You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 11 min 18 sec ago

Hydra Project: Hydra Connect #2 presentations and posters

Tue, 2014-10-07 15:27

Poster images from the “show and tell” session, and slide packs from many presentations given at Hydra Connect #2 last week can be found linked from the Hydra wiki.  The wiki front page also has a nice group photo – the full size version, should you want it, is linked from the conference program at the time it was taken – 12:30 on Wednesday.

David Rosenthal: Economies of Scale in Peer-to-Peer Networks

Tue, 2014-10-07 15:00
In a recent IEEE Spectrum article entitled Escape From the Data Center: The Promise of Peer-to-Peer Cloud Computing, Ozalp Babaoglu and Moreno Marzolla (BM) wax enthusiastic about the potential for Peer-to-Peer (P2P) technology to eliminate the need for massive data centers. Even more exuberance can be found in Natasha Lomas' Techcrunch piece The Server Needs To Die To Save The Internet (LM) about the MaidSafe P2P storage network. I've been working on P2P technology for more than 16 years, and although I believe it can be very useful in some specific cases, I'm far less enthusiastic about its potential to take over the Internet.

Below the fold I look at some of the fundamental problems standing in the way of a P2P revolution, and in particular at the issue of economies of scale. After all, I've just written a post about the huge economies that Facebook's cold storage technology achieves by operating at data center scale.
Economies of ScaleBack in April, discussing a vulnerability of the Bitcoin network, I commented:
Gradually, the economies of scale you need to make money mining Bitcoin are concentrating mining power in fewer and fewer hands. I believe this centralizing tendency is a fundamental problem for all incentive-compatible P2P networks. ... After all, the decentralized, distributed nature of Bitcoin was supposed to be its most attractive feature. In June, discussing Permacoin, I returned to the issue of economies of scale:
increasing returns to scale (economies of scale) pose a fundamental problem for peer-to-peer networks that do gain significant participation. One necessary design goal for networks such as Bitcoin is that the protocol be incentive-compatible, or as Ittay Eyal and Emin Gun Sirer (ES) express it:
the best strategy of a rational minority pool is to be honest, and a minority of colluding miners cannot earn disproportionate benefits by deviating from the protocolThey show that the Bitcoin protocol was, and still is, not incentive-compatible.

Even if the protocol were incentive-compatible, the implementation of each miner would, like almost all technologies, be subject to increasing returns to scale.Since then I've become convinced that this problem is indeed fundamental. The simplistic version of the problem is this:
  • The income to a participant in a P2P network of this kind should be linear in their contribution of resources to the network.
  • The costs a participant incurs by contributing resources to the network will be less than linear in their resource contribution, because of the economies of scale.
  • Thus the proportional profit margin a participant obtains will increase with increasing resource contribution.
  • Thus the effects described in Brian Arthur's Increasing Returns and Path Dependence in the Economy will apply, and the network will be dominated by a few, perhaps just one, large participant.
The advantages of P2P networks arise from a diverse network of small, roughly equal resource contributors. Thus it seems that P2P networks which have the characteristics needed to succeed (by being widely adopted) also inevitably carry the seeds of their own failure (by becoming effectively centralized). Bitcoin is an example of this. Some questions arise:
  • Does incentive-compatibility imply income linear in contribution?
  • If not, are there incentive-compatible ways to deter large contributions?
  • The simplistic version is, in effect, a static view of the network. Are there dynamic effects also in play?
Does incentive-compatibility imply income linear in contribution? Clearly, the reverse is true. If income is linear in, and solely dependent upon, contribution there is no way for a colluding minority of participants to gain more than their just reward. If, however:
  • Income grows faster than linearly with contribution, a group of participants can pool their contributions, pretend to be a single participant, and gain more than their just reward.
  • Income goes more slowly than linearly with contribution, a group of participants that colluded to appear as a single participant would gain less than their just reward.
So it appears that income linear in contribution is the limiting case, anything faster is not incentive-compatible.

Are there incentive-compatible ways to deter large contributions? In principle, the answer is yes. Arranging that income grows more slowly than contribution, and depends on nothing else, will do the trick. The problem lies in doing so.

Source: bitcoincharts.comThe actual income received by a participant is the value of the reward the network provides in return for the contribution of resources, for example the Bitcoin, less the costs incurred in contributing the resources, the capital and running costs of the mining hardware, in the Bitcoin case. As the value of Bitcoins collapsed (as I write, BTC is about $320, down from about $1200 11 months ago and half its value in August) many smaller miners discovered that mining wasn't worth the candle.

The network has to arrange not just that the reward grows more slowly than the contribution, but that it grows more slowly than the cost of the contribution to any participant. If there is even one participant whose rewards outpace their costs, Brian Arthur's analysis shows they will end up dominating the network. Herein lies the rub. The network does not know what an individual participant's costs, or even  the average participant's costs, are and how they grow as the participant scales up their contribution.

So the network would have to err on the safe side, and make rewards grow very slowly with contribution, at least above a certain minimum size. Doing so would mean few if any participants above the minimum contribution, making growth dependent entirely on recruiting new participants. This would be hard because their gains from participation would be limited to the minimum reward. It is clear that mass participation in the Bitcoin network was fuelled by the (unsustainable) prospect of large gains for a small investment.

Source: A network that assured incentive-compatibility in this way would not succeed, because the incentives would be so limited. A network that allowed sufficient incentives to motivate mass participation, as Bitcoin did, would share Bitcoin's vulnerability to domination by, as at present, two participants (pools, in Bitcoin's case).

Are there dynamic effects also in play? As well as increasing returns to scale, technology markets exhibit decreasing returns through time. Bitcoin is an extreme example of this. Investment in Bitcoin mining hardware has a very short productive life:
the overall network hash rate has been doubling every 3-4 weeks, and therefore, mining equipment has been losing half its production capability within the same time frame. After 21-28 weeks (7 halvings), mining rigs lose 99.3% of their value.This effect is so strong that it poses temptations for the hardware manufacturers that some have found impossible to resist. The FBI recently caught Butterfly Labs using hardware that customers had bought and paid for to mine on their own behalf for a while before shipping it to the customers. They thus captured the most valuable week or so of the hardware's short useful life for themselves

Source: blockchain.infoEven though with technology improvement rates much lower than the Bitcoin network hash rate increase, such as Moore's Law or Kryder's Law, the useful life of hardware is much longer than 6 months, this effect can be significant. When new, more efficient technology is introduced, thus reducing the cost per unit contribution to a P2P network, it does not become instantly available to all participants. As manufacturing ramps up, the limited supply preferentially goes to the manufacturers best customers, who would be the largest contributors to the P2P network. By the time supply has increased so that smaller contributors can enjoy the lower cost per unit contribution, the most valuable part of the technology's useful life is over.

Early availability of new technology acts to reduce the costs of the larger participants, amplifying their economies of scale. This effect must be very significant in Bitcoin mining, as Butterfly Labs noticed. At pre-2010 Kryder rates it would be quite noticeable since storage media service lives were less than 60 months. At the much lower Kryder rates projected by the industry storage media lifetimes will be extended and the effect correspondingly less.
TrustBM admit that there are significant unresolved trust issues in P2P technology:
The people using such a cloud must trust that none of the many strangers operating it will do something malicious. And the providers of equipment must trust that the users won’t hog computer time.

These are formidable problems, which so far do not have general solutions. If you just want to store data in a P2P cloud, though, things get easier: The system merely has to break up the data, encrypt it, and store it in many places.Unfortunately, even for storage this is inadequate. The system cannot trust the peers claiming to store the shards of the encrypted data but must verify that they actually are storing them. This is a resource-intensive process. Permacoin's proposal, to re-purpose resources already being expended elsewhere, is elegant but unlikely to be successful. Worse, the verification process consumes not just resources, but time. At each peer there is necessarily a window of time between successive verifications. During that time the system believes the peer has a good copy of the shard, but it might no longer have one.
Edge of the internetP2P enthusiasts describe the hardware from which their network is constructed in similar terms. Here is BM:
the P2P cloud is made up of a diverse collection of different people’s computers or game consoles or whateverand here is LM:
Users of MaidSafe’s network contribute unused hard drive space, becoming the network’s nodes. It’s that pooling — or, hey, crowdsourcing — of many users’ spare computing resource that yields a connected storage layer that doesn’t need to centralize around dedicated datacenters.When the idea of P2P networks started in the 90s:
Their model of the edge of the Internet was that there were a lot of desktop computers, continuously connected and powered-up, with low latency and no bandwidth charges, and with 3.5" hard disks that were mostly empty. Since then, the proportion of the edge with these characteristics has become vanishingly small. The edge is now intermittently powered up and connected, with bandwidth charges, and only small amounts of local storage.
Monetary rewardsThis means that, if the network is to gain mass participation, the majority of participants cannot contribute significant resources to it; they don't have suitable resources to contribute. They will have to contribute cash. This in turn means that there must be exchanges, converting between the rewards for contributing resources and cash, allowing the mass of resource-poor participants to buy from the few resource-rich participants.

Both Permacoin and MaidSafe envisage such exchanges, but what they don't seem to envisage is the effect on customers of the kind of volatility seen in the Bitcoin graph above. Would you buy storage from a service with this price history, or from Amazon? What exactly is the value to the mass customer of paying a service such as MaidSafe, by buying SafeCoin on an exchange, instead of paying Amazon directly, that would overcome the disadvantage of the price volatility?

As we see with Bitcoin, a network whose rewards can readily be converted into cash is subject to intense attack, and attracts participants ranging from sleazy to criminal. Despite its admirably elegant architecture, Bitcoin has suffered from repeated vulnerabilities. Although P2P technology has many advantages in resisting attack, especially the elimination of single points of failure and centralized command and control, it introduces a different set of attack vectors.
Measuring contributionsDiscussion of P2P storage networks tends to assume that measuring the contribution a participant supplies in return for a reward is easy. A Gigabyte is a Gigabyte after all. But compare two Petabytes of completely reliable and continuously available storage, one connected to the outside world by a fiber connection to a router near the Internet's core, and the other connected via 3G. Clearly, the first has higher bandwidth, higher availability and lower cost per byte transferred, so its contribution to serving the network's customers is vastly higher. It needs a correspondingly greater reward.

In fact, networks would need to reward many characteristics of a peer's storage contribution as well as its size:
  • Reliability
  • Availability
  • Bandwidth
  • Latency
Measuring each of these parameters, and establishing "exchange rates" between them, would be complex, would lead to a very mixed marketing message, and would be the subject of disputes. For example, the availability, bandwidth and latency of a network resource depends on the location in the network from which the resource is viewed, so there would be no consensus among the peers about these parameters.
ConclusionWhile it is clear that P2P storage networks can work, and can even be useful tools for small communities of committed users, the non-technical barriers to widespread adoption are formidable. They have been effective in preventing widespread adoption since the late 90s, and the evolution of the Internet has since raised additional barriers.

Terry Reese: MarcEdit 6 Update (10/6/2014)

Tue, 2014-10-07 13:47

I sent this note to the MarcEdit listserv late last night, early this morning, but forgot to post here.  Over the weekend, the Ohio State University Libraries hosted our second annual hackaton on the campus.  It’s been a great event, and this year, I had one of the early morning shifts (12 am-5 am) so I decided to use the time to do a little hacking myself.  Here’s a list of the changes:

  • Bug Fix: Merge Records Function: When processing using the control number option (or MARC21 primarily utilizing control numbers for matching) the program could merge incorrect data if large numbers of merged records existed without the data specified to be merged.  The tool would pull data from the previous record used and add that data to the matches.  This has been corrected.
  • Bug Fix: Network Task Directory — this tool was always envisioned as a tool that individuals would point to when an existing folder existed.  However, if the folder doesn’t exist prior to pointing to the location, the tool wouldn’t index new tasks.  This has been fixed.
  • Bug Fix: Task Manager (Importing new tasks) — When tasks were imported with multiple referenced task lists, the list could be unassociated from the master task.  This has been corrected.
  • Bug Fix:  If the plugins folder doesn’t exist, the current Plugin Manager doesn’t create one when adding new plugins.  This has been corrected.
  • Bug Fix: MarcValidator UI issue:  When resizing the form, the clipboard link wouldn’t move appropriately.  This has been fixed.
  • Bug Fix: Build Links Tool — relator terms in the 1xx and 7xx field were causing problems.  This has been corrected.
  • Bug Fix: RDA Helper: When parsing 260 fields with multiple copyright dates, the process would only handle one of the dates.  The process has been updated to handle all copyright values embedded in the 260$c.
  • Bug Fix: SQL Explorer:  The last build introduced a regression error so that when using the non-expanded SQL table schema, the program would crash.  This has been corrected.
  • Enhancement:  SQL Explorer expanded schema has been enhanced to include a column id to help track column value relationships.
  • Enhancement: Z39.50 Cataloging within the MarcEditor — when selecting the Z39.50/SRU Client, the program now seemlessly allows users to search using the Z39.50 client and automatically load the results directly into the open MarcEditor window.

Two other specific notes.  First, a few folks on the listserv have noted trouble getting MarcEdit to run on a Mac.  The issue appears to be MONO related.  Version 3.8.0 appears to have neglected to include a file in the build (which caused GUI operations to fail), and 3.10.0 brings the file back, but there was a build error with the component so the issue continues.  The problems are noted in their release notes as known issues and the bug tracker seems to suggest that this has been corrected in the alpha channels, but that doesn’t help anyone right now.  So, I’ve updated the Mac instruction to include a link to MONO 3.6.0, the last version tested as a stand alone install that I know works.  From now on, I will include the latest MONO version tested, and a link to the runtime to hopefully avoid this type of confusion in the future.

Second – I’ve created a nifty plugin related to the LibHub project.  I’ve done a little video recording and will be making that available shortly.  Right now, I’m waiting on some feedback.  The plugin will be initially released to LibHub partners to provide a way for them to move any data into the project for evaluation – but hopefully in time, it will be able to be more made more widely available.

Updates can be downloaded automatically via MarcEdit, or can be found at:

Please remember, if you are running a very old copy of MarcEdit 5.8 or lower, it is best practice to uninstall the application prior to installing 6.0.



Thom Hickey: Another JSON encoding for MARC data

Tue, 2014-10-07 13:42

You might think that a well understood format such as MARC would have a single straight-forward way of being represented in JSON.  Not so!  There are lots of ways of doing it, all with their own advantages (see some references below).  Still, I couldn't resist creating yet another.

This encoding grew out of some experimentation with Go (Golang), in which encoding MARC in JSON was one of my test cases, as was the speed at which the resulting encoding could be processed.  Another inspiration was Rich Hickey's ideas about the relaionship of data and objects:

...the use of objects to represent simple informational data is almost criminal in its generation of per-piece-of-information micro-languages, i.e. the class methods, versus far more powerful, declarative, and generic methods like relational algebra. Inventing a class with its own interface to hold a piece of information is like inventing a new language to write every short story.

That said, how to represent the data still leaves lots of options, as the multiple enocodings of MARC into JSON show.

Go's emphasis on strict matching of types pushed me towards a very flat structure:

  • The record is encoded as an array of objects
  • Each object has a 'Type' and represents either the leader or a field

Here are examples of the different fields:

{"Type":"leader", "Data": "the leader goes here"}

{"Type":"cfield", "Tag":"001", "Data":"12345"}

{"Type":"dfield", "Tag":"245", "Inds":"00", "Data":"aThis is a title$bSubtitle"}

Note that the subfields do not get their own objects.  They are concatenated together into one string using standard MARC subfield delimiters (represented by a $ above), essentially the way they appear in an ISO 2709 encoding.  In Python (and in Go) it is easy to split these strings on the delimiter into subfields as needed.  

In addition to making it easy to import the JSON structure into Go (everything is easy in Python), the lack of structure makes reading and writing the list of fields very fast and simple. The main HBase table that supports WorldCat now has some 1.7 billion rows, so fast processing is essential and we find that this encoding much faster than processing the XML representation.  Although we do put the list of fields into a Python object, that object is derived from the list itself, so we can treat is as such, including adding new fields (and Types) as needed, which then get automatically carried along in the exported JSON.

We are also finding that a simple flat structure makes it easy to add information (e.g. administrative metadata) that doesn't fit into standard MARC without effort.

Here are a few MARC in JSON references (I know there have been others in the past).  As far as I can tell, Ross's is the most popular:

Ross Singer:

Clay Fouts:

Galen Charlton:

Bill Dueber:

A more general discussion by Jakob Voss

Here is a full example of a record using the same example Ross Singer uses (although the record itself appears to have changed):

[{"Data": "01471cjm a2200349 a 4500", "Type": "leader"},
{"Data": "5674874", "Tag": "001", "Type": "cfield"},
{"Data": "20030305110405.0", "Tag": "005", "Type": "cfield"},
{"Data": "sdubsmennmplu", "Tag": "007", "Type": "cfield"},
{"Data": "930331s1963 nyuppn eng d", "Tag": "008", "Type": "cfield"},
{"Data": "9(DLC) 93707283", "Tag": "035", "Type": "dfield", "Inds": " "},
{"Data": "a7\u001fbcbc\u001fccopycat\u001fd4\u001fencip\u001ff19\u001fgy-soundrec", "Tag": "906", "Type": "dfield", "Inds": " "},
{"Data": "a 93707283 ", "Tag": "010", "Type": "dfield", "Inds": " "},
{"Data": "aCS 8786\u001fbColumbia", "Tag": "028", "Type": "dfield", "Inds": "02"},
{"Data": "a(OCoLC)13083787", "Tag": "035", "Type": "dfield", "Inds": " "},
{"Data": "aOClU\u001fcDLC\u001fdDLC", "Tag": "040", "Type": "dfield", "Inds": " "},
{"Data": "deng\u001fgeng", "Tag": "041", "Type": "dfield", "Inds": "0 "},
{"Data": "alccopycat", "Tag": "042", "Type": "dfield", "Inds": " "},
{"Data": "aColumbia CS 8786", "Tag": "050", "Type": "dfield", "Inds": "00"},
{"Data": "aDylan, Bob,\u001fd1941-", "Tag": "100", "Type": "dfield", "Inds": "1 "},
{"Data": "aThe freewheelin' Bob Dylan\u001fh[sound recording].", "Tag": "245", "Type": "dfield", "Inds": "14"},
{"Data": "a[New York, N.Y.] :\u001fbColumbia,\u001fc[1963]", "Tag": "260", "Type": "dfield", "Inds": " "},
{"Data": "a1 sound disc :\u001fbanalog, 33 1/3 rpm, stereo. ;\u001fc12 in.", "Tag": "300", "Type": "dfield", "Inds": " "},
{"Data": "aSongs.", "Tag": "500", "Type": "dfield", "Inds": " "},
{"Data": "aThe composer accompanying himself on the guitar ; in part with instrumental ensemble.", "Tag": "511", "Type": "dfield", "Inds": "0 "},
{"Data": "aProgram notes by Nat Hentoff on container.", "Tag": "500", "Type": "dfield", "Inds": " "},
{"Data": "aBlowin' in the wind -- Girl from the north country -- Masters of war -- Down the highway -- Bob Dylan's blues -- A hard rain's a-gonna fall -- Don't think twice, it's all right -- Bob Dylan's dream -- Oxford town -- Talking World War III blues -- Corrina, Corrina -- Honey, just allow me one more chance -- I shall be free.", "Tag": "505", "Type": "dfield", "Inds": "0 "},
{"Data": "aPopular music\u001fy1961-1970.", "Tag": "650", "Type": "dfield", "Inds": " 0"},
{"Data": "aBlues (Music)\u001fy1961-1970.", "Tag": "650", "Type": "dfield", "Inds": " 0"},
{"Data": "3Preservation copy (limited access)\u001fu", "Tag": "856", "Type": "dfield", "Inds": "41"},
{"Data": "aNew", "Tag": "952", "Type": "dfield", "Inds": " "},
{"Data": "aTA28", "Tag": "953", "Type": "dfield", "Inds": " "},
{"Data": "bc-RecSound\u001fhColumbia CS 8786\u001fwMUSIC", "Tag": "991", "Type": "dfield", "Inds": " "}


Note: As far as I know Rich Hickey and I are not related.

Open Knowledge Foundation: Open Definition v2.0 Released – Major Update of Essential Standard for Open Data and Open Content

Tue, 2014-10-07 11:00

Today Open Knowledge and the Open Definition Advisory Council are pleased to announce the release of version 2.0 of the Open Definition. The Definition “sets out principles that define openness in relation to data and content” and plays a key role in supporting the growing open data ecosystem.

Recent years have seen an explosion in the release of open data by dozens of governments including the G8. Recent estimates by McKinsey put the potential benefits of open data at over $1 trillion and others estimates put benefits at more than 1% of global GDP.

However, these benefits are at significant risk both from quality problems such as “open-washing” (non-open data being passed off as open) and from fragmentation of the open data ecosystem due to incompatibility between the growing number of “open” licenses.

The Open Definition eliminates these risks and ensures we realize the full benefits of open by guaranteeing quality and preventing incompatibility.See this recent post for more about why the Open Definition is so important.

The Open Definition was published in 2005 by Open Knowledge and is maintained today by an expert Advisory Council. This new version of the Open Definition is the most significant revision in the Definition’s nearly ten-year history.

It reflects more than a year of discussion and consultation with the community including input from experts involved in open data, open access, open culture, open education, open government, and open source. Whilst there are no changes to the core principles, the Definition has been completely reworked with a new structure and new text as well as a new process for reviewing licenses (which has been trialled with governments including the UK).

Herb Lainchbury, Chair of the Open Definition Advisory Council, said:

“The Open Definition describes the principles that define “openness” in relation to data and content, and is used to assess whether a particular licence meets that standard. A key goal of this new version is to make it easier to assess whether the growing number of open licenses actually make the grade. The more we can increase everyone’s confidence in their use of open works, the more they will be able to focus on creating value with open works.”

Rufus Pollock, President and Founder of Open Knowledge said:

“Since we created the Open Definition in 2005 it has played a key role in the growing open data and open content communities. It acts as the “gold standard” for open data and content guaranteeing quality and preventing incompatibility. As a standard, the Open Definition plays a key role in underpinning the “open knowledge economy” with a potential value that runs into the hundreds of billions – or even trillions – worldwide.”

What’s New

In process for more than a year, the new version was collaboratively and openly developed with input from experts involved in open access, open culture, open data, open education, open government, open source and wiki communities. The new version of the definition:

  • Has a complete rewrite of the core principles – preserving their meaning but using simpler language and clarifying key aspects.
  • Introduces a clear separation of the definition of an open license from an open work (with the latter depending on the former). This not only simplifies the conceptual structure but provides a proper definition of open license and makes it easier to “self-assess” licenses for conformance with the Open Definition.
  • The definition of an Open Work within the Open Definition is now a set of three key principles:
    • Open License: The work must be available under an open license (as defined in the following section but this includes freedom to use, build on, modify and share).
    • Access: The work shall be available as a whole and at no more than a reasonable one-time reproduction cost, preferably downloadable via the Internet without charge
    • Open Format: The work must be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights. Specifically, data should be machine-readable, available in bulk, and provided in an open format or, at the very least, can be processed with at least one free/libre/open-source software tool.
  • Includes improved license approval process to make it easier for license creators to check conformance of their license with the Open Definition and to encourage reuse of existing open licenses (rrareuse and outlines the process for submitting a license so that it can be checked for conformance against the Open Definition.
More Information
  • For more information about the Open Definition including the updated version visit:
  • For background on why the Open Definition matters, read the recent article ‘Why the Open Definition Matters’

This post was written by Herb Lainchbury, Chair of the Open Definition Advisory Council and Rufus Pollock, President and Founder of Open Knowledge

Open Knowledge Foundation: Branzilian Government Develops Toolkit to Guide Institutions in both Planning and Carrying Out Open Data Initatives

Tue, 2014-10-07 10:20

This is a guest post by Nitai Silva of the Brazilian government’s open data team and was originally published on the Open Knowledge Brazil blog here.

Recently Brazilian government released the Kit de Dados Abertos (open data toolkit). The toolkit is made up of documents describing the process, methods and techniques for implementing an open data policy within an institution. Its goal is to both demystify the logic of opening up data and to share with public employees observed best practices that have emerged from a number of Brazilian government initiatives.

The toolkit focuses on the Plano de Dados Abertos – PDA (Open Data Plan) as the guiding instrument where commitments, agenda and policy implementation cycles in the institution are registered. We believe that making each public agency build it’s own PDA is a way to perpetuate the open data policy, making it a state policy and not just a transitory governmental action.

It is organizsd to facilitate the implementation of the main activities cycles that must be observed in an institution and provides links and manuals to assist in these activities. Emphasis is given to the actors/roles involved in each step and their responsibilities. Is also helps to define a central person to monitor and maintain the PDA. The following diagram summarises the macro steps of implementing an open data policy in an institution:


Processo Sistêmico de um PDA


Open data theme has been part of the Brazilian government’s agenda for over three years. Over this period, we have accomplished a number of important achievemnet including passing the Lei de Acesso à Informação – LAI (FOIA) (Access to Information Law), making commitments as part of our Open Government Partnership Action Plan and developing the Infraestrutura Nacional de Dados Abertos (INDA) (Open Data National Infrastructure). However, despite these accomplishments, for many public managers, open data activities remain the exclusive responsibility of the Information Technology department of their respective institution. This gap is, in many ways, the cultural heritage of the hierarchical, departmental model of carrying out public policy and is observed in many institutions.

The launch of the toolkit is the first of a series of actions prepared by the Ministry of Planning to leverage open data initiatives in federal agencies, as was defined in the Brazilian commitments in the Open Government Partnership (OGP). The next step is to conduct several tailor made workshops designed to support major agencies in the federal government in the implementation of open data.

Despite it having been built with the aim of expanding the quality and quantity of open data made available by the federal executive branch agencies, we also made a conscious effort to make the toolkit generic enough generic enough for other branches and levels of government.

About the toolkit development:

It is also noteworthy to mention that the toolkit was developed on Github. Although the Github is known as an online and distributed environment for develop software, it has already being used for co-creation of text documents for a long time, even by governments. The toolkit is still hosted there, which allows anyone to make changes and propose improvements. The invitation is open, we welcome and encourageyour collaboration.

Finally I would like to thank Augusto Herrmann, Christian Miranda, Caroline Burle and Jamila Venturini for participating in the drafting of this post!

LibUX: Designing a Library Website with Zurb Triggers

Mon, 2014-10-06 22:54

design trigger is a pattern meant to appeal to behavior and cognitive biases observed in users. Big data and the user experience boom has provided a lot of information about how people actually use the web, which designs work, and–although creepy–how it is possible to cobble together an effective site designed to social engineer users.

So not too long ago Zurb pulled a bunch of triggers together into a pretty useful and interesting collection, and I thought I’d take a crack at prototyping a the front page of a library website.

Spoilers: the quality of my prototype kind of sucks, and blah-blah-blah let’s just call it low-fidelity prototyping, but I just want to ask you to, y’know, use your imagination.

Design for the Bottom Line

The Nightvale Public Library is a small, haunted public library that like many others has a bit of an image problem. It derives its annual budget from circulation numbers, library card registration, and resource usage. While there are a ton of interesting and good-looking public library websites to rip off, the Nightvale Public Library Web Librarian knows that a successful #libweb needs to convert traffic into measurements the institution uses to define success.

This website needs to

  • increase catalog usage
  • increase resource awareness and usage
  • increase library card registration

so let’s use design triggers to inspire this behavior.


 Any piece of information can act like an “anchor,” serving as a reference point for users to contrast against all other information. It can also bias how they make choices. Oftentimes, the anchor is the initial information you’re exposed to.

The first opportunity we should take is to present the, ah, “modernity” of the NPL – and modern it is, even if its offerings are comparatively small to some of the larger libraries in the state. If public libraries are concerned about future state and federal budgets, then it is worth drawing direct comparisons between the library and other popular web services users will be familiar with. A library website can’t compete with an Amazon, true, but it can and should exist in the same universe. If “relevancy” is in anyway judged by design trends, then it helps NPL’s ulterior motives to rip off a few.

In this example, I used a background image of someone using a library website on their phone and made the catalog search crazy prominent.

Behavioral Goals
  • User’s first impression is of a clean, modern library
  • Draw user’s attention to search

Relative Value

Shoppers often have a reference price in mind for the cost of any particular item. Reference prices are as important for assessing relative value as the price difference between competing products.

We can then use our next opportunity to create an effective double-whammy that at once previews our wares and also draws comparisons to other services our users like. Library web services may not have the, er, video library available to Netflix, but there’s always a one-up: it’s free. Making users aware of the relative value of library services can increase brand loyalty, support, forgiveness for when databases inevitably break or there are troublesome user interfaces, and so on.

Libraries can also use “relative value triggers” to make the case for Friends membership and other donations. I’ll link-up the research later, but there is powerful evidence showing that if you ask people to donate, subscribe, try a beta, or whatever – they will.

Behavioral Goals
  • Impress user with breadth and currency of resources
  • Appeal to user’s sense of community, price consciousness, and so on
  • User feels good about her or his library


Having partial knowledge can create a feeling that drives us to fill those gaps in information. Novel and challenging situations can increase that drive, and give intrinsic value to tasks.

At this point, we have hopefully suggested to the user that the Nightvale Public Library is clean, modern, and otherwise just a pretty awesome feel-good service. We can at this point exploit those good feelings for real engagement. The idea is that this is a section using only the best and most recent content, because boring or badly written content will negate our efforts. It could be an awesome library video or a high quality, link-baity post encouraging use of an obscure resource, but the idea is that it’s good and interesting.

Libraries can use curiosity to guide users to various gems which create delight (lost episode coming soon!) – delight increases loyalty and good-feels all around.

Behavioral Goals
  • User has a glimpse of quality content, increasing the library’s overall sense of value
  • Heck, maybe the user even clicks something

Recognition over Recall

Memories rely on being able to retrace neural patterns and connections that were associated with the original stimulus. Cues make it easier to retrieve memories, over free recall without a cue.

We can use recognition over recall to immediately encourage an action – like card registration. Additionally, libraries can use this tactic to shape the general story they are telling. If even one of the above triggers worked as hoped, then we can use our final opportunity to transform these hazy impressions into a concrete feeling: the library is awesome, it has a lot of cool stuff, and it may even be better than competitors.

There’s also a trigger called the Zeigarnik Effect, which basically translates to, “Well, you came this far. Don’t you want to sign up?” It actually sounds a little evil:

It’s possible to take advantage of our tendency to seek closure. Once the initial hurdle of starting something is overcome, the internal tension caused from not completing a task can be a great motivator

Oh, and libraries really need to make library card registration suck less. Even if you need loads of personal, creepy information, maybe you can at least hide the bulk of it from sight until the user is committed. Here’s an example:

See the Pen Simple Expanding Sign-Up Form by Michael Schofield (@michaelschofield) on CodePen.


Understanding and making use of design triggers is an important next step after libraries embrace usability testing. If anything, they allow for quick and informed prototypes, but more importantly they force you to think about the bottom line and define tangible measures of success.

So, what do you think? Would you register?

FYI: I used Balsamiq Mockups.

The post Designing a Library Website with Zurb Triggers appeared first on LibUX.

Jonathan Rochkind: Umlaut News: 4.0 and two new installations

Mon, 2014-10-06 21:27

Umlaut is, well, now I’m going to call it a known-item discovery layer, usually but not neccesarily serving as a front-end to SFX.

Umlaut 4.0.0 has been released

This release is mostly back-end upgrades, including:

  •  Support for Rails 4.x (Rails 3.2 included to make migration easier for existing installations, but recommend upgrading to Rails 4.1 asap, and starting with Rails 4.1 in new apps)
  • Based on Bootstrap 3 (Umlaut 3.x was Bootstrap 2)
  • internationalization/localization support
  • A more streamlined installation process with a custom installer
Recent Umlaut Installations

Princeton University has a beta install of Umlaut, and is hoping to go live in production soon.

Durham University (UK) has a beta/soft launch of Umlaut live. 

Filed under: General

Jonathan Rochkind: Umlaut News: 4.0 and two new installations

Mon, 2014-10-06 21:27

Umlaut is, well, now I’m going to call it a known-item discovery layer, usually but not neccesarily serving as a front-end to SFX.

Umlaut 4.0.0 has been released

This release is mostly back-end upgrades, including:

  •  Support for Rails 4.x (Rails 3.2 included to make migration easier for existing installations, but recommend upgrading to Rails 4.1 asap, and starting with Rails 4.1 in new apps)
  • Based on Bootstrap 3 (Umlaut 3.x was Bootstrap 2)
  • internationalization/localization support
  • A more streamlined installation process with a custom installer
Recent Umlaut Installations

Princeton University has a beta install of Umlaut, and is hoping to go live in production soon.

Durham University (UK) has a beta/soft launch of Umlaut live. 

Filed under: General

District Dispatch: ALA voice to join FCC Open Internet roundtable

Mon, 2014-10-06 19:27

John Windhausen, network neutrality counsel to the American Library Association (ALA) and president of Telepoly Consulting, will represent libraries and higher education institutions Tuesday as a panelist for an Open Internet roundtable discussion hosted by the Federal Communications Commission (FCC).

Windhausen will advocate for strong net neutrality principles during the panel “Construction of Legally Sustainable Rules” of the roundtable “Internet Openness and the Law,” which takes place at 11:30 a.m. on Tuesday, October 7, 2014. During the panel discussion, Windhausen will explore ways that the FCC can use its Section 706 authority with an “Internet-reasonable” standard that would recognize and support the Internet ecosystem and the Internet as an open platform for free speech, learning, research and innovation.

ALA, along with other higher education and library organizations, affirmed the critical importance of network neutrality to fulfilling our public interest missions and outlined the case for an Internet-reasonable standard in its filings (pdf) in the Open Internet proceeding. The ALA also urged the FCC to use available legal authority to adopt strong, enforceable net neutrality rules in a joint letter (pdf) with the Center for Democracy and Technology.

The roundtable is free, open to the public and will be streamed live.

The post ALA voice to join FCC Open Internet roundtable appeared first on District Dispatch.

Library of Congress: The Signal: We Want You Just the Way You Are: The What, Why and When of Fixity

Mon, 2014-10-06 16:10

Icons for Archive and Checksum, Designed by Iconathon Los Angeles, California, US 2013. Public Domain

Fixity, the property of a digital file or object being fixed or unchanged, is a cornerstone of digital preservation. Fixity information, from simple file counts or file size values to more precise checksums and cryptographic hashes, is data used to verify whether an object has been altered or degraded.

Many in the preservation community know they should be establishing and checking the fixity of their content, but less understood is how, when and how often? The National Digital Stewardship Alliance Standards and Practices and Infrastructure working groups have published Checking Your Digital Content: What is Fixity and When Should I Be Checking It?  (PDF) to help stewards of digital objects answer these questions in a way that makes sense for their organization based on their needs and resources.

The document includes helpful information on the following topics:

  • Definitions of fixity and fixity information.
  • Eleven reasons to collect, check, maintain and verify fixity information.
  • Seven general approaches to fixity check frequency.
  • Six common fixity information-generating instruments compared against each other.
  • Four places to consider storing and referencing fixity information.

Many thanks to everyone else who participated in the drafting and review of the document:

  • Paula De Stefano, New York University.
  • Carl Fleischhauer, Library of Congress.
  • Andrea Goethals, Harvard University.
  • Michael Kjörling, independent researcher.
  • Nick Krabbenhoeft, Educopia Institute.
  • Chris Lacinak, AVPreserve.
  • Jane Mandelbaum, Library of Congress.
  • Kevin McCarthy, National Archives and Records Administration.
  • Kate Murray, Library of Congress.
  • Vivek Navale, National Archives and Records Administration.
  • Dave Rice, City University of New York.
  • Robin Ruggaber, University of Virginia.
  • Kate Zwaard, Library of Congress.

DuraSpace News: VIVO Data from Mars: LASP, VIVO, and MAVEN

Mon, 2014-10-06 00:00

Above are four of the first images taken by the IUVS instrument. IUVS obtained these false-color images about eight hours after the successful completion of MAVEN’s Mars orbital insertion maneuver on September 21, 2014.

From Michael Cox, Laboratory of Atmospheric and Space Physics, University of Colorado at Boulder

Ed Summers: Sign o’ the Times

Sun, 2014-10-05 17:42

a sign, a metaphor by Casey Bisson.

An old acquaintance took this photo in Coaldale, Nevada. I had to have a copy for myself.

Patrick Hochstenbach: Homework assignment #2 Sketchbookskool

Sun, 2014-10-05 11:34
Filed under: Comics Tagged: cartoon, cat, comics, copic, manual, sketchskool, watercolor

Riley Childs: Test Post

Sat, 2014-10-04 21:05

This is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutes

The post Test Post appeared first on Riley's blog at

Patrick Hochstenbach: Homework assignment #1 Sketchbookskool

Sat, 2014-10-04 08:39
I enrolled in Sketchbook Skool. As first homework assignment we were asked to draw a recipe Filed under: Comics Tagged: cat, copic, recipe, sketchbookskool, watercolor

LITA: A Tested* Approach to Leveling Up

Sat, 2014-10-04 00:04

*Unscientifically, by a person from the internet.

If you’re a LITA member, then you’re probably very skilled in a few technical areas, and know just enough to be dangerous in several other areas. The later can be a liability if you’ve just been volunteered to implement the Great New Tech Thing at your library. Do it right, and you just might be recognized for your ingenuity and hard work (finally!). Do it wrong, and you’ll end up in the pillory (again!).

Maybe the Great New Tech Thing requires you to learn a new programming or markup language. Perhaps you’re looking to expand on your skills–and resume–by adding a language. For many years, the library associations and schools have emphasized tech skills as an essential component of librarianship. The reasons are plentiful, and the means are easier that you might think. With a library card, a few free, open source software tools, and some time, you can level up your tech skills by learning a new language.

I humbly suggest the following approach to leveling up, which has worked for me.

What you’ll need

A computer. A Windows, OS X, or Linux laptop or desktop computer will suffice.

Resources. Online programming “schools”, such as Codeacademy and Code School are a great concept and work for some people, but I’ve personally found them to provide an incomplete education. The UI demands brevity, and therefore many of the explanations and instructions require a certain level of knowledge about coding in general that most beginners lack. I have found good ol’ fashioned books to be a better resource. Find titles that have exercises, and you’ll learn by doing. Actually building something practical makes the process enjoyable. The Visual Quickstart Guide series by Peachpit Press and the Head First series by O’Reilly usually teach through practical examples.

Books are a great source of knowledge, but so are your fellow coders. Most languages have a community with an online presence, and it would be a good idea to find those forums and bookmark them. But if you were to bookmark only one forum, it should be the Stack Overflow forum for the language you’re learning.

Some languages also have official documentation online, for example, and

Time. Carve out time wherever you can. If you take public transportation to work, use that time (if you can find a seat). Learn during your lunch break. Give up a season of your favorite TV show (you can always catch up later in a weekend binge-watch when the DVDs hit your library shelves).

Where to start

Here and now. Maybe you’re reading this because you’ve just been tapped to implement the Great New Tech Thing at your library. Or maybe you’re considering adding a skill to your resume. Whatever the reason, there’s no time like the present.

Leveling up for professional development affords you greater flexibility. Start with a language your friends know–they will be an invaluable resource if you get stuck along the way. Also, consider starting with a simple language that you can build upon. If you already know HTML, then PHP and JavaScript are natural progressions, and they open the door to object-oriented languages like C++, Java, or Python. Finally, make sure there’s a viable–if not growing–community around the language you want to learn. Not only does this give a sense of the language’s future and staying-power, the community can also provide support through online forums, conferences and meetups, etc.

If you’re new to programming languages, I hope this approach helps. If you’re a veteran coder, please share your learning approach in the comments.

HangingTogether: Jyaa mata Seki-san – farewell to our OCLC Research Fellow from Japan

Fri, 2014-10-03 23:48

Hideyuki Seki and Jim Michalko on the OCLC headquarters campus

We are about to say goodbye to Hideyuki Seki, our current OCLC Research Fellow from Japan. The Manager of the Media Center (a designation for all the libraries) at Keio University in Tokyo, Seki-san has been with us for the last two weeks spending time in both the San Mateo and Dublin offices. His time with us was structured so that he would learn enough about our work and goals that he could informally but effectively represent OCLC Research and the OCLC Research Library Partnership to his colleagues at Keio and to his peers in the Japanese research library community.

Seki-san arrived with a particular set of interests he hoped to explore during his brief residency with us. He wanted to know more about:

1) Invigoration of cooperation among research libraries in Japan

There is not a strong history of collaborative projects among Japanese research libraries and he wanted to see if the strong commitment to collaboration here had lessons that would be useful in building communities of interest and practice in his country.

2) Advancing Keio University’s research impact and reputation

Connecting the Keio Media Center’s activities to the research being done at the University in ways that enrich it and increase its impact is a particular challenge shared with many US university libraries. He wanted to see the range of responses that are emerging here and consider them in light of the culture of Japanese universities.

3) Future of the digital repository

This interest is connected to research reputation and support issues as well as concerns about digital surrogates for preservation and access. He wondered about the US view of impact and sustainability.

He was also curious about the way OCLC Research operates, how it supports the Partnership as well as the OCLC cooperative.

We structured a program for him and included him in our annual all-staff face-to-face planning meetings at headquarters in Dublin, Ohio. It was certainly a challenge for him to be immersed in our idiosyncratic vocabulary, our bundles of acronyms and the flood of idiomatic English we consistently let flow. We reminded ourselves of the difficulties we might be causing to his understanding but seemed powerless to temporarily amend our ways. Instead we relied on him to rise to the challenge. He did.

We benefited from his presence in a number of ways. Explaining why we were giving attention to certain topics occasionally challenged us to reconsider. The extent to which the Japanese publishing industry has maintained a library service landscape still tied to print was revelatory for most of us. (The publishers have been slow to offer e-journals given that their market is captive by language and they are addicted to their high margins.) The management regime within the administrative echelons of Japanese universities also dictates the pace of change and progress. Managers are routinely re-assigned on a regular schedule to new responsibility areas at just about the time that they know enough to implement new directions.

And it was great fun seeing our Bay Area and mid-Ohio tourist sites through his eyes. Everybody learned a lot. We trust he’ll judge it worth the journey. We’ve already decided it was worth our effort.

About Jim Michalko

Jim coordinates the OCLC Research office in San Mateo, CA, focuses on relationships with research libraries and work that renovates the library value proposition in the current information environment.

Mail | Web | Twitter | LinkedIn | Google+ | More Posts (103)

District Dispatch: Librarians won’t stay quiet about surveillance

Fri, 2014-10-03 21:07

Photo by KPBS

The Washington Post highlighted the library community’s efforts to protect the public from government intrusion or censorship in “Librarians won’t stay quiet about government surveillance,” a feature article published today. It has been a longstanding belief in the library community that the possibility of surveillance—whether directly or through access to records of speech, research and exploration—undermines a democratic society.

Washington Post writer Andrea Peterson states:

In September 2003, Attorney General John Ashcroft called out the librarians. The American Library Association and civil liberties groups, he said, were pushing “baseless hysteria” about the controversial Patriot Act. He suggested that they were worried that spy agencies wanted to know “how far you have gotten on the latest Tom Clancy novel.”

In the case of government surveillance, they are not shushing. They’ve been among the loudest voices urging freedom of information and privacy protections.

Edward Snowden’s campaign against the National Security Agency’s data collection program has energized this group once again. And a new call to action from the ALA’s president means their voices could be louder and more coordinated than ever.

Read more

The post Librarians won’t stay quiet about surveillance appeared first on District Dispatch.