You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 54 min 36 sec ago

District Dispatch: Watch and learn: Making the election connection for libraries

Wed, 2014-10-08 16:47

On Monday, October 6, 2014, the American Library Association (ALA) and Advocacy Guru Stephanie Vance collaborated to host “Making the Election Connection,” an interactive webinar that explored the ways that library advocates can legally engage during an election season, as well as what types of activities have the most impact. Library supporters who missed Tuesday’s advocacy webinar now have access to the archived video.

Making the Election Connection from ALA Washington on Vimeo.

The post Watch and learn: Making the election connection for libraries appeared first on District Dispatch.

Library of Congress: The Signal: Astronomical Data & Astronomical Digital Stewardship: Interview with Elizabeth Griffin

Wed, 2014-10-08 15:36

The following is a guest post from Jane Mandelbaum, co-chair of the National Digital Stewardship Alliance Innovation Working group and IT Project Manager at the Library of Congress.

Elizabeth Griffin is an astrophysicist at the Dominion Astrophysical Observatory in Victoria Canada.

As part of our ongoing series of Insights interviews with individuals doing innovative work related to digital preservation and stewardship, we are interested in talking to practitioners from other fields on how they manage and use data over time.

To that end, I am excited to explore some of these issues and questions with Elizabeth Griffin. Elizabeth is an astrophysicist at the Dominion Astrophysical Observatory in Victoria Canada. She is Chair of the International Astronomical Union Task Force for the Preservation and Digitization of Photographic Plates, and Chair of the Data at Risk Task Group of the International Council for Science Committee on Data for Science and Technology. Griffin presented on Preserving and Rescuing Heritage Information on Analogue Media (PDF) at Digital Preservation 2014. We’re interested in understanding how astronomers have been managing and using astronomical data and hope that others can learn from the examples of astronomers.

Jane: Do you think that astronomers deal with data differently than other scientists?

Elizabeth:  Not differently in principle – data are precious and need to be saved and shared – but the astronomical community has managed to get its act together efficiently, and is consequently substantially more advanced in its operation of data management and sharing than are other sciences.  One reason is that the community is relatively small compared to that of other natural sciences and its attendant international nature also requires careful attention to systems that have no borders.

Another is that its heritage records are photographic plates, requiring a Plate Archivist with duties to catalog what has been obtained; those archives contained a manageable amount of observations per observatory (until major surveys like the Sloan Digital Sky Survey became a practical possibility).  Thus, astronomers could always access past observations, even if only as photographs, so the advantages of archiving even analogue data was established from early times.

Jane: It is sometimes said that astronomers are the scientists who are closest to practitioners of digital preservation because they are interested in using and comparing historical data observations over time.  Do astronomers think they are in the digital preservation business?

Elizabeth: Astronomers  know (not just “think”!) that they are in the digital preservation business, and have established numerous accessible archives (mirrored worldwide) that enable researchers to access past data.  But “historical” indicates different degrees; if a star changes by the day, then yesterday’s (born-digital) data are “historical,” whereas for events that have timescales of the order of a century, then “historical” data must include analogue records on photographic plates.

In the former case, born-digital data abound worldwide; in the latter, they are only patchily preserved in digitized form.  But the same element of “change” applies throughout the natural sciences, not just in astronomy.  Think of global temperature changes and the attendant alterations to glacier coverage, stream flows, dependent flora and fauna, air pollution and so on.   Hand-written data in any of the natural sciences, be they ocean temperatures, weather reports, snow and ice measurements or whatever, all belong to modern research, and all relevant scientists have got to see themselves as being in the digital preservation business, and to devote an aliquot portion of their resources to nurturing those precious legacy data.

We have no other routes to the truth about environmental changes that are on a longer time-scale that our own personal memories or records take us.  Digital preservation of these types of data are vital for all aspects of knowledge regarding change in the natural world, and the scientists involved must join astronomers in being part of the digital preservation business.

Jane: What do you think the role of traditional libraries, museums and archives should be when dealing with astronomical data and artifacts?

Elizabeth: Traditional libraries and archives are invaluable for retaining and  preserving documents that mapped or recorded observations at any point in the past.  Some artifacts  also need to be preserved and displayed, because so often the precision which which measurements could be made (and thence the reliability of what was quoted as the “measuring error”) was dependent upon the technology of the time (for instance, the use of metals with low expansion coefficients in graduated scales, the accuracy with which graduation marks could be inscribed into metal, the repeatability of the ruling engine used to produce a diffraction grating, etc.).

There is also cultural heritage to be read in the historic books and equipment, and it is important to keep that link visible if only so as to retain a correct perspective of where we are now at.  Science advances by the way people customarily think and by what [new] information they can access to fuel that thinking, so understanding a particular line of argument or theory can depend importantly upon the culture of the day.

International Year of Astronomy (NASA, Chandra, 2/10/09) Messier 101 (M101) from NASA’s Marshall Space Flight Center on Flickr.

Jane: The word “innovation” is often used in the context of science and technology, and teaching science.  See for example: The Marshmallow Challenge.  How do you think the concept of “innovation” can be most effectively used?

Elizabeth: “Innovation” has become something of a buzz-word in modern science, particularly when one is groping for a new way to dress up an old project for a grant proposal!  The public must also be rather bemused by it, since so many new developments today are described as “innovative.” What is important is to teach the concept of thinking outside the box.  That is usually how “innovative” ideas get converted into new technologies – not just cranking the same old handle to tease out one more decimal place – so whether you label it “innovation” or something else, the principle of steering away from the beaten track, and working across scientific disciplines rather than entombing them within specialist ivory towers, is the essential factor in true progress.

Jane: “Big data” analysis is often cited as valuable for finding patterns and/or exceptions.  How does this relate to the work of astronomers?

Elizabeth: Very closely!  Astronomers invented the “Virtual Observatory” in the late 20th Century, with the express purpose of federating large data-sets (those resulting from major all-sky surveys, for instance) but at different wavelengths (say) or with other significantly different properties, so that a new dimension of science could be perceived/detected/extracted.  There are so very many objects in an astronomer’s “target list” (our Galaxy alone contains some 10 billion stars, though amongst those are very many different categories and types) and it was always going to be way beyond individual human power and effort to perform such federating unaided.  Concepts of “big data” analyses assist the astronomer very substantially in grappling with that type of new science, though obviously there are guidelines to respect, such as making all metadata conform to certain standards.

Jane: What do you think astronomers have to teach others about generating and using the increasing amounts of data you are seeing now in astronomy?

Elizabeth: A great deal, but the “others” also need to understand how we got to where we now are.  It was not easy; there was not the “plentiful funding” that some outsiders like to assume, and all along the way there were (and still are) internecine squabbles over competitions for limited grant funds: public data or individual research is never an easy balance to strike!  The main point is to design the basics of a system that can work, and to persevere with establishing what it involves.

The basic system needs to be dynamic – able to accommodate changing conditions and moving goal-posts – and to identify resources that will ensure long-term longevity and support.  One such resource is clearly the funding to maintain and operate dynamic, distributed databases of the sort that astronomers now find usefully productive; another is the trained personnel to operate, develop and expand the capabilities, especially in an ever-changing environment.  A third is the importance of educating early-career scientists in the relevance and importance of computing support for compute-intensive sciences.  That may sound tautological, but it is very true that behind every successful modern researcher is a dedicated computing expert.

Teamwork has been an essential ingredient in astronomers’ ability to access and re-purpose large amounts of data.  The Virtual Observatory was not built just by computing experts; at least one third of committee members are present-day research astronomers, able to give first-hand explanations or caveats, and to transmit practical ideas.  These aspects are important ingredients in the model.  At the same time, astronomers still have a very long way to go; only very limited amounts of their non-digital (i.e. pre-digital) data have so far made it to the electronic world; most observations from “history” were recorded on photographic plates and the faithful translation of those records into electronic images or spectra is a specialist task requiring specialist equipment.  One of the big battles which such endeavors face is even a familial one, with astronomer contending against astronomer: most want to go for the shiny and new things, not the old and less sophisticated ones, and it is an uphill task to convince one’s peers that progress is sometimes reached by moving backwards!

Jane: What do you think will be different about the type of data you will have available and use in 10 years or 20 years?

Elizabeth: In essence nothing, just as today we are using basically the same type of data that we have used for the past 100+ years.  But access to those data will surely be a bit different, and if wishes grew on trees then we will have electronic access to all our large archives of historic photographic observations and metadata, alongside our ever-growing digital databases of new observations.

Jane: Do astronomers share raw data, and if so, how? When they do share, what are their practices for assigning credit and value to that work? Do you think this will change in the future?

Elizabeth: The situation is not quite like that.  Professional observing is carried out at large facilities which are nationally or internationally owned and operated.  Those data do not therefore “belong” to the observer, though the plans for the observing, and the immediate results which the Principal Investigator(s) of the observing program may have extracted, are intellectual property owned by the P.I. or colleagues unless or until published.  The corresponding data may have limited access restrictions for a proprietary period (usually of the order of 1 year, but can be changed upon request).

Many of the data thus stored are processed by some kind of pipeline to remove instrumental signatures, and are therefore no longer “raw”; besides, raw data from space are telemetered to Earth and would have no intelligible content until translated by a receiving station and converted into images or spectra of some kind.  Credit to the original observing should be cited in papers emanating from the research that others carry out on the same data once they are placed in the public domain.  I hope that will not change in the future.  It is all too tempting for some “armchair” astronomers (one thinks particularly of theoreticians) who do not carry out their own observing proposals, but wait to see what they can cream off from public data archives.  That is of course above board, but those people do not always appreciate the subtleties of the equipment or the many nuances that may have affected the quality or content of the output.

Jane: Do astronomers value quantitative data derived from observations differently than images themselves?

Elizabeth: Yes, entirely.  The good scientist is a skeptic,  and one very effective driver for the high profile of our database management schemes is the undeniable truth that two separate researchers may get different quantitative data from the same initial image, be that “image” a direct image of the sky or of an object, or its spectrum.  The initial image is therefore the objective element that should ALWAYS be retained for others to use; the quantitative measurements now in the journal are very useful, but are always only subjective, and never definitive.

Jane: How do you think citizen science projects such as Galaxy Zoo can be used to make a case for preservation of data?

Elizabeth: There is a slight misunderstanding here, or maybe just a bad choice of example!  Galaxy Zoo is not a project in which citizens obtain and share data; the Galaxy data that are made available to the public have been acquired professionally with a major telescope facility; the telescope in question (the Sloan Telescope) obtained a vast number of sky images, and it is the classification of the many galaxies which those images show which constitute the “Galaxy Zoo” project.  There is no case to be made out of that project for the preservation of data, since the data (being astronomical!) are already, and will continue to be, preserved anyway.

Your question might be better framed if it referred (for instance) to something like eBird, in which individuals report numbers and dates of bird sightings in their locations, and ornithologists are then able to piece together all that incoming information and worm out of it new information like migration patterns, changes in those patterns, etc.  It is the publication of new science like that that helps to build the case for data preservation, particularly when the data in question are not statutorily preserved.

Galen Charlton: Verifying our tools; a role for ALA?

Wed, 2014-10-08 14:59

It came to light on Monday that the latest version of Adobe Digital Editions is sending metadata on ebooks that are read through the application to an Adobe server — in clear text.

I’ve personally verified the claim that this is happening, as have lots of other people. I particularly like Andromeda Yelton’s screencast, as it shows some of the steps that others can take to see this for themselves.

In particular, it looks like any ebook that has been opened in Digital Editions or added to a “library” there gets reported on. The original report by Nate Hofffelder at The Digital Reader also said that ebook that were not known to Digital Editions were being reported, though I and others haven’t seen that — but at the moment, since nobody is saying that they’ve decompiled the program and analyzed exactly when Digital Editions sends its reports, it’s possible that Nate simply fell into a rare execution path.

This move by Adobe, whether or not they’re permanently storing the ebook reading history, and whether or not they think they have good intentions, is bad for a number of reasons:

  • By sending the information in the clear, anybody can intercept it and choose to act on somebody’s choice of reading material.  This applies to governments, corporations, and unenlightened but technically adept parents.  And as far as state actors are concerned – it actually doesn’t matter that Digital Editions isn’t sending information like name and email addresses in the clear; the user’s IP address and the unique ID assigned by Digital Editions will often be sufficient for somebody to, with effort, link a reading history to an individual.
  • The release notes from Adobe gave no hint that Digital Editions was going to start doing this. While Amazon’s Kindle platform also keeps track of reading history, at least Amazon has been relatively forthright about it.
  • The privacy policy and license agreement similarly did not explicitly mention this. There has been some discussion to the effect that if one looks at those documents closely enough, that there is an implied suggestion that Adobe can capture and log anything one chooses to do with their software. But even if that’s the case – and I’m not sure that this argument would fly in countries with stronger data privacy protection than the U.S. – sending this information in the clear is completely inconsistent with modern security practices.
  • Digital Editions is part of the toolchain that a number of library ebook lending platforms use.

The last point is key. Everybody should be concerned about an app that spouts reading history in the clear, but librarians in particular have a professional responsibility to protect our user’s reading history.

What does it mean in the here and now? Some specific immediate steps I suggest for libraries is to:

  • Publicize the problem to their patrons.
  • Officially warn their patrons against using Digital Editions 4.0, and point to work arounds like pointing “” to “” in hosts files.
  • If they must use Digital Editions to borrow ebooks, to recommend the use of earlier versions, which do not appear to be spying on users.

However, there are things that also need to be done in the long term.

Accepting DRM has been a terrible dilemma for libraries – enabling and supporting, no matter how passively, tools for limiting access to information flies against our professional values.  On the other hand, without some degree of acquiescence to it, libraries would be even more limited in their ability to offer current books to their patrons.

But as the Electronic Frontier Foundation points out,  DRM as practiced today is fundamentally inimical to privacy. If, following Andromeda Yelton’s post this morning, we value our professional soul, something has to give.

In other words, we have to have a serious discussion about whether we can responsibly support any level of DRM in the ebooks that we offer to our patrons.

But there’s a more immediate step that we can take. This whole thing came to light because a “hacker acquaintance” of Nate’s decided to see what Digital Editions is sending home. And a key point? Once the testing starting, it probably didn’t take that hacker more than half an hour to see what was going on, and it may well have taken only five.

While the library profession probably doesn’t count very many professional security researchers among its ranks, this sort of testing is not black magic.  Lots of systems librarians, sysadmins, and developers working for libraries already know how to use tcpdump and Wireshark and the like.

So what do we need to do? We need to stop blindly trusting our tools.  We need to be suspicious, in other words, and put anything that we would recommend to our patrons to the test to verify that it is not leaking patron information.

This is where organizations like ALA can play an important role.  Some things that ALA could do include:

  • Establishing a clearinghouse for reports of security and privacy violations in library software.
  • Distribute information on ways to perform security audits.
  • Do testing of library software in house and hire security researches as needed.
  • Provide institutional and legal support for these efforts.

That last point is key, and is why I’m calling on ALA in particular. There have been plenty of cases where software vendors have sued, or threatened to sue, folks who have pointed out security flaws. Rather than permitting that sort of chilling effect to be tolerated in the realm of library software, ALA can provide cover for individuals and libraries engaged in the testing that is necessary to protect our users.

Andromeda Yelton: ebooks choices and the missing soul of librarianship

Wed, 2014-10-08 13:47

We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.
American Library Association Code of Ethics

Yesterday I watched as Adobe Digital Editions told Adobe what book I was reading — title, author, publisher, year of publication, subject, description — and every page I’d read, and the time at which I read them. Adobe’s EULA states that it also collects my user ID and my general location.

I was able to watch this information be collected because it was all sent unencrypted, readable to any English-speaking human with access to any of the servers it passes through, in whatsoever jurisdiction, and also (if your wifi is unencrypted) the open air between my laptop and my router.

The Council of the American Library Association strongly recommends that… [circulation and other personally identifying] records shall not be made available to any agency of state, federal, or local government except pursuant to such process, order or subpoena as may be authorized under the authority of, and pursuant to, federal, state, or local law relating to civil, criminal, or administrative discovery procedures or legislative investigative power [and that librarians] resist the issuance of enforcement of any such process, order, or subpoena until such time as a proper showing of good cause has been made in a court of competent jurisdiction.”
Policy on confidentiality of library records

Your patrons’ reading information is already part of a warrantless dragnet. Because it has been transmitted in cleartext, the government needs no further assistance from you, your patrons, or your vendors to read it. Even were they to present you with a valid subpoena, you would be powerless to resist it, because you have, in effect, already written the information on your walls; you have no technical ability to protect it.

The American Library Association urges all libraries to…

  • Limit the degree to which personally identifiable information is collected, monitored, disclosed, and distributed; and avoid creating unnecessary records; and
  • Limit access to personally identifiable information to staff performing authorized functions; and…
  • Ensure that the library work with its organization’s information technology unit to ensure that library usage records processed or held by the IT unit are treated in accordance with library records policies; and
  • Ensure that those records that must be retained are secure; and
  • Avoid library practices and procedures that place personally identifiable information on public view.”

Resolution on the Retention of Library Usage Records

If Adobe Digital Editions is part of your technical stack — if your library offers Overdrive or 3M Cloud Library or EBL or ebrary or Baker & Taylor Axis 360 or EBSCO or MyiLibrary or quite possibly other vendors I haven’t googled yet — you are not doing this. You cannot do this.

…ebook models make us choose. And I don’t mean choosing which catalog, or interface, or set of contract terms we want — though we do make those choices, and they matter. I mean that we choose which values to advance, and which to sacrifice. We’re making those values choices every time we sign a contract, whether we talk about it or not.
me, Library Journal, 2012

In 2012 I wrote and spoke about how the technical affordances, and legal restrictions, of ebooks make us choose among fundamental library values in a way that paper books have not. About how we were making those choices about values whether we made them explicitly or not. About how we default to choosing access over privacy.

We have chosen access over privacy, and privacy is not an option left for us to choose.

Because: don’t underestimate this. This is not merely a question of a technical slip-up in one version of an Adobe product.

This is about the fact that we do not have the technical skills to verify whether our products are in line with the values we espouse, the policies we hold, or even the contracts we sign, and we do not delegate this verification to others who do. Our failure to verify affects all the software we run.

This is about the fact that best practice in software is generally to log promiscuously; you’re trained, as a developer, to keep all the information, just in case it comes in handy. It takes a conscious choice (or a slipshod incompetence) not to do so. Libraries must demand that our vendors make that choice, or else we are in the awkward position of trusting to their incompetence. This affects all the software we run.

This is about the fact that encryption products are often hard to use, the fact that secure https is not yet the default everywhere, the fact that anyone can easily see traffic on the unencrypted wireless networks found at so many libraries, the fact that anyone with the password (which, if you’re a library, is everyone) can see all the traffic on encrypted networks too. This affects all the software we run.

This is about Adobe. It is not just about Adobe. These are questions we should ask of everything. These are audits we should be performing on everything. This affects all the software we run.

I am usually a middle-ground person. I see multiple sides to every argument, I entertain arguments that have shades of the abhorrent to find their shades of truth. This is not an issue where I can do that.

If you have chosen, whether actively or by default, to trust that the technical affordances of your software match both your contracts and your values, you have chosen to let privacy burn. If you’re content with that choice, have the decency to stand up and say it: to say that playing nice with your vendors matters more to you than this part or professional ethics, that protecting patron privacy is not on your list of priorities.

If you’re not content with that choice, it is time to set something else on fire.

LITA: Managing Library Projects: General Tips

Wed, 2014-10-08 13:00
Image courtesy of Joel Dueck. Flickr 2007.

During my professional career, both before and after becoming a Librarian, I’ve spent a lot of time managing projects, even when that wasn’t necessarily my specific role. I’ve experienced the joys of Project Management in a variety of settings and industries, from tiny software startups to large, established organizations. Along the way, I’ve learned that, while there are general concepts that are useful in any project setting, the specific processes and tools used needed to complete a specific project depend on the nature of the task at hand and the organization’s profile. Here are some general strategies to keep in mind when tackling a complex project:

Pay special attention to connection points

Unless your project is entirely contained within one department, there will be places in your workflow where interaction between two or more disparate units will take place. Each unit has its own processes and goals, which may or may not serve your project’s purposes, so it’s important that you as PM keep the overall goals of the project in mind and ensure that work is being done efficiently in terms of the project’s needs, not just the department’s usual workflow. Each unit will likely also have its own jargon, so you need to make sure that information is communicated accurately between parties. It’s at these connection points that the project is most likely to fail, so keep your eye on what happens here.

Don’t reinvent the wheel

While a cross-functional project will potentially require the creation of new workflows and processes, it’s not a good idea to force project participants to go about their work in a way that is fundamentally different from what they usually do. First, it will steepen the learning curve and reduce efficiency, and second, because these staff members are likely to be involved in multiple projects simultaneously, it will increase confusion and make it more difficult for them to correctly follow your guidelines for what needs to be done. Try to design your workflows so that they take advantage of existing processes within departments as much as possible, and increase efficiency by modifying the way departments interact with one another to maximize results.

Choose efficient tools, not shiny ones

Even in the wealthiest organizations, resources are always at a premium, so when picking tools to use in managing your project don’t fall for the beautiful picture on the front of the box. Consider the cost of a particular tool, both in terms of price and the learning curve involved in bringing everyone attached to the project up to speed on how to use it. Sometimes the investment will be worth it; often you will be better off with something simpler that project staff already know. You can create complex project plans with MS Project or Abak 360, but for most projects I find that a rudimentary scheduling spreadsheet and a couple of quick and dirty projection models, all created with MS Excel, will do just as well. Free web-based tools can also be useful: one of my favorites is Lucid Chart, a workflow diagram creation tool that can replace Visio for many applications (and offers pretty good deals for educational institutions). The main concerns with this type of approach are whether having your project plans stored in the cloud makes sense from a security point of view, and the potential for a particular tool to disappear unexpectedly (anyone remember Astrid?).


Those are a few of the strategies that I have found useful in managing projects. What’s your favorite project management tip?

In the Library, With the Lead Pipe: The Right to Read: The How and Why of Supporting Intellectual Freedom for Teens

Wed, 2014-10-08 10:30

Teen girl working in the library (Asheboro Public Library – Flickr)

In brief: Intellectual freedom and equal access to information are central to libraries’ mission, but  libraries often fail to consider the intellectual freedom needs of teenage patrons, or lump teen patrons in with children in conversations of intellectual freedom. However, adolescence is developmentally distinct from childhood, and the freedom to access information of all kinds is vital for teen patrons. In this article, I outline the case for protecting intellectual freedom for young adults and provide practical steps libraries can take to do just that.


Recently, my grandmother sent me an article by Meghan Cox Gurdon called “The Case for Good Taste in Children’s Books.” Gurdon, the Wall Street Times children’s books reviewer, has gained notoriety among young adult librarians, authors, educators, and readers for writing about (primarily decrying) the prevalence of serious, often unpleasant themes and topics in young adult literature. Her 2011 article “Darkness Too Visible” set off a firestorm in the YA world and led to the creation of #YAsaves, an online movement where readers, authors, and librarians share the impact of “dark” literature on their lives. The newer article has much the same premise as the earlier one: contemporary young adult literature covers topics too lurid, too grim, and too graphic for young readers. In addition, “The Case for Good Taste in Children’s Books” calls for authors, editors, and publishers to censor the contents of books for young people under the guise of quality and “good taste.” Although I have plenty to say about Gurdon’s arguments, this article is not a direct response to her. Others have responded more eloquently than I could hope to (Sherman Alexie’s response, “Why the Best Kids’ Books Are Written in Blood,” is my personal favorite). Instead, this article is a call for libraries to actively and consciously defend the rights of teenager readers and library patrons, brought on by a discussion of “The Case for Good Taste in Children’s Books.”

While discussing Gurdon’s article, I found myself repeating what I think of as the library party line on intellectual freedom for young people: parents and guardians have the right to decide what their children have access to, but they don’t have the right to decide what all children have access to. Caregivers raise their children with a certain set of values. They have the right to introduce their children to materials that reflect those values and to discourage their children from accessing materials that contradict or challenge their values. Whether or not we as librarians agree with those values is irrelevant; our responsibility is to provide access to a wide variety of materials representing many viewpoints and to help users find materials that fit their needs.

While we remain neutral with regard to the content of library materials, libraries actively encourage caregivers to participate in their children’s intellectual lives in a variety of ways. Collection development policies frequently include language that rests the responsibility for children’s library use in their guardians’ hands. Early literacy programs for caregivers encourage them to read with and to their children. We also facilitate participation in more practical ways, like linking the accounts of children and their guardians. In an ideal world, creating the opportunity for guardians and their children to talk about reading together would set a precedent for conversations that continue through adolescence. It’s not that caregivers should stop being involved in their children’s’ library use and reading habits when their children reach adolescence. There may be times, however, when a young person wants or needs information to which her guardian might want to restrict access. Because of the developmental needs of adolescence and libraries’ commitment to intellectual freedom, libraries should support the intellectual freedom for teenagers rather than the right of guardians to control their children’s intellectual lives.

For teenagers the right to read—even materials with which adults in their lives may not be comfortable—is vitally important. Literature and information are tools for teens who are developing a sense of self and beginning to explore and understand the world as individuals independent from the family in which they were raised. Unfortunately, teens and teen materials are frequently targeted in efforts to censor information and restrict intellectual freedom. Luckily, there are concrete steps that libraries and librarians can take to protect our adolescent patrons’ privacy and their right to intellectual freedom.

Access to Information and Adolescent Development

First, librarians should understand how teens and children are different and what makes intellectual freedom particularly important for adolescents. Although definitions vary, adolescents are usually thought of as middle and high school students, roughly ages 12-18.1 Teens and children are often lumped together in discussions of intellectual freedom. Discussing “youth” as ages 0-18 fails to account for the different developmental needs of children and teenagers, and the failure to differentiate is detrimental to teens. Adolescence is a time of vast neurological, physiological, emotional, and social change. Teenage brains are primed for learning and more open to new experiences — more interested in novelty and new sensations — than human brains at any other point in our lifespan (for a more detailed discussion of the teen brain, see David Dobb’s National Geographic article). For example, only 2% of 12 year olds are sexually active, but by age 16, a third of teens have had sex, and by 18, the number grows to nearly two-thirds. Developing sexuality, while notable, is just one of the many changes of adolescence. Cognitive changes, including the ability to grapple with complex and abstract ideas, mean that adolescents are much more interested in questions of morality and personhood than younger children. (Steinberg, p. 32). Such rapid and all-encompassing change means that access to information is critical to young people. I’m using “information” in a broad sense; fictional narratives are as important as factual information for teens who are striving the understand the world around them and their place in it.

Establishing self-sufficiency and independence is one of the most significant outcomes of adolescence. Challenging and questioning the beliefs of their family and their culture is a natural and important aspect of teens’ blossoming independence. Blocking teens’ access to reading, viewing, and listening to materials stifles an opportunity for teenagers to explore viewpoints or experiences outside of the frame of influence created by their caregivers. In addition, the process of developing and asserting independence can be a difficult one for teens and guardians, and teens are often experimenting with behaviors and beliefs with which their caregivers are uncomfortable. Adult discomfort is as much a part of adolescence as teen experimentation. Our job as librarians is not to stand on one side or the other, but to provide access to information on a wide range of topics, depicting a wide range of experiences, so that teenagers who come to the library looking to broaden their horizons find the materials to do so. There’s an abundance of good reasons to let teenagers read about difficult and sensitive subjects. Seeing their own difficult lives reflected back at them can give teens going through dark times a sense of hope and comfort. Reading about lives that are different from their own can give teenagers a deeper understanding of others. Studies have shown that reading literary fiction makes people empathetic.

The thorniness of adolescent-guardian relationships and the importance of exploration and experimentation in adolescence means that it is not enough to establish that public libraries do not monitor or restrict what materials young people check out. Public libraries should, as much as possible, treat adolescent patrons as adults with regard to their intellectual freedom and privacy. This is distinct from our treatment of young children, in that we encourage guardians to take an active role in the reading lives and their children and to monitor and censor where they deem appropriate. Encouraging caregiver censorship for teens is a disservice to adolescents in a way that it is not to younger children, especially given the often-complicated relationships between young people and their guardians.

Knowing that teens have a developmental need for intellectual freedom, librarians should also be aware that young adults are vulnerable to attacks on their right to read and right to information. Adult discomfort with the sudden maturity of teenagers means that challenges to young adult materials in public libraries, school libraries, and classroom curricula make up the vast majority of book challenges. From 1990-2009 (the most recent data available via ALA’s Office of Intellectual Freedom) the number of challenges in schools and school libraries was more than double the number of challenges in any other institution. In the same time period, “unsuited to age group” was the third most common reason given for book challenges. Obviously both of these statistics include children’s as well as young adult materials. Looking at frequently challenged titles lists gives a little more insight as to the breakdown of the challenges. On the 2013 list of most the ten frequently challenged titles, more than half are young adult novels (The Absolutely True Diary of a Part-time Indian, The Hunger Games, A Bad Boy Can Be Good for a Girl, Looking for Alaska, The Perks of Being a Wallflower, and the Bone series). Two additional titles are frequently taught in high school classes or included on summer reading lists (The Bluest Eye and Bless Me Ultima). The 2011 list is even heavier in young adult titles (the ttyl series, The Color of the Earth series, The Hunger Games trilogy, The Absolutely True Diary of a Part-time Indian, the Alice series, What My Mother Doesn’t Know, and the Gossip Girl series) with two additional titles that are classroom standards (Brave New World and To Kill a Mockingbird). Book challenges can result in lost opportunities beyond access to information and stories. Rainbow Rowell, whose YA novel Eleanor and Park was well-received by critics and teens alike, had an invitation to speak at a Minnesota high school and public library rescinded after parents challenged the book. Meg Medina, author of  a novel about bullying called Yaqui Delgado Wants to Kick Your Ass, faced a similar situation in Virginia.

Due to their minority age, relative lack of power, and the not-uncommon idea that they don’t know what’s good for them, teens are relatively powerless in the face of attacks on their intellectual freedom, although they often speak out in support of challenged books. Public institutions may feel pressure to cave to the demands of tax-paying adults, but we are not serving teens’ best interests when we do so.

In fact, libraries can and should be defenders of teens’ intellectual freedom. Before I delve into the why and how of that assertion, I want to briefly acknowledge that I’m talking primarily about public libraries here. Earlier I mentioned the standard line that guardians are responsible for the reading habits and materials of their own children, but do not have the right to dictate what other children can and can’t read. This policy is an extension of the common assertion that public libraries do not act in loco parentis, or in the role of a parent or guardian. While public libraries do not act in loco parentis, schools have a legal mandate to do exactly that. The history of schools and the doctrine is a long and complex one, and the intersection between in loco parentis and schools’ responsibility to protect the constitutional rights of students is still being negotiated. For more on this topic, see the article by Richard Peltz-Steele included in the Additional Reading list at the end of the article.

Libraries, and particularly public libraries, occupy a space in teens’ lives that makes them uniquely suited to protect and defend teens’ intellectual freedom. Teens and young people (ages 14-24) represent nearly a quarter of public library users, a larger percentage than any other age group. By the time they are middle and high school students, many teenagers are using the library independently. Unlike the classroom, where topics and titles are governed by state and federal requirements, or are chosen unilaterally by the teacher, libraries offer information on virtually any topic of interest from lock-picking to the history of Russian firearms (both real, non-school reference questions I’ve answered). In our collections, we have books that guardians and teachers might not provide, either because they are unaware of them or because they object to the content. The wealth and variety of resources available in libraries make them an ideal match for minds that are receptive to new ideas and primed for learning. A huge amount of information is available online, of course, but many teenagers either don’t have internet access at home or are sharing a computer and internet connection, which can make searching for potentially sensitive information riskier. Additionally, our mandate to protect user privacy means that libraries are a safe space for teens to explore topics and read books that might be embarrassing or controversial.

Supporting Intellectual Freedom for Teens in Your Library

Unfortunately, the theoretical side of intellectual freedom is often the easy part. By and large, librarians seem to agree that we are not parents or guardians and that we do not censor materials because they are controversial. Implementing practices and policies that support our theoretical stance – walking the intellectual freedom walk, so to speak – can be more difficult than getting fired up about the right to read. What feels obvious in an abstract discussion of book bans and challenges and internet access can be complicated and daunting in the real world; turning theory into practice, especially in light of daily demands on our time and energy, is not always easy. So what can your library to do support intellectual freedom for teen patrons? Below you’ll find some suggestions (most of which, as an added benefit, will support intellectual freedom for all of your patrons).

Begin by reviewing your library’s policies. This sounds obvious – most collection development policies have some kind of language absolving libraries from monitoring or restricting the materials checked out by minors (the previously discussed in loco parentis clause). If you’ve never seen your library’s collection development policy, or if it’s been a while, start there. While you’re checking the collection development policy, also look for language that outlines the process for challenges to materials so that you are ready with a response in case of unhappy community members. This is pretty basic library school stuff, but when you’re a working librarian, it’s easy to get wrapped up in daily tasks and set things like policy updates aside. While you’re checking and possibly updating your collection development policy, also look for a statement on diversity within the collection. Most libraries are charged with meeting the needs and interests of their communities, but that does not mean catering only to the majority. In fact, a well-rounded collection should include voices and experiences that do not exist (or are not visible) in your community.

Review your collection as well. Does it have materials for those young people who do not share the majority beliefs, views, and experiences of your community? If you work in a more conservative area, do you have books on sex and sexuality for teenagers? If you work in a liberal area, do you have materials by conservative writers? Are a variety of religions, ethnicities, sexual orientations, and gender expressions represented in your fiction and nonfiction collections, particularly your young adult collections?

While you’re reviewing policies, look at your cardholder policy. Do you offer family cards (i.e., a single account that can be used and reviewed by all the members of a family)? While family cards are convenient, they make it nearly impossible to guarantee privacy for teen patrons. In cases where teens’ interests and caregivers’ values don’t align, privacy and intellectual freedom are nearly synonymous, so protecting teen’s privacy is a vital part of protecting their intellectual freedom. Even if you don’t offer family cards, do caregivers have access to information about their teenaged children’s check-outs? For an example of the library system whose cardholder rules protect teenagers’ right to privacy, and with it their freedom to access any kind of materials, check out the Seattle Public Library’s privacy policy. While you’re reviewing cardholder policy, also look at its implementation at your library. Do librarians or circulation staff offer information such as the titles of overdue books on a teen’s card to other family members, even if policy is designed to protect this information? If so, consider offering training on intellectual freedom and privacy to staff. Intellectual freedom is covered in library school, but often front-line staff aren’t librarians and may not have had the same depth of training on importance of privacy and equal access. An organizational culture that supports intellectual freedom is as important—perhaps more so—as policies that do the same.

These suggestions aren’t world-shaking, but if it’s been a while since you did a policy or collection review, or since you reviewed the way that policy is put into practice at your institution, consider this a gentle reminder that policy, as rote as it may seem, can have real implications for young people.

While policy, collection, and practice are great places to make changes that support your teen patrons’ freedom to read, there are additional things libraries and librarians can do to facilitate access to information, especially information that might be embarrassing to ask about or otherwise controversial. First, consider creating an honor system collection. The defining feature of an honor system collection is that the books in the collection can be borrowed from the library without a library card or any other method of check-out. The collection can be as informal as a basket of high-interest books or can be processed and cataloged like the rest of the collection, although security tags and other measures should be de-activated or left off during processing. The Santa Cruz Public Library has an honor system collection called the Teen Self Help collection; titles are entered in to the catalog and the records are browsable by tags. Honor system collections help protect teens’ privacy and remove intimidation and embarrassment, which can be particularly potent in adolescence, as barriers to access to information. These collections tend to focus on nonfiction titles, but could easily include popular fiction titles as well, especially those that frequently appear on banned and challenged lists – titles like Speak, The Absolutely True Diary of a Part-time Indian, and The Perks of Being a Wallflower.

Consider partnering with community organizations to promote intellectual freedom and access to information for teenagers. Reach out to local organizations for information on mental health, sex and sexuality, healthy relationships, drugs and alcohol, and other topics that teens may need information on.  Examples include religious groups (check your library’s policy on posting religious material first, and be sure that multiple religions are represented), Planned Parenthood and other health organizations, and institutions that work with homeless youth. Many of these organizations have pamphlets or other information available. Create a community resources area in your teen section that provides access to information that doesn’t have to be checked out or returned. Set up a teen resources table at all teen programs, regardless of topic. Players at video game tournaments may not express their interest in or need for health or housing resources, but if the information is available and visible, those who need it are more likely to find and utilize it.

Most of the policy and practice changes I’ve suggested are relatively simple from a librarian’s point of view, but they can make a huge difference to teens for whom intellectual freedom is both vital and tenuous. The right to access materials of all kinds on all topics is a developmental necessity for young people, who are undergoing rapid intellectual, psychological, and social change. As librarians, it’s rarely difficult to talk intellectual freedom; in theory, we all agree that banning books is wrong and access to information of all kinds is right. Putting those ideas into practice, especially when faced with the possibility of controversial material, young people, and unhappy caregivers, can feel much more difficult, but changes like those I’ve suggested above can help bring the theoretical into practice, where it truly matters.

Thanks and Acknowledgments

Thank you to Ellie Collier, Erin Dorney, and Hugh Randle of the In the Library with the Lead Pipe Editorial Board for their insightful comments and grammatical finesse. In addition, I’m grateful to Issac Gilman and Amy Springer, who acted as external editors and provided helpful advice. Thanks to my husband, who now knows more about intellectual freedom for teenagers than any software developer needs to. Above all, thanks to the tireless Gretchen Kolderup, for her guidance, encouragement, and enthusiasm.

Citations and additional reading

“Adolescence.” Encyclopædia Britannica. Encyclopædia Britannica Online Library Edition.

Encyclopædia Britannica, Inc., 2013. Accessed December 4, 2013. 

Alexie, Sherman. “Why The Best Kids’ Books are Written in Blood.” The Wall Street Journal: Speakeasy, June 9, 2011. Accessed November 20, 2013.

Becker, Samantha et al. “Opportunity for All: How Library  Policies and Practices Impact Public Internet Access, report  no. IMLS-2011-RES-010.” Accessed May 18, 2014.

“Challenges by reason, initiator & institution for 1990-99 and 2000-09,” Banned & Challenged Books. Accessed November 20, 2013.

Dobbs, David. “Beautiful Brains.” National Geographic Magazine, October 2011. Accessed November 20, 2013.

Gurdon, Megan Cox. “The Case for Good Taste in Children’s Books.” Imprimis 42, no. 7/8 (July/August 2013). Accessed November 20, 2013.

Gurdon, Megan Cox. “Darkness Too Visible.” Wall St. Journal, June 4, 2011. Accessed November  20, 2013.

Guttmacher Institute. “American Teens’ Sexual and Reproductive Health.” May 2014. Accessed May 17, 2014.

Peltz-Steele, Richard J., Pieces of Pico: “Saving Intellectual Freedom in the Public School Library,” Brigham Young University Education and Law Journal, Vol. 2005, p. 103, 2005. Accessed January 10, 2014.

Steinberg, Laurence, and Stephanie Dionne Sherk. “Adolescence.” The Gale Encyclopedia of Children’s Health: Infancy through Adolescence. Ed. Kristine Krapp and Jeffrey Wilson. Vol. 1. Detroit: Gale, 2006. 32-36. Accessed September 28, 2014.

“Talks Cancelled for YA Authors Meg Medina and Rainbow Rowell.” Blogging Censorship. National Coalition Against Censorship. September 13, 2014. Accessed September 28, 2014.


  1. A side note about ages: obviously, the transition from childhood to adulthood is an individual process, which every individual reaching milestones in different orders and at different times. Creating policies that differentiate children from adolescents necessitates an arbitrary cutoff, although the process is, of course, a gradual one. Many organizations that serve young people and youth service providers (including The Search Institute and YALSA, among others) seem to agree that 12 is an appropriate age for that arbitrary cut-off, but there is room for discussion and disagreement on this point.

LibUX: 015: A High Functioning Research Site with Sean Hannan

Wed, 2014-10-08 06:13

On #libux 15, Sean Hannan talks about designing a high functioning research site for the John Hopkins Sheridan Libraries and University Museums. I love it. It’s a crazy fast API-driven research dashboard mashing up research databases, LibGuides, and a magic, otherworldly carousel actually increasing engagement. I, uh, want to rip this site off so bad. Research tools are so incredibly difficult to build well, especially when libraries rely so heavily on third parties, that I’m glad to have taken the opportunity to pick Sean’s brain. Thirty minutes of whaaaaaa?! right here.

Cluster your content as it makes sense, but at the same time don’t force people to feel like they are stepping through a novel. They’re not going to read through the entire guide and say, ‘Aha! Now I can start my research!’ They’re there for a tiny bit of information and then they’re off.

Sean’s contributed a chapter about writing an API for APIs in More Library Mashups, which you can pre-order on the cheap.


On that podcast, I talk about optimizing browser painting, here’s two good links on it:

— Sean Hannan (@MrDys) October 8, 2014

The post 015: A High Functioning Research Site with Sean Hannan appeared first on LibUX.

Cynthia Ng: Using Reveal.JS on GitHub Pages for your Presentations

Wed, 2014-10-08 03:43
So after getting yelled at by @adr, I decided that I would finally move away from PowerPoint and use reveal.js for my presentations. Unfortunately, while it seemed fairly simple, all the instructions I found involved the command line, which I didn’t want to use unless it was absolutely necessary. So here are the basic instructions […]

Terry Reese: MarcEdit Sept. 2014 server log snapshot

Wed, 2014-10-08 02:23

Here’s a snapshot of the server log data as reported through Awstats for the subdomain. 

Server log stats for Sept. 2014:

  • Logged MarcEdit uses: ~190,000
  • Unique Users: ~17,000
  • Bandwidth Used: ~14 GB

Top 10 Countries by Bandwidth:

  1. United States
  2. Canada
  3. China
  4. India
  5. Australia
  6. Great Britain
  7. Mexico
  8. Italy
  9. Spain
  10. Germany

Countries by Use (with at least 100+ reported uses)

United States





Great Britain









New Zealand



Russian Federation

Hong Kong






Saudi Arabia








Czech Republic









El Salvador




European country



United Arab Emirates

South Africa






South Korea








Slovak Republic



Costa Rica











Republic of Serbia




Sri Lanka

Puerto Rico

Dominican Republic










Ivory Coast (Cote D’Ivoire)






Papua New Guinea



Netherlands Antilles





Palestinian Territories


Aland islands









Jonathan Rochkind: Catching HTTP OPTIONS /* request in a Rails app

Tue, 2014-10-07 21:36

Apache sometimes seems to send an HTTP “OPTIONS /*” request to Rails apps deployed under Apache Passenger.  (Or is it “OPTIONS *”? Not entirely sure). With User-Agent of “Apache/2.2.3 (CentOS) (internal dummy connection)”.

Apache does doc that this happens sometimes, although I don’t understand it.

I’ve been trying to take my Rails error logs more seriously to make sure I handle any bugs revealed. 404’s can indicate a problem, especially when the referrer is my app itself. So I wanted to get all of those 404’s for Apache’s internal dummy connection out of my log.  (How I managed to fight with Rails logs enough to actually get useful contextual information on FATAL errors is an entirely different complicated story for another time).

How can I make a Rails app handle them?

Well, first, let’s do a standards check and see that RFC 2616 HTTP 1.1 Section 9 (I hope I have a current RFC that hasn’t been superseded) says:

If the Request-URI is an asterisk (“*”), the OPTIONS request is intended to apply to the server in general rather than to a specific resource. Since a server’s communication options typically depend on the resource, the “*” request is only useful as a “ping” or “no-op” type of method; it does nothing beyond allowing the client to test the capabilities of the server. For example, this can be used to test a proxy for HTTP/1.1 compliance (or lack thereof).

Okay, sounds like we can basically reply with whatever we want to this request, it’s a “ping or no-op”.  How about a 200 text/plain with “OK\n”?

Here’s a line I added to my Rails routes.rb file that seems to catch the “*” requests and just respond with such a 200 OK.

match ':asterisk', via: [:options], constraints: { asterisk: /\*/ }, to: lambda {|env| [200, {'Content-Type' => 'text/plain'}, ["OK\n"]]}

Since “*” is a special glob character to Rails routing, looks like you have to do that weird constraints trick to actually match it. (Thanks to mbklein, this does not seem to be documented and I never would have figured it out on my own).

And then we can use a little “Rack app implemented in a lambda” trick to just return a 200 OK right from the routing file, without actually having to write a controller action somewhere else just to do this.

I have not yet tested this extensively, but I think it works? (Still worried if Apache is really requesting “OPTIONS *” instead of “OPTIONS /*” it might not be. Stay tuned.)

Filed under: General

Jonathan Rochkind: Catching HTTP OPTIONS /* request in a Rails app

Tue, 2014-10-07 21:36

Apache sometimes seems to send an HTTP “OPTIONS /*” request to Rails apps deployed under Apache Passenger.  (Or is it “OPTIONS *”? Not entirely sure). With User-Agent of “Apache/2.2.3 (CentOS) (internal dummy connection)”.

Apache does doc that this happens sometimes, although I don’t understand it.

I’ve been trying to take my Rails error logs more seriously to make sure I handle any bugs revealed. 404’s can indicate a problem, especially when the referrer is my app itself. So I wanted to get all of those 404’s for Apache’s internal dummy connection out of my log.  (How I managed to fight with Rails logs enough to actually get useful contextual information on FATAL errors is an entirely different complicated story for another time).

How can I make a Rails app handle them?

Well, first, let’s do a standards check and see that RFC 2616 HTTP 1.1 Section 9 (I hope I have a current RFC that hasn’t been superseded) says:

If the Request-URI is an asterisk (“*”), the OPTIONS request is intended to apply to the server in general rather than to a specific resource. Since a server’s communication options typically depend on the resource, the “*” request is only useful as a “ping” or “no-op” type of method; it does nothing beyond allowing the client to test the capabilities of the server. For example, this can be used to test a proxy for HTTP/1.1 compliance (or lack thereof).

Okay, sounds like we can basically reply with whatever we want to this request, it’s a “ping or no-op”.  How about a 200 text/plain with “OK\n”?

Here’s a line I added to my Rails routes.rb file that seems to catch the “*” requests and just respond with such a 200 OK.

match ':asterisk', via: [:options], constraints: { asterisk: /\*/ }, to: lambda {|env| [200, {'Content-Type' => 'text/plain'}, ["OK\n"]]}

Since “*” is a special glob character to Rails routing, looks like you have to do that weird constraints trick to actually match it. (Thanks to mbklein, this does not seem to be documented and I never would have figured it out on my own).

And then we can use a little “Rack app implemented in a lambda” trick to just return a 200 OK right from the routing file, without actually having to write a controller action somewhere else just to do this.

I have not yet tested this extensively, but I think it works? (Still worried if Apache is really requesting “OPTIONS *” instead of “OPTIONS /*” it might not be. Stay tuned.)

Filed under: General

Library of Congress: The Signal: What Does it Take to Be a Well-rounded Digital Archivist?

Tue, 2014-10-07 18:34

The following is a guest post from Peter Chan, a Digital Archivist at the Stanford University Libraries.

Peter Chan

I am a digital archivist at Stanford University. A couple of years ago, Stanford was involved in the AIMS project, which jump-started Stanford’s thinking about the role of a “digital archivist.” The project ended in 2011 and I am the only digital archivist hired as part of the project that is still on the job on a full-time basis. I recently had discussions with my supervisors about the roles and responsibilities of a digital archivist. This inspired me to take a look at job postings for “digital archivists” and what skills and qualifications organizations were currently looking for.

I looked at eight job advertisements for digital archivists that were published in the past 12 months. The responsibilities and qualifications required of digital archivists were very diverse in these organizations. However, all of them required formal training in archival theory and practice. Some institutions placed more emphasis on computer skills and prefer applicants to have programming skills such as PERL, XSLT, Ruby, HTML and experience working with SQL databases and repositories such as DSpace and Fedora. Others required knowledge on a variety of metadata standards. A few even desired knowledge in computer forensic tools such as FTK Imager, AccessData Forensic Toolkits and writeblockers. Most of these tools are at least somewhat familiar to digital archivists/librarians.

Screenshot from the ePADD project.

In my career, however, I have also found other skills useful to the job. In my experience working on two projects (ePADD and GAMECIP), I also found that the knowledge of Natural Language Processing and Linked Open Data/Semantic Web/Ontology was extremely useful. Because of those needs, I became familiar with the Stanford Named Entity Recognizer (NER) and the Apache OpenNLP library to extract personal names, organizational names and locations in email archives in the ePADD project. Additionally, familiarity with SKOS, Open Metadata Registry and Protégé helped publish controlled vocabularies as linked open data and to model the relationship among concepts in video game consoles in the GAMECIP project.

The table below summarizes the tasks I encountered during the past six years working in the field as well as the skills and tools useful to address each task.

Tasks which may fall under the responsibilities of Digital Archivists Knowledge / Skills / Software / Tools needed to work on the Tasks Collection Development (Interact with donors, creators, dealers, curators – hereafter “creators.”)   Gain overall knowledge (computing habits of creators, varieties of digital material, hardware/software used, etc.) of the digital component of a collection. In-depth knowledge of computing habits, varieties of digital material, hardware/software for all formats (PC, Mac, devices, cloud, etc.). Tool:  AIMS Born-Digital Material Survey Explain to creators the topic of digital preservation, including differences between “bit preservation” and “preserving the abstract content encoded into bits”; migration / emulation / virtualization; “Trust Repository”; levels of preservation when necessary. In-depth knowledge of digital preservation.Background:”Ensuring the Longevity of Digital Information” by Jeff Rothenberg, January 1995 edition of Scientific American (Vol. 272, Number 1, pp. 42-7) (PDF); Reference Model for an Open Archival System (OAIS) (PDF); Preserving Exe: Toward a National Strategy for Software Preservation (PDF); Library of Congress Recommended Format Specifications; NDSA Levels of Preservation; Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) (PDF). Explain to creators how forensic software is used to accession, process and deliver born-digital collections when necessary – especially regarding sensitive/restricted materials. Special knowledge of making use of forensic software in an archival context.Tools: AccessData FTK, EnCase Forensic, etc. Explain to creators the use of natural language processing/data mining/visualization tools to process and deliver born-digital collections when necessary. General knowledge of tools used in processing and delivering born-digital archives such as entity extraction, networking and visualization software.Tools: Stanford Named Entity Recognizer (NER), Apache OpenNLP, Gephi, D3.js, HTML 5 PivotViewer, etc. Explain to creators about publishing born-digital collection metadata and/or contents in semantic web/linked open data vs. Encoded Archival Description finding aids/other HTML-based web publishing methods when necessary. Knowledge of linked data/semantic web/EAD finding aids / HTML-based web publishing method. Explain web archiving to creators. General knowledge of web archiving, cataloging, delivery and preservation of web sites. Knowledge of web archiving software such as Heritrix and HTTrack. Knowledge of Wayback Machine from Internet Archive. Explain to creators about the archives profession in general. Knowledge of establishing and maintaining control, arranging and describing born-digital archival materials in accordance with accepted standards and practices to ensure the long-term preservation of collections. Accessioning    Copy files contained on storage media including obsolete formats such as 5.25 inch floppy disks, computer punch cards, etc. Knowledge of onboard 5.25 inch. floppy disk controllers and hardware interfaces and tools, including IDE, SCSI, Firewire, SATA, FC5025, KryoFlux, Catweasel, Zip drives, computer tapes, etc. Knowledge of file systems such as FAT, NTFS, HFS, etc. Ensure source data in storage media will not be erased/changed accidentally during accessioning while maintaining a proper audit trail in copying files from storage media. Knowledge of write-protect notch/slide switch in floppy disks and hardware write blockers. Knowledge of forensic software (e.g., FTK Imager for PC and Command FTK Imager for Mac). Get file count, file size and file category of collections. Knowledge of forensic software (e.g. AccessData FTK, EnCase Forensic, BitCurator, etc.), JHOVE, DROID, Pronom, etc. Ensure computer viruses, if they exist in collection materials, are under control during accessioning. Knowledge of the unique nature of archival materials (no replacement, etc.), behavior of viruses stored in file containers and special procedures in using antivirus software for archival materials. Accession email archives. Knowledge of Internet protocol (POP, IMAP) and email format (Outlook, mbox). Knowledge of commercial software packages to archive and reformat email (Emailchemy, Mailstore). Knowledge of open source software such as ePADD (Email: Process, Accession, Discover and Deliver) to archive emails. Archive web sites. Knowledge of web archiving software such as Heritrix and HTTrack. Knowledge of legal issues in archiving web sites. Knowledge of web archiving services such as Archive-It. Create accession records for born-digital archives. Knowledge of archival data management systems such as Archivists’ Toolkit (AT) with the Multiple Extent Plugin, etc.. Arrangement and Description / Processing    Screen out restricted, personal, classified and proprietary information such as social security numbers, credit card numbers, classified data, medical records, etc. in archives. Knowledge of the sensitivity of personal identifiable information (PII) and tools to locate PII (e.g. AccessData FTK, Identity Finder). Knowledge of legal restrictions on access to data (DMCA, FERPA, etc.). Classify text elements in born-digital materials into predefined categories such as the names of persons, organizations and locations when appropriate. Knowledge of entity extraction software and tools to perform entity extraction (such as Open Calais, Stanford Named Entity Recognizer, Apache Open NLP). Show the network relationship of people in collections when appropriate. Knowledge of network graph and tools such as Gephi, NodeXL. Create controlled vocabularies to facilitate arrangement and description when appropriate. Knowledge of the  concepts of controlled vocabularies. Knowledge of W3C standard for publishing controlled vocabularies (SKOS). Knowledge of software for creating controlled vocabularies in SKOS such as SKOSjs and SKOS Editor. Knowledge of platforms for hosting SKOS controlled vocabularies such as Linked Media Framework and Apache Marmotta. Knowledge of services for publishing SKOS such as Open Metadata Registry and Poolparty, Inc. Model data in archives in RDF (Resource Description Framework). Knowledge of semantic web/linked data. Knowledge of commonly used vocabularies/schema such as DC, and FOAF, etc. Knowledge of vocabulary repositories such as Linked Open Vocabularies (LOV). Knowledge of tools to generate rdf/xml, rdf/json such as LODRefine and Karma, etc. Model concepts and relationships between them in archives (e.g. video game consoles) using ontology when appropriate. Knowledge of the W3C standard OWL (Web Ontology Language) and software to create ontologies using OWL such as Protégé and WebProtege. Describe files with special formats (e.g. born-digital photographic images). Knowledge of image metadata schema standards (IPTC, EXIF) and software to create/modify such metadata (Adobe Bridge, Photo Mechanic, etc.). Describe image files by names of persons in images with the help of software when appropriate. Knowledge of facial recognition functions in software such as Picasa, Photoshop Elements. Use visualization tools to represent data in archives when appropriate. Knowledge of open source JavaScript library for manipulating documents such as D3.js, HTML 5 PivotViewer and commercial tools such as IBM ManyEyes and Cooliris. Assign metadata to archived web sites. Knowledge of cataloging options available in web archiving services such as Archive-It or in web archiving software such as HTTrack. Create EAD finding aids. Knowledge of accepted standards and practices in creating finding aids. Knowledge of XML editors or other software (such as Archivists’ Toolkit) to create EAD finding aids. Discovery and Access    Deliver born-digital archives. Knowledge of copyright laws and privacy issues. Deliver born-digital archives in reading room computers. Knowledge of security measures required for workstations in reading rooms, such as disabling Internet access and USB ports, to prevent unintentional transfer of collection materials. Knowledge of software to deliver images in collections such as Adobe Bridge. Knowledge of software to read files with obsolete file formats such as QuickView Plus. Deliver born-digital archives using institutions’ catalog system. Knowledge of the interface required by the institutions’ catalog system to make the delivery. Deliver born-digital archives using institution repository systems. Knowledge of DSpace, Fedora, Hydra and the interfaces developed to facilitate such delivery. Publish born-digital archives using linked data/semantic web. Knowledge of linked data publishing platform such as Linked Media Framework, Apache Marmotta, OntoWiki and linked data publishing services such as Open Metadata Registry. Deliver born-digital archives using exhibition software. Knowledge of open source exhibition software such as Omeka. Deliver archived web sites. Knowledge of delivery options available in Web Archiving Services such as Archive-It or in web archiving software such as HTTrack. Deliver email archives. Knowledge of commercial software such as Mailstore. Knowledge of open source software such as ePADD (Email: Process, Accession, Discover and Deliver). Deliver software collections using emulation/virtualization. Knowledge of emulation/virtualization tools such as KEEP, JSMESS, MESS, VMNetX and XenServer. Deliver finding aids of born-digital archives using union catalogs such as OAC. Knowledge of uploading procedures to respective union catalogs such as OAC. Preservation  Prepare the technical metadata (checksum, creation, modification and last access dates, file format, file size, etc.) of files in archives for transfer to preservation repository. Knowledge of forensic software such as AccessData FTK, EnCase Forensic, and BitCurator, etc. Programming skill in XSLT to extract the information when appropriate from reports generated by the software. Use emulation / virtualization strategy to preserve software collections. Knowledge of emulation/virtualization tools such as KEEP, JSMESS, MESS, VMNetX and XenServer. Use migration strategies to preserve digital objects. Knowledge of Library of Congress Recommended Format Specifications. Knowledge of migration tools such as Xena, Adobe Acrobat Professional and Curl Exemplars in Digital Archives (Cedars) and the Creative Archiving at Michigan and Leeds: Emulating the Old on the New (CAMiLEON) projects. Submit items to preservation repository. Knowledge of preservation system such as Archivematica, LOCKSS and preservation services such as Portico, Tessella and DuraSpace. Knowledge of preservation repository interfaces. Advanced knowledge in Excel for batch input to the repository when appropriate. Preserve archived web sites. Knowledge of preservation options available in Web Archiving Services such as Archive-It. Knowledge of preserving web sites in preservation repository.

This list may seem dishearteningly comprehensive, but I attained these skills with years of experience working as a digital archivist on a number of challenging projects. I didn’t start off knowing everything on this list. I learned these skills and knowledge by going to conferences, workshops, attending the Natural Language Processing MOOC classes and through self-study by seeking resources available online. A digital archivist starting out in this field does not need to have all these skills right off the bat, but does need to be open to and able to consistently learn and apply new knowledge.

Of course, digital archivists in different institutions will have different responsibilities according to their particular situations. I hope this article will generate discussion of the work expected from digital archivists and the knowledge required for them to succeed. Finally, I would like to thank Glynn Edwards, my supervisor, who supports my exploratory investigation into areas which some organizations may consider irrelevant to the job of a digital archivist. As a reminder, my opinions aren’t necessarily that of my employer or any other organizations.

LibraryThing (Thingology): NEW: Easy Share for Book Display Widgets

Tue, 2014-10-07 17:48

LibraryThing for Libraries is pleased to announce an update to our popular Book Display Widgets.

Introducing “Easy Share.” Easy Share is a tool for putting beautiful book displays on Facebook, Twitter, Pinterest, Tumblr, email newsletters and elsewhere. It works by turning our dynamic, moving widgets into shareable images, optimized for the service you’re going to use them on.

Why would I want an image of a widget?

Dynamic widgets require JavaScript. This works great on sites you control, like a library’s blog or home page. But many sites, including some of the most important ones, don’t allow JavaScript. Easy Share bridges that gap, allowing you to post your widgets wherever a photo or other image can go—everywhere from Facebook to your email newsletters.

How do I find Easy Share?

To use Easy Share, move your cursor over a Book Display Widget. A camera icon will appear in the lower right corner of the widget. Click on that to open up the Easy Share box.

How can I share my widgets?

You can share your widget in three ways:

  1. Download. Download an image of your widget. After selecting a size, click the “down” arrow to download the image. Each image is labeled with the name of your widget, so you can find it easily on your computer. Upload this image to Facebook or wherever else you want it to go.
  2. Link. Get a link (URL) to the image. Select the size you want, then click the link icon to get a link to copy into whatever social media site you want.
  3. Dynamic. “Dynamic” images change over time, so you can place a “static” image somewhere and have it change as your collection changes. To get a dynamic image, go to the edit page for a widget. Use the link there to embed this image into your website or blog. Dynamic widgets update whenever your widget updates. Depending on users’ browser “caching” settings, changes may or may not happen immediately. But it will change over time.

You can also download or grab a link to a image of your widget from the widget edit page. Under the preview section, click “Take Screenshot.” You can see our blog post about that feature here.

Check out the LibraryThing for Libraries Wiki for more instructions.


Find out more about LibraryThing for Libraries and Book Display Widgets. And sign up for a free trial of either by contacting

DPLA: DPLA Community Reps Produce Hackathon Planning Guide, Now Available

Tue, 2014-10-07 16:45

We’re excited to announce the release of a new Community Reps-produced resource, GLAM Hack-in-a-box, a short guide to organizing and convening a hackathon using cultural heritage data from GLAM organizations (Galleries, Libraries, Archives, Museums) including DPLA. We hope this guide will serve as a useful resource for those either unfamiliar with or inexperienced in pulling together a hackathon.

Included in this hackathon guide

What is a hackathon?
Learn about what a hackathon is and who can participate in one. Common examples–and misconceptions–are covered in this introductory section.

Developing your program
Think through the key details of your hackathon’s program. Topics covered include audience, purpose and goals, format, and staffing. Example programs are included as well.

Working through the logistics
Understand the logistical details to consider when planning a hackathon. Topics covered include venue considerations, materials, and project management tips. Example materials are included as well.

Day-of and post-hackathon
Learn how to make the most of your hard work when it counts most: the day-of! Topics covered include key day-of considerations and common concerns.

Handy resources
Find a number of useful resources for planning a GLAM API-based hackathon, including DPLA, as well as guides that we used in the process of writing this document.

This free resource was produced by the DPLA Community Reps over the course of summer 2014.  Many thanks go out to Community Reps Chad Nelson and Nabil Kashyap for volunteering their time and energy to work on this guide. If you happen to use the guide in planning a hackathon for the first time, or, if you’ve planned a hackathon in the past and learned something new and/or have additional recommendations after reading it, let us know!

 All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

Hydra Project: Hydra Connect #2 presentations and posters

Tue, 2014-10-07 15:27

Poster images from the “show and tell” session, and slide packs from many presentations given at Hydra Connect #2 last week can be found linked from the Hydra wiki.  The wiki front page also has a nice group photo – the full size version, should you want it, is linked from the conference program at the time it was taken – 12:30 on Wednesday.

David Rosenthal: Economies of Scale in Peer-to-Peer Networks

Tue, 2014-10-07 15:00
In a recent IEEE Spectrum article entitled Escape From the Data Center: The Promise of Peer-to-Peer Cloud Computing, Ozalp Babaoglu and Moreno Marzolla (BM) wax enthusiastic about the potential for Peer-to-Peer (P2P) technology to eliminate the need for massive data centers. Even more exuberance can be found in Natasha Lomas' Techcrunch piece The Server Needs To Die To Save The Internet (LM) about the MaidSafe P2P storage network. I've been working on P2P technology for more than 16 years, and although I believe it can be very useful in some specific cases, I'm far less enthusiastic about its potential to take over the Internet.

Below the fold I look at some of the fundamental problems standing in the way of a P2P revolution, and in particular at the issue of economies of scale. After all, I've just written a post about the huge economies that Facebook's cold storage technology achieves by operating at data center scale.
Economies of ScaleBack in April, discussing a vulnerability of the Bitcoin network, I commented:
Gradually, the economies of scale you need to make money mining Bitcoin are concentrating mining power in fewer and fewer hands. I believe this centralizing tendency is a fundamental problem for all incentive-compatible P2P networks. ... After all, the decentralized, distributed nature of Bitcoin was supposed to be its most attractive feature. In June, discussing Permacoin, I returned to the issue of economies of scale:
increasing returns to scale (economies of scale) pose a fundamental problem for peer-to-peer networks that do gain significant participation. One necessary design goal for networks such as Bitcoin is that the protocol be incentive-compatible, or as Ittay Eyal and Emin Gun Sirer (ES) express it:
the best strategy of a rational minority pool is to be honest, and a minority of colluding miners cannot earn disproportionate benefits by deviating from the protocolThey show that the Bitcoin protocol was, and still is, not incentive-compatible.

Even if the protocol were incentive-compatible, the implementation of each miner would, like almost all technologies, be subject to increasing returns to scale.Since then I've become convinced that this problem is indeed fundamental. The simplistic version of the problem is this:
  • The income to a participant in a P2P network of this kind should be linear in their contribution of resources to the network.
  • The costs a participant incurs by contributing resources to the network will be less than linear in their resource contribution, because of the economies of scale.
  • Thus the proportional profit margin a participant obtains will increase with increasing resource contribution.
  • Thus the effects described in Brian Arthur's Increasing Returns and Path Dependence in the Economy will apply, and the network will be dominated by a few, perhaps just one, large participant.
The advantages of P2P networks arise from a diverse network of small, roughly equal resource contributors. Thus it seems that P2P networks which have the characteristics needed to succeed (by being widely adopted) also inevitably carry the seeds of their own failure (by becoming effectively centralized). Bitcoin is an example of this. Some questions arise:
  • Does incentive-compatibility imply income linear in contribution?
  • If not, are there incentive-compatible ways to deter large contributions?
  • The simplistic version is, in effect, a static view of the network. Are there dynamic effects also in play?
Does incentive-compatibility imply income linear in contribution? Clearly, the reverse is true. If income is linear in, and solely dependent upon, contribution there is no way for a colluding minority of participants to gain more than their just reward. If, however:
  • Income grows faster than linearly with contribution, a group of participants can pool their contributions, pretend to be a single participant, and gain more than their just reward.
  • Income goes more slowly than linearly with contribution, a group of participants that colluded to appear as a single participant would gain less than their just reward.
So it appears that income linear in contribution is the limiting case, anything faster is not incentive-compatible.

Are there incentive-compatible ways to deter large contributions? In principle, the answer is yes. Arranging that income grows more slowly than contribution, and depends on nothing else, will do the trick. The problem lies in doing so.

Source: bitcoincharts.comThe actual income received by a participant is the value of the reward the network provides in return for the contribution of resources, for example the Bitcoin, less the costs incurred in contributing the resources, the capital and running costs of the mining hardware, in the Bitcoin case. As the value of Bitcoins collapsed (as I write, BTC is about $320, down from about $1200 11 months ago and half its value in August) many smaller miners discovered that mining wasn't worth the candle.

The network has to arrange not just that the reward grows more slowly than the contribution, but that it grows more slowly than the cost of the contribution to any participant. If there is even one participant whose rewards outpace their costs, Brian Arthur's analysis shows they will end up dominating the network. Herein lies the rub. The network does not know what an individual participant's costs, or even  the average participant's costs, are and how they grow as the participant scales up their contribution.

So the network would have to err on the safe side, and make rewards grow very slowly with contribution, at least above a certain minimum size. Doing so would mean few if any participants above the minimum contribution, making growth dependent entirely on recruiting new participants. This would be hard because their gains from participation would be limited to the minimum reward. It is clear that mass participation in the Bitcoin network was fuelled by the (unsustainable) prospect of large gains for a small investment.

Source: A network that assured incentive-compatibility in this way would not succeed, because the incentives would be so limited. A network that allowed sufficient incentives to motivate mass participation, as Bitcoin did, would share Bitcoin's vulnerability to domination by, as at present, two participants (pools, in Bitcoin's case).

Are there dynamic effects also in play? As well as increasing returns to scale, technology markets exhibit decreasing returns through time. Bitcoin is an extreme example of this. Investment in Bitcoin mining hardware has a very short productive life:
the overall network hash rate has been doubling every 3-4 weeks, and therefore, mining equipment has been losing half its production capability within the same time frame. After 21-28 weeks (7 halvings), mining rigs lose 99.3% of their value.This effect is so strong that it poses temptations for the hardware manufacturers that some have found impossible to resist. The FBI recently caught Butterfly Labs using hardware that customers had bought and paid for to mine on their own behalf for a while before shipping it to the customers. They thus captured the most valuable week or so of the hardware's short useful life for themselves

Source: blockchain.infoEven though with technology improvement rates much lower than the Bitcoin network hash rate increase, such as Moore's Law or Kryder's Law, the useful life of hardware is much longer than 6 months, this effect can be significant. When new, more efficient technology is introduced, thus reducing the cost per unit contribution to a P2P network, it does not become instantly available to all participants. As manufacturing ramps up, the limited supply preferentially goes to the manufacturers best customers, who would be the largest contributors to the P2P network. By the time supply has increased so that smaller contributors can enjoy the lower cost per unit contribution, the most valuable part of the technology's useful life is over.

Early availability of new technology acts to reduce the costs of the larger participants, amplifying their economies of scale. This effect must be very significant in Bitcoin mining, as Butterfly Labs noticed. At pre-2010 Kryder rates it would be quite noticeable since storage media service lives were less than 60 months. At the much lower Kryder rates projected by the industry storage media lifetimes will be extended and the effect correspondingly less.
TrustBM admit that there are significant unresolved trust issues in P2P technology:
The people using such a cloud must trust that none of the many strangers operating it will do something malicious. And the providers of equipment must trust that the users won’t hog computer time.

These are formidable problems, which so far do not have general solutions. If you just want to store data in a P2P cloud, though, things get easier: The system merely has to break up the data, encrypt it, and store it in many places.Unfortunately, even for storage this is inadequate. The system cannot trust the peers claiming to store the shards of the encrypted data but must verify that they actually are storing them. This is a resource-intensive process. Permacoin's proposal, to re-purpose resources already being expended elsewhere, is elegant but unlikely to be successful. Worse, the verification process consumes not just resources, but time. At each peer there is necessarily a window of time between successive verifications. During that time the system believes the peer has a good copy of the shard, but it might no longer have one.
Edge of the internetP2P enthusiasts describe the hardware from which their network is constructed in similar terms. Here is BM:
the P2P cloud is made up of a diverse collection of different people’s computers or game consoles or whateverand here is LM:
Users of MaidSafe’s network contribute unused hard drive space, becoming the network’s nodes. It’s that pooling — or, hey, crowdsourcing — of many users’ spare computing resource that yields a connected storage layer that doesn’t need to centralize around dedicated datacenters.When the idea of P2P networks started in the 90s:
Their model of the edge of the Internet was that there were a lot of desktop computers, continuously connected and powered-up, with low latency and no bandwidth charges, and with 3.5" hard disks that were mostly empty. Since then, the proportion of the edge with these characteristics has become vanishingly small. The edge is now intermittently powered up and connected, with bandwidth charges, and only small amounts of local storage.
Monetary rewardsThis means that, if the network is to gain mass participation, the majority of participants cannot contribute significant resources to it; they don't have suitable resources to contribute. They will have to contribute cash. This in turn means that there must be exchanges, converting between the rewards for contributing resources and cash, allowing the mass of resource-poor participants to buy from the few resource-rich participants.

Both Permacoin and MaidSafe envisage such exchanges, but what they don't seem to envisage is the effect on customers of the kind of volatility seen in the Bitcoin graph above. Would you buy storage from a service with this price history, or from Amazon? What exactly is the value to the mass customer of paying a service such as MaidSafe, by buying SafeCoin on an exchange, instead of paying Amazon directly, that would overcome the disadvantage of the price volatility?

As we see with Bitcoin, a network whose rewards can readily be converted into cash is subject to intense attack, and attracts participants ranging from sleazy to criminal. Despite its admirably elegant architecture, Bitcoin has suffered from repeated vulnerabilities. Although P2P technology has many advantages in resisting attack, especially the elimination of single points of failure and centralized command and control, it introduces a different set of attack vectors.
Measuring contributionsDiscussion of P2P storage networks tends to assume that measuring the contribution a participant supplies in return for a reward is easy. A Gigabyte is a Gigabyte after all. But compare two Petabytes of completely reliable and continuously available storage, one connected to the outside world by a fiber connection to a router near the Internet's core, and the other connected via 3G. Clearly, the first has higher bandwidth, higher availability and lower cost per byte transferred, so its contribution to serving the network's customers is vastly higher. It needs a correspondingly greater reward.

In fact, networks would need to reward many characteristics of a peer's storage contribution as well as its size:
  • Reliability
  • Availability
  • Bandwidth
  • Latency
Measuring each of these parameters, and establishing "exchange rates" between them, would be complex, would lead to a very mixed marketing message, and would be the subject of disputes. For example, the availability, bandwidth and latency of a network resource depends on the location in the network from which the resource is viewed, so there would be no consensus among the peers about these parameters.
ConclusionWhile it is clear that P2P storage networks can work, and can even be useful tools for small communities of committed users, the non-technical barriers to widespread adoption are formidable. They have been effective in preventing widespread adoption since the late 90s, and the evolution of the Internet has since raised additional barriers.

Terry Reese: MarcEdit 6 Update (10/6/2014)

Tue, 2014-10-07 13:47

I sent this note to the MarcEdit listserv late last night, early this morning, but forgot to post here.  Over the weekend, the Ohio State University Libraries hosted our second annual hackaton on the campus.  It’s been a great event, and this year, I had one of the early morning shifts (12 am-5 am) so I decided to use the time to do a little hacking myself.  Here’s a list of the changes:

  • Bug Fix: Merge Records Function: When processing using the control number option (or MARC21 primarily utilizing control numbers for matching) the program could merge incorrect data if large numbers of merged records existed without the data specified to be merged.  The tool would pull data from the previous record used and add that data to the matches.  This has been corrected.
  • Bug Fix: Network Task Directory — this tool was always envisioned as a tool that individuals would point to when an existing folder existed.  However, if the folder doesn’t exist prior to pointing to the location, the tool wouldn’t index new tasks.  This has been fixed.
  • Bug Fix: Task Manager (Importing new tasks) — When tasks were imported with multiple referenced task lists, the list could be unassociated from the master task.  This has been corrected.
  • Bug Fix:  If the plugins folder doesn’t exist, the current Plugin Manager doesn’t create one when adding new plugins.  This has been corrected.
  • Bug Fix: MarcValidator UI issue:  When resizing the form, the clipboard link wouldn’t move appropriately.  This has been fixed.
  • Bug Fix: Build Links Tool — relator terms in the 1xx and 7xx field were causing problems.  This has been corrected.
  • Bug Fix: RDA Helper: When parsing 260 fields with multiple copyright dates, the process would only handle one of the dates.  The process has been updated to handle all copyright values embedded in the 260$c.
  • Bug Fix: SQL Explorer:  The last build introduced a regression error so that when using the non-expanded SQL table schema, the program would crash.  This has been corrected.
  • Enhancement:  SQL Explorer expanded schema has been enhanced to include a column id to help track column value relationships.
  • Enhancement: Z39.50 Cataloging within the MarcEditor — when selecting the Z39.50/SRU Client, the program now seemlessly allows users to search using the Z39.50 client and automatically load the results directly into the open MarcEditor window.

Two other specific notes.  First, a few folks on the listserv have noted trouble getting MarcEdit to run on a Mac.  The issue appears to be MONO related.  Version 3.8.0 appears to have neglected to include a file in the build (which caused GUI operations to fail), and 3.10.0 brings the file back, but there was a build error with the component so the issue continues.  The problems are noted in their release notes as known issues and the bug tracker seems to suggest that this has been corrected in the alpha channels, but that doesn’t help anyone right now.  So, I’ve updated the Mac instruction to include a link to MONO 3.6.0, the last version tested as a stand alone install that I know works.  From now on, I will include the latest MONO version tested, and a link to the runtime to hopefully avoid this type of confusion in the future.

Second – I’ve created a nifty plugin related to the LibHub project.  I’ve done a little video recording and will be making that available shortly.  Right now, I’m waiting on some feedback.  The plugin will be initially released to LibHub partners to provide a way for them to move any data into the project for evaluation – but hopefully in time, it will be able to be more made more widely available.

Updates can be downloaded automatically via MarcEdit, or can be found at:

Please remember, if you are running a very old copy of MarcEdit 5.8 or lower, it is best practice to uninstall the application prior to installing 6.0.



Thom Hickey: Another JSON encoding for MARC data

Tue, 2014-10-07 13:42

You might think that a well understood format such as MARC would have a single straight-forward way of being represented in JSON.  Not so!  There are lots of ways of doing it, all with their own advantages (see some references below).  Still, I couldn't resist creating yet another.

This encoding grew out of some experimentation with Go (Golang), in which encoding MARC in JSON was one of my test cases, as was the speed at which the resulting encoding could be processed.  Another inspiration was Rich Hickey's ideas about the relaionship of data and objects:

...the use of objects to represent simple informational data is almost criminal in its generation of per-piece-of-information micro-languages, i.e. the class methods, versus far more powerful, declarative, and generic methods like relational algebra. Inventing a class with its own interface to hold a piece of information is like inventing a new language to write every short story.

That said, how to represent the data still leaves lots of options, as the multiple enocodings of MARC into JSON show.

Go's emphasis on strict matching of types pushed me towards a very flat structure:

  • The record is encoded as an array of objects
  • Each object has a 'Type' and represents either the leader or a field

Here are examples of the different fields:

{"Type":"leader", "Data": "the leader goes here"}

{"Type":"cfield", "Tag":"001", "Data":"12345"}

{"Type":"dfield", "Tag":"245", "Inds":"00", "Data":"aThis is a title$bSubtitle"}

Note that the subfields do not get their own objects.  They are concatenated together into one string using standard MARC subfield delimiters (represented by a $ above), essentially the way they appear in an ISO 2709 encoding.  In Python (and in Go) it is easy to split these strings on the delimiter into subfields as needed.  

In addition to making it easy to import the JSON structure into Go (everything is easy in Python), the lack of structure makes reading and writing the list of fields very fast and simple. The main HBase table that supports WorldCat now has some 1.7 billion rows, so fast processing is essential and we find that this encoding much faster than processing the XML representation.  Although we do put the list of fields into a Python object, that object is derived from the list itself, so we can treat is as such, including adding new fields (and Types) as needed, which then get automatically carried along in the exported JSON.

We are also finding that a simple flat structure makes it easy to add information (e.g. administrative metadata) that doesn't fit into standard MARC without effort.

Here are a few MARC in JSON references (I know there have been others in the past).  As far as I can tell, Ross's is the most popular:

Ross Singer:

Clay Fouts:

Galen Charlton:

Bill Dueber:

A more general discussion by Jakob Voss

Here is a full example of a record using the same example Ross Singer uses (although the record itself appears to have changed):

[{"Data": "01471cjm a2200349 a 4500", "Type": "leader"},
{"Data": "5674874", "Tag": "001", "Type": "cfield"},
{"Data": "20030305110405.0", "Tag": "005", "Type": "cfield"},
{"Data": "sdubsmennmplu", "Tag": "007", "Type": "cfield"},
{"Data": "930331s1963 nyuppn eng d", "Tag": "008", "Type": "cfield"},
{"Data": "9(DLC) 93707283", "Tag": "035", "Type": "dfield", "Inds": " "},
{"Data": "a7\u001fbcbc\u001fccopycat\u001fd4\u001fencip\u001ff19\u001fgy-soundrec", "Tag": "906", "Type": "dfield", "Inds": " "},
{"Data": "a 93707283 ", "Tag": "010", "Type": "dfield", "Inds": " "},
{"Data": "aCS 8786\u001fbColumbia", "Tag": "028", "Type": "dfield", "Inds": "02"},
{"Data": "a(OCoLC)13083787", "Tag": "035", "Type": "dfield", "Inds": " "},
{"Data": "aOClU\u001fcDLC\u001fdDLC", "Tag": "040", "Type": "dfield", "Inds": " "},
{"Data": "deng\u001fgeng", "Tag": "041", "Type": "dfield", "Inds": "0 "},
{"Data": "alccopycat", "Tag": "042", "Type": "dfield", "Inds": " "},
{"Data": "aColumbia CS 8786", "Tag": "050", "Type": "dfield", "Inds": "00"},
{"Data": "aDylan, Bob,\u001fd1941-", "Tag": "100", "Type": "dfield", "Inds": "1 "},
{"Data": "aThe freewheelin' Bob Dylan\u001fh[sound recording].", "Tag": "245", "Type": "dfield", "Inds": "14"},
{"Data": "a[New York, N.Y.] :\u001fbColumbia,\u001fc[1963]", "Tag": "260", "Type": "dfield", "Inds": " "},
{"Data": "a1 sound disc :\u001fbanalog, 33 1/3 rpm, stereo. ;\u001fc12 in.", "Tag": "300", "Type": "dfield", "Inds": " "},
{"Data": "aSongs.", "Tag": "500", "Type": "dfield", "Inds": " "},
{"Data": "aThe composer accompanying himself on the guitar ; in part with instrumental ensemble.", "Tag": "511", "Type": "dfield", "Inds": "0 "},
{"Data": "aProgram notes by Nat Hentoff on container.", "Tag": "500", "Type": "dfield", "Inds": " "},
{"Data": "aBlowin' in the wind -- Girl from the north country -- Masters of war -- Down the highway -- Bob Dylan's blues -- A hard rain's a-gonna fall -- Don't think twice, it's all right -- Bob Dylan's dream -- Oxford town -- Talking World War III blues -- Corrina, Corrina -- Honey, just allow me one more chance -- I shall be free.", "Tag": "505", "Type": "dfield", "Inds": "0 "},
{"Data": "aPopular music\u001fy1961-1970.", "Tag": "650", "Type": "dfield", "Inds": " 0"},
{"Data": "aBlues (Music)\u001fy1961-1970.", "Tag": "650", "Type": "dfield", "Inds": " 0"},
{"Data": "3Preservation copy (limited access)\u001fu", "Tag": "856", "Type": "dfield", "Inds": "41"},
{"Data": "aNew", "Tag": "952", "Type": "dfield", "Inds": " "},
{"Data": "aTA28", "Tag": "953", "Type": "dfield", "Inds": " "},
{"Data": "bc-RecSound\u001fhColumbia CS 8786\u001fwMUSIC", "Tag": "991", "Type": "dfield", "Inds": " "}


Note: As far as I know Rich Hickey and I are not related.

Open Knowledge Foundation: Open Definition v2.0 Released – Major Update of Essential Standard for Open Data and Open Content

Tue, 2014-10-07 11:00

Today Open Knowledge and the Open Definition Advisory Council are pleased to announce the release of version 2.0 of the Open Definition. The Definition “sets out principles that define openness in relation to data and content” and plays a key role in supporting the growing open data ecosystem.

Recent years have seen an explosion in the release of open data by dozens of governments including the G8. Recent estimates by McKinsey put the potential benefits of open data at over $1 trillion and others estimates put benefits at more than 1% of global GDP.

However, these benefits are at significant risk both from quality problems such as “open-washing” (non-open data being passed off as open) and from fragmentation of the open data ecosystem due to incompatibility between the growing number of “open” licenses.

The Open Definition eliminates these risks and ensures we realize the full benefits of open by guaranteeing quality and preventing incompatibility.See this recent post for more about why the Open Definition is so important.

The Open Definition was published in 2005 by Open Knowledge and is maintained today by an expert Advisory Council. This new version of the Open Definition is the most significant revision in the Definition’s nearly ten-year history.

It reflects more than a year of discussion and consultation with the community including input from experts involved in open data, open access, open culture, open education, open government, and open source. Whilst there are no changes to the core principles, the Definition has been completely reworked with a new structure and new text as well as a new process for reviewing licenses (which has been trialled with governments including the UK).

Herb Lainchbury, Chair of the Open Definition Advisory Council, said:

“The Open Definition describes the principles that define “openness” in relation to data and content, and is used to assess whether a particular licence meets that standard. A key goal of this new version is to make it easier to assess whether the growing number of open licenses actually make the grade. The more we can increase everyone’s confidence in their use of open works, the more they will be able to focus on creating value with open works.”

Rufus Pollock, President and Founder of Open Knowledge said:

“Since we created the Open Definition in 2005 it has played a key role in the growing open data and open content communities. It acts as the “gold standard” for open data and content guaranteeing quality and preventing incompatibility. As a standard, the Open Definition plays a key role in underpinning the “open knowledge economy” with a potential value that runs into the hundreds of billions – or even trillions – worldwide.”

What’s New

In process for more than a year, the new version was collaboratively and openly developed with input from experts involved in open access, open culture, open data, open education, open government, open source and wiki communities. The new version of the definition:

  • Has a complete rewrite of the core principles – preserving their meaning but using simpler language and clarifying key aspects.
  • Introduces a clear separation of the definition of an open license from an open work (with the latter depending on the former). This not only simplifies the conceptual structure but provides a proper definition of open license and makes it easier to “self-assess” licenses for conformance with the Open Definition.
  • The definition of an Open Work within the Open Definition is now a set of three key principles:
    • Open License: The work must be available under an open license (as defined in the following section but this includes freedom to use, build on, modify and share).
    • Access: The work shall be available as a whole and at no more than a reasonable one-time reproduction cost, preferably downloadable via the Internet without charge
    • Open Format: The work must be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights. Specifically, data should be machine-readable, available in bulk, and provided in an open format or, at the very least, can be processed with at least one free/libre/open-source software tool.
  • Includes improved license approval process to make it easier for license creators to check conformance of their license with the Open Definition and to encourage reuse of existing open licenses (rrareuse and outlines the process for submitting a license so that it can be checked for conformance against the Open Definition.
More Information
  • For more information about the Open Definition including the updated version visit:
  • For background on why the Open Definition matters, read the recent article ‘Why the Open Definition Matters’

This post was written by Herb Lainchbury, Chair of the Open Definition Advisory Council and Rufus Pollock, President and Founder of Open Knowledge

Open Knowledge Foundation: Branzilian Government Develops Toolkit to Guide Institutions in both Planning and Carrying Out Open Data Initatives

Tue, 2014-10-07 10:20

This is a guest post by Nitai Silva of the Brazilian government’s open data team and was originally published on the Open Knowledge Brazil blog here.

Recently Brazilian government released the Kit de Dados Abertos (open data toolkit). The toolkit is made up of documents describing the process, methods and techniques for implementing an open data policy within an institution. Its goal is to both demystify the logic of opening up data and to share with public employees observed best practices that have emerged from a number of Brazilian government initiatives.

The toolkit focuses on the Plano de Dados Abertos – PDA (Open Data Plan) as the guiding instrument where commitments, agenda and policy implementation cycles in the institution are registered. We believe that making each public agency build it’s own PDA is a way to perpetuate the open data policy, making it a state policy and not just a transitory governmental action.

It is organizsd to facilitate the implementation of the main activities cycles that must be observed in an institution and provides links and manuals to assist in these activities. Emphasis is given to the actors/roles involved in each step and their responsibilities. Is also helps to define a central person to monitor and maintain the PDA. The following diagram summarises the macro steps of implementing an open data policy in an institution:


Processo Sistêmico de um PDA


Open data theme has been part of the Brazilian government’s agenda for over three years. Over this period, we have accomplished a number of important achievemnet including passing the Lei de Acesso à Informação – LAI (FOIA) (Access to Information Law), making commitments as part of our Open Government Partnership Action Plan and developing the Infraestrutura Nacional de Dados Abertos (INDA) (Open Data National Infrastructure). However, despite these accomplishments, for many public managers, open data activities remain the exclusive responsibility of the Information Technology department of their respective institution. This gap is, in many ways, the cultural heritage of the hierarchical, departmental model of carrying out public policy and is observed in many institutions.

The launch of the toolkit is the first of a series of actions prepared by the Ministry of Planning to leverage open data initiatives in federal agencies, as was defined in the Brazilian commitments in the Open Government Partnership (OGP). The next step is to conduct several tailor made workshops designed to support major agencies in the federal government in the implementation of open data.

Despite it having been built with the aim of expanding the quality and quantity of open data made available by the federal executive branch agencies, we also made a conscious effort to make the toolkit generic enough generic enough for other branches and levels of government.

About the toolkit development:

It is also noteworthy to mention that the toolkit was developed on Github. Although the Github is known as an online and distributed environment for develop software, it has already being used for co-creation of text documents for a long time, even by governments. The toolkit is still hosted there, which allows anyone to make changes and propose improvements. The invitation is open, we welcome and encourageyour collaboration.

Finally I would like to thank Augusto Herrmann, Christian Miranda, Caroline Burle and Jamila Venturini for participating in the drafting of this post!