You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 42 min 5 sec ago

D-Lib: In Brief: The Clipper Project

Mon, 2015-11-16 14:14

LITA: Agile Development: Building an Agile Culture

Mon, 2015-11-16 14:00

Over the last few months I have described various components of Agile development. This time around I want to talk about building an Agile culture. Agile is more than just a codified process; it is a development approach, a philosophy, one that stresses flexibility and communication. In order for a development team to successfully implement Agile the organization must embrace and practice the appropriate culture. In this post will to briefly discuss several tips that will help develop Agile development.

The Right People

It all starts here: as with pretty much any undertaking, you need the right people in place, which is not necessarily the same as saying the best people. Agile development necessitates a specific set of skills that are not intrinsically related to coding mastery: flexibility, teamwork, and ability to take responsibility for a project’s ultimate success are all extremely important. Once the team is formed, management should work to bring team members closer together and create the right environment for information sharing and investment.

Encourage Open Communication

Because of Agile’s quick pace and flexibility, and the lack of overarching structures and processes, open communication is crucial. A team must develop communication pathways and support structures so that all team members are aware of where the project stands at any one moment (the daily scrum is a great example of this). More important, however, is to convince the team to open up and conscientiously share progress individual progress, key roadblocks, and concerns about the path of development. Likewise, management must be proactive about sharing project goals and business objectives with the team. An Agile team is always looking for the most efficient way to deliver results, and the more information they receive about the motivation and goals that lie behind a project the better. Agile managers must actively encourage a culture that says “we’re all in this together, and together we will find the solution to the problem.” Silos are Agile’s kryptonite.

Empower the Team

Agile only works when everyone on the team feels responsible for the success of the project, and management must do its part by encouraging team members to take ownership of the results of their work, and trusting them to do so. Make sure everyone on the team understands the ultimate organizational need, assign specific roles to each team member, and then allow team members to find their own ways to meet the stated goals. Too often in development there is a basic disconnect between the people who understand the business needs and those who have the technical know-how to make them happen. Everyone on the team needs to understand what makes for a successful project, so that wasted effort is minimized.

Reward the Right Behaviors

Too often in development organizations, management metrics are out of alignment with process goals. Hours worked are a popular metric teams use to evaluate members, although often proxies like hours spent at the office, or time spent logged into the system, are used. With Agile, the focus should be on results. As long as a team meets the stated goals of a project, the less time spent working on the solution, the better. Remember, the key is efficiency, and developing software that solves the problem at hand with as few bells and whistles as possible. If a team is consistently beating it’s time estimates by a significant margin, it can recalibrate their estimation procedures. Spending all night at the office working on a piece of code is not a badge of honor, but a failure of the planning process.

Be Patient

Full adoption of Agile takes time. You cannot expect a team to change it’s fundamental philosophy overnight. The key is to keep working at it, taking small steps towards the right environment and rewarding progress. Above all, management needs to be transparent about why it considers this change important. A full transition can take years of incremental improvement. Above all, be conscious that the steady state for your team will likely not look exactly like the theoretical ideal. Agile is adaptable and each organization should create the process that works best for its own needs.

If you want to learn more about building an Agile culture, check out the following resources:

In your experience, how long does it take for a team to fully convert to the Agile way? What is the biggest roadblock to adoption? How is the process initiated and who monitors and controls progress?

“Scrum process” image By Lakeworks (Own work) [GFDL ( or CC BY-SA 4.0-3.0-2.5-2.0-1.0 (], via Wikimedia Commons

Conal Tuohy: Taking control of an uncontrolled vocabulary

Mon, 2015-11-16 13:49

A couple of days ago, Dan McCreary tweeted:

Working on new ideas for NoSQL metadata management for a talk next week. Focus on #NoSQL, Documents, Graphs and #SKOS. Any suggestions?

— Dan McCreary (@dmccreary) November 14, 2015

It reminded me of some work I had done a couple of years ago for a project which was at the time based on Linked Data, but which later switched away from that platform, leaving various bits of RDF-based work orphaned.

One particular piece which sprung to mind was a tool for dealing with vocabularies. Whether it’s useful for Dan’s talk I don’t know, but I thought I would dig it out and blog a little about it in case it’s of interest more generally to people working in Linked Open Data in Libraries, Archives and Museums (LODLAM).

I told Dan:

@dmccreary i did a thing once with an xform using a sparql query to assemble a skos concept scheme, edit it, save in own graph. Of interest?

— Unholy Taco (@conal_tuohy) November 14, 2015

When he sounded interested, I made a promise:

@dmccreary i have the code somewhere. .. will dig it out

— Unholy Taco (@conal_tuohy) November 14, 2015

I know I should find a better home for this and the other orphaned LODLAM components, but for now, the original code can be seen here:

I’ll explain briefly how it works, but first, I think it’s necessary to explain the rationale for the vocabulary tool, and for that you need to see how it fits into the LODLAM environment.

At the moment there is a big push in the cultural sector towards moving data from legacy information systems into the “Linked Open Data (LOD) Cloud” – i.e. republishing the existing datasets as web-based sets of inter-linked data. In some cases people are actually migrating from their old infrastructure, but more commonly people are adding LOD capability to existing systems via some kind of API (this is a good approach, to my way of thinking – it reduces the cost and effort involved enormously). Either way, you have to be able to take your existing data and re-express it in terms of Linked Data, and that means facing up to some challenges, one of which is how to manage “vocabularies”.

Vocabularies, controlled and uncontrolled

What are “vocabularies” in this context? A “vocabulary” is a set of descriptive terms which can be applied to a record in a collection management system. For instance, a museum collection management system might have a record for a teacup, and the record could have a number of fields such as “type”, “maker”, “pattern”, “colour”, etc. The value of the “type” field would be “teacup”, for instance, but another piece in the collection might have the value “saucer” or “gravy boat” or what have you. These terms, “teacup”, “plate”, “dinner plate”, “saucer”, “gravy boat” etc, constitute a vocabulary.

In some cases, this set of terms is predefined in a formal list, This is called a “controlled vocabulary”. Usually each term has a description or definition (a “scope note”), and if there are links to other related terms (e.g. “dinner plate” is a “narrower term” of “plate”), as well as synonyms, including in other languages (“taza”, “plato”, etc) then the controlled vocabulary is called a thesaurus. A thesaurus or a controlled vocabulary can be a handy guide to finding things. You can navigate your way around a thesaurus, from one term to another, to find related classes of object which have been described with those terms, or the thesaurus can be used to automatically expand your search queries without you having to do anything; you can search for all items tagged as “plate” and the system will automatically also search for items tagged “dinner plate” or “bread plate”.

In other cases, though, these vocabularies are uncontrolled. They are just tags that people have entered in a database, and they may be consistent or inconsistent, depending on who did the data entry and why. An uncontrolled vocabulary is not so useful. If the vocabulary includes the terms “tea cup”, “teacup”, “Tea Cup”, etc. as distinct terms, then it’s not going to help people to find things because those synonyms aren’t linked together. If it includes terms like “Stirrup Cup” it’s going to be less than perfectly useful because most people don’t know what a Stirrup Cup is (it is a kind of cup).

The vocabulary tool

So one of the challenges in moving to a Linked Data environment is taking the legacy vocabularies which our systems use, and bringing them under control; linking synonyms and related terms together, providing definitions, and so on. This is where my vocabulary tool would come in.

In the Linked Data world, vocabularies are commonly modelled using a system called Simple Knowledge Organization System (SKOS). Using SKOS, every term (a “Concept” in SKOS) is identified by a unique URI, and these URIs are then associated with labels (such as “teacup”), definitions, and with other related Concepts.

The vocabulary tool is built with the assumption that a legacy vocabulary of terms has been migrated to RDF form by converting every one of the terms into a URI, simply by sticking a common prefix on it, and if necessary “munging” the text to replace, or encode spaces or other characters which aren’t allowed in URIs. For example, this might produce a bunch of URIs like this:

  • etc.

What the tool then does is it finds all these URIs and gives you a web form which you can fill in to describe them and link them together. To be honest I’m not sure how far I got with this tool, but ultimately the idea would be that you would be able to organise the terms into a hierarchy, link synonyms, standardise inconsistencies by indicating “preferred” and “non-preferred” terms (i.e. you could say that “teacup” is preferred, and that “Tea Cup” is a non-preferred equivalent).

When you start the tool, you have the opportunity to enter a “base URI”, which in this case would be – the tool would then find every such URI which was in use, and display them on the form for you to annotate. When you had finished imposing a bit of order on the vocabulary, you would click “Save” and your annotations would be stored in an RDF graph whose name was Later, your legacy system might introduce more terms, and your Linked Data store would have some new URIs with that prefix. You would start up the form again, enter the base URI, and load all the URIs again. All your old annotations would also be loaded, and you would see the gaps where there were terms that hadn’t been dealt with; you could go and edit the definitions and click “Save” again.

In short, the idea of the tool was to be able to use, and to continue to use, legacy systems which lack controlled vocabularies, and actually impose control over those vocabularies after converting them to LOD.

How it works

OK here’s the technical bit.

The form is built using XForms technology, and I coded it to use a browser-based (i.e. Javascript) implementation of XForms called XSLTForms.

When the XForm loads, you can enter the common base URI of your vocabulary into a text box labelled “Concept Scheme URI”, and click the “Load” button. When the button is clicked, the vocabulary URI is substituted into a pre-written SPARQL query and sent off to a SPARQL server. This SPARQL query is the tricky part of the whole system really: it finds all the URIs, and it loads any labels which you might have already assigned them, and if any don’t have labels, it generates one by converting the last part of the URI back into plain text.

prefix skos: <> construct { ?vocabulary a skos:ConceptScheme ; skos:prefLabel ?vocabularyLabel. ?term a skos:Concept ; skos:inScheme ?vocabulary ; skos:prefLabel ?prefLabel . ?subject ?predicate ?object . } where { bind(&lt;<vocabulary-uri><!--></vocabulary-uri>&gt; as ?vocabulary) { optional {?vocabulary skos:prefLabel ?existingVocabularyLabel} bind("Vocabulary Name" as ?vocabularyLabel) filter(!bound(?existingVocabularyLabel)) } union { ?subject ?predicate ?term . bind( replace(substr(str(?term), strlen(str(?vocabulary)) + 1), "_", " ") as ?prefLabel ) optional {?term skos:prefLabel ?existingPrefLabel} filter(!bound(?existingPrefLabel)) filter(strstarts(str(?term), str(?vocabulary))) filter(?term != ?vocabulary) } union { graph ?vocabulary { ?subject ?predicate ?object } } }

The resulting list of terms and labels is loaded into the form as a “data instance”, and the form automatically grows to provide data entry fields for all the terms in the instance. When you click the “Save” button, the entire vocabulary, including any labels you’ve entered, is saved back to the server.

William Denton: Anthropocene librarianship

Sun, 2015-11-15 20:59

Anthropocene librarianship is the active response librarians make to the causes and effects of climate change so severe humans are creating a new geological epoch.

(I’ve been mulling this over this week and wanted to put the idea out there because it’s giving me a good framework for thinking about things. I’m curious to know what you make of it.)

What is the Anthropocene?

The idea was first set out in Crutzen and Stoermer (2000):

Considering these and many other major and still growing impacts of human activities on earth and atmosphere, and at all, including global, scales, it seems to us more than appropriate to emphasize the central role of mankind in geology and ecology by proposing to use the term “anthropocene” for the current geological epoch. The impacts of current human activities will continue over long periods.

They end with:

Without major catastrophes like an enormous volcanic eruption, an unexpected epidemic, a large-scale nuclear war, an asteroid impact, a new ice age, or continued plundering of Earth’s resources by partially still primitive technology (the last four dangers can, however, be prevented in a real functioning noösphere) mankind will remain a major geological force for many millennia, maybe millions of years, to come. To develop a world-wide accepted strategy leading to sustainability of ecosystems against human induced stresses will be one of the great future tasks of mankind, requiring intensive research efforts and wise application of the knowledge thus acquired in the noösphere, better known as knowledge or information society. An exciting, but also difficult and daunting task lies ahead of the global research and engineering community to guide mankind towards global, sustainable, environmental management.

For more, Wikipedia has a good overview. The Working Group on the ‘Anthropocene’ (which sits inside the International Union of Geological Sciences) defines it so (with odd punctuation):

The 'Anthropocene’ is a term widely used since its coining by Paul Crutzen and Eugene Stoermer in 2000 to denote the present time interval, in which many geologically significant conditions and processes are profoundly altered by human activities. These include changes in: erosion and sediment transport associated with a variety of anthropogenic processes, including colonisation, agriculture, urbanisation and global warming. the chemical composition of the atmosphere, oceans and soils, with significant anthropogenic perturbations of the cycles of elements such as carbon, nitrogen, phosphorus and various metals. environmental conditions generated by these perturbations; these include global warming, ocean acidification and spreading oceanic 'dead zones’. the biosphere both on land and in the sea, as a result of habitat loss, predation, species invasions and the physical and chemical changes noted above.

What does Anthropocene librarianship do?

Some examples, but you will be able to think of more:

  • Collections: building collections that serve our users’ needs regarding everything about climate change; sharing resources; keeping users informed about what we have and how it’s useful; providing reader’s advisory about climate fiction.
  • Preservation: preserving materials in all forms and carriers, including knowledge, culture, the web, data, code and research; collaborating with others on preserving languages, seeds, etc.; guaranteeing long-term stability of online sources; saving libraries and special collections at risk to disasters; storing original documents and special collections about climate-related research (e.g. Harvard Library’s Papers of the Intergovernmental Panel on Climate Change).
  • Sustainability: of our buildings and architecture (the green libraries work currently underway); of our practices, processes and platforms.
  • Greenhouse gas reductions: in buildings; power usage overall; from paper and power in printers and photocopiers; purchasing; book delivery between branches; conference arrangements.
  • Preparation: preparing for droughts, storms, floods, heat waves, higher sea levels, temperature increases, changes in agriculture, extinctions, climate migrations, conflict, regulations enforcing reduced carbon emissions, etc.
  • Disaster response: providing reference services; providing telephone and internet access; lending technology; supporting crisis mapping.
  • Climate migrations: providing services for incoming migrants; preserving what they leave behind.
  • Collaborations: with libraries, associations and communities in areas under pressure or at risk; with researchers; with climate change groups.
  • Communities: hosting shelter in cool air during heat waves; making meeting spaces available to community groups.
  • Advocacy: about the science and politics; about responses and remedies; about what libraries, archives and our local communities need and can do.
  • Information literacy and climate literacy: about the science and how is done; about the politics and how it is made; about resources to help understand and respond to climate change; dealing with climate change deniers; using climate change as an example subject in instruction; providing subject guides, workshops, classes, reference service at climate change events.
  • Research: applying library and information science methods to climate change-related disciplines, their methods, scholars, publications, practices, discourse, etc.; collaborating on and supporting work by researchers in those fields.
  • Free and open: access, data, software, research; making all work in this area freely available to everyone under the best license (Creative Commons, GPL, etc.).
  • Social justice: understanding and explaining how climate change is connected to issues about economics, law, social policy, etc.
  • Values: recognizing values shared with environmental and other groups, such as preservation, conservation, stewardship and long time frames.
  • Prefiguration: “making one’s means as far as possible identical with one’s ends” as Graeber (2014) puts it; putting into practice today what we want our work, profession, institutions and organizations to be like in the future.

The term

There is debate about whether the term “Anthropocene” is valid and if so when the interval began. Boswell (1892) quotes Dr. Johnson: “Depend upon it, Sir, when any man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully.” Climate change isn’t two weeks away, it’s last year, now, and decades and centuries ahead. “Anthropocene librarianship” is meant to help concentrate our minds.

The current literature

A search in Library and Information Science Abstracts (one of the major subscription article databases in LIS; it’s run by ProQuest) turns up nothing for the word “anthropocene:”

“Climate change” is one of its subject terms, however, and that shows 17 results today:

Here they are:

  • Adamich, Tom et al. “The Gov Doc Kids Group and Free Government Information.” IFLA Journal 38.1 (2012): 68–77.
  • Dutt, Bharvi, K. C. Garg, and Archita Bhatta. “A Quantitative Assessment of the Articles on Environmental Issues Published in English-Language Indian Dailies.” Annals of Library and Information Studies 60.3 (2013): 219–226.
  • Elia, Emmanuel F., Stephen Mutula, and Christine Stilwell. “Indigenous Knowledge Use in Seasonal Weather Forecasting in Tanzania: The Case of Semi-Arid Central Tanzania.” South African Journal of Libraries and Information Science 80.1 (2014): 18–27.
  • Etti, Susanne et al. “Growing the ERM Energy and Climate Change Practice Through Knowledge Sharing.” Journal of Information & Knowledge Management 9.3 (2010): 241–250.
  • Gordon-Clark, Matthew. “Paradise Lost? Pacific Island Archives Threatened by Climate Change.” Archival Science 12.1 (2012): 51–67.
  • Hall, Richard. “Towards a Resilient Strategy for Technology-Enhanced Learning.” Campus-Wide Information Systems 28.4 (2011): 234–249.
  • Hiroshi, Hirano. “Usage details of the Earth Simulator and sustained performance of actual applications.” Journal of Information Processing and Management 48.5 (2005): 268–275.
  • Holgate, Becky. “Global Climate Change.” The School Librarian 63.2 (2015): 84.
  • Islam, Md. Shariful. “The Community Development Library in Bangladesh.” Information Development 25.2 (2009): 99–111.
  • Johansen, Bruce E. “Media Literacy and ‘Weather Wars:’ Hard Science and Hardball Politics at NASA.” SIMILE: Studies in Media & Information Literacy Education 6.3 (2006): np.
  • Luz, Saturnino, Masood Masoodian, and Manuel Cesario. “Disease Surveillance and Patient Care in Remote Regions: An Exploratory Study of Collaboration among Health-Care Professionals in Amazonia.” Behaviour & Information Technology 34.6 (1507): 548–565.
  • Murgatroyd, Peter, and Philip Calvert. “Information-Seeking and Information-Sharing Behavior in the Climate Change Community of Practice in the Pacific.” Science & Technology Libraries 32.4 (2013): 379–401.
  • Mwalukasa, Nicholaus. “Agricultural Information Sources Used for Climate Change Adaptation in Tanzania.” Library Review 62.4-5 (2013): 266–292.
  • Sabou, Marta, Arno Scharl, and Michael Fols. “Crowdsourced Knowledge Acquisition: Towards Hybrid-Genre Workflows.” International Journal on Semantic Web and Information Systems 9.3 (2013): 14–41.
  • Stoss, F. W. “The Heat Is on! U.S. Global Climate Change Research and Policy.” EContent 23.4 (2000): 36–38.
  • Vaughan, K. T. L. “Science and Technology Sources on the Internet. Global Warming and Climate Change Science.” Issues in Science and Technology Librarianship 32 (2001): n. pag.
  • Veefkind, V. et al. “A New EPO Classification Scheme for Climate Change Mitigation Technologies.” World Patent Information 34.2 (2012): 106–111.

Quite a mix, from around the world, and representative of the wide range of subject matter LIS has in its scope.

But only 17? Since 2000? This certainly isn’t a full literature review, but 17 is far too few for even a quick search. We need a lot more work done.

The Journal of Anthropocene Librarianship

Perhaps we could start The Journal of Anthropocene Librarianship to focus and grow attention in our discipline, while still engaging in inter- and transdisciplinary work beyond LIS. Of course it would be fully open access.

I found three new journals on the the Anthropocene: Anthropocene (Elsevier, RoMEO green, allows some self-archiving), The Anthropocene Review (Sage), and Elementa: Science of the Anthropocene (BioOne, fully open access, see author guidelines). The introductory editorial in Anthropocene by Chin et al. sets out its aim:

Anthropocene openly seeks research that addresses the scale and extent of human interactions with the atmosphere, cryosphere, ecosystems, oceans, and landscapes. We especially encourage interdisciplinary studies that reveal insight on linkages and feedbacks among subsystems of Earth, including social institutions and the economy. We are concerned with phenomena ranging over time from geologic eras to single isolated events, and with spatial scales varying from grain scale to local, regional, and global scales. Papers that address new theoretical, empirical, and methodological advances are high priority for the Journal. We welcome contributions that elucidate deep history and those that address contemporary processes; we especially invite manuscripts with potential to guide and inform humanity into the future.

A broad approach like this but tailored to LIS could work well.

On the other hand, leaping to a journal is a big step. Maybe it’s best to follow the Code4Lib model: start with a mailing list and a web site, and grow. Or, do it all at once.

What about archives?

Libraries and archives work together closely but serve different purposes, and archivists are very different from librarians, so I won’t venture into describing what Anthropocene archives might be like. However, Matthew Gordon-Clark’s “Paradise Lost? Pacific Island Archives Threatened by Climate Change” (2012; the sea level rise predictions now are worse) is a perfect example of this work. Here’s the abstract:

Over the past 10 years, a clear pattern of increasing sea-level rises has been recorded across the Pacific region. As international work progresses on climate change, it is becoming clear that the expected rise of sea levels will have significant impacts upon low-lying islands and nations. Sea-level rises of less than 0.5 m are generally suggested, although some researchers have made more drastic projections. This paper describes the second stage of research into the impacts of climate change upon the national archival collections in low-lying Pacific islands and nations. This article follows on the argument that archival collection relocation will be necessary and sets the boundaries for further research. It will summarize current research into climate change models and predicted sea-level rises, identify Pacific islands and nations that will be the focus of detailed further research by setting a range of research boundaries based on the known geography of nations within the Pacific, arguing for a specific measurement of “low-elevation”, outlining other risk factors likely to affect the survival of threatened national archival collections and naming those islands and nations that are thus deemed to be at greatest risk of flooding and thus likely to need to relocate their archives. The goal is to demonstrate how archivists might inform the governmental policy in threatened islands and nations as well as what other nations might do to offer assistance.

A web search for anthropocene archives turns up a lot of results. Archives of the Anthropocene at the Max Planck Institute for the History of Science is interesting:

Taken seriously, the Anthropocene claims that the cultural has insinuated itself so thoroughly into the natural that any notion of an objective, unhumanized record of the earth will no longer be tenable. The Anthropocene hypothesis implies that the sciences of the archives will need to reorient themselves to a new, participatory sense of macro-duration and confront the possibilities that the unaccessioned “noise” of human artifacts might dwarf any authoritative signal that we believe our archives will communicate to the distant future.

Galleries and museums also have to deal with the problem. As a group we’re called the GLAM sector: galleries, libraries, archives and museums. Together: GLAMthropocene, saving the world.

Works cited

Boswell, James. The Life of Samuel Johnson, LL.D. Together with The Journal of a Tour to the Hebdrides. Vol. 3. London: George Bell & Sons, 1892.

Chin, Anne et al. “Anthropocene: Human Interactions with Earth Systems.” Anthropocene 1 (2013): 1–2. DOI: 10.1016/j.ancene.2013.10.001

Crutzen, Paul J. and Eugene F. Stoermer. “The 'Anthropocene’.” Global Change Newsletter 41 (2000): 17–18.

Gordon-Clark, Matthew. “Paradise Lost? Pacific Island Archives Threatened by Climate Change.” Archival Science 12.1 (2012): 51–67. DOI: 10.1007/s10502-011-9144-3

Graeber, David. “Anthropology and the rise of the professional-managerial class.” Journal of Ethnographic Theory 4.3 (2014): 73–88.

Mita Williams: What we've got here is failure to understand Scholarly Communication

Sun, 2015-11-15 14:34
If you follow conversations about Scholarly Communication (as I do), it is not uncommon to run into the frustrations of librarians and scholars who cannot understand why their peers continue to publish in journals that reside behind expensive paywalls. As someone who very much shares this frustration, I found this quotation particularly illuminating:

As in Latin, one dominant branch of meaning in "communication" has to do with imparting, quite apart from any notion of a dialog or interactive process. Thus communication can mean partaking, as in being a communicant (partaking in holy communication). Here "communication" suggests belonging to a social body via an expressive act that requires no response or recognition. To communicate by consuming bread and wine is to signify membership in a communion of saints both living and dead, but it is primarily a message-sending activity (except perhaps as a social ritual to please others or as a message to the self or to God). Moreover, here to "communicate" is an act of receiving, not of sending; more precisely, it is to send by receiving. A related sense is the notion of a scholarly "communication" (monograph) or a "communication" as a message or notice. Here is no sense of exchange, through some sort of audience, however vague or dispersed, is implied.

- "Speaking into the air", John Durham Peters, p.7

Hydra Project: Sufia 6.4.0 released

Fri, 2015-11-13 10:22

We are pleased to announce the release of Sufia 6.4.0

Sufia 6.4.0 includes new features for uploading files to collections, enabling suggested citation formatting, as well as a number of bugfixes and refactorings.

See the release notes [1] for the upgrade process and for an exhaustive list of the work that has gone into this release. Thanks to the 18 contributors for this release, which comprised 100 commits touching 123 files: Adam Wead, Anna Headley, Brandon Straley, Carolyn Cole, Colin Gross, Dan Kerchner, Lynette Rayle, Hector Correa, Justin Coyne, Mike Giarlo, Michael Tribone, Nation Rogers, Olly Lyytinen, Piotr Hebal, Randy Coulman, Christian Aldridge, Tonmoy Roy, and Yinlin Chen.


William Denton: Foul, rainy, muddy sloppy morning

Fri, 2015-11-13 03:12

“It was a foul, rainy, muddy, sloppy morning, without a glimmer of sun, with that thick, pervading, melancholy atmosphere which forces for the time upon imaginative men a conviction that nothing is worth anything,” —Anthony Trollope, Ralph the Heir (1871), chapter XXIX.

District Dispatch: Fan fiction webinar now available

Thu, 2015-11-12 22:26

Harry Potter enthusiasts dress as Hogwarts students (image from Wikimedia).

An archive of the CopyTalk webinar on fan fiction and copyright issues originally broadcast on Thursday, November 5, 2015 is available.

Fan-created works are in general broadly available to people at the click of a link. Fan fiction hasn’t been the subject of any litigation, but it plays an increasing role in literacy as its creation and consumption has skyrocketed. Practice on the ground can matter as much as court cases and the explosion of noncommercial creativity is a big part of the fair use ecosystem. This presentation touched on many of the ways in which creativity has impacted recent judicial rulings on fair use, from Google books , to  putting a mayor’s face on a T-shirt, to copying a competitor’s ad for a competing ad. Legal scholar and counsel to the Organization for Transformative Works, Rebecca Tushnet enlightened us.

This was a really interesting webinar. Do check it out!

Rebecca Tushnet clerked for Chief Judge Edward R. Becker of the Third Circuit Court of Appeals in Philadelphia and Associate Justice David H. Souter of the United States Supreme Court and spent two years as an associate at Debevoise & Plimpton in Washington, DC, specializing in intellectual property. After two years at the NYU School of Law, she moved to Georgetown, where she now teaches intellectual property, advertising law, and First Amendment law.

Her work currently focuses on the relationship between the First Amendment and false advertising law. She has advised and represented several fan fiction websites in disputes with copyright and trademark owners. She serves as a member of the legal team of the Organization for Transformative Works, a nonprofit dedicated to supporting and promoting fanworks, and is also an expert on the law of engagement rings.

Our next CopyTalk is December 3rd at 2pm Eastern/11am Pacific. Our topic will be the 1201 rulemaking and this year’s exemptions. Get ready for absurdity!

The post Fan fiction webinar now available appeared first on District Dispatch.

Open Knowledge Foundation: Calling all Project Assistants: we need you!

Thu, 2015-11-12 17:55

The mission of Open Knowledge International is to open up all essential public interest information and see it utilized to create insight that drives change. To this end we work to create a global movement for open knowledge, supporting a network of leaders and local groups around the world; we facilitate coordination and knowledge sharing within the movement; we build collaboration with other change-making organisations both within our space and outside; and, finally, we prototype and provide a home for pioneering products.

A decade after its foundation, Open Knowledge International is ready for its next phase of development. We started as an organisation that led the quest for the opening up of existing data sets – and in today’s world most of the big data portals run on CKAN, an open source software product developed first by us.

Today, it is not only about opening up of data; it is making sure that this data is usable, useful and – most importantly – used, to improve people’s lives. Our current projects (OpenSpending, OpenTrials, School of Data, and many more) all aim towards giving people access to data, the knowledge to understand it, and the power to use it in our everyday lives.

Now, we are looking for an enthusiastic

Project Assistant

(flexible location, part time)

to join the team to help deliver our projects around the world. We are seeking people who care about openness and have the commitment to make it happen.

We do not require applicants to have experience of project management – instead, we would like to work with motivated self-starters, able to demonstrate engagement with initiatives within the open movement. If you have excellent written and verbal communication skills, are highly organised and efficient with strong administration and analytical abilities, are interested in how projects are managed and are willing to learn, we want to hear from you.

The role includes the following responsibilities:

  • Monitoring and reporting of ongoing work progress to Project Managers and on occasion to other stakeholders
  • Research and investigation
  • Coordination of, and communication with, the project team, wider organisation, volunteers and stakeholders
  • Documentation, including creating presentations, document control, proof-reading, archiving, distributing and collecting
  • Meeting and event organisation, including scheduling, booking, preparing documents, minuting, and arranging travel and accommodation where needed
  • Project communication and promotion, including by email, blog, social media, networking online and in person
  • Liaising with staff across the organisation to offer and for support, eg public communication and finance

Projects you may be involved with include Open Data for Development, OpenTrials and OpenSpending, as well as new projects in future.

This role requires someone who can be flexible and comfortable with remote working, able to operate in a professional environment and participate in grassroots activities. Experience working as and with volunteers is advantageous.

You are comfortable working with people from different cultural, social and ethnic backgrounds. You are happy to share your knowledge with others, and you find working in transparent and highly visible environments interesting and fun.

Personally, you have a demonstrated commitment to working collaboratively, with respect and a focus on results over credit.

The position reports to the Project Manager and will work closely with other members of the project delivery team.

The role is part-time at 20 hours per week, paid by the hour. You will be compensated with a market salary, in line with the parameters of a non-profit-organisation.

This would particularly suit recent graduates who have studied a complementary subject to Open Knowledge International, looking for some experience in the workplace.

Successful applicants must have excellent English language skills in both speaking and writing.

You can work from home, with flexibility offered and required. Some flexibility around work hours is useful, and there may be some (infrequent) international travel required.

We offer employment contracts for residents of the UK with valid permits, and services contracts to overseas residents.

Interested? Then send us a motivational letter and a one page CV via Please indicate your current country of residence, as well as your salary expectations (in GBP) and your earliest availability.

Early application is encouraged, as we are looking to fill the positions as soon as possible. These vacancies will close when we find a suitable candidate.

If you have any questions, please direct them to jobs [at]

David Rosenthal: SPARC Author Addendum

Thu, 2015-11-12 16:00
SPARC has a post Author Rights: Using the SPARC Author Addendum to secure your rights as the author of a journal article announcing the result of an initiative to fix one of the fundamental problems of academic publishing, namely that in most cases authors carelessly give up essential rights by signing unchanged a copyright transfer agreement written by the publisher's lawyers.

The publisher will argue that this one-sided agreement, often transferring all possible rights to the publisher, is absolutely necessary in order that the article be published. Despite their better-than-average copyright policy, ACM's claims in this regard are typical. I dissected them here.

The SPARC addendum was written by a lawyer, Michael W. Carroll of Villanova University School of Law, and is intended to be attached to, and thereby modify, the publisher's agreement. It performs a number of functions:
  • Preserving the author's rights to reproduce, distribute perform, and display the work for non-commercial purposes.
  • Acknowledges that the work may already be the subject of non-exclusive copyright grants to the author's institution or a funding agency.
  • Imposes as a condition of publication that the publisher provide the author with a PDF of the camera-ready version without DRM.
The kicker is the final paragraph, which requests that the publisher return a signed copy of the addendum, and makes it clear that publishing the work in any way indicates assent to the terms of the addendum. This leaves the publisher with only three choices, agree to the terms, refuse to publish the work, or ignore the addendum.

Of course, many publishers will refuse to publish, and many authors at that point will cave in. The SPARC site has useful advice for this case. The more interesting case is the third, where the publisher simply ignores the author's rights as embodied in the addendum. Publishers are not above ignoring the rights of authors, as shown by the history of my article Keeping Bits Safe: How Hard Can It Be?, published both in ACM Queue (correctly with a note that I retained copyright) and in CACM (incorrectly claiming ACM copyright). I posted analysis of ACM's bogus justification of their copyright policy based on this experience. There is more here.

So what will happen if the publisher ignores the author's addendum? They will publish the paper. The author will not get a camera-ready copy without DRM. But the author will make the paper available, and the "kicker" above means they will be on safe legal ground. Not merely did the publisher constructively agree to the terms of the addendum, but they failed to deliver on their side of the deal. So any attempt to haul the author into court, or send takedown notices, would be very risky for the publisher.

2012 data from Alex HolcombePublishers don't need anything except permission to publish. Publishers want the rights beyond this to extract the rents that generate their extraordinary profit margins. Please use the SPARC addendum when you get the chance.

FOSS4Lib Recent Releases: Vivo - 1.8.1

Thu, 2015-11-12 14:26

Last updated November 12, 2015. Created by Peter Murray on November 12, 2015.
Log in to edit this page.

Package: VivoRelease Date: Tuesday, November 10, 2015

Open Knowledge Foundation: Treasures from the Public Domain in New Essays Book

Thu, 2015-11-12 13:34

Open Knowledge project The Public Domain Review is very proud to announce the launch of its second book of selected essays! For nearly five years now we’ve been diligently trawling the rich waters of the public domain, bringing to the surface all sorts of goodness from various openly licensed archives of historical material: from the Library of Congress to the Rijksmuseum, from Wikimedia Commons to the wonderful Internet Archive. We’ve also been showcasing, each fortnight, new writing on a selection of these public domain works, and this new book picks out our very best offerings from 2014.

All manner of oft-overlooked histories are explored in the book. We learn of the strange skeletal tableaux of Frederik Ruysch, pay a visit to Humphry Davy high on laughing gas, and peruse the pages of the first ever picture book for children (which includes the excellent table of Latin animal sounds pictured below). There’s also fireworks in art, petty pirates on trial, brainwashing machines, truth-revealing diseases, synesthetic auras, Byronic vampires, and Charles Darwin’s photograph collection of asylum patients. Together the fifteen illustrated essays chart a wonderfully curious course through the last five hundred years of history — from sea serpents of the 16th-century deep to early-20th-century Ouija literature — taking us on a journey through some of the darker, stranger, and altogether more intriguing corners of the past.

Order by 18th November to benefit from a special reduced price and delivery in time for Christmas

If you are wanting to get the book in time for Christmas (and we do think it’d make an excellent gift for that history-loving relative or friend!), then please make sure to order before midnight on Wednesday 18th November. Orders placed before this date will also benefit from a special reduced price!

Please visit the dedicated page on The Public Domain Review site to learn more and also buy the book!

Double page spread of Latin animal sounds – from our essay on the first ever children’s picture book.

Double page spread (full bleed!), showing a magnificent 18th-century print of a fireworks display at the Hague – from our essay on how artists have responded to the challenge of depicting fireworks through the ages.

Ed Summers: Data First Interventions

Thu, 2015-11-12 05:00

These are some remarks I made at the Web Archives conference at the University of Michigan, on November 12th, 2015. I didn’t have any slides other than this visual presentation.

Thanks for the opportunity to participate in this panel today. I’m really looking forward to the panel conversation so I will try to keep my remarks brief. A little over a year ago I began working as a software developer at the Maryland Institute for Technology in the Humanities (MITH). MITH has been doing digital humanities (DH) work for the last 15 years. Over that time it has acquired a rich and [intertwingled] history of work at the intersection of computing and humanities disciplines, such as textual studies, art history, film and media studies, music, electronic literature, games studies, digital forensics, the performing arts, digital stewardship, and more. Even after a year I’m still getting my head around the full scope of this work.

To some extent I think MITH and similar centers, conferences and workshops like ThatCamp have been so successful at infusing humanities work with digital methods and tools that the D in DH isn’t as necessary as it once was. We’re doing humanities work that necessarily involves now pervasive computing technology and digitized or born digital collections. Students and faculty don’t need to be convinced that digital tools, methods and collections are important for their work. They are eager to engage with DH work, and to learn the tools and skills to do it. At least that’s been my observation in the last year. For the rest of my time I’d like to talk about how MITH does its work as a DH center, and how that intersects with material saved from the Web.

Traditionally, DH centers like MITH have been built on the foundation of faculty fellowships, which bring scholars into the center for a year to work on a particular project, and spread their knowledge and expertise around. But increasingly MITH has been shifting its attention to what we call the Digital Humanities Incubator model. The incubator model started in 2013 as a program to introduce University Library faculty, staff and graduate assistants to digitization, transcription, data modeling and data exploration through their own projects. Unfortunately, there’s not enough time here to describe the DH incubator in much more detail, but if you are interested I encourage you to check out Trevor Munñoz and Jennifer Guillano’s Making Digital Humanities Work where they talk about the development of the incubator. In the last year we’ve been experimenting with an idea that grew out of the incubator, which Neil Fraistat (MITH’s Director) has been calling the data first approach. Neil described this approach earlier this year at DH 2015 in Sydney using this particular example:

This past year, we at MITH experimented with digital skills development by starting not with a fellow or a project, but with a dataset instead: an archive of over 13 million tweets harvested … concerning the shooting of Michael Brown by a police officer in Ferguson, Missouri and the protests that arose in its wake. Beginning with this dataset, MITH invited arts and humanities, journalism, social sciences, and information sciences faculty and graduate students to gather and generate possible research questions, methods, and tools to explore it. In response to the enthusiastic and thoughtful discussion at this meeting, MITH created a series of five heavily attended workshops on how to build social media archives, the ethics and rights issues associated with using them, and the tools and methods for analyzing them. The point here was not to introduce scholars to Digital Humanities or to enlist them in a project, but to enable them through training and follow up consultations to do the work they were already interested in doing with new methods and tools. This type of training seems crucial to me if DH centers are actually going to realize their potential for becoming true agents of disciplinary transformation. And with something less than the resources necessary to launch and sustain a fellowship project, we were able to train a much larger constituency.

13 million tweets might sound like a lot. But really it’s not. It’s only 8GB of compressed, line-oriented JSON. The 140 characters that make up the text of each tweet is actually only 2% of the structured data that is made available from the Twitter API for each tweet. The five incubator workshops Neil mentioned were often followed with a sneakernet style transfer of data onto a thumb drive, accompanied by a brief discussion of the Twitter Terms of Service. The events in Ferguson aligned with interests of the student body and faculty. The data collection lead to collaborations with Bergis Jules at UC Riverside to create more Twitter datasets for Sandra Bland, Freddie Gray, Samuel Dubose and Walter Scott as awareness of institutionalized racism and police violence grew. The Ferguson dataset was used to create backdrops in town hall meetings attended by hundreds of students who were desperate to understand and contextualize Ferguson in their lives as students and citizens.

For me, this experience wasn’t about the power of Big Data. Instead it was a lesson in the necessity and utility of Small Data. Small data that is collected for a particular purpose, and whose provenance can fit comfortably in someone’s brain. Small data that intervened in the business as usual, collection development policies to offer new perspectives, inter or anti-disciplinary engagement, and allegiances.

I think we’re still coming to understand the full dimensions of this particular intervention, especially when you consider some of the issues around ethics, privacy, persistence and legibility that it presents. But I think DH centers like MITH are well situated to be places for creative interventions such as this one around the killing of Michael Brown. We need more spaces for creative thinking, cultural repair, and assembling new modes of thinking about our experience in the historical moments that we find ourselves living in today. Digital Humanities centers provide a unique place for this radically interdisciplinary work to happen.

I’d be happy to answer more questions about the Ferguson dataset or activities around it, either now or after this session. Just come and find me, or email me…I’d love to hear from you. MITH does have some work planned over the coming year for building capacity specifically in the area of Digital Humanities and African American history and culture. We’re also looking at ways to help build a community of practice around the construction of collections like the Ferguson Twitter dataset, especially with regards to how they inform what we collect form the Web. In addition to working at MITH I’m also a PhD student in the iSchool at UMD where I’m studying what I’m calling computer assisted appraisal, but which is actually a strand of work going I would be happy to talk to you about that stuff to, but that would be a different talk in itself. Thanks!

LITA: Jobs in Information Technology: November 11, 2015

Thu, 2015-11-12 03:42

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week:

Head of Processing, Yale University, New Haven, CT

Information Services Team Lead/Librarian (NASA), Cadence Group, Greenbelt, MD

Head of Collection Management, J. Willard Marriott Library, University of Utah, Salt Lake City, UT

Head of Graduate and Undergraduate Services, J. Willard Marriott Library, University of Utah, Salt Lake City, UT

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Roy Tennant: These Things I Will Do

Wed, 2015-11-11 22:24

Recently in a conversation with one of my daughters I remarked that the only issue I thought might be more devastating to humanity than the status of girls and women in society is global warming. Upon reflection, I now feel that I was wrong. There is nothing more devastating to humanity than the status of girls and women in society.

There are a few reasons for this, but I will simply cite an essential one. The essential reason is that for us to be able to collectively solve the biggest problems we face as humanity, we need ALL of our assets brought to bear, and that certainly includes the more than half of the planet who are female. The fact that many girls and women are marginalized, abused, and denied their full rights as human beings means that we are collectively severely crippled. And it has to stop.

Meanwhile, although I am in a female dominated profession, there are still a disproportionate number of men in positions of power and in generally higher-paying technical positions. For years the tech community Code4Lib has struggled to diversify and make women more comfortable in joining in, both virtually and at the annual conference. Thankfully, it appears that progress is being made.

But it is just a beginning — both for Code4Lib and for society more generally. So these are things I pledge to do to help:

  • Shut up. As a privileged white male, I’ve come to realize that my voice is the loudest in the room. And I don’t mean that in actual fact, although it is often true in that sense. I mean it figuratively. People pay attention to what I have to say just by the mere fact of my placement in the power hierarchy. The fact that I am speaking means a lesser-heard voice remains lesser-heard. So I will strive to not speak in situations where doing so can allow space for lesser-heard voices to speak.
  • Listen up. Having made space for lesser-heard voices, I need to listen to what they have to say. That means actively engaging with what they are saying, thinking carefully about it, and finding points of relevance to my situation.
  • Speak up. As someone in a position of power I know that it can be used for good or evil. Using it for evil doesn’t necessarily mean I knowingly cause harm, mind you, but I can cause harm nonetheless. Using my power for good may mean, at times, speaking up to others in power positions to create more inviting and inclusive situations for those with less power in social situations.
  • Step down. As someone who is often offered the podium at a conference or meeting, I’m trying to do better about not accepting offers until or unless there is at least equity in gender representation. This means sometimes walking away from gigs, which I have done and which I will continue to do until this female-dominated profession gives women their due.
  • Step up. Whether Edmund Burke said this or someone else, I nonetheless hold it to be true: “All that is necessary for the triumph of evil is that good men do nothing.” So sometimes I will need to spring into action to fight the evil of misogyny, whether it is overt and intended or subtle and unintentional.

There are no doubt other ways in which I can help, and I look forward to learning what those are. It’s a journey, I’ve found, in trying to understand what being on the top of the societal heap means and how it has shaped my perceptions and, unfortunately, actions.

Eric Hellman: Using Let's Encrypt to Secure an Elastic Beanstalk Website

Wed, 2015-11-11 21:25
Since I've been pushing the library and academic publishing community to implement HTTPS on all their informations services, I was really curious to see how the new Let's Encrypt (LE) certificate authority is really working, with its "general availability" date imminent. My conclusion is that "general availability" will not mean "general usability" right away; its huge impact will take six months to a year to arrive. For now, it's really important for the community to put our developers to work on integrating Let's Encrypt into our digital infrastructure.

I decided to secure the website as my test example. It's still being developed, and it's not quite ready for use, so if I screwed up it would be no disaster. is hosted using Elastic Beanstalk (EB) on Amazon Web Services (AWS), which is a popular and modern way to build scaleable web services. The servers that Elastic Beanstalk spins up have to be completely configured in advance- you can't just log in and write some files. And EB does its best to keep servers serving. It's no small matter to shut down a server and run some temporary server, because EB will spin up another server to handle rerouted traffic. These characteristics of  Elastic Beanstalk exposed some of the present shortcomings and future strengths of the Let's Encrypt project.

Here's the mission statement of the project:
Let’s Encrypt is a free, automated, and open certificate authority (CA), run for the public’s benefit.While most of us focus on the word "free", the more significant word here is "automated":
Automatic: Software running on a web server can interact with Let’s Encrypt to painlessly obtain a certificate, securely configure it for use, and automatically take care of renewal.Note that the objective is not to make it painless for website administrators to obtain a certificate, but to enable software to get certificates. If the former is what you want, in the near term, then I strongly recommend that you spend some money with one of the established certificate authorities. You'll get a certificate that isn't limited to 90 days, as the LE certificates are, you can get a wildcard certificate, and you'll be following the manual procedure that your existing web server software expects you to be following.

The real payoff for Let's Encrypt will come when your web server applications start expecting you to use the LE methods of obtaining security certificates. Then, the chore of maintaining certificates for secure web servers will disappear, and things will just work. That's an outcome worth waiting for, and worth working towards today.

So here's how I got Let's Encrypt working with Elastic Beanstalk for

The key thing to understand here is that before Let's Encrypt can issue me a certificate, I have to prove to them that I really control the hostname that I'm requesting a certificate for. So the Let's Encrypt client has to be given access to a "privileged" port on the host machine designated by DNS for that hostname. Typically, that means I have to have root access to the server in question.

In the future, Amazon should integrate a Let's Encrypt client with their Beanstalk Apache server software so all this is automatic, but for now we have to use the Let's Encrypt "manual mode". In manual mode, the Let's Encrypt client generates a cryptographic "challenge/response", which then needs to be served from the root directory of the web server.

Even running Let's Encrypt in manual mode required some jumping through hoops. It won't run on Mac OSX. It doesn't yet support the flavor of Linux used by Elastic Beanstalk, so it does no good configuring Elastic Beanstalk to install it there. Instead I used the Let's Encrypt Docker container, which works nicely, and I ran a Docker-Machine inside "virtualbox" on my Mac.

Having configured Docker, I ran
docker run -it --rm -p 443:443 -p 80:80 --name letsencrypt \    
-v "/etc/letsencrypt:/etc/letsencrypt" \
-v "/var/lib/letsencrypt:/var/lib/letsencrypt" \ -a manual -d \
--server auth 
(the --server option requires your domain to be whitelisted during the beta period.) After paging through some screens asking for my email address and permission to log my IP address, the client responded with
Make sure your web server displays the following content at before continuing:
8wBDbWQIvFi2bmbBScuxg4aZcVbH9e3uNrkC4CutqVQ.hZuATXmlitRphdYPyLoUCaKbvb8a_fe3wVj35ISDR2ATo do this, I configured a virtual directory "/.well-known/acme-challenge/" in the Elastic Beanstalk console with mapped to a "letsencrypt/" directory in my application. I then made a file named  "8wBDbWQIvFi2bmbBScuxg4aZcVbH9e3uNrkC4CutqVQ" with the specified content in my letsencrypt directory, committed the change with git, and deployed the application with the elastic beanstalk command line interface. After waiting for the deployment to succeed, I checked that responded correctly, and then hit <enter>. (Though the LE client tells you that the MIME type "text/plain" MUST be sent, elastic beanstalk sets no MIME header, which is allowed.)

IMPORTANT NOTES: - Congratulations! Your certificate and chain have been saved at   /etc/letsencrypt/live/ Your cert   will expire on 2016-02-08. To obtain a new version of the   certificate in the future, simply run Let's Encrypt again....except since I was running Docker inside virtualbox on my Mac, I had to log into the docker machine and copy three files out of that directory (cert.pem, privkey.pem, and chain.pem). I put them in my local <.elasticbeanstalk> directory. (See this note for a better way to do this.)

The final step was to turn on HTTPS in elastic beanstalk. But before doing that, I had to upload the three files to my AWS Identity and Access Management Console. To do this, I needed to use the aws command line interface, configured with admin privileges. The command was
aws iam upload-server-certificate \--server-certificate-name gitenberg-le \--certificate-body file://<.elasticbeanstalk>/cert.pem \--private-key file://<.elasticbeanstalk>/privkey.pem \--certificate-chain file://<.elasticbeanstalk>/chain.pemOne more trip to the Elastic Beanstalk configuration console (network/load balancer section), and was on HTTPS.

Given that my sys-admin skills are rudimentary, the fact that I was able to get Let's Encrypt to work suggests that they've done a pretty good job of making the whole process simple. However, the documentation I needed was non-existent, apparently because the LE developers want to discourage the use of manual mode. Figuring things out required a lot of error-message googling. I hope this post makes it easier for people to get involved to improve that documentation or build support for Let's Encrypt into more server platforms.

(Also, given that my sys-admin skills are rudimentary, there are probably better ways to do what I did, so beware.)

If you use web server software developed by others, NOW is the time to register a feature request. If you are contracting for software or services that include web services, NOW is the time to add a Let's Encrypt requirement into your specifications and contracts. Let's Encrypt is ready for developers today, even if it's not quite ready for rank and file IT administrators.

Evergreen ILS: Hack-A-Way 2015 Wrap Up

Wed, 2015-11-11 15:43

We wrapped up the 2015 Hack-A-Way this past Friday and after a few days to reflect I wanted to write about the event and on future events. Saying what the impact of a coding event will be can be difficult. Lines of code written can be a misleading metric and a given patch may not make it into production. However, a casual discussion could have major consequences years down the road. Indeed, it was at the first Hack-A-Way that Bill Erickson’s presentation on web sockets had no immediate impact but over the next year was a key component in the decision to go to a web based safe client. Still, I’m going to venture into saying that there are impacts both immediate and long term.

I won’t got into detail on each thing that was worked on, you can read the collaborative notes here for that:

But, some highlights, for me, included:

The Web Based staff client – Bill Erickson has done a lot of work on the Windows installer for Hatch and Galen will help with the OS X version. Both PINES and SCLENDS have libraries looking forward to getting the web based client into production to do real world tests. I’m very excited about this.

Galen Charlton was confirmed as the release manager for Evergreen 3.0 (or whatever version number is selected).

Syrup – A fair bit of bandwidth was spent on how Syrup could be more tightly integrated into Evergreen for course reserves. I’m always excited to see academics get more support with Evergreen even if it’s not my personal realm.

Sqitch – Bill Erickson presented on this, a tool for managing SQL scripts. Sqitch plans let you specify dependencies between SQL scripts; avoids need for numbering them so that they run in a particular order and encourages creation of create, revert, and verify scripts. This may be a good tool to use during development though production deployments are likely to still use the traditional upgrade scripts.

Twenty patches got merged during the Hack-A-Way with more following over the weekend that were tied to work done then.

Ken Cox, an Evergreen user joined us as a developer and showed us the great work he has done on an Android app for Evergreen.

We discussed the steps to becoming as a core committer and several ideas were thrown around about how to encourage the growth among potential developers via a mentoring program. No firm consensus came about in terms of what that program should look like but I’m glad to say that in an epilogue to the discussion Kathy Lussier has been made a core committer! Kathy has been a long time consistent contributor to Evergreen and bug reviewer so I’m excited to have seen this happen.

Search speed continues to be a contentious issue in the Evergreen community. Search relies on a lot of layers from hardware to Apache to Postgres to SQL queries to even the web site rendering and things beyond Evergreen like the speed between server and client. As a result comparisons need discipline and controls in place. Using external indexing and search products was discussed but it’s a hard discussion to have. Frankly, it’s very easy to end up comparing apples to oranges even between projects with similar tasks and goals. For example, Solr was referenced as a very successful product that is used commercially and with library products but research and exploration will be needed before we can have a more full discussion about it (or other products). MassLNC shared their search vision – which was a good starting place for the dialogue. Many systems administrators shared their best practices. We also discussed creating a baseline of searching taking into account variables such as system setups and record sizes and then creating metric goals. Even possible changes to Postgres to accommodate our needs was thrown out for consideration.

Related to the core committer discussion we did an overview of the technical layout of Evergreen and common paths in and out of the system for data.

Now, as all this wonderful work happened it’s still an incomplete picture. It doesn’t capture the conversations, the half patches, the bug testing, the personal growth of participants that happened as well. Nor does it capture the kind hosting we received from MassLNC and NOBLE who help ferry us about, sent staff to participate, arranged hotels, kept coffee flowing and in general were as kind a host as we could hope for. I feel like I should write far more about the hosts but I probably can’t thank each one individually as I’m sure I don’t know what each even did but the contribution of hosting the Hack-A-Way is always a big task and the folks at MassLNC and NOBLE did a wonderful job that we are all very thankful for.

Now, about the operational side of the Hack-A-Way. There were some discussions about the future of the event and managing remote participation. Remote participation in the Hack-A-Way has always been problematic. When the Hack-A-Way began remote participation largely amounted to people updating IRC with what was going on. Then we tried adding a camera and using Google Hangouts. Then, the limitations of Google Hangouts became apparent. We tried a FLOSS product the next year and that didn’t work well at all. Through all of this more people wanting to participate remotely has become a consistent issue. So, through the event this year I created a list of things I want to do next year. Tragically, this will put more bandwidth strain on the hosting site but we always seem to push the bandwidth to it’s limit (and sometimes beyond).

  • Ensure that the main presentation computer for group events has a microphone that can be used, is setup as part of the group event and has it’s screen shared.
  • Have the secondary station with the microphone / camera that can be mobile instead of on a stationary tripod. This will mean a dedicated laptop for this purpose. If I have time I may setup a Raspberry Pi with a script to watch IRC and allow IRC users to control the camera movement remotely, which might be fun.
  • Move to a more robust commercial product that has presentation controls (the needs this year showed that was necessary). We also have needs to occasionally break into small groups with remote presence that this won’t solve so Google Hangouts will still probably have a use. We are going to try out a commercial product next year for this but look at our options that support our community as best we can, namely looking for Chrome native support via an HTML5 client.

Beyond that, we discussed the frequency of the Hack-A-Way and locations. Next year Evergreen Indiana is kind enough to host us and already has a submission in place for 2017. Several ideas that were floated were extending the conference by 1 – 3 days for a hacking event there or even having a second Hack-A-Way each year situated to break the year into even segments of Conference / Hack-A-Way / Hack-A-Way rather than the Hack-A-Way being mid year between conferences as it is now. No decision was made except to continue the conversation and try to come to some decisions by the time of the conference in Raleigh.

The only sure thing is that those months will pass very quickly between now and then. I felt the Hack-A-Way was very successful with a lot of work done and a lot of good conversations started, which is part of the function of gathering into one spot so many of us that are spread out and used to only communicating via IRC and email (with occasional Facebook postings thrown in).

Eric Lease Morgan: MARC, MARCXML, and MODS

Wed, 2015-11-11 15:19

This is the briefest of comparisons between MARC, MARCXML, and MODS. Its was written for a set of library school students learning XML.

MARC is an acronym for Machine Readable Cataloging. It was designed in the 1960’s, and its primary purpose was to ship bibliographic data on tape to libraries who wanted to print catalog cards. Consider the computing context of the time. There were no hard drives. RAM was beyond expensive. And the idea of a relational database had yet to be articulated. Consider the idea of a library’s access tool — the card catalog. Consider the best practice of catalog cards. “Generate no more than four or five cards per book. Otherwise, we will not be able to accommodate all of the cards in our drawers.” MARC worked well, and considering the time, it represented a well-designed serial data structure complete with multiple checksum redundancy.

Someone then got the “cool” idea to create an online catalog from MARC data. The idea was logical but grew without a balance of library and computing principles. To make a long story short, library principles sans any real understanding of computing principles prevailed. The result was a bloating of the MARC record to include all sorts of administrative data that never would have made it on to a catalog card, and this data was delimited in the MARC record with all sorts of syntactical “sugar” in the form of punctuation. Moreover, as bibliographic standards evolved, the previously created data was not updated, and sometimes people simply ignored the rules. The consequence has been disastrous, and even Google can’t systematically parse the bibliographic bread & butter of Library Land.* The folks in the archives community — with the advent of EAD — are so much better off.

Soon after XML was articulated the Library Of Congress specified MARCXML — a data structure designed to carry MARC forward. For the most part, it addressed many of the necessary issues, but since it insisted on making the data in a MARCXML file 100% transformable into a “traditional” MARC record, MARCXML falls short. For example, without knowing the “secret codes” of cataloging — the numeric field names — it is very difficult to determine what are the authors, titles, and subjects of a book.

The folks at the Library Of Congress understood these limitations almost from the beginning, and consequently they created an additional bibliographic standard called MODS — Metadata Object Description Schema. This XML-based metadata schema goes a long way in addressing both the computing times of the day and the needs for rich, full, and complete bibliographic data. Unfortunately, “traditional” MARC records are still the data structure ingested and understood by the profession’s online catalogs and “discovery systems”. Consequently, without a wholesale shift in practice, the profession’s intellectual content is figuratively stuck in the 1960’s.

* Consider the hodgepodge of materials digitized by Google and accessible in the HathiTrust. A search for Walden by Henry David Thoreau returns a myriad of titles, all exactly the same.

  1. MARC ( – An introduction to the MARC standard
  2. leader ( – All about the leader of a traditional MARC record
  3. MARC Must Die ( – An essay by Roy Tennent outlining why MARC is not a useful bibliographic format. Notice when it was written.
  4. MARCXML ( – Here are the design considerations for MARCXML
  5. MODS ( – This is an introduction to MODS

This is much more of an exercise than it is an assignment. The goal of the activity is not to get correct answers but instead to provide a framework for the reader to practice critical thinking against some of the bibliographic standards of the library profession. To the best of your ability, and in the form of an written essay between 500 and 1000 words long, answer and address the following questions based on the contents of the given .zip file:

  1. Measured in characters (octets), what is the maximum length of a MARC record? (Hint: It is defined in the leader of a MARC record.)
  2. Given the maximum length of a MARC record (and therefore a MARCXML record), what are some of the limitations this imposes when it comes to full and complete bibliographic description?
  3. Given the attached .zip file, how many bibliographic items are described in the file named data.marc? How many records are described in the file named data.xml? How many records are described in the file named data.mods? How do did you determine the answers to the previous three questions? (Hint: Open and read the files in your favorite text and/or XML editor.)
  4. What is the title of the book in the first record of data.marc? Who is the author of the second record in the file named data.xml. What are the subjects of the third record in the file named data.mods? How did you determine the answers the previous three questions? Be honest.
  5. Compare & contrast the various bibliographic data structures in the given .zip file. There are advantages and disadvantages to all three.