You are here

Feed aggregator

Brown University Library Digital Technologies Projects: What is OCRA?

planet code4lib - Fri, 2015-04-10 15:40

OCRA is a platform for faculty to request digital course reserves in all formats.  Students access digital course reserves via Canvas or at the standalone OCRA site.  Students access physical reserves in library buildings via Josiah.

Hydra Project: SAVE THE DATE: Hydra Connect 2015 – Monday, September 21st – Thursday, September 24th

planet code4lib - Fri, 2015-04-10 15:21

Hydra today announced the dates for Hydra Connect 2015:

Hydra Connect 2015
Minneapolis, Minnesota
Monday, September 21 – Thursday, September 24, 2015

Registration and lodging details will be available in early June 2015.

The four day event will be structured as follows:

  • Mon 9/21        – Workshops and leader facilitated training sessions
  • Tue 9/22          – Main Plenary Session
  • Wed 9/23        – Morning Plenary Session, Afternoon Un-conference breakout sessions
  • Thu 9/24  – All day Un-conference breakouts and workgroup sessions

We are also finalizing details for a poster session within the program and a conference dinner to be held on the one of the main conference evenings.

Please mark your calendars and plan on joining us this September!

David Rosenthal: 3D Flash - not as cheap as chips

planet code4lib - Fri, 2015-04-10 15:00
Chris Mellor has an interesting piece at The Register pointing out that while 3D NAND flash may be dense, its going to be expensive.

The reason is the enormous number of processing steps per wafer - between 96 and 144 deposition layers for the three leading 3D NAND flash technologies. Getting non-zero yields from that many steps involves huge investments in the fab:
Samsung, SanDisk/Toshiba, and Micron/Intel have already announced +$18bn investment for 3D NAND.
  • Samsung’s new Xi’an, China, 3D NAND fab involves a +$7bn total capex outlay
  • Micron has outlined a $4bn spend to expand its Singapore Fab 10
This compares with Seagate and Western Digital’s capex totalling ~$4.3 bn over the past three years.Chris has this chart, from Gartner and Stifel, comparing the annual capital expenditure per TB of storage of NAND flash and hard disk. Each TB of flash contains at least 50 times as much capital as a TB of hard disk, which means it will be a lot more expensive to buy.

PS - "as cheap as chips" is a British usage.

Jonathan Rochkind: “Streamlining access to Scholarly Resources”

planet code4lib - Fri, 2015-04-10 14:14

A new Ithaka report, Meeting Researchers Where They Start: Streamlining Access to Scholarly Resources [thanks to Robin Sinn for the pointer], makes some observations about researcher behavior that many of us probably know, but that most of our organizations haven’t succesfully responded to yet:

  • Most researchers work from off campus.
  • Most researchers do not start from library web pages, but from google, the open web, and occasionally licensed platform search pages.
  • More and more of researcher use is on smaller screens, mobile/tablet/touch.

The problem posed by the first two points is the difficulty in getting access to licensed resources. If you start from the open web, from off campus, and wind up at a paywalled licensed platform — you will not be recognized as a licensed user.  Becuase you started from the open web, you won’t be going through EZProxy. As the Ithaka report says, “The proxy is not the answer… the researcher must click through the proxy server before arriving at the licensed content resource. When a researcher arrives at a content platform in another way, as in the example above, it is therefore a dead-end.”

Shibboleth and UI problems

Theoretically, Shibboleth federated login is an answer to some of that. You get to a licensed platform from the open web, you click on a ‘login’ link, and you have the choice to login via your university (or other host organization), using your institutional login at your home organization, which can authenticate you via Shibboleth to the third party licensed platform.

The problem here that the Ithaka report notes is that these Shibboleth federated login interfaces at our  licensed content providers — are terrible.

Most of them even use the word “Shibboleth” as if our patrons have any idea what this means. As the Ithaka report notes, “This login page is a mystery to most researchers. They can be excused for wondering “what is Shibboleth?” even if their institution is part of a Shibboleth federation that is working with the vendor, which can be determined on a case by case basis by pulling down the “Choose your institution” menu.”

Ironically, this exact same issue was pointed out in the NISO “Establishing Suggested Practices Regarding Single Sign-on” (ESPReSSO) report from 2011. The ESPReSSO report goes on to not only identify the problem but suggest some specific UI practices that licensed content providers could take to improve things.

Four years later, almost none have. (One exception is JStor, which actually acted on the ESPReSSO report, and as a result actually has an intelligible federated sign-on UI, which I suspect our users manage to figure out. It would have been nice if the Ithaka report had pointed out good examples, not just bad ones. edit: I just discovered JStor is actually currently owned by Ithaka, perhaps they didn’t want to toot their own horn.).

Four years from now, will the Ithaka report have had any more impact?  What would make it so?

There is one more especially frustrating thing to me regarding Shibboleth, that isn’t about UI.  It’s that even vendors that say they support Shibboleth, support it very unreliably. Here at my place of work we’ve been very aggressive at configuring Shibboleth with any vendor that supports it. And we’ve found that Shibboleth often simply stops working at various vendors. They don’t notice until we report it — Shibboleth is not widely used, apparently.  Then maybe they’ll fix it, maybe they won’t. In another example, Proquest’s shibboleth login requires the browser to access a web page on a four-digit non-standard port, and even though we told them several years ago that a significant portion of our patrons are behind a firewall that does not allow access to such ports, they’ve been uninterested in fixing/changing it. After all, what are we going to do, cancel our license?  As the several years since we first complained about this issue show, obviously not.  Which brings us to the next issue…

Relying on Vendors

As the Ithaka report notes, library systems have been effectively disintermediated in our researchers workflows. Our researchers go directly to third-party licensed platforms. We pay for these platforms, but we have very little control of them.

If a platform does not work well on a small screen/mobile device, there’s nothing we can do but plead. If a platform’s authentication system UI is incomprehensible to our patrons, likewise.

The Ithaka report recognizes this, and basically recommends that… we get serious when we tell our vendors to improve their UI’s:

Libraries need to develop a completely different approach to acquiring and licensing digital content, platforms, and services. They simply must move beyond the false choice that sees only the solutions currently available and instead push for a vision that is right for their researchers. They cannot celebrate content over interface and experience, when interface and experience are baseline requirements for a content platform just as much as a binding is for a book. Libraries need to build entirely new acquisitions processes for content and infrastructure alike that foreground these principles.

Sure. The problem is, this is completely, entirely, incredibly unrealistic.

If we were for real to stop “celebreating content over interface and experience”, and have that effected in our acquisitions process, what would that look like?

It might look like us refusing to license something with a terrible UX, even if it’s content our faculty need electronically. Can you imagine us telling faculty that? It’s not going to fly. The faculty wants the content even if it has a bad interface. And they want their pet database even if 90% of our patrons find it incomprehensible. And we are unable to tell them “no”.

Let’s imagine a situation that should be even easier. Let’s say we’re lucky enough to be able to get the same package of content from two different vendors with two different platforms. Let’s ignore the fact that “big deal” licensing makes this almost impossible (a problem which has only gotten worse since a D-Lib article pointed it out 14 years ago). Even in this fantasy land, where we say we could get the same content from two differnet platforms — let’s say one platform costs more but has a much better UX.  In this continued time of library austerity budgets (which nobody sees ending anytime soon), could we possibly pick the more expensive one with the better UX? Will our stakeholders, funders, faculty, deans, ever let us do that? Again, we can’t say “no”.

edit: Is it any surprise, then, that our vendors find business success in not spending any resources on improving their UX?  One exception again is JStor, which really has a pretty decent and sometimes outstanding UI.  Is the fact that they are a non-profit endeavor relevant? But there are other non-profit content platform vendors which have UX’s at the bottom of the heap.

Somehow we’ve gotten ourselves in a situation where we are completely unable to do anything to give our patrons what we know they need.  Increasingly, to researchers, we are just a bank account for licensing electronic platforms. We perform the “valuable service” of being the entity you can blame for how much the vendors are charging, the entity you require to somehow keep licensing all this stuff on smaller budgets.

I don’t think the future of academic libraries is bright, and I don’t even see a way out. Any way out would take strategic leadership and risk-taking from library and university administrators… that, frankly, institutional pressures seem to make it impossible for us to ever get.

Is there anything we can do?

First, let’s make it even worse — there’s a ‘technical’ problem that the Ithaka report doesn’t even mention that makes it even worse. If the user arrives at a paywall from the open web, even if they can figure out how to authenticate, they may find that our institution does not have a license from that particular vendor, but may very well have access to the same article on another platform. And we have no good way to get them to it.

Theoretically, the OpenURL standard is meant to address exactly this “appropriate copy” problem. OpenURL has been a very succesful standard in some ways, but the ways it’s deployed simply stop working when users don’t start from library web pages, when they start from the open web, and every place they end up has no idea what institution they belong to or their appropriate institutional OpenURL link resolver.

I think the only technical path we have (until/unless we can get vendors to improve their UI’s, and I’m not holding my breath) is to intervene in the UI.  What do I mean by intervene?

The LibX toolbar is one example — a toolbar you install in your browser that adds instititutionally specific content and links to web pages, links that can help the user authenticate against a platform arrived to via the open web, even links that can scrape the citation details from a page and help the user get to another ‘appropriate copy’ with authentication.

The problem with LibX specifically is that browser toolbars seem to be a technical dead-end.  It has proven pretty challenging to get a browser toolbar to keep working accross browser versions. The LibX project seems more and more moribund — it may still be developed, but it’s documentation hasn’t kept pace, it’s unclear what it can do or how to configure it, fewer browsers are supported. And especially as our users turn more and more to mobile (as the Ithaka report notes), they more and more often are using browsers in which plugins can’t be installed.

A “bookmarklet” approach might be worth considering, for targetting a wider range of browsers with less technical investment. Bookmarklets aren’t completely closed off in mobile browsers, although they are a pain in the neck for the user to add in many.

Zotero is another interesting example.  Zotero, as well as it’s competitors including Mendeley, can succesfully scrape citation details from many licensed platform pages. We’re used to thinking of Zotero as ‘bibliographic management’, but once it’s scraped those citation details, it can also send the user to the institutionally-appropriate link resolver with those citation details — which is what can get the user to the appropriate licensed copy, in an authenticated way.  Here at my place of work we don’t officially support Zotero or Mendeley, and haven’t spent much time figuring out how to get the most out of even the bibliographic management packages we do officially support.

Perhaps we should spend more time with these, not just to support ‘bibliographic management’ needs, but as a method to get users from the open web to authenticated access to an appropriate copy.  And perhaps we should do other R&D in ‘bookmarklets'; in machine learning for citation parsing so users can just paste a citation into a box (perhaps via bookmarklet) to get authenticated access to appropriate copy; in anything else we can think of to:

Get the user from the open web to licensed copies.  To be able to provide some useful help for accessing scholarly resources to our patrons, instead of just serving as a checkbook. With some library branding, so they recognize us as doing something useful after all.

Filed under: General

LITA: “Why won’t my document print?!” — Two Librarians in Training

planet code4lib - Fri, 2015-04-10 13:30

For this post, I am joined by a fellow student in Indiana University’s Information and Library Science Department, Sam Ott! Sam is a first year student, also working toward a dual-degree Master of Library Science and Master of Information Science, who has over three years of experience working in paraprofessional positions in multiple public libraries. Sam and I are taking the same core classes, but he is focusing his studies on public libraries instead of my own focus on academic and research libraries. With these distinct end goals in mind, we wanted to write about how the technologies we are learning in library school are helping cultivate our skills in preparation for future jobs.



On the academic library track, much of the technology training seems to be abstract and theory based, paired with practical training. There is a push for students to learn digital encoding practices, such as TEI/XML, and to understand how these concepts function within a digital library/archive. Website architecture and development also appear as core classes and electives as ways to complement theoretical classes.

Specializations offer a chance to delve deeper into the theory and practice of one of these aspects, for example, Digital Libraries, Information Architecture, and Data Science. The student chapter of the Association for Information Science and Technology (ASIS&T) offers workshops through UITS, in addition to the courses offered, to introduce and hone UNIX, XML/XSLT, and web portfolio development skills.


ALA Midwinter Meeting, 2015.

On the public library track, the technology training is limited to two core courses (Representation and Organization, plus one chosen technology requirement) and electives. While most of the coursework for public libraries is geared toward learning how to serve multiple demographics, studying Information Architecture can allow for greater exposure to relevant technologies. However, the student’s schedule is filled by the former, with less time for technological courses.

One reason I chose to pursue the Master of Information Science, was to bridge what I saw as a gap in technology preparation for public library careers. The MIS has been extremely helpful in allowing me to learn best practices for system design and how people interact with websites and computers. However, these classes are still geared toward the skills needed for an academic librarian or industry employee, and lack the everyday technology skills a public librarian may need, especially if there isn’t an IT department available.


We’ve considered a few options of courses and workshops which could provide a hands-on approach to daily technology use in any library. Since many academic librarians focused in digital tools still staff the reference desk and interact with patrons, this information is vital for library students moving on to jobs. We imagine a course or workshop series that introduces students to common issues staff and patrons face with library technologies. The topics of this course could include: learning how to reboot and defragment computers, hook up and use various audio visual technologies such as projectors, and troubleshooting the dreaded printer problems.

The troubleshooting method we want to avoid. Image courtesy of

As public and academic libraries embrace the evolving digital trends, staff will need to understand how to use and troubleshoot ranges of platforms, makerspaces, and digital creativity centers. Where better to learn these skills than in school!

But we aren’t quite finished. An additional aspect to the course or workshop would be allowing the students to shadow, observe, and learn from University Information Technology Services as they troubleshoot common problems across all platforms. This practical experience both observing and learning how to fix frequent and repeated issues would give students a well-rounded experiential foundation while in library school.

If you are a LITA blog reader working in a public library, which skills would you recommend students learn before taking the job? What kinds of technology-related questions are frequently asked at your institution?

Open Knowledge Foundation: OpenCon 2015 is launched

planet code4lib - Fri, 2015-04-10 11:47

This blog post is cross-posted from the Open Access Working Group blog.

Details of OpenCon 2015 have just been announced!

OpenCon2015: Empowering the Next Generation to Advance Open Access, Open Education and Open Data will take place in on November 14-16 in Brussels, Belgium and bring together students and early career academic professionals from across the world to learn about the issues, develop critical skills, and return home ready to catalyze action toward a more open system for sharing the world’s information — from scholarly and scientific research, to educational materials, to digital data.

Hosted by the Right to Research Coalition and SPARC, OpenCon 2015 builds on the success of the first-ever OpenCon meeting last year which convened 115 students and early career academic professionals from 39 countries in Washington, DC. More than 80% of these participants received full travel scholarships, provided by sponsorships from leading organizations, including the Max Planck Society, eLife, PLOS, and more than 20 universities.

“OpenCon 2015 will expand on a proven formula of bringing together the brightest young leaders across the Open Access, Open Education, and Open Data movements and connecting them with established leaders in each community,” said Nick Shockey, founding Director of the Right to Research Coalition. “OpenCon is equal parts conference and community. The meeting in Brussels will serve as the centerpiece of a much larger network to foster initiatives and collaboration among the next generation across OpenCon’s three issue areas.“

OpenCon 2015’s three day program will begin with two days of conference-style keynotes, panels, and interactive workshops, drawing both on the expertise of leaders in the Open Access, Open Education and Open Data movements and the experience of participants who have already led successful projects.

The third day will take advantage of the location in Brussels by providing a half-day of advocacy training followed by the opportunity for in-person meetings with relevant policy makers, ranging from the European Parliament, European Commission, embassies, and key NGOs. Participants will leave with a deeper understanding of the conference’s three issue areas, stronger skills in organizing local and national projects, and connections with policymakers and prominent leaders across the three issue areas.

Speakers at OpenCon 2014 included the Deputy Assistant to the President of the United States for Legislative Affairs, the Chief Commons Officer of Sage Bionetworks, the Associate Director for Data Science for the U.S. National Institutes of Health, and more than 15 students and early career academic professionals leading successful initiatives. OpenCon 2015 will again feature leading experts. Patrick Brown and Michael Eisen, two of the co-founders of PLOS, are confirmed for a joint keynote at the 2015 meeting.

“For the ‘open’ movements to succeed, we must invest in capacity building for the next generation of librarians, researchers, scholars, and educators,said Heather Joseph, Executive Director of SPARC (The Scholarly Publishing and Academic Resources Coalition). “OpenCon is dedicated to creating and empowering a global network of young leaders across these issues, and we are eager to partner with others in the community to support and catalyze these efforts.”

OpenCon seeks to convene the most effective student and early career academic professional advocates—regardless of their ability to pay for travel costs. The majority of participants will receive full travel scholarships. Because of this, attendance is by application only, though limited sponsorship opportunities are available to guarantee a fully funded place at the conference. Applications will open on June 1, 2015.

In 2014, more than 1,700 individuals from 125 countries applied to attend the inaugural OpenCon. This year, an expanded emphasis will be placed on building the community around OpenCon and on satellite events. OpenCon satellite events are independently hosted meetings that mix content from the main conference with live presenters to localize the discussion and bring the energy of an in-person OpenCon event to a larger audience. In 2014, OpenCon satellite events reached hundreds of students and early career academic professionals in nine countries across five continents. A call for partners to host satellite events has now opened and is available at

OpenCon 2015 is organized by the Right to Research Coalition, SPARC, and a committee of student and early career researcher organizations from around the world.

Applications for OpenCon 2015 will open on June 1st. For more information about the conference and to sign up for updates, visit You can follow OpenCon on Twitter at @Open_Con or using the hashtag #opencon.

Hydra Project: DPLA joins the Hydra Partners

planet code4lib - Fri, 2015-04-10 09:28

We are delighted to announce that the Digital Public Library of America (DPLA) has become the latest formal Hydra Partner.  In their Letter of Intent Mark Matienzo, DPLA’s Director of Technology, writes of their “upcoming major Hydra project, generously funded by the IMLS, and in partnership with Stanford University and Duraspace, [which] focuses on developing an improved set of tools for content management, publishing, and aggregation for the network of DPLA Hubs. This, and other projects, will allow us to make contributions to other core components of the Hydra stack, including but not limited to Blacklight, ActiveTriples, and support for protocols like IIIF and ResourceSync. We are also interested in continuing to contribute our metadata expertise to the Hydra community to ensure interoperability across our communities.”

A warm welcome into the Partners for all our friends at the DPLA!

HangingTogether: Managing Metadata for Image Collections

planet code4lib - Thu, 2015-04-09 18:03

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Naun Chew of Cornell and Stephen Hearn of the University of Minnesota. Focus group members manage a wide variety of image collections presenting challenges for metadata management. In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments. Depending on the nature of the collection and its users, questions concerning identification of works, depiction of entities, chronology, geography, provenance, genre, subjects (“of-ness” and “about-ness”) all present themselves; so do opportunities for crowdsourcing and interdisciplinary research.

Many describe their digital image resources on the collection level while selectively describing items. As much as possible, enhancements are done in batch. Some do authority work, depending on the quality of the accompanying metadata. Some libraries have disseminated metadata guidelines to help bring more consistency in the data.

Among the challenges discussed:

Variety of systems and schemas:  Image collections created in different parts of the university such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library. The metadata often comes in spreadsheets, or unstructured accompanying data. Often the metadata created by other departments requires a lot of editing. The situation is simpler when all digitization is handled through one center and the library does all of the metadata creation. Some libraries are using Dublin Core for their image collections’ metadata and others are using MODS (Metadata Object Description Schema). It was suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema).

Duplicate metadata for different objects: There are cases where the metadata is identical for a set of drawings, even though there are slight differences in those drawings. Duplicating the metadata across similar objects is likely due to limited staff. Possibly the faculty could add more details. Brown University extended authorizations to photographers to add to the metadata accompanying their photographs without any problems.

Lack of provenance: A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance. For example, a researcher took OCR’ed text retrieved from HathiTrust, ending up with millions of images. However, the researcher didn’t include the metadata of where the images came from. The challenge is to support both a specific purpose and group of people as well as large scale discovery.

Maintaining links between metadata and images: How should libraries store images and keep them in sync with the metadata?  There may be rights issues from relying on a specific platform to maintain links between metadata and images. Where should thumbnails live?

Relating multiple views and versions of same object: Multiple versions of the same object taken over time can be very useful for forensics. For example, Stanford relied on dated photographs to determine when its “Gates of Hell” sculpture had been damaged. Brown University decided to describe a “blob” of. various images of the same thing in different formats and then have descriptions of the specific versions hanging off it. The systems used within the OCLC Research Library Partnership do not yet have a good way to structure and represent relationships among images, such as components of a piece.

Integrating collections from different sources: Stanford is considering ingesting images from a local art museum, many of which are images for a single object, so that scholars can study the object over time. They are wondering how to represent them in their discovery layer. MIT is trying to integrate metadata coming from different departments so that they can contribute to different aggregators, such as the DPLA.  All involved get together to make sure that there is a shared understanding. Contributing and having images live in an aggregated way present new challenges.

Yale’s largest image collection is the Kissinger papers, with about 2 million scanned images. For much of the collection, metadata is very scanty. Meetings among the collection owner, metadata specialist and systems staff try to resolve insufficient or questionable data and to come to a shared understanding. They store two copies of each image: TIFF (preservation and on request) and JPEG for everything else).

Managing relationships with faculty and curators: It’s important to ensure that faculty feel their needs are met. Collaboration is necessary among holders of the materials, metadata specialists and developers as all come from different perspectives.


Challenges of bringing together different images or versions of the same object in a large aggregation were explored by OCLC Research’s Europeana Innovation Pilots.  The pilots came up with a method for hierarchically structuring cultural objects at different similarity levels to find semantic clusters.


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (57)

DPLA: DPLA and Education: Findings and Recommendations from our Whiting Study

planet code4lib - Thu, 2015-04-09 17:09

During the last nine months, the Digital Public Library of America has been researching educational use with generous support from the Whiting Foundation. We’ve been learning from other online education resource providers in the cultural heritage world and beyond about what they have offered and how teachers and students have responded. We also convened focus groups of instructors in both K-12 and undergraduate higher education to hear about what they use and would like to see from DPLA. Today, we are proud to release our research findings and recommendations for DPLA’s future education work.

Preliminary feedback from educators suggested that DPLA was exciting as a single place to find content but occasionally overwhelming because of the volume of accessible material.  In our focus groups, we learned that educators are eager to incorporate primary sources into instructional activities and independent student research projects, but we can better help them by organizing source highlights topically and giving them robust context. We also discovered how important it was to educators and students to be able to create their own primary-sourced based projects with tools supported by DPLA. From other education projects, including many supported by our Hubs, we learned that sustainable education projects require teacher involvement, deep standards research, and specific outreach strategies. Based on this research, we recommend that DPLA and its teacher advocates build curated content for student use with teacher guidance, and that DPLA use its position at the center of a diverse mix of cultural heritage institutions to continue to facilitate conversations about educational use. We see this report as the beginning of a process of working with our many partners and educators to make large digital collections like DPLA more useful and used.

Péter Király: Seminar Programme: Göttingen Dialog in Digital Humanities (2015)

planet code4lib - Thu, 2015-04-09 17:02

Seminar Programme: Göttingen Dialog in Digital Humanities (2015)

The dialogs take place on Tuesdays at 17:00 during the Summer semester (from April 21th until July 14th). The venue of the seminars is to be announced, at the Göttingen Centre for Digital Humanities (GCDH). The centre's address is: Heyne-Haus, Papendiek 16, D-37073 Göttingen.

April 21
Yuri Bizzoni, Angelo Del Grosso, Marianne Reboul (University of Pisa, Italy)
Diachronic trends in Homeric translations

April 28
Stefan Jänicke, Judith Blumenstein, Michaela Rücker, Dirk Zeckzer, Gerik Scheuermann (Universität Leipzig, Germany)
Visualizing the Results of Search Queries on Ancient Text Corpora with Tag Pies

May 5
Jochen Tiepmar (Universität Leipzig, Germany)
Release of the MySQL based implementation of the CTS protocol

May 12
Patrick Jähnichen, Patrick Oesterling, Tom Liebmann, Christoph Kurras, Gerik Scheuermann, Gerhard Heyer (Universität Leipzig, Germany)
Exploratory Search Through Visual Analysis of Topic Models

May 19
Christof Schöch (Universität Würzburg, Germany)
Topic Modeling Dramatic Genre

May 26
Peter Robinson (University of Saskatchewan, Canada)
Some principles for making of collaborative scholarly editions in digital form

June 2
Jürgen Enge, Heinz Werner Kramski, Susanne Holl (HAWK Hildesheim, Germany)
»Arme Nachlassverwalter...« Herausforderungen, Erkenntnisse und Lösungsansätze bei der Aufbereitung komplexer digitaler Datensammlungen

June 9
Daniele Salvoldi (Freie Universität Berlin, Germany)
A Historical Geographic Information System (HGIS) of Nubia based on the William J. Bankes Archive (1815-1822)

June 16
Daniel Burckhardt (HU Berlin, Germany)
Comparing Disciplinary Patterns: Gender and Social Networks in the Humanities through the Lens of Scholarly Communication

June 23
Daniel Schüller, Christian Beecks, Marwan Hassani, Jennifer Hinnell, Bela Brenger, Thomas Seidl, Irene Mittelberg (RWTH Aachen University, Germany, University of Alberta, Canada)
Similarity Measuring in 3D Motion Capture Models of Co-Speech Gesture

June 30
Federico Nanni (University of Bologna, Italy)
Reconstructing a website’s lost past - Methodological issues concerning the history of

July 7
Edward Larkey (University of Maryland, USA)
Comparing Television Formats: Using Digital Tools for Cross-Cultural Analysis

July 14
Francesca Frontini, Amine Boukhaled, Jean-Gabriel Ganascia (Laboratoire d’Informatique de Paris 6, Université Pierre et Marie Curie)
Mining for characterising patterns in literature using correspondence analysis: an experiment on French novels

As announced in the Call For Papers, the dialogs will take the form of a 45 minute presentation in English, followed by 45 minutes of discussion and student participation. Due to logistic and time constraints, the 2015 dialog series will not be video-recorded or live-streamed. A summary of the talks, together with photographs and, where available, slides, will be uploaded to the GCDH/eTRAP. For this reason, presenters are encouraged, but not obligated, to prepare slides to accompany their papers. Please also consider that the €500 award for best paper will be awarded on the basis of both the quality of the paper *and* the delivery of the presentation.

Camera-ready versions of the papers must reach Gabriele Kraft at gkraft(at)gcdh(dot)de by April 30.

The papers will not be uploaded to the GCDH/eTRAP website but, as previously announced, published as a special issue of Digital Humanities Quarterly (DHQ). For this reason, papers must be submitted in an editable format (e.g. .docx or LaTeX), not as PDF files.

A small budget for travel cost reimbursements is available.

Everybody is welcome to join in.

If anyone would like to tweet about the dialogs, the Twitter hashtag of this series is #gddh15.

For any questions, do not hesitate to contact gkraft(at)gcdh(dot)de. For further information and updates, visit or

We look forward to seeing you in Göttingen!

The GDDH Board (in alphabetical order):
Camilla Di Biase-Dyson (Georg August University Göttingen)
Marco Büchler (Göttingen Centre for Digital Humanities)
Jens Dierkes (Göttingen eResearch Alliance)
Emily Franzini (Göttingen Centre for Digital Humanities)
Greta Franzini (Göttingen Centre for Digital Humanities)
Angelo Mario Del Grosso (ILC-CNR, Pisa, Italy)
Berenike Herrmann (Georg August University Göttingen)
Péter Király (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)
Gabriele Kraft (Göttingen Centre for Digital Humanities)
Bärbel Kröger (Göttingen Academy of Sciences and Humanities)
Maria Moritz (Göttingen Centre for Digital Humanities)
Sarah Bowen Savant (Aga Khan University, London, UK)
Oliver Schmitt (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)
Sree Ganesh Thotempudi (Göttingen Centre for Digital Humanities)
Jörg Wettlaufer (Göttingen Centre for Digital Humanities & Göttingen Academy of Sciences and Humanities)
Ulrike Wuttke (Göttingen Academy of Sciences and Humanities)

This event is financially supported by the German Ministry of Education and Research (No. 01UG1509).

LITA: Job Opening: LITA Executive Director

planet code4lib - Thu, 2015-04-09 16:00

The Library and Information Technology Association (LITA), a division of the American Library Association, seeks a dynamic, entrepreneurial, forward-thinking Executive Director.

This is a fulfilling and challenging job that affords national impact on library technologists. As the successful candidate, you will be not only organized, financially savvy, and responsive, but also comfortable with technological change, project management, community management, and organizational change.

Interested in applying? For a full description and requirements, visit

Search Timeline

We will advertise for the position in April, conduct phone interviews in early May, and conduct in-person interviews with the top candidates at ALA Headquarters in Chicago, mid to late May.

Ideally, the candidate would start in June (perhaps just before ALA Annual Conference), and there would be a one-month overlap with current Executive Director Mary Taylor, who retires July 31.

Search Committee

  • Mary Ghikas, ALA Senior Associate Executive Director
  • Dan Hoppe, ALA Director of Human Resources
  • Keri Cascio, ALCTS Executive Director
  • Rachel Vacek, LITA President
  • Thomas Dowling, LITA Vice-President
  • Andromeda Yelton, LITA Director-at-Large
  • Isabel Gonzalez-Smith, LITA Emerging Leader

Library of Congress: The Signal: Mapping Words: Lessons Learned From a Decade of Exploring the Geography of Text

planet code4lib - Thu, 2015-04-09 13:34

The following is a guest post by Kalev Hannes Leetaru, Senior Fellow, George Washington University Center for Cyber & Homeland Security.

It is hard to imagine our world today without maps. Though not the first online mapping platform, the debut of Google Maps a decade ago profoundly reshaped the role of maps in everyday life, popularizing the concept of organizing information in space. When Flickr unveiled image geotagging in 2006, more than 1.2 million photos were geotagged in the first 24 hours. In August 2009, with the launch of geotagged tweets, Twitter announced that organizing posts according to location would usher in a new era of spatial serendipity, allowing users to “switch from reading the tweets of accounts you follow to reading tweets from anyone in your neighborhood or city–whether you follow them or not.”

As more and more of the world’s citizen generated information becomes natively geotagged, we increasingly think of information as being created in space and referring to space, using geography to map conversation, target information, and even understand global communicative patterns. Yet, despite the immense power of geotagging, the vast majority of the world’s information does not have native geographic metadata, especially the vast historical archives of text held by libraries. It is not that libraries do not contain spatial information, it is that their rich descriptions of location are expressed in words rather than precise mappable latitude/longitude coordinates. A geotagged tweet can be directly placed on a map, while a textual mention of “a park in Champaign, USA” in a digitized nineteenth century book requires highly specialized “fulltext geocoding” algorithms to identify, disambiguate (determine whether the mention is of Champaign, Illinois or Champaign, Ohio and which park is referred to) and convert textual descriptions of location into mappable geographic coordinates.

Building robust algorithms capable of recognizing mentions of an obscure hilltop or a small rural village anywhere on Earth requires a mixture of state-of-the-art software algorithms and artistic handling of the enormous complexities and nuances of how humans express space in writing. This is made even more difficult by assumptions of shared locality made by content like news media, the mixture of textual and visual locative cues in television, and the inherent transcription error of sources like OCR and closed captioning.

Recognizing location across languages is especially problematic. The majority of textual location mentions on Twitter are in English regardless of the language of the tweet itself. On the other hand, mapping the geography of the world’s news media across 65 languages requires multilingual geocoding that takes into account the enormous complexity of the world’s languages. For example, the extensive noun declension of Estonian means that identifying mentions of “New York” requires recognizing “New York”, “New Yorki” , “New Yorgi”, “New Yorgisse”, “New Yorgis”, “New Yorgist”, “New Yorgile”, “New Yorgil”, “New Yorgilt”, “New Yorgiks”, “New Yorgini”, “New Yorgina”, “New Yorgita”, and “New Yorgiga”. Multiplied by over 10 million recognized locations on Earth across 65 languages and one can imagine the difficulties of recognizing textual geography.

For the past decade much of my work has centered on this intersection of location and information across languages and modalities, exploring the geography of massive textual archives through the twin lenses of the locations they describe and the impact of location on the production and consumption of information. A particular emphasis of my work has lain in expanding the study of textual geography to new modalities and transitioning the field from small human studies to at-scale computational explorations.

Over the past five years my studies have included the first large-scale explorations of the textual geography of news media, social media, Wikipedia, television, academic literature, and the open web, as well as the first large-scale comparisons of geotagging versus textual description of location in citizen media and the largest work on multilingual geocoding. The remainder of this blog post will share many of the lessons I have learned from these projects and the implications and promise they hold for the future of making the geography of library holdings more broadly available in spatial form.

Figure 1 – Locations of news outlets linked by the Drudge Report 2002-2008.

In the early 2000’s while an undergraduate student at the National Center for Supercomputing Applications I launched an early open cloud geocoding and GIS platform that provided a range of geospatial services through a simple web interface and cloud API. The intense interest in the platform and the incredible variety of applications that users found for the geocoding API foreshadowed the amazing creativity of the open data community in mashing up geographic APIs and data. Over the following several years I undertook numerous small-scale studies of textual geography to explore how such information could be extracted and utilized to better understand various kinds of information behavior.

Some of my early papers include a 2005 study of the geographic focus and ownership of news and websites covering climate change and carbon sequestration (PDF) that demonstrated the importance of the dual role of the geography of content and consumer. In 2006 I co-launched a service that enabled spatial search of US Government funding opportunities (PDF), including alerts of new opportunities relating to specific locations. This reinforced the importance of location in information relevance: a contract to install fire suppression sprinklers in Omaha, Nebraska is likely of little interest to a small business in Miami, Florida, yet traditional keyword search does contemplate the concept of spatial relevance.

Similarly, in 2009 I explored the impact of a news outlet’s physical location on the Drudge Report’s sourcing behavior and in 2010 examined the impact of a university’s physical location on its national news stature. These twin studies, examining the impact of physical location on news outlets and on newsmakers, emphasized the highly complex role that geography plays in mediating information access, availability, and relevance.

Figure 2 – Network of locations that most frequently co-occur with each other in coverage of Osama Bin Laden – center point is 200km from where he was ultimately found.

In Fall 2011 I published the first of what has become a half-decade series of studies expanding the application of textual geography to ever-larger and more diverse collections of material. Published in September 2011, Culturomics 2.0 was the first large-scale study of the geography of the world’s news media, identifying all mentions of location across more than 100 million news articles stretching across half a century.

A key finding was the centrality of geography to journalism: on average a location is mentioned every 200-300 words in a typical news article and this has held relatively constant for over 60 years. Another finding was that mapping the locations most closely associated with a public figure (in this case Osama Bin Laden) offers a strong estimate of that person’s actual location, while the network structure of which locations more frequently co-occur with each other yields powerful insights into perceptions of cultural and societal boundaries.

Figure 3 – Map of countries which are most commonly mentioned together in global news coverage – countries with the same color are more frequently mentioned with other countries of that color than with countries of a different color.

The following Spring I collaborated with supercomputer vendor SGI to conduct the first holistic exploration of the textual geography of Wikipedia. Wikipedia allows contributors to include precise latitude/longitude coordinates in articles, but because such coordinates must be manually entered in specialized code, just 4% of English-language articles had a single entry as of 2012, totaling just 1.1 million coordinates, primarily centered in the US and Western Europe. In contrast, 59% of English-language articles had at least one textual mention of a recognized location, totaling more than 80.7 million mentions of 2.8 million distinct places on Earth.

In essence, the majority of contributors to Wikipedia appear more comfortable writing the word “London” in an article than looking up its centroid latitude/longitude and entering it in specialized code. This has significant implications for how libraries leverage volunteer citizen geocoding efforts in their collections.

Figure 4 – Interactive Google Earth interface to Wikipedia’s coverage of Libya.

To explore how such information could be used to provide spatial search for large textual collections, a prototype Google Earth visualization was built to search Wikipedia’s coverage of Libya. A user could select a specific time period and instantly access a map of every location in Libya mentioned across all of Wikipedia with respect to that time period.

Finally, a YouTube video was created that visualizes world history 1800-2012 through the eyes of Wikipedia by combining the 80 million textual location mentions in the English Wikipedia with the 40 million date references to show which locations were mentioned together in an article with respect to a given year. Links were color-coded red for connections with a negative tone (usually indicating physical conflict like war) or green for connections with a positive tone.

Figure 5 – Video of world history 1800-2012 through the eyes of Wikipedia – links indicate locations mentioned together in an article with respect to the given year (red=negative tone, green=positive tone).

That Fall I collaborated with SGI once again, along with social media data vendor GNIP and the University of Illinois Geospatial Information Laboratory to produce the first detailed exploration of the geography of social media, which helped popularize the geocoding and mapping of Twitter. This project produced the first live map of a natural disaster, as well as the first live map of a presidential election.

At the time, few concrete details were available regarding Twitter’s geographic footprint and the majority of social media maps focused on the small percentage of natively geotagged tweets. Twitter offered a unique opportunity to compare textual and sensor-based geographies in that 2% of tweets are geotagged with precise GPS or cellular triangulation coordinates. Coupled with the very high correlation of electricity and geotagged tweets, this offers a unique ground truth of the actual confirmed location of Twitter users to compare against different approaches to geocoding textual location cues in estimating the location of the other 98% of tweets that often have textual information about location.

A key finding was that two-thirds of those 2% of tweets that are geotagged were sent by just 1% of all users, meaning that geotagged information on Twitter is extremely skewed. Another finding was that across the world location is primarily expressed in English regardless of the language that a user tweets in and that 34% of tweets have recoverable high-resolution textual locations. From a communicative standpoint, it turns out that half of tweets are about local events and half of tweets are directed at physically nearby users versus tweeting about global events or users elsewhere in the world, suggesting that geographic proximity plays only a minor role in communication patterns on broadcast media like Twitter.

Figure 6 – Animated heatmap of tweets relating to Hurricane Sandy.

A common pattern that emerges across both Wikipedia and Twitter is that even when native geotagging is available, the vast majority of location metadata resides in textual descriptions rather than precise GIS-friendly numeric coordinates. This is the case even when geotagging is made transparent and automatic through GPS tagging on mobile devices.

In Spring 2013 I launched the GDELT Project, which extends my earlier work on the geography of the news media by offering a live metadata firehose geocoding global news media on a daily basis. That Fall I collaborated with Roger Macdonald and the Internet Archive’s Television News Archive to create the first large-scale map of the geography of television news. More than 400,000 hours of closed captioning of American television news totaling over 2.7 billion words was geocoded to produce an animated daily map of the geographic focus of television news from 2009-2013.

Figure 7 – Animated map of locations mentioned in American television news 2009.

Closed captioning text proved to be extremely difficult to geocode. Captioning streams are in entirely uppercase letters, riddled with errors like “in two Paris of shoes” and long sequences of gibberish characters, and in some cases have a total absence of punctuation or other boundaries.

This required extensive adaptation of the geocoding algorithms to tolerate an enormous diversity of typographical errors more pathological in nature than those found in OCR’d content – approaches that were later used in creating the first live emotion-controlled television show for NBCUniversal’s Syfy channel. Newscasts also frequently rely on visual on-screen cues such as maps or text overlays for location references, and by their nature incorporate a rapid-fire sequence of highly diverse locations mentioned just sentences apart from each other, making the disambiguation process extremely complex.

Figure 8 – Heatmap of the locations most commonly mentioned in US Government publications, academic literature, and the global news media 1979-2014.

In Fall 2014 I collaborated with the US Army to create the first large-scale map of the geography of academic literature and the open web, geocoding more than 21 billion words of academic literature spanning the entire contents of JSTOR, DTIC, CORE, CiteSeerX, and the Internet Archive’s 1.6 billion PDFs relating to Africa and the Middle East, as well as a second project creating the first large-scale map of human rights reports. A key focus of this project was the ability to infuse geographic search into academic literature, enabling searches like “find the five most-cited experts who publish on water conflicts with the Nuers in this area of South Sudan” and thematic maps such as a heatmap of the locations most closely associated with food insecurity.

Figure 9 – Map of global protest and conflict activity drawn from the GDELT Project displayed on the NOAA Science on a Sphere.

As of spring 2015 the GDELT Project now maps the geography of an ever-growing cross-section of the global news media in realtime across 65 languages. Every 15 minutes it machine translates all global news coverage it monitors in 65 languages from Afrikaans and Albanian to Urdu and Vietnamese and applies the world’s largest multilingual geocoding system to identify all mentions of location anywhere in the world, from a capital city to a remote hilltop. Over the past several years, GDELT’s mass realtime geocoding of the world’s news media has popularized the use of large-scale automated geocoding, with disciplines from political science to journalism now experimenting with the technology and GDELT’s geocoding capabilities now lie at the heart of numerous initiatives from cataloging disaster coverage for the United Nations to mapping global conflict with the US Institute of Peace to modeling the patterns of world history.

Most recently, a forthcoming collaboration with cloud mapping platform CartoDB will enable ordinary citizens and journalists to create live interactive maps of the ideas, topics, and narratives pulsing through the global news media using GDELT. The example map below shows the geographic focus of Spanish (green), French (red), Arabic (yellow) and Chinese (blue) news media for a one hour period from 8-9AM EST on April 1, 2015, placing a colored dot at every location mentioned in the news media of each language. Ordinarily, mapping the geography of language would be an enormous technical endeavor, but by combining GDELT’s mass multilingual geocoding with CartoDB’s interactive mapping, even a non-technical user can create a map in a matter of minutes. This is a powerful example of what will become possible as libraries increasingly expose the spatial dimension of their collections in data formats that allow them to be integrated into popular mapping platforms. Imagine an amateur historian combining digitized georeferenced historical maps and geocoded nineteenth newspaper articles with modern census data to explore how a region has changed over time – these kinds of mashups would be commonplace if the vast archives of libraries were made available in spatial form.

Figure 10 – Geographic focus of world’s news media by language 8-9AM EST on April 1, 2015 (Green = locations mentioned in Spanish media, Red = French media, Yellow = Arabic media, Blue = Chinese media).

In short, as we begin to peer into the textual holdings of our world’s libraries using massive-scale data mining algorithms like fulltext geocoding, we are for the first time able to look across our collective informational heritage to see macro-level global patterns never before visible. Geography offers a fundamental new lens through which to understand and observe those new insights, and as libraries increasingly geocode their holdings and make that material available in standard geographic open data formats, they will enable an entirely new era where libraries become conveners of information and innovation that empower a new era of access and understanding of our world.

Open Knowledge Foundation: Document Freedom Day in Kathmandu, Nepal

planet code4lib - Thu, 2015-04-09 12:23

On 2015’s Document Freedom Day, Open Knowledge Nepal organized a seminar on Open Standards at CLASS Nepal at Maitighar, Kathmandu.

We intended to pitch openness to a new audience in Nepal and help them learn documentation skills. As we could not hope to teach documentation and spreadsheets in less than a day, we utilized the cohort to teach them small bits of information and skills that they could take home and gather information about their current knowledge and pertinent needs so as to help ourselves plan future events and trainings.

The targeted audience were office bearers and representatives of labor unions in many private and government organizations in Nepal. We also invited some students of Computer Science and Information Technology (CSIT). Few of the students are core members of Open Knowledge Nepal team and have also represented us in Open Data Day 2015, Kathmandu. We invited the students to let them know about the audience they will have to work with, in days to come.

It was a lazy March afternoon in Kathmandu and participants were slowly turning in from around 2 pm. Organizers and the students had already begun with chitchats on open, tech, football and other stuffs while waiting for enough participants to begin the event formally. Participants kept coming in ones and twos until the hall was up to its limit (35+) and we started formally just after 3:00 PM (NST).

The event was started by Mr. Durga of CLASS Nepal by welcoming all participants and introducing CLASS Nepal to the participants. He then invited Mr. Lekhnath Pokhrel, representative of UNI-Global Union in the event. He requested all participants to take full advantage of seminar and announced they will be organizing useful events in coming future too. Nikesh Balami, our active member and Open Government lead followed with his presentation on “Open Knowledge, Open Data, Open Standards, and Open Formats.” He started by gathering information about participants’ organizational backgrounds. This lightened the settings as everybody opened up to each other. NIkesh introduced Open Knowledge Nepal and our activities to the hall (see the slides).

Kshitiz Khanal, Open Access lead at Open Knowledge Nepal went next. This session was intended to be an open discussion and skill dissemination on documentation and spreadsheet basics. We started by asking everybody to share their experience, set of skills and the skills they would like to learn in the event.

We were in for a surprise. While we had prepared to teach them pivot tables, our audience were interested to learn more basic skills. Most of our audience were familiar with documentation packages like Microsoft Word, some were using spreadsheets in work, and most of them had to use slides to present their work. We paired our students with our target audience so that one can teach other. Based on the requests, we decided to teach basic spreadsheet actions like sorting and filtering data, performing basic mathematical operations.

We also explained basic presentation philosophy like use pictures in place of words whenever possible, using as less words as possible, and when we do – making them big, rehearsing before presenting. These sound like obvious but these are not commonplace yet because these were not taught anywhere as a part of curriculum to our audience. This was well received. We also had a strange request – how to attach a sound recording in email. We decided to teach how to use google drive. We demonstrated how google drive can be used to store documents and the links can be used to send any type of files by email.

There were few female participants as well. This was a good turnout when compared to most of our and other tech / open events in Kathmandu with nil female participation. One of our female participant said that while she wants to learn more skills, she doesn’t have time to learn at home while taking care of her children, and at office she mostly has her hands full with work.
Most of the work in many offices is documentation, and this day and age makes strong documentation skills almost mandatory. While having freedom in the sense of document freedom entails having access to proper tools, it also necessitates having the proper set of skills to use the tools.

We learned lessons in the status and interest of people like our audience and the level of skill that we need to begin with while preparing modules for other similar events.

See the photo stream here and find further detailed account here on the Open Knowledge Nepal blog.

Open Library Data Additions: Amazon Crawl: part fi

planet code4lib - Wed, 2015-04-08 23:04

Part fi of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

Jonathan Rochkind: simple config for faster travis ruby builds

planet code4lib - Wed, 2015-04-08 20:08

There are a few simple things you can configure in your .travis.yml to make your travis builds faster for ruby builds. They are oddly under-documented by travis in my opinion, so I’m noting them there.


Odds are your ruby/rails app uses nokogiri. (all Rails 4.2 apps do, as nokogiri has become a rails dependency in 4.2)  Some time ago (in the past year I think?) nokogiri releases switched to building libxml and libxslt from source when you install the gem.

This takes a while. On various machines I’ve seen 30 seconds, two minutes, 5 minutes.  I’m not sure how long it usually takes on travis, as travis logs don’t seem to give timing for this sort of thing, but I know I’ve been looking at the travis live web console and seen it paused on “installing nokogiri” for a while.

But you can tell nokogiri to use already-installed libxml/libxslt system libraries if you know the system already has compatible versions installed — which travis seems to — with the ENV variable `NOKOGIRI_USE_SYSTEM_LIBRARIES=true`.  Although I can’t seem to find that documented anywhere by nokogiri, it’s the word on the street, and seems to be so.

You can set such in your .travis.yml thusly:

env: global: - NOKOGIRI_USE_SYSTEM_LIBRARIES=true Use the new Travis architecture

Travis introduced a new architecture on their end using Docker, which is mostly invisible to you as a travis user.  But the architecture is, at the moment, opt-in, at least for existing projects. 

Travis plans to eventually start moving over even existing projects to the new architecture by default. You will still be able to opt-out, which you’d do mainly if your travis VM setup needed “sudo”, which you don’t have access to in the new architecture.

But in the meantime, what we want is to opt-in to the new architecture, even on an existing project. You can do that simply by adding:

sudo: false

To your .travis.yml.

Why do we care?  Well, travis suggests that the new architecture “promises short to nonexistent queue wait times, as well as offering better performance for most use cases.” But even more importantly for us, it lets you do bundler caching too…

Bundler caching

If you’re like me, a significant portion of your travis build time is just installing all those gems. On your personal dev box, you have gems you already installed, and when they’re listed in your Gemfile.lock they just get used, the bundler/rubygems doens’t need to go reinstalling them every time.

But the travis environment normally starts with a clean slate on every build, so every build it has to go reinstalling all your gems from your Gemfile.lock.

Aha, but travis has introduced a caching feature that can cache installed gems.  At first this feature was only available for paid private repos, but now it’s available for free open source repos if you are using the new travis architecture (above).

For most cases, simply add this to your .travis.yml:

cache: bundler

There can be complexities in your environment which require more complex setup to get bundler caching to work, see the travis docs.

Happy travis’ing

The existence of travis offering free CI builds to open source software, and with such a well-designed platform, has seriously helped open source software quality/reliability increase in leaps and bounds. I think it’s one of the things that has allowed the ruby community to deal with fairly quickly changing ruby versions, that you can CI on every commit, on multiple ruby versions even.

I love travis.

It’s odd to me that they don’t highlight some of these settings in their docs better. In general, I think travis docs have been having trouble keeping up with travis changes — travis docs are quite good as far as being written well, but seem to sometimes be missing key information, or including not quite complete or right information for current travis behavior. I can’t even imagine how much AWS CPU time all those libxml/libxslt compilations on every single travis build are costing them!  I guess they’re working on turning on bundler caching by default, which will significantly reduce the number of times nokogiri gets built, once they do.

Filed under: General

OCLC Dev Network: Web Services Maintenance April 17

planet code4lib - Wed, 2015-04-08 18:30

All Web services that require user level authentication will be unavailable during the installation window, which is between 2:00 – 4:00 AM local time, Friday April 17th. 

HangingTogether: Bracket competition: And the winner is …

planet code4lib - Wed, 2015-04-08 18:21

OCLC Research Collective Collections Tournament


Thanks to everyone who entered the 2015 OCLC Research Collective Collections Tournament Bracket Competition! A quick re-cap of the rules: all entrants picked a conference. If no one chose the winning conference, then a random drawing would be held among all entrants to determine the winner of the prize. Well, that’s where we’re at! No one picked Atlantic 10 to prevail, so everyone gets another chance to win!

A random drawing was held this morning in the Tournament offices (well, here in OCLC Research). The winner of the 2015 OCLC Research Collective Collections Tournament Bracket Competition is …

Carol Diedrichs!

Carol wins a $100 Visa Gift Card, along with the right to call herself Bracket Competition Champion! Congratulations! And thanks to all of our Bracket Competition participants for playing.

We hope you enjoyed the Collective Collections Tournament! Keep up to date with OCLC Research as we continue to use the concept of collective collections to explore a wide range of library topics.


More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

Round of 16: The plot thickens … and so do the books

Round of 8: Peaches and Pumpkins

The Semi-Finals

Champion Revealed! Real-ly!

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (14)

LITA: LITA Lightning Rounds at 2015 ALA Annual

planet code4lib - Wed, 2015-04-08 18:10

Will you be at the American Library Association Conference in San Francisco this June? Do you have a great new technology idea that you’d like to share informally with colleagues? How about a story related to a clever tech project that you just pulled off at your institution, successfully, or less-than-successfully?

The LITA Program Planning Committee (PPC) is now accepting proposals for a round of Lightning Talks to be given at ALA.

To submit your idea please fill out this form:

The lightning rounds will be Saturday June 27, 10:30 – 11:30

All presenters will be given 5 minutes to speak.

Proposals are due Monday, May 4 at midnight. Questions? Please contact PPC chair, Debra Shapiro,


Open Knowledge Foundation: Community building through the DM2E project

planet code4lib - Wed, 2015-04-08 16:58

During the past three years, Open Knowledge has been leading the community building work in the Digitised Manuscripts to Europeana (DM2E) project, a European research project in the area of Digital Humanities led by Humboldt University. Open Knowledge activities included the organisation of a series of events such as Open Data in Cultural Heritage workshops, running two rounds of the Open Humanities Awards and the establishment of OpenGLAM as an active volunteer-led community pushing for increased openness in cultural heritage.

DM2E and the Linked Open Web

As one of its core aims, the DM2E project worked on enabling libraries and archives to easily upload their digitised material into Europeana – the online portal that provides access to millions of items from a range of Europe’s leading galleries, libraries, archives and museums. In total, over 20 million manuscript pages from libraries, archives and research institutions were added during the three years of the project. In line with the Europeana Data Exchange Agreement, all contributing institutions agreed to make their metadata openly available under the Creative Commons Public Domain Dedication license (CC-0), which allows for easier reuse.

Since different providers make their data available in different formats, the DM2E consortium developed a toolset that converted metadata from a diverse range of formats into the DM2E model, an application profile of the Europeana Data Model (EDM). The developed software also allows the contextualisation and linking of this cultural heritage data sets, which makes this material suitable for use within the Linked Open Web. An example of this is the Pundit tool, which Net7 developed to enable researchers to add annotations in a digital text and link them to related texts or other resources on the net (read more).

Open Knowledge achievements

Open Knowledge was responsible for the community building and dissemination work within DM2E, which, apart from promoting and documenting the project results for a wide audience, focused on promoting and raising awareness around the importance of open cultural data. The presentation below sums up the achievements made during the project period, including the establishment of OpenGLAM as a community, the organisation of the event series and the Open Humanities Awards, next to the extensive project documentation and dissemination through various channels.

DM2E community building from Digitised Manuscripts to Europeana OpenGLAM

In order to realise the value of the tools developed in DM2E, as well as to truly integrate the digitised manuscripts into the Linked Data Web, there need to be enough other open resources to connect to and an active community of cultural heritage professionals and developers willing to extend and re-use the work undertaken as part of DM2E. That is why Open Knowledge set up the OpenGLAM community: a global network of people and organisations who are working to open up cultural content and data. OpenGLAM focuses on promoting and furthering free and open access to digital cultural heritage by maintaining an overview of Open Collections, providing documentation on the process and benefits of opening up cultural data, publishing regular news and blog items and organising diverse events.

Since the start in 2012, OpenGLAM has grown into a large, global, active volunteer-led community (and one of the most prominent Open Knowledge working groups to date), supported by a network of organisations such as Europeana, the Digital Public Library of America, Creative Commons and Wikimedia. Apart from the wider community taking part in the OpenGLAM discussion list, there is a focused Working Group of 17 open cultural data activists from all over the world, a high-level Advisory Board providing strategic guidance and four local groups that coordinate OpenGLAM-related activities in their specific countries. Following the end of the DM2E project, the OpenGLAM community will continue to push for openness in digital cultural heritage.

Open Humanities Awards

As part of the community building efforts, Open Knowledge set up a dedicated contest awards series focused on supporting innovative projects that use open data, open content or open source tools to further teaching and research in the humanities: the Open Humanities Awards. During the two competition rounds that took place between 2013-2014, over 70 applications were received, and 5 winning projects were executed as a result, ranging from an open source Web application which allows people to annotate digitized historical maps (Maphub) to an improved search application for Wittgenstein’s digitised manuscripts (Finderapp WITTfind). Winners published their results on a regular basis through the DM2E blog and presented their findings at conferences in the field, proving that the awards served as a great way to stimulate innovative digital humanities research using open data and content. Details on all winning projects, as well as final reports on their results, are available from this final report.

DM2E event series

Over the course of the project, Open Knowledge organised a total of 18 workshops, focused on promoting best practices in legal and technical aspects of opening up metadata and cultural heritage content, providing demonstration and training with the tools and platforms developed in the project and hackdays and coding sprints. Highlights included the Web as Literature conference at the British Library in 2013, the Open Humanities Hack series and the Open Data in Cultural Heritage workshops, as a result of which several local OpenGLAM groups were started up. A full list of events and their outcomes is available from this final report.

Open Data in Cultural Heritage Workshop: Starting the OpenGLAM group for Germany (15 July 2014, Berlin)

It has been a great experience being part of the DM2E consortium: following the project end, the OpenGLAM community will be sustained and build upon, so that we can realise a world in which our shared cultural heritage is open to all regardless of their background, where people are no longer passive consumers of cultural content created by an elite, but contribute, participate, create and share.

More information


Subscribe to code4lib aggregator