You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 1 hour 37 min ago

District Dispatch: IMLS awards libraries National Medals

Thu, 2015-05-21 16:26

(Left to right) Cecil County Public Library Director Denise Davis, Cecil County Community Member Thomas Cousar and Michelle Obama with National Medal.

Earlier this week, First Lady Michelle Obama joined the Institute of Museum and Library Services (IMLS) Acting Director Maura Marx to present the 2015 National Medal for Museum and Library Service to ten exemplary libraries and museums National Medals for their service to their communities. Now in its 21st year, the National Medal is the nation’s highest honor conferred on libraries and museums, and celebrates institutions that make a difference for individuals, families, and communities.

National Medal recipients include:

(Left to right) Erica Jesonis, Chief Librarian for Information Technology; Morgan Miller, Assistant Director for Public Service; U.S. Rep. Andy Harris (R-MD); Denise Davis, Cecil County Public Library Director, Frazier Walker, Community Relations Specialist.

During the event, First Lady Michelle Obama said to the recipients: “The services that you all provide are not luxuries. Just the opposite. Every day your institutions are keeping so many folks in this country from falling through the cracks. In many communities our libraries and museums are the places that help young people dream bigger and reach higher for their futures, the places that help new immigrants learn English and apply for citizenship…the places where folks can access a computer and send out a job application so they can get back to work and get back to the important process of supporting their families.”

Denise Davis, director of the Cecil County Public Library in Elkton, Md., spoke about receiving the prestigious recognition:

Public libraries have a powerful role in creating opportunities by keeping the doors to knowledge open, allowing creativity to flourish, and never letting barriers become insurmountable.

The next deadline for nominating a library or museum is October 1, 2015. Learn more about the National Medal at

The post IMLS awards libraries National Medals appeared first on District Dispatch.

Library of Congress: The Signal: The K-12 Web Archiving Program: Preserving the Web from a Youthful Point of View

Thu, 2015-05-21 15:00

This article is being co-published on the Teaching With the Library of Congress blog and was written by Butch Lazorchak and Cheryl Lederle.

If you believe the Web (and who doesn’t believe everything they read on the Web?), it boastfully celebrated its 25th birthday last year. Twenty-five years is long enough for the first “children of the Web” to be fully-grown adults, just now coming of age to recognize that the Web that grew up around them has irrevocably changed.

In this particular instance, change is good. It’s only by becoming aware of what we’re losing (or have already lost) that we’ll be spurred to action to preserve it. We’ve been aware of the value of the historic web for a number of years here at the Library of Congress, and we’ve worked hard to understand how to capture the Web through the Library’s Web Archiving program and the work we’ve done with partners at the Memento project and through the International Internet Preservation Consortium.

K-12 Web Archiving Program.

But let’s go back to those “children of the Web.” Nostalgia is a powerful driver for preservation, but most preservation efforts are driven by full-grown adults. If they’re able to bring a child’s perspective to their work it’s only through the prism of their own memory, and in any event, the nostalgic items they may wish to capture may not be around anymore by the time they get to them. What’s needed is not just a nostalgic memory of the web, but efforts to curate and capture the web with a perspective that includes the interests of the young. And who better to represent the interests of the young than children and teenagers themselves! Luckily the Library of Congress has such a program: the K-12 web archiving program.

The K-12 Web Archiving program has been operating since 2008, engaging dozens of schools and hundreds of students from schools, large and small, from across the U.S. in understanding what the Web means to them, and why it’s important to capture it. In partnership with the Internet Archive, the program enables schools to set up their own web capture tools and choose sets of web resources to collect; resources that represent the full range of youthful experience, including popular culture, commerce, news, entertainment and more.

Cheryl Lederle, an Educational Resource Specialist at the Library of Congress, notes that the program builds student awareness of the internet as a primary source as well as how quickly it can change. The program might best be understood through the reflections of participating teachers:

  • “The students gained an understanding of how history is understood through the primary sources that are preserved and therefore the importance of the selection process for what we are digitally preserving. But, I think the biggest gain was their personal investment in preserving their own history for future generations. The students were excited and fully engaged by being a part of the K-12 archiving program and that their choices were being preserved for their own children someday to view.” – MaryJane Cochrane, Paul VI Catholic High School
  • “The project introduced my students to historical thinking; awareness of digital data as a primary source and documentation of current events and popular culture; and helped foster an appreciation and awareness of libraries and historical archives.” – Patricia Carlton, Mount Dora High School

And participating students:

  • “Before this project, I was under the impression that whatever was posted on the Internet was permanent. But now, I realize that information posted on the Internet is always changing and evolving.”
  • “I find it very interesting that you can look back on old websites and see how technology has progressed. I want to look back on the sites we posted in the future to see how things have changed.”
  • “I was surprised by the fact that people from the next generation will also share the information that I have collected.”
  • “They’re really going to listen to us and let us choose sites to save? We’re eight!”

Collections from 2008-2014 are available for study on the K-12 Web Archiving site, and the current school year will be added soon. Students examining these collections might:

  • Compare one school’s collections from different years.
  • Compare collections preserved by students of different grade levels in the same year.
  • Compare collections by students of the same grade level, but from different locations.
  • Create a list of Web sites they think should be preserved and organize them into two or three collections.

What did your students discover about the value of preserving Web sites?

David Rosenthal: Unrecoverable read errors

Thu, 2015-05-21 15:00
Trevor Pott has a post at The Register entitled Flash banishes the spectre of the unrecoverable data error in which he points out that while disk manufacturers quoted Bit Error Rates (BER) for hard disks are typically 10-14 or 10-15, SSD BERs range from 10-16 for consumer drives to 10-18 for hardened enterprise drives. Below the fold, a look at his analysis of the impact of this difference of up to 4 orders of magnitude.

When a disk in a RAID-5 array fails and is replaced, all the data on other drives in the array must be read to reconstruct the data from the failed drive. If an unrecoverable read error (URE) is encountered in this process, one or more data blocks will be lost. RAID-6 and up can survive increasing numbers of UREs.

It has been obvious for some time that as hard disks got bigger without a corresponding decrease in BER that RAID technology had a problem, in that the probability of encountering a URE during reconstruction was going up, and thus so was the probability of losing data when a drive failed.As Trevor writes:
Putting this into rather brutal context, consider the data sheet for the 8TB Archive Drive from Seagate. This has an error rate of 10^14 bits. That is one URE every 12.5TB. That means Seagate will not guarantee that you can fully read the entire drive twice before encountering a URE.
Let's say that I have a RAID 5 of four 5TB drives and one dies. There is 12TB worth of data to be read from the remaining three drives before the array can be rebuilt. Taking all of the URE math from the above links and dramatically simplifying it, my chances of reading all 12TB before hitting a URE are not very good.
With 6TB drives I am beyond the math. In theory, I shouldn't be able to rebuild a failed RAID 5 array using 6TB drives that have a 10^14 BER. I will encounter a URE before the array is rebuilt and then I’d better hope the backups work.
So RAID 5 for consumer hard drives is dead.Well, yes, but RAID-5, and RAID in general, is just one rather simple form of erasure coding. There are better forms of erasure coding for long-term data reliability. I disagree with Trevor when he writes:
There are plenty of ways to ensure that we can reliably store data, even as we move beyond 8TB drives. The best way, however, may be to put stuff you really care about on flash arrays. Especially if you have an attachment to the continued use of RAID 5.Trevor is ignoring the economics. Hard drives are a lot cheaper for bulk storage than flash. As Chris Mellor pointed out in a post at The Register about a month ago, each byte of flash contains at least 50 times as much capital investment as a byte of hard drive. So it will be a lot more expensive, even if not 50 times as expensive. For the sake of argument, lets say it is 5 times as expensive. To a first approximation, cost increases linearly with the replication factor, but reliability increases exponentially. So, instead of a replication factor of 1.2 in a RAID-5 flash array, for the same money I can have a replication factor of 12.2 in a hard disk array. Data in the hard drive array would be much, much safer for the same money. Or suppose I used a replication factor of 2.5, the data would be a great deal safer for 40% of the cost.

DuraSpace News: NOW AVAILABLE: DSpace 5.2!

Thu, 2015-05-21 00:00
From Hardy Pottinger, on behalf of the DSpace 5.2 Release Team, and all the DSpace developers.   Winchester, MA  The DSpace developers are pleased to formally announce that DSpace 5.2 is now available. DSpace 5.2 is a bug-fix release and contains no new features. DSpace 5.2 can be downloaded immediately at either of the following locations:   • SourceForge:

Ed Summers: SKOS and Wikidata

Wed, 2015-05-20 21:10

For #DayOfDH yesterday I created a quick video about some data normalization work I have been doing using Wikidata entities. I may write more about this work later, but the short version is that I have a bunch of spreadsheets with names in them (authors) in a variety of formats and transliterations, which I need to collapse into a unique identifier so that I can provide a unified display of the data per unique author. So for example, my spreadsheets have information for Fyodor Dostoyevsky using the following variants:

  • Dostoeieffsky, Feodor
  • Dostoevski
  • Dostoevski, F. M.
  • Dostoevski, Fedor
  • Dostoevski, Feodor Mikailovitch
  • Dostoevskii
  • Dostoevsky
  • Dostoevsky, Fiodor Mihailovich
  • Dostoevsky, Fyodor
  • Dostoevsky, Fyodor Michailovitch
  • Dostoieffsky
  • Dostoieffsky, Feodor
  • Dostoievski
  • Dostoievski, Feodor Mikhailovitch
  • Dostoievski, Feodore M.
  • Dostoievski, Thedor Mikhailovitch
  • Dostoievsky
  • Dostoievsky, Feodor Mikhailovitch
  • Dostoievsky, Fyodor
  • Dostojevski, Feodor
  • Dostoyeffsky
  • Dostoyefsky
  • Dostoyefsky, Theodor Mikhailovitch
  • Dostoyevski, Feodor
  • Dostoyevsky
  • Dostoyevsky, Fyodor
  • Dostoyevsky, F. M.
  • Dostoyevsky, Feodor Michailovitch
  • Dostoyevsky, Feodor Mikhailovich

So, obviously, I wanted to normalize these. But I also want to link the name up to an identifier that could be useful for obtaining other information, such as an image of the author, a description of their work, possibly link to works by the author, etc. I’m going to try to map the authors to Wikidata, largely because there are links from Wikidata to other places like the Virtual International Authority File, and Freebase, but there are also images on Wikimedia Commons, and nice descriptive text for the people. As an example here is the Wikidata page for Dostoyevsky.

To aid in this process I created a very simple command line tool and library called wikidata_suggest which uses Wikidata’s suggest API to interactively match up a string of text to a Wikidata entity. If Wikidata doesn’t have any suggestions as a fallback the utility looks in a page of Google’s search results for a Wikipedia page and then will optionally let you use that text.


Soon after tweeting about the utility and the video I made about it I heard from Alberto who works on the NASA Astrophysics Data System and was interested in using wikidata_suggest to try to link up the Unified Astronomy Thesaurus to Wikidata.

@libcce map UAT to @wikidata?

— Alberto Accomazzi (@aaccomazzi)

May 20, 2015

Fortunately the UAT is made available as a SKOS RDF file. So I wrote a little proof of concept script named that loads a SKOS file, walks through each skos:Concept and asks you to match the skos:prefLabel to Wikidata using wikidata_suggest. Here’s a quick video I made of what this process looks like:

I guess this is similar to what you might do in OpenRefine, but I wanted a bit more control over how the data was read in, modified and matched up. I’d be interested in your ideas on how to improve it if you have any.

It’s kind of funny how Day of Digital Humanities quickly morphed into Day of Astrophysics…

Nicole Engard: Bookmarks for May 20, 2015

Wed, 2015-05-20 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Coffitivity Coffitivity recreates the ambient sounds of a cafe to boost your creativity and help you work better.

Digest powered by RSS Digest

The post Bookmarks for May 20, 2015 appeared first on What I Learned Today....

Related posts:

  1. New: The Journal of Library Innovation
  2. Google Buys Feedburner – it’s official
  3. Monitor your sites

OCLC Dev Network: WMS Collection Management API - Problems with Write Operations

Wed, 2015-05-20 20:15

We have discovered problems with WMS Collection Management API write operations - including both Create and Update - and are advising users of this web service to limit their usage to the Read and Search operations only for now.

District Dispatch: Panel to discuss ebook lending growth at 2015 ALA Annual Conference

Wed, 2015-05-20 18:00

A leading panel of library and publishing experts will provide an update on the library ebook lending market and discuss best ways for libraries to advance library access to digital content at the 2015 American Library Association’s (ALA) Annual Conference in San Francisco. The interactive session, “Making Progress in Digital Content,” takes place from 10:30 to11:30a.m. on Sunday, June 28, 2015. The session will be held at the Moscone Convention Center in room 2018 of the West building.

During the session, an expert panel of library leaders from ALA’s Digital Content Working Group (DCWG) will provide insights on the most promising opportunities available to advance library access to digital content. Organizational leaders will discuss ALA’s efforts toward exploiting digital content access opportunities. Audience input will be sought to inform ALA priorities in this area. The program features DCWG co-chairs Carolyn Anthony and Erika Linke, along with additional guest panelists.

  • Carolyn Anthony, co-chair, ALA Digital Content Working Group; director, Skokie Public Library (Illinois); immediate past-president, Public Library Association
  • Erika Linke, co-chair, ALA Digital Content Working Group; associate dean of Libraries and director of Research and Academic Services, Carnegie Mellon University Libraries

View all ALA Washington Office conference sessions

The post Panel to discuss ebook lending growth at 2015 ALA Annual Conference appeared first on District Dispatch.

LITA: Jobs in Information Technology: May 20, 2015

Wed, 2015-05-20 17:24

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Director of Information Technology, Douglas County Libraries, Castle Rock, CO

Associate Product Owner, The Library Corporation (TLC), Inwood, WV

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Jonathan Rochkind: “First Rule of Usability? Don’t Listen to Users”

Wed, 2015-05-20 15:23

A 15-year-old interesting brief column from noted usability expert Jakob Nielsen, which I saw posted today on reddit:  First Rule of Usability? Don’t Listen to Users

Summary: To design the best UX, pay attention to what users do, not what they say. Self-reported claims are unreliable, as are user speculations about future behavior. Users do not know what they want.

I’m reposting here, even though it’s 15 years old, because I think many of us haven’t assimilated this message yet, especially in libraries, and it’s worth reviewing.

An even worse version of trusting users self-reported claims, I think, is trusting user-facing librarians self-reported claims about what they have generally noticed users self-reporting.  It’s like taking the first problem and adding a game of ‘telephone’ to it.

Nielsen’s suggested solution?

To discover which designs work best, watch users as they attempt to perform tasks with the user interface. This method is so simple that many people overlook it, assuming that there must be something more to usability testing. Of course, there are many ways to watch and many tricks to running an optimal user test or field study. But ultimately, the way to get user data boils down to the basic rules of usability:

  • Watch what people actually do.
  • Do not believe what people say they do.
  • Definitely don’t believe what people predict they may do in the future.

Yep. If you’re not doing this, start. If you’re doing it, you probably need to do it more.  Easier said than done in a typical bureaucratic inertial dysfunctional library organization, I realize.

It also means we have a professional obligation to watch what the users do — and determine how to make things better for them. And then watch again to see if it did. That’s what makes us professionals. We can not simply do what the users say, it is an abrogation of our professional responsibility, and does not actually produce good outcomes for our patrons. Again, yes, this means we need library organizations that allow us to exersize our professional responsibilities and give us the resources to do so.

For real, go read the very short article. And consider what it would mean to develop in libraries taking this into account.

Filed under: General

District Dispatch: Libraries for Tomorrowland

Wed, 2015-05-20 15:07
[The following article was written by Christopher Harris, director of the School Library System for the Genesee Valley Educational Partnership in New York. Chris also serves as a Youth and Technology Fellow for the American Library Association’s Office for Information Technology Policy (OITP).]

Panelists at the Miles fromTomorrowland event.

“Miles from Tomorrowland,” a relatively new show from Disney Junior, has plotted a course to bring computer science to young viewers. Perhaps more importantly, the team behind the show has made intentional choices to encourage young girls to consider computer science and other science, technology, engineering, and mathematics (STEM) fields as great options for the future. The show has been developed as a collaborative effort between Disney, Google, and National Aeronautics and Space Administration (NASA). On Monday, the three organizations gathered at the Google offices in Washington, D.C. to showcase the success and think about future plans. But what does this have to do with libraries? Patience…first we must consider the backstory.

Following in the footsteps of the original Star Trek where creator Gene Roddenberry cast Nichelle Nichols as Lt. Uhura in one of the first professional roles for an African-American female, “Miles from Tomorrowland” showcases women in key positions. In the show, a family is travelling through space working for the Tomorrowland Transportation Authority. The captain of the family ship is the mother, and Miles’ big sister Loretta is the computer science whiz who guides Miles through his adventures and is always ready to save the day. What are we currently doing in our libraries—and what more might we be doing—to also highlight women in STEM and leadership fields?

Miles from TomorrowLand show. Photo by Disney.

Nichelle Nichols’ portrayal of Uhura on Star Trek was inspirational to many women in the 1970s. She was even hired by NASA to help bring more women and minorities into the space program; a mission that resulted in the recruitment of the first six women astronauts – including the very notable Sally Ride – in 1978.

The idea of having a female astronaut as captain of the ship in Miles of Tomorrowland was based on real life astronaut Dr. Yvonne Cagle. Dr. Cagle, the second African-American woman selected by NASA for astronaut training, completed her training in 1998 and has been working as a strong advocate for getting more young women into STEM fields ever since.

The statistics are a bit frightening when you really stop and consider them. in 1984, one year after Sally Ride became the first American woman in space and in the prime of Adm. Grace Hopper’s media coverage as a female computer scientist, only 37 percent of those entering computer science fields were women according to research from Google. Things have not improved; in fact the situation has become quite alarming. In 2009, women made up only 18 percent of those pursuing computer science. Today, despite making up roughly half of the U.S. workforce, women fill less than 25 percent of STEM (pdf) related jobs in the country.

“Miles from Tomorrowland” is hoping to change things. Creator Sascha Paladino spoke yesterday about the birth of his twin sons as inspiration for the show. He wanted his sons to see positive female role models in the captain and a big sister who used computer programming to solve problems. Quick…name three books from your library that feature female military commanders, female scientists, or female computer programmers.

For older readers, the first challenge is relatively easy thanks to some amazing science-fiction series like David Weber’s Honor Harrington or Elizabeth Moon’s (herself a former computer specialist with the Marines) Vatta and Serano books. Octavia Butler’s work also stands out as promoting both women characters and highlighting people of color as protagonists. For younger readers? Margaret Bechard’s “Star Hatchling” and Madeline L’Engle’s “A Wrinkle in Time” come to mind. But where are the mainstream series like Junie B. Jones or Cam Jansen where girls are shown being interested in and excelling in STEM fields?

What I took away from the event yesterday is that there is a huge amount of interest in encouraging young girls to consider STEM careers. Google, Disney, and NASA are working on this through “Miles from Tomorrowland” for very young viewers, but also in other ways. Google’s Made with Code project empowered girls as coders able to take over their state’s Christmas tree at the White House and program the lights.

What I also heard yesterday is that more work is needed. Two Congresswomen, Representative Susan Brooks (R-IN) and Representative Suzan DelBene (D-WA) are working to highlight the importance of STEM in schools and the need to encourage girls to go into STEM fields. They have co-sponsored an effort to have computer programming added to ESEA as a required field for K-12 education. Who better to lead this instructional effort than the school librarians? Who better to support the early learning that can accompany efforts like “Miles From Tomorrowland” and the extension beyond the classroom than public librarians?

So then the question. The whole point of this post. Are you ready to step up and participate in this effort? Google’s research shows that one of the key contributing factors to young women considering a STEM field in college is encouragement from a parent or other adult (teacher, guidance counselor…librarian?). What will you do in your library to encourage girls and young women to become interested in STEM fields? How will you support their learning to help them build self-confidence as STEM experts? How will you help bridge the gender divide in STEM careers to better help the country fill the numerous current and future STEM field job openings?

How will your library and you as a librarian become a champion of STEM learning for all – but especially for the currently underrepresented such as women and minorities?

The post Libraries for Tomorrowland appeared first on District Dispatch.

Library of Congress: The Signal: The NDSR Boston Residents Reflect on their “20% Projects”

Wed, 2015-05-20 12:39

The following is a guest post by the entire group of NDSR-Boston residents as listed below. For their final posting, the residents present an overview of their individual professional development projects.

Rebecca Fraimow (WGBH)


One of the best things about this year’s NDSR in Boston  is the mandate to dedicate 20% of our time to projects outside of the specific bounds of our institution. Taking coursework, attending conferences, creating workshops — it’s all the kind of stuff that’s invaluable in the archival profession but is often hard to make time for on top of a full-time job, and I really appreciated that NDSR explicitly supported these efforts.

While I definitely took advantage of the time for my own personal professional development — investing time in Python and Ruby on Rails workshops and Harvard’s CopyrightX course, as well as presentations at AMIA, Code4Lib, Personal Digital Archives, NEA and NDSR-NE — the portion of my 20% that I’ve most appreciated is the opportunity to expand the impact of the program beyond the bounds of the immediate NDSR community. With the support of the rest of the Boston cohort, I partnered with my WGBH mentor, Casey Davis, to lead a series of workshops on handling audiovisual analog and digital material for students at the Simmons School of Library and Information Science. It was fantastic to get a chance to share the stuff I’ve learned with the next generation of archivists (and, who knows, maybe some of the next round of NDSR residents!).

As a cohort, we’ve also teamed up to design workflows and best practice documents for the History Project — a Boston-based, volunteer-run LGBT archive with a growing collection of digitized and born-digital items. This project is also, I think, a really great example of the ways that the program can make an impact outside of the relatively small number of institutions that host residents, and illustrates how valuable it is to keep expanding the circle of digital preservation knowledge.

Samantha Dewitt (Tufts University)


The NDSR residency has been a terrific experience for me, with the Tufts project proving to be a very good fit. Having been completely preoccupied with the subject of open science and Research Data Management in these past nine months, I am finding it hard to let go of the topic and I endeavor to continue working on this particular corner of the digital preservation puzzle. These days, data sharing and research data management frequently arise as topics of conversation in relation to research universities. Consequently, I had little trouble finding ways to add digital data preservation to my “20%” time. I looked forward to sharing the subject with my NDSR cohort whenever possible!

In November, our group attended a seminar on data curation at the Harvard-Smithsonian Center for Astrophysics. Several weeks later, I was able to meet with Dr. Micah Altman (MIT) to explore the subject of identifying and managing confidential information in research. Also in November, the Boston Library Consortium & Digital Science held a workshop at Tufts on Better Understanding the Research Information Management Landscape. Mark Hahnel, founder of Figshare, and Jonathan Breeze, CEO of Symplectic, spoke. This spring, Eleni Castro, research coordinator and data scientist at Harvard, met with our group to discuss the university’s new Dataverse 4.0 beta. Finally, in April, I was excited to be able to attend the Research Data Access and Preservation Summit in Minneapolis, MN. It has been a busy nine months!

Joey Heinen (Harvard Libraries)


The “20%” component of the National Digital Stewardship Residency is a great way for us to expand our interests, learn more about emerging trends and practices in the field and also to stay connected to any interests that might not align with our projects. My 20% involved a mixture of continuing education opportunities, organizing talks and tours and contributing to group projects which serve specific institutions or the field at large. For continuing education I learned some of the basics of Python programming through the Data Scientist Training for Librarians at Harvard.

For talks and tours, I organized a visit to the Northeast Document Conservation Center (largely to learn about the IRENE Audio Preservation System ) and with the Harvard Art Museum’s Registration and Digital Infrastructure and Emerging Technologies departments. I also co-organized an event entitled “Catching Waves: A Panel Discussion on Sustainable Digital Audio Delivery” (webex recording available soon on Harvard Library’s YouTube Channel). For developing resources I participated in the AMIA/DLF 2014 Hack Day in a group that developed a tool for comparing the output of three A/V characterization tools (see the related blog post) and also designed digital imaging and audio digitization workflows for the History Project.

Finally, I participated in NDSR-specific panels at the National Digital Stewardship Alliance – Northeast meeting (NDSA-NE) and the Spring New England Archivists conference as well as individually at the recent American Institute for Conservation of Historic and Artistic Works conference. All in all I am pleased with the diversity of the projects and my level of engagement with both the local and national preservation communities. (As a project update, here is the most recent iteration of the Format Migration Framework (pdf)).

Tricia Patterson (MIT Libraries)


Two weeks left to go! And I ended up doing so much more than I initially anticipated during my residency. My project was largely focused on diagrammatically and textually documenting the low-level workflows of our digitization and managing digital preservation processes, some of the results of which can be seen on the Digital Preservation Management workshop site. But beyond the core of the project, so much else was accomplished. I helped organize both an MIT host event and a field trip to the JFK Library and Museum for my NDSR compadres. Joey Heinen and I co-organized a panel on sustainable digital audio delivery, replete with stellar panelists from both MIT and Harvard. I collaborated with my NDSR peers on a side assignment for the History Project. I also shared my work with colleagues at so many different venues, like presenting at the New England Music Library Association, giving a brown bag talk at MIT, writing on our group blog, being accepted to present with my MIT colleagues at the International Association of Sound and Audiovisual archives conference, and in the final days of my residency, presenting at the Association of Recorded Sound Collections conference.

All in all, a lot has been crammed into nine brief months: engaging in hands-on experience, enhancing my technological and organizational knowledge, forging connections in the digital preservation community and beyond. It really ended up being a vigorous and dynamic catapult into the professional arena of digital preservation. Pretty indispensable, I’d say!

Jen LaBarbera (Northeastern University)


Though my project focused specifically on creating workflows and roadmaps for various kinds of digital materials, I found myself becoming more and more intrigued by the conceptual challenges of digital preservation for the digital humanities. Working on this project as part of a residency meant that I had some flexibility and was given the time and encouragement to pursue topics of interest, even if they were only indirectly related to my project at Northeastern University.

As a requirement of the residency, each resident had to plan and execute an event at their host institution, and we were given significant latitude to define that event. Instead of doing the standard tour and in-person demonstration of my work at Northeastern, Giordana Mecagni and I chose to reach out to some folks in our library-based Digital Scholarship Group to host a conversation exploring the intersections between digital preservation and digital humanities. The response from the Boston digital humanities and library community was fantastic; people were eager to dive into this conversation and talk about the challenges and opportunities presented in preserving the scholarly products of the still fairly new world of digital humanities. We had a stellar turnout from digital humanities scholars and librarians from all over the Boston area, from institutions within the NDSR Boston cohort and beyond. We didn’t settle on any concrete answers in our conversation, but we were able to highlight the importance of digital preservation within the digital humanities world.

My experience with NDSR Boston will continue to be informative and influential as I move on to the next step in my career, as the lead archivist at Lambda Archives of San Diego in sunny southern California. From the actual work on my project at Northeastern to the people we met through our “20%” activities – e.g. touring NEDCC, attending Rebecca’s AV archiving workshops at Simmons, working with the History Project to develop digital preservation plans and practices – I feel much more prepared to responsibly preserve and make available the variety of formats of digital material that will inevitably come my way in my new position at this LGBTQ community archive.

DPLA: Developing and implementing a technical framework for interoperable rights statements

Wed, 2015-05-20 12:10

Farmer near Leakey, holding a goat he has raised. Near San Antonio, 1973. National Archives and Records Administration.

Within the Technical Working Group of the International Rights Statements Working Group, we have been focusing our efforts on identifying a set of requirements and a technically sound and sustainable plan to implement the rights statements under development. Now that two of the Working Group’s white papers have been released, we realized it was a good time to build on the introductory blog post by our Co-Chairs, Emily Gore and Paul Keller. Accordingly, we hope this post provides a good introduction to our technical white paper, Recommendations for the Technical Infrastructure for Standardized International Rights Statements, and more generally, how our thinking has changed throughout the activities of the working group.

The core requirements

The Working Group realized early on that there was the need for a common namespace for rights statements in the context of national and international projects that aggregate cultural heritage objects. We saw the success of the work undertaken to develop and implement the Europeana Licensing Framework, but realized that a more general framework was necessary to be leveraged beyond the Europeana community.  In addition, we established that there was a clear need to develop persistent, dereferenceable URIs to provide human- and machine-readable representations.

In non-technical terms, this identifies a number of specific requirements. First, the persistence requirement means that the URIs need to remain the same over time, so we can ensure that they can be accessed consistently over the long term. The “dereferenceability” requirement states that when we request a rights statement by its URI, we need to get a representation back for it, either human-readable or machine-readable depending on how it’s requested. For example, if a person enters a rights statement’s URI in their web browser’s address bar, they should get an HTML page in response that presents the rights statement’s text and more information. By comparison, if a piece of software or a metadata ingestion process requests the rights statement by its URI, it should get a machine-readable representation (say, using the linked data-compatible JSON-LD format) that it can interpret and reuse in some predictable way.

Beyond these requirements, we also identified the need for both the machine-readable representation to provide specific kinds of additional information where appropriate, such as the name of the statement, the version of the statement, and where applicable, the jurisdiction where the rights statement applies. Finally, and most importantly, we needed a consistent way to provide translations of these statements that met the above requirements for dereferenceability, since they are intended to be reused by a broad international community of implementers.

Data modeling and implementation

After some discussion, we decided the best implementation for these rights statements was to develop a vocabulary implemented using the Resource Description Framework (RDF) and the Simple Knowledge Organization System (SKOS) standards. These standards are broadly used throughout the Web, and are both used within the Europeana Data Model  and the DPLA Metadata Application Profile. We are also looking at the Creative Commons Rights Expression Language (ccREL) and Open Digital Rights Language (ODRL) models to guide our development. At this stage, we have a number of modeling issues still open, such as which properties to use for representing various kinds of human-readable documentation or providing guidance on how to apply community-specific constraints and permissions. Deciding whether (and how) rights statements can be extended in the future is also an intriguing point. We are looking for feedback on all these topics!

As part of the process, we have been managing our draft implementation of the data model in a GitHub repository to allow for ease of collaboration across the technical subgroup. As the proposed rights statements become finalized following the discussion period on the two white papers, we will be working to provide a web server to host the machine-readable and human-readable versions of the rights statements in accordance with our requirements. To guide our implementation, we are building on the Best Practice Recipes for Publishing RDF Vocabularies with a slight modification to allow for better support for the multilingual requirements of the Working Group. Advice from the technical experts in our community is also highly welcome on this approach.

The end of the public feedback period has been set to Friday 26th June 2015, but the Technical Working Group will try to answer comments on the white paper on a regular basis, in the hope of setting up a continuous, healthy stream of discussion.


The technical work on implementing the rights statements has been deeply collaborative, and would not have been possible by the dedicated efforts of the members of the Technical Working Group:

  • Bibliothèque Nationale de Luxembourg: Patrick Peiffer
  • Digital Public Library of America: Tom Johnson and Mark Matienzo
  • Europeana Foundation: Valentine Charles and Antoine Isaac
  • Kennisland: Maarten Zeinstra
  • University of California San Diego: Esmé Cowles
  • University of Oregon: Karen Estlund
  • Florida State University: Richard J. Urban

Library Tech Talk (U of Michigan): Quality in HathiTrust (Re-Posting)

Wed, 2015-05-20 00:00

This is a re-posting of a HathiTrust blog post. HathiTrust receives well over a hundred inquiries every month about quality problems with page images or OCR text of volumes in HathiTrust. That’s the bad news. The good news is that in most of these cases, there is something they can do about it. A new blog post is intended to shed some light on the thinking and practices about quality in HathiTrust.

LibUX: 019: Links Should Open in the Same Window

Tue, 2015-05-19 22:39

Where should links open – and does it matter? In this episode of the podcast, we explore the implications on the net user experience of such a seemingly trivial preference.


You can listen to LibUX on Stitcher, find us on iTunes, or subscribe to the straight feed. Consider signing-up for our weekly newsletter, the Web for Libraries.

The post 019: Links Should Open in the Same Window appeared first on LibUX.

SearchHub: Infographic: The Woes of the CIOs

Tue, 2015-05-19 17:28
It’s tough out there for CIOs. They’re getting it from all sides and from all directions. Let’s take a look at the unique challenges CIOs face in trying to keep their organizations competitive and effective:

The post Infographic: The Woes of the CIOs appeared first on Lucidworks.

Islandora: Fedora 4 Project Update IV

Tue, 2015-05-19 15:31

As the project entered the fourth month, work continued on migration planning and mapping, migration-utils, and Drupal integration.

Migration work was split between working on migration-utils, migration mappings, data modeling (furthering Portland Common Data Model compliance), and working with the Islandora (Fedora 4 Interest Group), Fedora (Fedora Tech meetings), and Hydra (Hydra Metadata Working Group) communities on the preceding items. In addition, Audit Service-- a key requirement of an Islandora community fcrepo3 -> fcrepo4 migration -- finalized the second phase of the project. Community stakeholders are currently reviewing and providing feedback.

Work on migration-utils focused mainly applying a number of mappings (outlined here) to the utility, adding support for object-to-object linking, and providing documentation on how to use the utility. This work can be demonstrated by building the Islandora 7.x-2.x Vagrant Box, cloning the migration-utils repository, and pointing migration-utils at a fcrepo3 native filesystem or directory of exported FOXML.

As for object modeling and inter-community work, an example of this work is the below image of a sample Islandora Large Image object modeled in the Portland Common Data Model. This model will continue to evolve as the communities work together in the various Hydra Metadata Working Group sub-working groups.

On the Drupal side of things, work was started on Middleware Services, a middleware service that will utilize the Fedora 4 REST API and the Drupal Services modules, and create an API for the majority of interactions between the two systems. In addition, a few Drupal modules have been created to leverage this; islandora_basic_image, islandora_collection, islandora_dcterms.

In addition, the team has been exploring options with RDF integration and support in Drupal, as well as how to handle editing (Islandora XML Forms) the various descriptive metadata schemas the community uses. This is captured in a few issues in the issue queue; #27 & #28. Due to the importance of the issue, a special Fedora 4 Interest Group meeting was held to discuss how to proceed with this functionality in Islandora 7.x-2.x. The group's consensus was to solicit use cases from the community to better understand how to proceed with 7.x-2.x

Work will continue on the migration and Drupal sides of the project into May.

David Rosenthal: How Google Crawls Javascript

Tue, 2015-05-19 15:00
I started blogging about the transition the Web is undergoing from a document to a programming model, from static to dynamic content, some time ago. This transition has very fundamental implications for Web archiving; what exactly does it mean to preserve something that is different every time you look at it? Not to mention the vastly increased cost of ingest, because executing a program takes a lot more, a potentially unlimited amount of, computation than simply parsing a document.

The transition has big implications for search engines too; they also have to execute rather than parse. Web developers have a strong incentive to make their pages search engine friendly, so although they have enthusiastically embraced Javascript they have often retained a parse-able path for search engine crawlers to follow. We have watched academic journals adopt Javascript, but so far very few have forced us to execute it to find their content.

Adam Audette and his collaborators at Merkle | RKG have an interesting post entitled We Tested How Googlebot Crawls Javascript And Here’s What We Learned. It is aimed at the SEO (Search Engine Optimzation) world but it contains a lot of useful information for Web archiving. The TL;DR is that Google (but not yet other search engines) is now executing the Javascript in ways that make providing an alternate, parse-able path largely irrelevant to a site's ranking. Over time, this will mean that the alternate paths will disappear, and force Web archives to execute the content.

District Dispatch: Ending “bulk collection” of library records on the line in looming Senate vote

Tue, 2015-05-19 13:14

Image Source: PolicyMic

Last week, the House of Representatives voted overwhelmingly, 338 to 88, for passage of the latest version of the USA FREEDOM Act, H.R. 2048. The bill — and the battle to achieve the first meaningful reform of the USA PATRIOT Act since it was enacted 14 years ago — now shifts to the Senate. There, the outcome may well turn on the willingness of individual voters to overwhelm Congress with demands that USA FREEDOM either be passed without being weakened, or that the now infamous “library provision” of the PATRIOT Act (Section 215) and others slated for expiration on June 1 simply be permitted to “sunset” as the Act provides if Congress takes no action. Now is the time for all librarians and library supporters — for you — to send that message to both of your US Senators. Head to the action center to find out how.

For the many reasons detailed in yesterday’s post, ALA and its many private and public sector coalition partners have strongly urged Congress to pass the USA FREEDOM Act of 2015 without weakening its key, civil liberties-restoring provisions. Already a finely-tuned compromise that delivers fewer privacy protections than last year’s Senate version of the USA FREEDOM Act, this year’s bill simply cannot sustain further material dilution and retain ALA’s (and many other groups’) support. The Obama Administration also officially endorsed and called for passage of the bill.

Unfortunately, the danger of the USA FREEDOM Act being blocked entirely or materially weakened is high. The powerful leader of the Senate, Mitch McConnell of Kentucky, is vowing to bar consideration of H.R. 2048 and, instead, to provide the Senate with an opportunity to vote only on his own legislation (co-authored with the Chair of the Senate Intelligence Committee) to reauthorize the expiring provisions of the PATRIOT Act with no privacy-protecting or other changes whatsoever. Failing the ability to pass that bill, Sen. McConnell and his allies have said that they will seek one or more short-term extensions of the PATRIOT Act’s expiring provisions.

Particularly in light of last week’s ruling by a federal appellate court that the government’s interpretation of its “bulk collection” authority under Section 215 was illegally broad in all key respects, ALA and its partners from across the political spectrum vehemently oppose any extension without meaningful reform of the USA PATRIOT Act of any duration.

The looming June 1 “sunset” date provides the best leverage since 2001 to finally recalibrate key parts of the nation’s surveillance laws to again respect and protect library records and all of our civil liberties. Please, contact your Senators now!

Additional Resources

House Judiciary Committee Summary of H.R. 2048

Statement of Sen. Patrick Leahy, lead sponsor of S. 1123 (May 11, 2015)

Open Technology Institute Comparative Analysis of select USA FREEDOM Acts of 2014 and 2015

Patriot Act in Uncharted Legal Territory as Deadline Approaches,” National Journal (May 10, 2015)

N.S.A. Collection of Bulk Call Data Is Ruled Illegal,” New York Times (May 7, 2015)

The post Ending “bulk collection” of library records on the line in looming Senate vote appeared first on District Dispatch.

LITA: Call for Writers

Tue, 2015-05-19 13:00
meme courtesy of Michael Rodriguez

The LITA blog is seeking regular contributors interested in writing easily digestible, thought-provoking blog posts that are fun to read (and hopefully to write!). The blog showcases innovative ideas and projects happening in the library technology world, so there is a lot of room for contributor creativity. Possible post formats could include interviews, how-tos, hacks, and beyond.

Any LITA member is welcome to apply. Library students and members of underrepresented groups are particularly encouraged to apply.

Contributors will be expected to write one post per month. Writers will also participate in peer editing and conversation with other writers – nothing too serious, just be ready to share your ideas and give feedback on others’ ideas. Writers should expect a time commitment of 1-3 hours per month.

Not ready to become a regular writer but you’d like to contribute at some point? Just indicate in your message to me that you’d like to be considered as a guest contributor instead.

To apply, send an email to briannahmarshall at gmail dot com by Friday, May 29. Please include the following information:

  • A brief bio
  • Your professional interests, including 2-3 example topics you would be interested in writing about
  • If possible, links to writing samples, professional or personal, to get a feel for your writing style

Send any and all questions my way!

Brianna Marshall, LITA blog editor