You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 5 hours 33 min ago

Nicole Engard: Bookmarks for September 9, 2014

Tue, 2014-09-09 20:30

Today I found the following resources and bookmarked them on <a href=

  • Color Oracle Color Oracle is a free color blindness simulator for Window, Mac and Linux.

Digest powered by RSS Digest

The post Bookmarks for September 9, 2014 appeared first on What I Learned Today....

Related posts:

  1. Another Satisfied Customer
  2. Amazon’s bestselling laptop is open source!
  3. September Workshops

Roy Tennant: In Memoriam: Anne Grodzins Lipow

Tue, 2014-09-09 20:23

I was reminded by her daughter on Facebook that Anne Grodzins Lipow passed away ten years ago today. In commemoration of that horrible event, I am posting the Foreword I wrote for Anne’s festschrift that was published in 2008.

On September 9, 2004 librarianship lost a true champion. Anne Grodzins Lipow was unique – of all the testimonials I’ve read about her that is one undeniable truth. We each knew a different set of Anne’s qualities, or engaged with her in a different way, but in the end it all came down to the fact that Anne was someone we could all say was “larger than life”.

The days after her passing were filled with personal testimonials that were mostly lodged as comments on the Infopeople blog. It was an odd experience for me to read these messages and realize that as much as I felt that I knew her, I barely knew her at all. I was like the proverbial blind man with his hands wrapped around one part of the elephant, while others had a firm grip on other body parts and would describe a very different animal. My reality, as deeply felt as it was, was only a pale shadow of the whole.

But for all that, it was a long, long shadow. As a newly-minted librarian at UC Berkeley in the second half of the 1980s, I knew Anne as the person who led the outreach and instructional efforts of the library. Before long, she saw in me the potential to be a good teacher, despite my fear of public speaking, so she pulled me into her program and began teaching me everything she knew about speaking, putting on workshops, making handouts, etc. Under her tutelage, I taught classes such as dialup access to the library catalog, when 300bps modems were still common.

As the Internet began making inroads into universities, Anne was there with newly developed workshops on how to use it. She was convinced very early on, as was I, that the Internet would be an essential technology for libraries. This led to her approaching my colleague John Ober (then on faculty at the library school at Berkeley) and I about doing a full-day Internet workshop scheduled to coincide with the 1992 ALA Annual Conference in San Francisco. Using a metaphor of John’s, we called it
“Crossing the Internet Threshold”.

In preparing for the workshop, we created so many handouts that we needed to put them into a binder that began to look increasingly like a book in the making. With typical Anne flair, she arranged for the gifted librarian cartoonist Gary Handman (also our colleague at Berkeley) to create a snazzy cover for the binder, that she also used to create T-shirts (which many of us have to this day).

Anne knew enough about workshops to do a “trial run” before the big day, so we did one for UC Berkeley library staff a couple weeks before, which gave us feedback essential to making an excellent workshop. In the end, the workshop was such a hit that Anne ran with it. She took the binder of handouts we had created and made a book out of it — the first book of her newly-created business called Library Solutions Institute and Press. Her decision to publish the book herself rather than seek out a publisher was so typical of Anne. And how she did it will tell you a lot about her.

Despite the higher cost, Anne insisted on using domestic union printing shops for printing. While other publishers were publishing books overseas for a fraction of the cost, publishing for Anne was a political and social activity, through which she could do good for those around her. It was very important to her to treat people with respect and kindness, and she did it so well. That was the kind of person Anne was.

While every publisher I have since worked with after Anne has insisted they are incapable of paying royalties any more frequently than twice a year, Anne paid her authors monthly. And whereas other publishers wait months to pay you for royalties earned long before, Anne would pay immediately. This meant that when books were returned, as they sometimes were, she took the loss for having paid the author royalties on books that had not been sold. That was the kind of person Anne was.

Anne continued to blaze new trails after libraries began climbing on the Internet bandwagon, due in no small measure to her books and workshops on the topic. Anne became a well-known and coveted consultant on a number of topics, but in particular on reference services.

Her “Rethinking Reference” institutes and book were widely acclaimed, and her book The Virtual Reference Librarian’s Handbook (2003) demonstrated that Anne was always at the cutting edge of librarianship. That was the kind of person Anne was.

I visited her after her cancer was diagnosed and after her treatment had failed. We all knew there was no hope, that she had only a matter of weeks to live. Despite the obvious ravages of the illness, Anne’s outlook remained bright and welcoming. She was happy to have her friends and family around her, and we talked of many things except the dark shadow that hung over us all. Even then, she was happy to see whoever came by, and to talk with them with a smile and good wishes. That was the kind of person Anne was.

A piece of all my major professional accomplishments I owe to Anne, and her great and good influence on me. She would deny this, despite it’s truth, wanting all the credit to accrue to me alone. That was the kind of person Anne was.


Each one of us who have contributed to this volume have been touched by Anne in our own, quite personal ways. Some of us have known of her work mostly by reputation and reading, while others were blessed with more direct and personal contact. But the fact remains that Anne cast a long professional shadow that will affect many librarians yet to come.

For those of us who created a monument of words to someone we love and respect, Anne had one final gift to give. As anyone who has ever created a present for someone they love knows, in so doing you think about the person for whom you are making the gift. Therefore, the authors of this volume have all spent more time with Anne, and as always it was time well spent. We know our readers will count it so too.

31 January 2008, Sonoma, CA

LITA: LITA Midwinter Institutes

Tue, 2014-09-09 19:34

Registration for LITA’s Midwinter Institutes opened today with ALA’s joint registration! Whether you’ll be attending Midwinter or are just looking for a great one day continuing education event in the Chicago/Midwest area, we hope you’ll join us.

When? All workshops will be held on Friday, January 30, 2015, from 8:30-4:00

Cost for LITA Members: $235  (ALA $350 / Non-ALA $380)
(If you are a member of LITA use special code LITA2015 to receive the price of $235.)

Workshops Descriptions:

Developing mobile apps to support field research
Instructor: Wayne Johnston, University of Guelph Library

Researchers in most disciplines do some form of field research. Too often they collect data on paper which is not only inefficient but vulnerable to date loss. Surveys and other data collection instruments can easily be created as mobile apps with the resulting data stored on the campus server and immediately available for analysis. The apps also enable added functionality like improved data validity through use of authority files and capturing GPS coordinates. This support to field research represents a new way for academic libraries to connect with researchers within the context of a broader research date management strategy.

Introduction to Practical Programming
Instructor: Elizabeth Wickes, University of Illinois at Urbana-Champaign

This workshop will introduce foundational programming skills using the Python programming language. There will be three sections to this workshop: a brief historical review of computing and programming languages (with a focus on where Python fits in), hands on practice with installation and the basics of the language, followed by a review of information resources essential for computing education and reference. This workshop will prepare participants to write their own programs, jump into programming education materials, and provide essential experience and background for the evaluation of computing reference materials and library program development. Participants from all backgrounds with no programming experience are encouraged to attend.

From Lost to Found: How user Testing Can Improve the User Experience of Your Library Website
Instructors: Kate Lawrence, EBSCO Information Services; Deirdre Costello, EBSCO Information Services; Robert Newell, University of Houston

When two user researchers from EBSCO set out to study the digital lives of college students, they had no idea the surprises in store for them. The online behaviors of “digital natives” were fascinating: from students using Google to find their library’s website, to what research terms and phrases students consider another language altogether: “library-ese.” Attendees of this workshop will learn how to conduct usability testing, and participate in a live testing exercise via Participants will leave the session with the knowledge and confidence to conduct user testing that will yield actionable and meaningful insights about their audience.


More details about these workshops will be coming in interviews with the instructors in October! If you have a question you’d like to ask the instructors, please contact LITA Education Chair Abigail Goben at [firstnamelastname]





LITA: 2014 LITA Forum: early bird rates available through Sept. 15

Tue, 2014-09-09 19:19
Don’t miss your chance to save up to $50 on registration for the 2014 LITA Forum “From Node to Network” to be held Nov. 5-8, 2014 at the Hotel Albuquerque in Albuquerque N.M.

Don’t forget to book your room at the Hotel Albuquerque by Oct. 14, 2014 to guarantee the LITA room rate.

This year’s Forum will feature three keynote speakers

  • AnnMarie Thomas, Engineering Professor, University of St. Thomas
  • Lorcan Dempsey, Vice President, OCLC Research and Chief Strategist
  • Kortney Ryan Ziegler, Founder Trans*h4ck.

More than 30 concurrent colleague inspired sessions and a dozen poster sessions will provide a wealth of practical information on a wide range of topics.

Networking opportunities, a major advantage of a smaller conference, are an important part of the Forum. Take advantage of the Thursday evening reception and sponsor showcase, the Friday networking dinners or Kitchen Table Conversations, plus meals and breaks throughout the Forum to get to know LITA leaders, Forum speakers, sponsors, and peers.

This year two preconference workshops will also be offered.

Linked Data for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users
Led by: Dean B. Krafft and Jon Corson-Rikert, Cornell University Library

Linked Open Data (LOD) provides an expressive and extensible mechanism for sharing information (metadata) about all the materials research libraries make available. In this workshop the presenters will introduce the principles and practices of creating and consuming Linked Open Data via a series of examples from sources relevant to libraries. They will provide an introduction to the technologies, tools, and types of data typically involved in creating and working with Linked Open Data and the semantic web. The preconference will also address the challenges of data quality, interoperability, authoritativeness, privacy, and other issues accompanying the adoption of new technologies as these apply to making use of Linked Open Data.

Learn Python by Playing with Library Data
Led by: Francis Kayiwa, Kayiwa Consulting

What can be more fun than learning Python? Learning Python by hacking on library data! In this workshop, you’ll learn Python basics by reading files, looking at MARC (yes MARC), building data structures, and analyzing library data (those logs aren’t going to appreciate themselves). By the end, you will have set up your Python environment, installed some useful packages, and learned how to write simple programs that you can use to impress your colleagues back at work.

2014 LITA Forums sponsors include EBSCO, Springshare, @mire, Innovative and OCLC.

Visit the LITA website for more information.

Library and Information Technology Association (LITA) members are information technology professionals dedicated to educating, serving, and reaching out to the entire library and information community.   LITA is a division of the American Library Association.

LITA and the LITA Forum fully support the Statement of Appropriate Conduct at ALA Conferences

Bill Dueber: Help me test yet another LC Callnumber parser

Tue, 2014-09-09 19:10

Those who have followed this blog and my code for a while know that I have a long, slightly sad, and borderline abusive relationship with Library of Congress call numbers.

They're a freakin' nightmare. They just are.

But, based on the premise that Sisyphus was a quitter, I took another stab at it, this time writing a real (PEG-) parser instead of trying to futz with extended regular expressions.

The results, so far, aren't too bad.

The gem is called lc_callnumber, but more importantly, I've put together a little heroku app to let you play with it, and then correct any incorrect parses (or tell me that it worked correctly) to build up a test suite.

So…Please try to break my LC Callnumber parser!

[Code for the app itself is on github; pull requests for both the app and the gem joyously received]

David Rosenthal: Two Brief Updates

Tue, 2014-09-09 17:56
A couple of brief updates on topics I've been covering, Amazon's margins and the future of flash memory.

First, Benedict Evans has a fascinating and detailed analysis of Amazon's financial reports. Read the whole thing.

He shows how Amazon's strategy is not to generate and distribute profits, but to re-invest their cash flow into staring and developing businesses. Starting each business absorbs cash, but as they develop they turn around and start generating cash that can be used to start the next one.

His graphs and numbers make the case brilliantly. Here, for example, is Amazon's revenues and profits since launch; lots of revenues and almost no profit. But it is more revealing to focus, as Amazon does, on cash flow.

Here Evans shows Free Cash Flow (FCF), Capital Expenditure (capex), and Operating Cash Flow (OCF) as a proportion of revenue.
Amazon’s OCF margin has been very roughly stable for a decade, but the FCF has fallen, due to radically increased capex.Here Evans shows capex as a proportion of sales, showing a relentless rise starting in late 2009.
That is, if Amazon was spending the same on capex per dollar of revenue as it was in 2009, it would have kept $3bn more in cash in the last 12 months.What we're interested in here is the AWS business, which is most of the category Amazon calls "Other". Here is the growth of "Other" revenue. This is a market that Amazon is absolutely dominating. Its cash flow is doing two things, paying for the computing infrastructure Amazon needs to runs its other, much larger, established businesses, and paying for the startup costs of new businesses.

As far as I can see, in the "cloud" business only Google has the same synergy between an established business, and a cloud business. Other competitors don't need the cloud scale of investment to support another, much larger existing business. They have to treat their cloud investments as a stand-alone business, which is much less efficient. And they are much smaller than AWS. So they aren't going to survive. IBM and Microsoft, I'm looking at you.

Second, Chris Mellor at The Register looks at the hype surrounding the "all-flash data center" and makes the point that Dave Anderson of Seagate has been making for years.
That leaves us with the view that all-flash data centres are not feasible at present. They may become feasible if the cost of flash falls to near-parity with nearline and bulk storage disk but there is another problem: the flash foundry capacity to build the stuff just doesn't exist.

In terms of exabytes of capacity, worldwide disk production is vastly higher than that of flash, and with flash fabs costing $7bn to $9bn apiece it is likely to remain so.

This is no small matter. An all-flash data centre would need approximately the same number of TB of storage as current all-disk or hybrid flash/disk data centres.
The flash foundry operators are paranoid about avoiding loss-making gluts of product, having seen the dire effects of that in the memory industry, with its persistent huge losses and dramatic supplier consolidation. They will be slow to bring new flash fab capacity online.

They are working towards increasing flash capacity by increasing wafer density through cell geometry shrinks, and also through building flash chips with stacked layers of cells, so-called 3D NAND.

These in themselves won't allow the flash industry to take on any substantial portion of worldwide disk capacity in the next few years. That requires many new fabs and there is no sign of that happening.Not to mention that generating a return on a $7-9B investment requires that the product it builds be in the market for many years. Flash technology is approaching its limits, so the time during which flash will dominate the solid-state storage market with its premium pricing is short, too short to generate the necessary return.

HangingTogether: Innovative solutions for dealing with born-digital content in obsolete forms – Part 2

Tue, 2014-09-09 17:00

This is the second of three posts about a lightning talk session at SAA. Part 1 began with descriptions of the array of media an archives might confront and an effort to test how much can be done in house.

Part 2 picks up with four archivists talking about solutions to particularly challenging formats.

Abby Adams is the Digital Archivist at Hagley Museum & Library, an independent research library in Wilmington, Delaware, documenting American enterprise from its inception to present day with a focus on the intersection of industry, technology, and society. In 2012, Hagley received a large hybrid collection, consisting primarily of textual analog materials, in addition to a number of born-digital records. The records were created by various tech corporations during the normal course of business in the late 1990s and early 2000s and document aspects of the dot-com boom and bust, an area of research where primary sources are sorely lacking. Given the potentially high research value of the collection, Adams gave the preservation of the born-digital content high priority and culled hundreds of records cartons to discover the following obsolete media formats: 349 compact discs; 134 3.5” floppy disks; 113 digital linear tapes (DLT); 49 digital data storage tapes (DDS); 19 quarter-inch mini cartridges; 15 Travan cartridges; and 8 zip disks.

Although the CDs and floppy disks presented few problems, the remaining obsolete formats offered a lesson in how complex data recovery can be. Adams’ attempts to use “freecycled” drives and jerry-rig old PCs were just not working. Even if she could connect a computer to the exact generation DLT or DDS drive to read the tapes, she would also need to know the software program used to create the backup, which could vary widely depending on the date of creation, then successfully install it, and cross her fingers the media isn’t encrypted or corrupt. Since Hagley is a small shop with limited in-house resources, it was clear outsourcing the data extraction was the best course of action. After consulting several vendors, Adams and her coworker Kevin Martin found a company that specializes in data extraction and indexing of backup tapes. After establishing a budget for the first phase of the project, Adams and Martin sent the vendor a sample consisting of five DLT and three DDS tapes. Less than a week later, the vendor provided them access to the indexed data from seven out of eight tapes. Due to the size of the collection and Hagley’s limited in-house resources, Adams was strict with appraisal, retaining only about ten percent of the data. The original media was returned to Hagley a few weeks later. Having successfully completed the first phase of the project, Hagley will continue to use the same company for the remaining backup tapes.

Elise Warshavsky, is the Digital Archivist at the Presbyterian Historical Society, which serves as the national archives of the Presbyterian Church, documenting the political and social history of the church. The archives acquired the laptop of Clifton Kirkpatrick former Stated Clerk, the highest elected official within the church. The laptop contained files he had worked on as well as his email. Five years later Elise was hired and was asked to archive the Stated Clerk’s laptop. This was the nature of the “detailed instructions” she received regarding passwords, the types of files, and that there were 28,000 emails in the Novell GroupWise account:

The records manager who had originally received the laptop had converted the account to a Remote account enabling the email to live solely on the laptop. The records archivist had also reorganized the inbox and appraised each individual email, resulting in lost folder structure and possibly other lost metadata. The emails were readable, but because of a 50-year embargo on access to them, the goal was to ensure that these files would be readable in 50 years. After not being able to find a way to convert the GroupWise Remote email to another format, she finally contacted a company that makes a commercial grade email converter called Transend. They agreed to resurrect the Remote account on their GroupWise servers and then convert it to .pst, Microsoft’s open proprietary file format. Then she was able to move forward with her migration plan: convert to a more archival email format, .MBOX, as well as run a tool to batch export PDFs from each individual email and convert them to PDF/As – a format researchers would be able to search and access in 50 years.

Elise’s advice: If you get frustrated about not having the tools or skills necessary to complete a project, reach out to find help. There’s no need to develop resources in house when dealing with a unique, most likely not repeatable incident. Get help, and move on to doing what you do know how to do – accession, appraise, and preserve.

Ted Hull, Director of the Electronic Records Division at the National Archives at College Park, told of a project to recover content from 7-track tapes.

The Electronic Records Division accessions, processes, arranges for preservation, describes, and provides access to the born-digital federal records scheduled for permanent retention in the National Archives. They hold 932 series from over 100 federal agencies; consisting of over 750 million unique files and over 320 terabytes of data. 7-track magnetic tape was an industry standard from the 1950s -1970s, when it was generally replaced with 9-track magnetic tape. While most of the Archives’ content had already been transferred off of 7-track tape, in 2013, staff identified 13 remaining tapes containing records from the Federal Home Loan Bank Board, the Bureau of Indian Affairs, and the U.S. Joint Chiefs of Staff. The Archives reached out and found that the National Center for Atmospheric Research (NCAR) in Boulder, CO still had the capability to read 7-track tapes and were able to recover data from 9 tapes; the other 4 were blank. NCAR converted the binary-coded decimal encoding to ASCII and made the files available to NARA for direct download from their FTP site; NARA processed and accessioned the records and the original tapes were returned to NARA for disposal.

Ben Goldman, the Digital Records Archivist for Pennsylvania State University Libraries, discovered 27 3-inch disks in a modern literary manuscript collection. They didn’t have the equipment needed to read the disks, and we weren’t even sure if the disks were readable or even contained data worth recovering.
Amstrad disk from the Fiona Pitt-Kethley papers, Penn State University Special Collections Library

The author confirmed that she did own an Amstrad computer (a somewhat popular computer in the UK for a brief period in the 1980s), but because Ben didn’t know exactly what hardware or software was needed to read the disks, he decided to outsource recovery of the disks. He wanted to use the opportunity to come up with a model vendor agreement and to make the project an extension of their internal born-digital workflow. To that end, he created a media inventory spreadsheet to be used to identify the disks, their labels, their contents, the images derived from them, and to accommodate checksums after their eventual transfer. Mostly, however, he wanted to see if outsourcing was a viable option for archivists confronting elusive computer media formats and to see if core archival requirements could be met by outsourcing, whether service providers could adhere to emerging best practices, and to see if the costs were viable for archives. PSU provided funding for a project at $40 per disk.

[Tweet] Jason P. Evans Groth: $40/disk is same as person making $40k spending two hours to image obsolete disks, so maybe it is the right deal? #s601 #saa14

Soon Ben had a signed vendor agreement with the Museum of Computer Culture to provide disk images that could be processed using forensic tools. They were to work from the inventory and follow naming conventions and provide checksums to ensure accurate transfer.

Many months later, however, Ben was working with two other vendors – without a signed agreement. They found that disk images that were native to the Amstrad operating system couldn’t be migrated to modern formats or processed using common forensic tools. Instead, Ben received three versions of every file in three different formats, each with its own brand of lossy-ness and, in the end, there was no adherence to naming conventions and no checksums. Despite not really meeting his expectations, Ben doesn’t think of the project as a failure. “Fugitive media is defiant,” he warns. Communication is key and the vendor agreement should establish communication requirements. Beyond that, Ben is not sure this cost model will be sustainable. Instead, he suggests that archivists need to develop in-network options. There are technologies, resources, and talented people working on these issues. It would be nice to see some better community strategies for tackling the issues and supporting each other.

Next up: Part 3 will continue with three speakers representing the service provider point of view.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (32)

DPLA: September 10/15, 2014: Board and Board Finance Committee Open Calls

Tue, 2014-09-09 14:00

The DPLA Board and its Finance Committee will each hold an open conference call in September 2014.  Both of these calls are open to the public.

Board Finance Committee Open Call
September 10, 2014 at 1:00 PM EDT

? View Agenda and Dial-in


  • Overview of recent grant awards
  • Open comments and suggestions from the committee
  • Comments and suggestions from the public


Via the web:

Via telephone
United States: +1 (805) 309-0012
Access Code: 312-488-189
Audio PIN: Shown after joining the meeting
Meeting ID: 312-488-189


Board of Directors Open Call
September 15, 2014 at 3:00 PM EDT

? View Agenda and Dial-in



Public Session

  • Proposal to amend DPLA By­laws to allow for increased number of Directors (Call to vote)
  • Overview of draft DPLA Strategic Plan
  • Update from Executive Director
  • Questions/comments from the public

Executive Session

  • Review of DPLA Handbook
  • Conflict of Interest Certification
  • Review of draft DPLA Strategic Plan
  • Funding and financial update


All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

LITA: Introducing the New LITA Blog Writers

Tue, 2014-09-09 13:00

You’ll still be able to find LITA announcements and events posted on the blog, but now there will also be original content by LITA members representing a variety of perspectives, from library students to  public, academic, and special librarians.

The LITA blog also welcomes guest posts. To submit an idea for consideration, please email me at briannahmarshall(at)gmail(dot)com with a bio, brief summary of your post topic, and link to a writing sample if possible.

Without further ado, here are the writers whose posts you’ll be reading in the coming months.

Bryan J. Brown

Bryan received his BS in English and Philosophy from the University of Southern Indiana, and is a recent graduate from Indiana University’s Department of Information and Library Science where he focused on digital libraries and metadata. After graduation, Bryan transplanted to Tallahassee, FL to be a developer at Florida State University Libraries’ Technology and Digital Scholarship Department. His professional interests include Open Source software in libraries and archives, digital preservation and the semantic web. For more information, check out

Lindsay Cronk

Lindsay – librarian, blogger, and adventurer – graduated with her MLIS from Valdosta State is 2012 and has been advocating and serving libraries through her work at LYRASIS ever since. Her interests include open source development models, tools for library marketing and outreach, student research behavior, and later career David Bowie. You can catch her online at her blog or tweeting @linds_bot.

Brittney Farley

Brittney is in her final year as an MSLIS student at the Florida State University’s iSchool. Her specializations include information management/technology and human-computer interaction. She received her BA in History from the University of Florida. She is currently a library assistant in the City of Boca Raton Public Library’s Instructional Services department. Brittney blends her background, as help desk assistant and researcher, to better serve patrons of varying technical understanding.

Lauren Hays

Lauren is the instructional and research librarian at MidAmerica Nazarene University in Olathe, KS. Along with her master’s in library science, she recently completed her second master’s degree in educational technology and also received a graduate certificate in online teaching and learning.  Her professional interests include information literacy, adult learners, online learning, technology, connected learning, and the scholarship of teaching and learning.   In her spare time, she can be found drinking coffee, reading, or planning her next trip.  Follow her on Twitter @Lib_Lauren.

John Klima

John is the Assistant Director of the Waukesha Public Library where one of his many hats is maintaining, upgrading, and innovating technology within the library. Klima wrote a number of articles on steampunk for Library Journal. In his spare time, he is the editor of The Bulletin, the professional publication of the Science Fiction and Fantasy Writers of America. From 2001 to 2013 he edited the Hugo-Award winning magazine Electric Velocipede. Klima has also edited several anthologies including Logorrhea: Good Words Make Good Stories, and Happily Ever After. He co-edited the anthology Glitter & Mayhem with Lynne M. Thomas and Michael Damian Thomas.

Brianna Marshall

Brianna is Digital Curation Coordinator at the University of Wisconsin-Madison, where she manages the institutional repository and develops campuswide services for research data management and curation. She received her Master of Information Science and Master of Library Science from Indiana University’s School of Informatics and Computing in May 2014. From 2012-2014 she was a writer and managing editor for the library student-run blog Hack Library School. Now she is excited to be the new LITA blog editor. She tweets on occasion at @notsosternlib and keeps a blog, too.

Leanne Mobley

Leanne recently earned her MLS from Indiana University and currently works as the Digital Literacy Librarian for the Martin County Library System. Her background is in media production and she is passionate about using technology to bring ideas to life. She is an ardent library lover and still carries her very first library card in her wallet. Find her on Twitter @hey_library.

Leanne Olson

Leanne is a Metadata Management Librarian at Western University in London, Ontario, Canada.  Her main library-related areas of interest include metadata and cataloguing, digital libraries, authority control, teaching, and library history.  She’s also a playwright and lover of the outdoors.  Much of her blogging will be done from her backyard, possibly under five feet of snow.

Michael Rodriguez

Michael is the newly minted eLearning Librarian at Hodges University in southwest Florida, with the faculty rank of Assistant Professor. He graduated in August 2014 with his MLIS from Florida State University and has a background in history and public librarianship. Michael is also a technologist, interested in software customization, distance education, and free web tools and apps. When not doing cool stuff at work, he kayaks among the many mangrove islands off the Florida coast. He tweets @topshelver and blogs at Shelver’s Cove.

Erik Sandall

Erik is Electronic Services Librarian and Webmaster at Mechanics’ Institute in San Francisco, Calif. His professional interests are in integrated library systems, content management systems, online databases, ebooks, and web design and development. When he’s not working on these things, Erik is probably playing soccer or practicing how to open a wine bottle without breaking the cork.

Leo Stezano

Leo is a Project Manager at the Avery Architecture and Fine Arts Library at Columbia University; this is his first library job since receiving his MLIS from Syracuse University in 2011. Previously he spent many years in the private sector, working in Project and Product Management and Business Analysis for a variety of companies. His professional interests include digital librarianship, process optimization, and innovative technical project philosophies. He also enjoys playing soccer and raising two toddlers. You can follow Leo at and on Twitter at @LeoStezano.

Grace Thomas

Grace is a first year grad student working toward a dual-degree MLIS at Indiana University. With a background in English, Computer Science, and Digital Humanities from the University of Nebraska-Lincoln, she is especially interested in digital libraries and archives, and digital preservation. Currently, she works as a Graduate Assistant with associate professors John Walsh and Noriko Hara in the IU School of Informatics and Computing, and on the Petrarchive Digital Archive Project. Grace spends the rest of her time in swimming pools, watching any and all dance performances, and exploring Bloomington by bicycle, occasionally tweeting about all of the above at @gracehthom.

John Miedema: Slow reading six years later. Digital technology has evolved, and so have I. There is a trade-off.

Tue, 2014-09-09 12:59

I was recently interviewed by The Wall Street Journal about slow reading. It has been a few years since I did one of these interviews. I wrote Slow Reading in 2008, six years ago. At the time, the Kindle had just been released and there was a surge of discussion about reading practices, to which I attribute the interest in my little book of research. The request for an interview suggests an ongoing interest in slow reading. So what do I have to say about the subject now?

I used to slow-read often. I would write books reviews, thinking myself progressive in a digital sense for blogging reviews in just four paragraphs. A shift began. My ongoing use of digital technology to read, write and think forced that shift along. I tried to write about that shift in a new online book project — I, Reader — but I failed. The shift was still in progress. I hit a wall at one point. I thought for a time I had reached the end of reading. In 2013, I stopped reading and writing. A year later I started again. I have a good perspective on the shift, but I have no immediate plans to resume writing about it.

So what did I tell the interviewer about slow reading? I confessed that I slow-read print books less often. I re-asserted that “Slow reading is a form of resistance,  challenging a hectic culture that requires speed reading of volumes of information fragments.” I admitted that my resistance is waning. Digital technology has evolved to allow for reading, not just for scanning of information fragments, but also for comprehension of complex and rich material. I was surprised and pleased to discover how digital technology has re-programmed my reading and writing skills to process information more quickly and deeply. I am smarter than I used to be.

I have resumed my writing of book reviews. I restored a selection of book reviews from the past, ones relevant to my current blogging purposes. I will be writing new reviews, probably less often. I will be writing them differently. Currently I am reading Book Was There: Reading in Electronic Times by Andrew PiperI no longer take notes on paper as I read. I have been tweeting notes. I like the way it is evolving. I use a hashtag for the title and author, and sometimes a reader joins in. When I am done, I will write a very short review, two paragraphs tops, and post it here.

That’s not all I said to the interviewer. I said there has been a trade-off because of digital technology. There is always a trade-off. We just have to decide whether whether the gains are more than the losses. What have we lost? I lingered on this question because the loss is less than I anticipated. We still read. We still read rich and complex material. Students still prefer print books for serious reading but I expect they are going through the same transition as I did. What is lost, I assert, is long-form writing. Books born print can be scanned and put online, but books born digital are getting shorter all the time. It is no coincidence that my book, Slow Reading, was short. I was already a reader in transition. Digital technology prefers shortness. It is one reason that many kinds of poetry will survive and thrive on the web. Things should be short and simple as possible (but not simpler, per the quote attributed to Einstein). Long-form novels and textbooks will be lost in time. It is a loss. Is it worth it?

Jakob Voss: Abbreviated URIs with rdfns

Tue, 2014-09-09 09:26

Working with RDF and URIs can be annoying because URIs such as “” are long and difficult to remember and type. Most RDF serializations make use of namespace prefixes to abbreviate URIs, for instance “dc” is frequently used to abbreviate “” so “” can be written as qualified name “dc:title“. This simplifies working with URIs, but someone still has to remember mappings between prefixes and namespaces. Luckily there is a registry of common mappings at

A few years ago I created the simple command line tool rdfns and a Perl library to look up URI namespace/prefix mappings. Meanwhile the program is also available as Debian and Ubuntu package librdf-ns-perl. The newest version (not included in Debian yet) also supports reverse lookup to abbreviate an URI to a qualified name. Features of rdfns include:

look up namespaces (as RDF/Turtle, RDF/XML, SPARQL…)

$ rdfns foaf.ttl foaf.xmlns dbpedia.sparql foaf.json @prefix foaf: . xmlns:foaf="" PREFIX dbpedia: "foaf": ""

expand a qualified name

$ rdfns dc:title

lookup a preferred prefix

$ rdfns geo

create a short qualified name of an URL

$ rdfns dc:title

I use RDF-NS for all RDF processing to improve readability and to avoid typing long URIs. For instance Catmandu::RDF can be used to parse RDF into a very concise data structure:

$ catmandu convert RDF --file rdfdata.ttl to YAML

Jonathan Rochkind: Cardo is a really nice free webfont

Tue, 2014-09-09 04:39

Some of the fonts on google web fonts aren’t that great. And I’m not that good at picking the good ones from the not-so-good ones on first glance either.

Cardo is a really nice old-style serif font that I originally found recommended on some list of “the best of google fonts”.

It’s got a pretty good character repertoire for latin text (and I think Greek). The Google Fonts version doesn’t seem to include Hebrew, even though some other versions might?  For library applications, the more characters the better, and it should have enough to deal stylishly with whatever letters and diacritics you throw at it in latin/germanic languages, and all the usual symbols (currency, punctuation; etc).

I’ve used it in a project that my eyeballs have spent a lot of time looking at (not quite done yet), and been increasingly pleased by it, it’s nice to look at and to read, especially on a ‘retina’ display. (I wouldn’t use it for headlines though)

Filed under: Uncategorized

DPLA: DPLA &amp; Imgur’s Summer of Archives Comes to a Close

Tue, 2014-09-09 00:58

Back in June, we announced our collaboration with the Digital Public Library of America (DPLA) for the Summer of Archives–an experimental gallery endeavor that brought tons of historical OC gems to User Submitted. From perfectly looping space GIFs, to famous cats of history, to beautiful book covers, to celestial maps, we’re happy to call this experiment a huge and awesome success.

The very last Summer of Archives post is live in User Submitted right now. We’re going out the same way we came in–with historical GIFs!

Huge thanks to the DPLA for sharing this special content with Imgur all summer long. Be sure to check the DPLA Imgur account to revisit all of the submissions. If your thirst for history cannot be quenched, head over to the DPLA website for a vast array of great content.

This blogpost was originally published on the blog (view on

William Denton: Augustus

Tue, 2014-09-09 00:32

A few people recommended Stoner by John Williams to me, and they were right. It’s a gem.

I was in Book City tonight and the clerk was selling a customer on Stoner for a book club. Browsing the new release tables with Williams on my mind I saw a similar new edition from New York Review Books of Augustus, which is about that Augustus.

The first line is a doozy:

… I was with him at Actium, when the sword struck fire from metal, and the blood of soldiers was awash on deck and stained the blue Ionian Sea, and the javelin whistled in the air, and the burning hulls hissed upon the water, and the day was loud with the screams of men whose flesh roasted in the armor they could not fling off; and earlier I was with him at Mutina, where that same Marcus Antonius overran our camp and the sword was thrust into the empty bed where Caesar Augustus had lain, and where we persevered and earned the first power that was to give us the world; and at Philippi, where he traveled so ill he could not stand and yet made himself to be carried among his troops in a litter, and came near death again by the murderer of his father, and where he fought until the murderers of the mortal Julius, who became a god, were destroyed by their own hands.

DuraSpace News: Update 5: Beta Pilot Projects Set to Kick-Off

Tue, 2014-09-09 00:00
From David Wilcox, Fedora Product Manager   Winchester, MA This is the fifth in a series of updates on the status of Fedora 4.0 as we move from the Beta [1] to the Production Release. The updates are structured around the goals and activities outlined in the July-December 2014 Planning document [2], and will serve to both demonstrate progress and call for action as needed. New information since the last status update is highlighted in bold text.  

Library of Congress: The Signal: Hybrid Born-Digital and Analog Special Collecting: Megan Halsband on the SPX Comics Collection

Mon, 2014-09-08 17:29

Megan Halsband, Reference Librarian with the Library of Congress Serial and Government Publications Division.

Every year, The Small Press Expo in Bethesda, Md brings together a community of alternative comic creators and independent publishers. With a significant history of collecting comics, it made sense for the Library of Congress’ Serial and Government Publications Division and the Prints & Photographs Division to partner with SPX to build a collection documenting alternative comics and comics culture. In the last three years, this collection has been developing and growing.

While the collection itself is quite fun (what’s not to like about comics), it is also a compelling example of the way that web archiving can complement and fit into work developing a special collection. To that end, I am excited to talk with Megan Halsband, Reference Librarian with the Library of Congress Serial and Government Publications Division and one of the key staff working on this collection as part of our Content Matters interview series.

Trevor: First off, when people think Library of Congress I doubt “comics” is one of the first things that comes to mind. Could you tell us a bit about the history of the Library’s comics collection, the extent of the collections and what parts of the Library of Congress are involved in working with comics?

Megan: I think you’re right – the comics collection is not necessarily one of the things that people associate with the Library of Congress – but hopefully we’re working on changing that! The Library’s primary comics collections are two-fold – first there are the published comics held by the Serial & Government Publications Division, which appeared in newspapers/periodicals and later in comic books, as well as the original art, which is held by the Prints & Photographs Division.

Example of one of the many comics available through The Library of Congress National Digital Newspaper Program. The End of a Perfect Day. Mohave County miner and our mineral wealth (Kingman, Ariz.) October 14, 1921, p.2.

The Comic Book Collection here in Serials is probably the largest publicly available collection in the country, with over 7,000 titles and more than 125,000 issues. People wonder why our section at the Library is responsible for the comic books – and it’s because most comic books are  published serially.  Housing the comic collection in Serials also makes sense, as we are also responsible for the newspaper collections (which include comics). The majority of our comic books come through the US Copyright Office via copyright deposit, and we’ve been receiving comic books this way since the 1930?s/1940?s.

The Library tries to have complete sets of all the issues of major comic titles but we don’t necessarily have every issue of every comic ever published (I know what you’re thinking and no, we don’t have an original Action Comics No. 1 – maybe someday someone will donate it to us!). The other main section of the Library that works with comic materials is Prints & Photographs – though Rare Book & Special Collections and the area studies reading rooms probably also have materials that would be considered ‘comics.’

Trevor: How did the idea for the SPX collection come about? What was important about going out to this event as a place to build out part of the collection? Further, in scoping the project, what about it suggested that it would also be useful/necessary to use web archiving to complement the collection?

Megan: The executive director of SPX, Warren Bernard, has been working in the Prints & Photographs Division as a volunteer for a long time, and the collection was established in 2011 after an Memorandum of Understanding was signed between the Library and SPX. I think Warren really was a major driving force behind this agreement, but the curators in both Serials and Prints & Photographs realized that our collections didn’t include materials from this particular community of creators and publishers in the way that it should.

Small Press Expo floor in 2013

Given that SPX is a local event with an international reputation and awards program (SPX awards the Ignatz) and the fact that we know staff at SPX, I think it made sense for the Library to have an ‘official’ agreement that serves as an acquisition tool for material that we wouldn’t probably otherwise obtain. Actually going to SPX every year gives us the opportunity to meet with the artists, see what they’re working on and pick up material that is often only available at the show – in particular mini-comics or other free things.

Something important to note is that the SPX Collection – the published works, the original art, everything – is all donated to the Library. This is huge for us – we wouldn’t be able to collect the depth and breadth of material (or possibly any material at all) from SPX otherwise.  As far as including online content for the collection, the Library’s Comics and Cartoons Collection Policy Statement (PDF) specifically states that the Library will collect online/webcomics, as well as award-winning comics. The SPX Collection, with its web archiving component,  specifically supports both of these goals.

Trevor:  What kinds of sites were selected for the web archive portion of the collection? In this case, I would be interested in hearing a bit about the criteria in general and also about some specific examples. What is it about these sites that is significant? What kinds of documentation might we lose if we didn’t have these materials in the collection?

Archived web page from the American Elf web comic.

Megan: Initially the SPX webarchive (as I refer to it – though its official name is Small Press Expo and Comic Art Collection) was extremely  selective – only the SPX website itself and the annual winner of the Ignatz Award for Outstanding Online Comic were captured.  The staff wanted to see how hard it would be to capture websites with lots of image files (of various types). Turns out it works just fine (if there’s not paywall/subscriber login credentials required) – so we expanded the collection to include all the Ignatz nominees in the Outstanding Online Comic category as well.

Some of these sites, such as Perry Bible Fellowship and American Elf, are long-running online comics who’s creators have been awarded Eisner, Harvey and Ignatz awards. There’s a great deal of content on these websites that isn’t published or available elsewhere – and I think that this is one of the major reasons for collecting this type of material. Sometimes the website might have initial drafts or ideas that later are published, sometimes the online content is not directly related to published materials, but for in-depth research on an artist or publication, often this type of related content is extremely useful.

Trevor: You have been working with SPX to build this collection for a few years now. Could you give us an overview of what the collection consists of at this point? Further, I would be curious to know a bit about how the idea of the collection is playing out in practice. Are you getting the kinds of materials you expected? Are there any valuable lessons learned along the way that you could share? If anyone wants access to the collection how would they go about that?

Megan: At this moment in time, the SPX Collection materials that are here in Serials include acquisitions from 2011-2013, plus two special collections that were donated to us, the Dean Haspiel Mini-Comics Collection and the Heidi MacDonald Mini-Comics Collection.  I would say that the collection has close to 2,000 items (we don’t have an exact count since we’re still cataloging everything) as well as twelve websites in the web archive. We have a wonderful volunteer who has been working on cataloging items from the collection, and so far there are over 550 records available in the Library’s online catalog.

Mini comics from the SPX collection

Personally, I didn’t have any real expectations of what kinds of materials we would be getting – I think that definitely we are getting a good selection of mini-comics, but it seems like there are more graphic novels that I anticipated. One of the fun things about this collection are the new and exciting things that you end up finding at the show – like an unexpected tiny comic that comes with its own magnifying glass or an oversize newsprint series.

The process of collecting has definitely gotten easier over the years. For example, the Head of the Newspaper Section, Georgia Higley, and I just received the items that were submitted in consideration for the 2014 Ignatz Awards. We’ll be able to prep permission forms/paperwork in advance of the show for the materials we’re keeping from this material, and it will help us cut down on potential duplication. This is definitely a valuable lesson learned! We’ve also come up with a strategy for visiting the tables at the show – there are 287 tables this year – so we divide up the ballroom between four of us (Georgia and I, as well as two curators from Prints & Photographs – Sara Duke and Martha Kennedy) to make it manageable.

We also try to identify items that we know we want to ask for in advance of the show – such as ongoing serial titles or debut items listed on the SPX website – to maximize our time when we’re actually there. Someone wanting to access the collection would come to the Newspaper & Current Periodical Reading Room to request the comic books and mini-comics. Any original art or posters from the show would be served in the Prints & Photographs Reading Room. As I mentioned – there is still a portion of this collection that is unprocessed – and may not be immediately accessible.

Trevor: Stepping back from the specifics of the collection, what about this do you think stands for a general example of how web archiving can complement the development of special collections?

Megan: One of the true strengths of the Library of Congress is that our collections often include not only the published version, but also the ephemeral material related to the published item/creator, all in one place. From my point of view, collecting webcomics gives the Library the opportunity to collect some of this ‘ephemera’ related to comics collections and only serves to enhance what we are preserving for future research. And as I mentioned earlier, some of the content on the websites provides context, as well as material for comparison, to the physical collection materials that we have, which is ideal from a research perspective.

Trevor:  Is there anything else with web archiving and comics on the horizon for your team? Given that web comics are such significant part of digital culture I’m curious to know if this is something you are exploring. If so, is there anything you can tell us about that?

We recently began another web archive collection to collect additional webcomics beyond those nominated for Ignatz Awards – think Dinosaur Comics and XKCD. It’s very new (and obviously not available for research use yet) – but I am really excited about adding materials to this collection. There are a lot of webcomics out there – and I’m glad that the Library will now be able to say we have a selection of this type of content in our collection! I’m also thinking about proposing another archive to capture comics literature and criticism on the web – stay tuned!

HangingTogether: Innovative solutions for dealing with born-digital content in obsolete forms – Part 1

Mon, 2014-09-08 17:00

[Tweet] AB Schmuland: Obsolete media brings them in at 8 am EDT on a Saturday! #saa14 #s601

I chaired a lightning talk session at SAA 2014 in Washington DC on August 16. The premise was that many archives have received materials in forms that they cannot even read. Archives are acquiring born-digital content at increasing rates and it’s hard enough to keep up with current formats. It makes sense to reach out to the community for help with more obscure media. I found ten speakers who had confronted this problem and figured out innovative solutions to getting material into a form that could be more easily managed.

[Tweet] Jennifer Schaffner: “my name is ___ and I have born-digital on crazy old media that I can barely identify that I have no idea what to do with” #saa14 #s601

The speakers’ stories were so encouraging to others in similar situations that I wanted to share them further.

This is the first of three posts. We start with a talk about the array of media an archives might confront, followed by a talk about an effort to test how much can be done in house.

Lynda Schmitz Fuhrig, the Electronic Records Archivist at the Smithsonian Institution Archives urged archivists to ingest materials off removable media as soon as possible — if possible. She itemized some of more typical physical media the SI Archives has and the workstations they maintain to access them. Then she told of some successes they’d had getting content off less typical forms, like Digital Audio Tapes, data tapes, interactive compact discs, and digital videocassettes.

[Tweet] Kevin Schlottmann: National air and space website from 1994 recovered from tape in 2012 #s601 #saa14

Finally she cautioned about some of the media we may be overly confident about: CDs and DVDs – not just that drives to read them are no longer standard issue, but that their life spans can vary dramatically.

She suggested looking to schools, eBay, craigslist, and listservs to obtain out of date equipment and considering whether another archives could help with your format. For formats that simply cannot be read, she raised the possibility of waiting until a researcher wants it and seeing if the researcher is willing to pay to have a vendor transfer the data.

Moryma Aydelott, Special Assistant to the Director of Preservation at the Library of Congress, described developing cross-division in-house workflows for processing 3 ½” and 5 ¼” floppy disks.

The goal was to get a backup copy of the items stored on long term storage, while encouraging standard practices and increasing staff digital competencies. She described the software used (xcopy and FTK Imager) to get complete and unchanged copies of the content. Tabs that make the floppies read-only were used to prevent disks being accidentally overwritten during copying. After reading data off the disks, the workflow included steps to create checksums and other files using the BagIt specification, and for items to be inventoried as they’re saved to tape-based long term storage. The workflows were documented, staff was trained, and processes were customized to particular situations.

[Tweet] Sasha Griffin: Balance outsourcing with developing staff competences in-house #s601 #saa14

Curatorial divisions had been contemplating transferring data off of these media but were unsure how to start, and this project gave them some help and confidence to get going. Now the Preservation Reformatting Division is assembling a lab with scanners, portable drives, and a FRED machine. It will be available to staff in all LC curatorial divisions and those staff are helping to determine other hardware and software the lab should include. A committee has formed to develop scalable ways of processing materials that can’t be processed in house.

Next up: Part 2 will continue with four speakers talking about solutions to particularly challenging formats.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (32)

OCLC Dev Network: Enhancements Planned for September 14

Mon, 2014-09-08 14:00

In addition to the upcoming VIAF changes we shared last week (currently planned for September 16), a separate  release on September 14 will bring enhancements to a couple of our WorldShare Web services.

Hydra Project: Hydra Connect #2 is a sell-out!

Mon, 2014-09-08 09:39

We’re more than pleased to tell you that Hydra’s second Connect meeting, to be held in Cleveland 30 September – 3rd October, is a sell-out!  Not only have we sold all the tickets, we have a waiting list of people hoping we might manage to find a little more space.  We’re looking forward to seeing 160 faces, friends old and new, at Case Western Reserve University in three weeks.

HangingTogether: Linked Data Survey results 6 – Advice from the implementers

Mon, 2014-09-08 08:00



OCLC Research conducted an international linked data survey for implementers between 7 July and 15 August 2014. This is the sixth--and last--post in the series reporting the results.  

An objective in conducting this survey was to learn from the experiences of those who had implemented or were implementing linked data projects/services.  We appreciate that so many gave advice. About a third of those who have implemented or are implementing a linked data project are planning to implement another within the next two years; another third are not sure.

Asked what they would differently if they were starting their project again, respondents answered with issues clustered around organizational support and staffing, vocabularies, and technology. One noted that legal issues seriously delayed the release of their linked data service and that legal aspects need to be addressed early.

Organizational support and staffing:

  • Have a clear mandate for the project. Our issues have stemmed from our organization, not the technology or concept.
  • It would have been useful to have a greater in-house technical input.
  • With hindsight we have more realistic expectations. if funding would allow I would hire a programmer to the project.
  • Attempt to garner wider organisational support and resources before embarking on what was in essence a very personal project.
  • We also would have preferred to have done this as an official project, with staff resources allocated, rather than as an ad-hoc, project that we’ve crammed into our already full schedules.
  • Have dedicated technical project manager – or at least a bigger chunk of time.
  • Have more time planned and allocated for both myself and team members.


  • Build an ontology and formal data model from the ground up.
  • Align concepts we are publishing with other authorities, most of which didn’t exist at the time.
  • Vocabulary selection, avoid some of the churn related to that process.
  • Make more accurate and detailed records so that it is easier for people using the data to clear up ambiguity of similar names.
  • I might seek a larger number of partners to contribute their controlled vocabularies or thesauri in advance.


  • We would immediately use Open Refine to extract and clean up data after the first export from Access
  • We would provide a SPARQL endpoint for the data if we had the opportunity.
  • We would give more thought to service resilience from the perspective of potential denial of service attacks.
  •  Well define the schema first before we generated the records. Use the schema to validate all of the records before we stored them in the system’s database.
  • It is still a pity that the Linked Data Pilot is not more integrated to the production system. It would have easier if the LOD principles would have been included in this production system from the beginning.
  • We might have done more to help our vendor understand the complexity of the LCNAF data service as well as the complexity of the MARC authority format.
  • Better user experience; we chose to focus on data mining vs data use.
  • Transforming the source data into semantic form, before attempting process (clustering, clean up, matching).
  • A stable infrastructure is vital for the scalability of the project.

General advice

Much of the advice for both those considering projects to consume linked data and those considering projects to publish linked data cluster around preparation and project management:

  • Ask what benefit doing linked data at all will really have.
  • There is more literature and online information relating to consuming linked data than there was when we started so our advice would be to read as widely as possible and consult with experts in the community.
  • Get a semantic web expert in the team
  • The same as any other project: have a detailed programme.
  • Have a focus. Do your research. Overestimate time spent.
  • Take a Linked Data class
  • Estimate the time required for the project and then double it.  The time to explain details of MARC, EAD, and local practices and standards to the vendor, to test functionality of the application, and to test navigational design elements of the application all require dedicated blocks of time.
  • Bone up on your tech skills.  It’s not magic; there is no wand you can wave.
  • Basic project management, basic data management, basic project planning are really important at the onset.
  • Having a detailed program before starting. Get institutional commitment.  Unless the plan is to do the smallest thing… the investment is great enough to warrant some kind of administrative blessing, at the minimum.
  • Take advantage of the many great (and free) resources for learning about RDF and linked data.
  • Start with a small project and then apply the knowledge gained and the tools built to larger scale projects.
  • Find people at other institutions who are doing linked data so you can bounce ideas off of each other.
  • Plan, plan, plan! Do research. Understand all that there is going on and what challenges you will have before you reach them.
  • Automate, automate, automate

Advice for those considering a project to consume linked data

  • Linking to external datasets is very important but also very difficult.
  • Find authorities for your specific domain from the outset, and if they don’t exist don’t be afraid to define and publish your own concepts.
  • Firm understanding of ontologies
  • Use CIDOC CRM / FRBRoo for cultural heritage sources. It will be far more costs effective and provide the highest quality of data that can be integrated preserving the variability and language of your data.
  • Pick a problem you can solve. Start with as core vocabulary. Lean toward JSON-LD instead of rdfxml. Like agile fail quick and often. Store the index in a triplestore.
  • Make a decision what kind of granularity of data you want to make available as linked data – no semantics for now. We cannot make our data to transform as linked data as one to one relationship – there should be a data that will not be available in linked data. If you want to make your data discoverable, then semantic will work the best.
  • Sometimes the data available just won’t work with your project. Keep in mind that something may look like a match at first but the devil is in the details. 

Advice for those considering a project to publish linked data

General advice: “Try to consume it first!”

Project management

  • It’s possible to participate in linked data projects even by producing data and leaving the work of linking to others.
  • Managing expectations of content creators is tough – people often have expectations of linked data that aren’t possible. The promise of being able to share and link things up can efface the work required to prepare materials for publication.
  • Always look at what others have done before you. Build a good relationship with the researcher with whom you are working; leverage the knowledge and experience of that person or persons. Carefully plan your project ahead of time, in particular the metadata.
  • Look at the larger surrounding issues.  It is not enough to just dump your data out there.  Be prepared to perform some sort of analytics to capture information as to uses of the data.  Also include a mechanism for feedback about the data and requested improvements/enhancements.  The social contract of linked data is just as important as the technical aspects of transforming and publishing the data.
  • Just do it, but consider if you’re just adding more bad data to the web — dumping a set of library records to RDF is pointless. Consider the value of publishing data. Reusing data is probably more interesting.
  • The assumption that the data needs to be there in order to be used is, I think, wrong. The usefulness of data is in its use; create a service one uses oneself and it is valuable and useful. Whether others actually use it is irrelevant.
  • Pay attention to reuse existing ontologies in order to improve interoperability and user comprehension of your published data. 

Technical advice

  • Publish the highest quality possible that will also achieve semantic and contextual harmionisation. You will end up doing it again otherwise and therefore it is far more cost effective and gets the best results.
  • Don’t use fixed field/ value data models. For cultural heritage data use CIDOC CRM / FRBRoo.
  • Offer a SPARQL endpoint to your data.
  • Use JSON-LD.
  • Museums need to take a good look at their data and make sure that they create granular data, i.e. each concept (actors, keywords, terms, objects, events, …) needs to have unique ids, which in turn will be referenced in URIs. Also publishing linked data means embracing a graph data structure, which is a total departure from traditional relational data structure: linked data forces you to make explicit what is only implicit in the database.  Modeling data for events is challenging but rewarding. Define what data entities your museum is responsible for… Being able to define URIs for entities means being able to give them unique identifiers and and there are many data issues that need to be taken care of within an institution.  Also, very important is that producing LOD requires the data manager to think differently about data, and not about information.  LOD requires that you make explicit knowledge that is only implicit in a traditional relational database.

 Recommended Resources

This is a compilation of resources–conferences, linked data projects, listservs, websites–respondents found particularly valuable in learning more about linked data.

Conferences valuable in learning more about linked data: American Medical Informatics Association meetings,  Computer Applications in Archaeology, Code4Lib conferences, Digital Library Federation’s forums, Dublin Core Metadata Initiative, European Library Automation Group, European Semantic Web Conferences, International Digital Curation Conference, International Semantic Web Conference, Library and Information Technology Association’s national forums, Metadata and Digital Object Roundtable (in association with the Society of American Archivists), Scholarly Publishing and Academic Resources Coalition conferences, Semantic Web in Libraries, Theory and Practice of Digital Libraries

Linked data projects implementers track:

  • 270a Linked Dataspaces
  • AMSL, an electronic management system based on linked data technologies
  • Library of Congress’ BIBFRAME (included in the survey responses)
  • Bibliothèque Nationale de France’s Linked Open Data project
  • Bibliothèque Nationale de France’s OpenCat: Interesting data model – lightweight FRBR model together with reuse of commonly used web ontologies (DC; FOAF, etc.); scalable open source platform (cubicweb). Opencat aims to demonstrate that data published on can be re-used by other libraries, in particular public libraries.
  • COMSODE (Components Supporting the Open Data Exploitation)
  • Deutsche National Bibliothek’s Linked Data Service
  • Yale Digital Collections Center’s Digitally Enabled Scholarship with Medieval Manuscripts, linked data-based.
  • ESTC (English Short-Title Catalogue): Moving to a linked data model; tracked because one of the aims is to build communities of interest among researchers.
  • Libhub: Of interest because it has the potential to assess the utility of BIBFRAME as a successor to MARC21.
  • LIBRIS, the Swedish National Bibliography
  • Linked Data 4 Libraries (LD4L): “The use cases they created are valuable for communicating the possible uses of linked data to those less familiar with linked data and it will be interesting to see the tools that are developed as a result of the projects.” (Included in the survey responses)
  • Linked Jazz: Reveals relationships of the jazz community, something similar to what a survey respondent wants to accomplish.
  • North Carolina State University’s Organization Name Linked Data: Of interest because it demonstrates concepts in practice (included in the survey responses).
  • Oslo Public Library’s Linked Data Cataloguing: “It is attempting to look at implementing linked data from the point of view of actual need… of a real library for implementation. Cataloguing and all aspects of the system will be designed around linked data.” (Included in the survey responses)
  • Pelagios: Uses linked data principles to increase the discoverability of ancient data through place associations and a major spur for a respondent’s project.
  • PeriodO:  A gazetteer of scholarly assertions about the spatial and temporal extents of historical and archaeological periods; addresses spatial temporal definitions.
  • Spanish Subject Headings for Public Libraries Published as Linked Data (Lista de Encabezamientos de Materia para las Bibliotecas Públicas en SKOS)
  • OCLC’s WorldCat Works (included in the survey responses)

Listservs: (Bibliographic Framework Transition Initiative Forum),, DCMI (Dublin Core Metadata Initiative) listservs,, (Digital Library Federation),, (linked data platform working group),


Analyze the responses yourself!

If you’d like to apply your own filters to the responses, or look at them more closely, the spreadsheet compiling all survey responses (minus the contact information which we promised we’d keep confidential) is available at:




About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (50)