You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 21 min 56 sec ago

HangingTogether: Innovative solutions for dealing with born-digital content in obsolete forms – Part 3

Wed, 2014-09-10 17:00

This is the third and final post about an SAA lightning talk session. Part 1 began with descriptions of the array of media an archives might confront and an effort to test how much can be done in house. In Part 2, we heard from archivists who had dealt with particularly challenging formats. Part 3 includes the service provider perspective, reaching out to the retrocomputing community, and concludes with some words about agreements between repositories and service providers.

Matthew McKinley, Digital Project Specialist at University of California, Irvine Libraries, gave an archives-to-archives service provider perspective. UCLA Special Collections has 15 5 ¼” floppy disks from the Southern California Women for Understanding collection. The collection inventory says the disks contain mailing lists and research data. He imagines the contents might be interesting research material, but he hasn’t been able to begin the project, because the legal departments and other administrators from both institutions had to sign off on the service agreement.

After finally getting the agreement through the UCI process, UCLA is on its third pass of the document.

Determining risks to obsolete media & cost of inaction, how do you factor in criticable & very slow-moving factors, like legal? #saa14 #s601

— Ricc Ferrante (@raferrante) August 16, 2014

His advice is to start with a vendor agreement and workflow for converting other media, such as audiovisual media. Learn from colleagues what roadblocks they ran into. Remember, this is an “innovative service” that administration may not understand or agree with, but you can use that to your advantage: make it known that data could be dying and time is of the essence.

The knowledge level of those involved affects the level of collaboration, oversight, and communication needed. In this situation, both project managers were familiar with archival terms and concepts. If your service provider has more technical experience, but no preservation or library & archive focus, you’ll need to make it very clear exactly what you need in both agreement & ongoing communication: metadata requirements, file format specifications, privacy and security concerns… Explain it and be sure they understand it. It could save you a lot of time and headaches later.

Matthew offered another insight that will make a transfer project go more smoothly: learn as much as you can about when and how the media was created. This is important for provenance metadata, but even more important for accessing the media and converting the content. What you get will depend on the creator’s knowledge or interest level — some may give you detailed version numbers and others may say “I used whatever came on my Mac in 1985.” But any information helps, whether about the donor’s hardware and software or about their computing behavior.

Margo Padilla, Strategic Programs Manager at the Metropolitan New York Library Council (METRO), talked about a born-digital migration service pilot project being coordinated by METRO and the Center for Jewish History, with support from the Delmas Foundation. She harkened back to the 2012 OCLC report, “Swatting the Long Tail of Digital Media: A Call for Collaboration,” which proposed a community-based approach for transferring content off of legacy media to more stable media.

The report suggested that a local institution become a hub, developing expertise and acquiring and maintaining necessary equipment to provide transfer services for institutions without the staff or resources to undertake this sort of work themselves. METRO is coordinating a working group to test this approach. The group is focused on the logistical issues and practical outcomes involved with this process, in order to test and refine real-world workflows, contractual agreements, and deliverables, and to document it for use by others. Partners in the working group include: The American Jewish Historical Society, The Guggenheim Museum Archives, The Leo Baeck Institute, The New-York Historical Society, and Queens Library.

Each organization will explore their own internal processes and requirements for inventorying legacy media, appraising and selecting them for transfer, and determining how they will be reintegrated into collections. So far, each participant has conducted a survey of their materials on digital storage media and a small sample of test materials (floppies, Zip disks, and optical media) have been delivered to METRO.

As a non-profit consortium currently providing a range of other digital services, METRO is well situated to offer digital migration services on a free or cost-recovery basis to local repositories and is focusing on scoping and developing their service model. They will track cost and labor estimates for developing a collaborative forensics, migration, and media archaeology lab and providing transfer-related services such as metadata, content analysis, normalization, and redundant storage. They will also identify and refine other potential requirements such as insurance coverage, secure storage areas, delivery methods, and protocols for dealing with confidential data.

The group has developed a draft service agreement, taking into consideration questions of potential deliverables, security, turnaround times, and other standard agreement language. METRO is in the process of building a dedicated workstation and is about to begin transferring content, initially providing disk images with file-system analytics and a metadata export. As part of an iterative process to develop a tiered service model, each organization will analyze this initial information and let METRO know what sort of additional forensic processing or analysis they would like to receive. Discussions about expectations and scope of work, contract agreements, workflows, and deliverables are ongoing.

Stephen Torrence, Vice President and Co-Founder of the Museum of Computer Culture, gave his perspective as a data migration service provider. MCC, part of a network of collectors, focuses on hardware restoration and stepped forward as a service provider to assist archives with obsolete media.

He did a successful proof-of-concept project with The Ohio State University Libraries, working with three low-risk diskettes. This helped both parties to work through the process and establish a reasonable base price per disk (for disk image, files, and metadata) and a fee for file format conversion.

He’s also learned from less successful projects. He tried to help out the Texas Department of State Health Services when they needed to recover a medical image from a magneto-optical disk with a proprietary operating system. After much time and many attempts, he was finally able to read the disk, but couldn’t decode the bitstream. From this he learned to establish a fallback base cost to recover some of his investment. When he worked with PSU on the AMSTRAD disks, he found that getting non-US hardware and software was cost-prohibitive. Both the service provider and the client need to acknowledge practical limitations.

Stephen then gave advice about “how to be an awesome client:” Know the origin environment before beginning – it will keep costs down and allow more accurate time estimates. Accept exceptions – be prepared to iterate based on feedback and redefine success, if necessary. Share the impact – involving your provider in the import of the context or content can give them a better sense of purpose.

"saving old born-dig finding aids that saves hours of time and labor of archivists" gives vintage collector "warm fuzzies" #saa14 #s601

— Jennifer Schaffner (@genschaffner) August 16, 2014

Then we heard from an archivist who reached out to another community.

Dorothy Waugh, Digital Archives Project Archivist at Emory University, talked about the problem of knowing where to turn for help in working with obsolete physical media.

Dorothy Waugh at MARBL sez Step 1: admit you have a problem (wrt obsolete media of course) #s601 #saa14

— Sasha Griffin (@griffingate) August 16, 2014

The needs of the Digital Archives unit were often outside the scope of Emory’s desktop and IT support and the complex technical language frequently found in online resources drove her to pursue options that offered the chance to ask questions in real time and get hands-on practice. She embarked on research into local retrocomputing groups and has since been participating in meetings of the Atlanta Historical Computing Society, or AHCS.

Dorothy discussed the benefits of the hands-on approach at these meetings, one immediate benefit being a broader understanding of the development of personal computers. AHCS is a great resource for locating hard-to-find hardware—she’s been lucky enough to receive a number of floppy disk and Zip disk drives from its members—and for troubleshooting difficulties with obsolete or specialized hardware or software: for instance, Dorothy has worked with members to solve some minor issues affecting Emory’s KryoFlux, a tool that aids in the imaging of aging floppy disks. The group also offered problem-solving assistance on accessing content on some proprietary floppy disks formatted for a mid-1980s VideoWriter word processor.

Throughout, Dorothy has taken full advantage of the opportunity to ask lots of questions, which have consistently been met with patience and enthusiasm. She stressed, though, some key differences between the retrocomputing community and the digital archives profession, not least the emphasis that archives place upon content as opposed to the retrocomputing community’s emphasis upon computers as cultural artifacts. As a result of these differences, Dorothy has learned the importance of being quite explicit about her objectives as a digital archivist, while remaining engaged with the retrocomputing community’s commitment to the preservation of authentic computing environments.

A remaining concern for Dorothy is how her participation with AHCS can evolve into more of a reciprocal partnership and she continues to seek ways by which she can achieve this. Her hope is that, in addition to serving as an opportunity for education and advocacy, these attempts will enable AHCS to identify additional opportunities for two-way collaboration between the two communities.

After the ten talks, I announced a new OCLC Research report, “Agreement Elements for Outsourcing Transfer of Born Digital Content,” by Ricky Erway, Ben Goldman, and Matthew McKinley. This document suggests elements that should be considered when constructing an agreement (or memorandum of understanding) for outsourcing the transfer of born-digital content from a physical medium while encouraging adherence to both archival principles and technical requirements. By being explicit about the expectations, processes, and deliverables we can hope to develop an effective community-based approach for transferring content off of legacy media and into a form that can be manage, preserved, and used by researchers.

Master lesson of #saa14: You're part of a community of expertise, think of what you can do for others and who you can lean on for help #601

— Helen Schubrt Fields (@magicallyreal) August 16, 2014

We hope these vignettes inspire others facing challenges with archival content in obscure formats to find innovative options for dealing with obsolete media, so you can get on with the really important work of processing, preserving, and providing access to those unique collections.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (33)

Andromeda Yelton: Why I support the Ada Initiative. (You, too?)

Wed, 2014-09-10 13:30

I need to talk to you about emotion. From a brilliant Code4Lib lightning talk by Mark Matienzo.

I got involved with codes of conduct by accident. People were saying on Twitter that ALA should have one, and I didn’t want it to be one of those good ideas full of tweety energy that go nowhere, and called their bluff and set up a google doc, and everyone else realized this made me a leader before I did.

One of the things that happens when people see you as a leader on this topic is you start to be a keeper of stories. People come to you, out of nowhere, and tell you about that time when someone was creepy. Or someone was drunk. Or someone didn’t keep their hands, or lips, to themselves. Or someone made them feel violated, blindsided, and maybe they smiled and kept the peace because that’s what women are taught to do, and maybe they didn’t understand until later what they should have said, how they should have said it. And maybe no one watching even understood how serious the problem was. Because we’re all taught, all of us, that women are not the protagonists of the story — we’re the love interest, the sassy sidekick, the prop there to raise someone else’s stakes. It’s easy not to see how hurt someone has been if you think the story is about someone else.

Stories change when you identify with a new protagonist. To me, a subtext of discussions about codes of conduct, and so many other issues that arise in social justice discourse communities, is this: we make new statements about who is the protagonist. Whose stakes matter. Whose perspective observers should take. With whom to empathize. We claim our own positions as protagonists in our stories, and demand that others do as well.

I find my work on the Ada Initiative advisory board compelling in large part because there are so many stories, so many emotions — people who have been treated in ways no one should be treated, people who have to waste energy on issues many of their colleagues don’t before they even get to the work they all share, people walking around with these raw and gaping and secret wounds. I want us all to be protagonists. I want us all to be able to spend our energy on love and intellect and creativity, not on responding to harassment, or to the threat of harassment, or even the implicit fear of it.

So that’s why I support the work of the Ada Initiative: because codes of conduct legitimize marginalized people’s own understandings of their (our) own experiences, and give them (us) concrete ways to push back when their (our) lines are crossed. Because AdaCamps give us environments where we don’t have to be the only woman in the room, and therefore space to be all the other things we are, too. Because Ally Skills workshops give men a framework for seeing the ways women’s lives can be entirely, invisibly, surprisingly different, right in front of them, and strategies for enacting their values.

And that’s why, along with Bess Sadler, Chris Bourg, and Mark Matienzo, am providing a $5120 matching grant for donations from libraryland toward the Ada Initiative’s yearly fund drive. From today through the 15th, we’ll match every dollar pledged through the Ada Initiative’s library-specific campaign page, up to $5120.

If you’re happy that your favorite library association (e.g. ALA, DLF, Code4Lib, SAA, Access, In the Library with the Lead Pipe, Evergreen, Open Repositories, and OLA) has adopted a Code of Conduct, the Ada Initiative deserves part of the credit. If you liked Valerie Aurora’s keynote at Code4Lib 2014, or you’re looking forward to her Ally Skills Workshop at DLF Forum, or you think it’s cool that several library technologists have been to AdaCamp, please consider donating to the Ada Initiative, so that it can keep doing all those things (and hey, maybe more!).

Or if you just like feminist stickers. We can set you up with those, too.

Thank you.

Erin White: Why this librarian supports the Ada Initiative

Wed, 2014-09-10 13:00

This week the Ada Initiative is announcing a fundraising drive just for the library community. I’m pitching in, and I hope you will, too.

The Ada Initiative’s mission is to increase the status and participation of women in open technology and culture. The organization holds AdaCamps, ally workshops for men, and impostor syndrome trainings; and spreads awareness of the need for conference codes of conduct.

Library tech is a great place to be right now

Library tech is an increasingly gender-inclusive space. I’m especially happy to be part of the Code4Lib community. In late 2012 Code4Lib adopted a conference code of conduct, and at this year’s conference, the Ada Initiative’s Valerie Aurora joined us for the whole conference and keynoted on the final day, which was a treat. Meeting her was a big deal for me and I learned a lot from her talk.

And, there are awe-inspiring women role models in library tech-land: Bess Sadler, Andromeda Yelton, Coral Sheldon-Hess, Margaret Heller, and many others.

…but that doesn’t mean we can’t do better

Despite our forward momentum, there are still some fundamental gender gaps in libraryland.

I went to grad school because I liked building websites and wanted to get a theoretical background for my work in IT. Information science seemed like a natural place for me. I didn’t think I would become a librarian, but my path started to veer toward library technology as I finished my program. As that happened, I realized that there was a false distinction between the library science and information science programs at my school. So I wrote my master’s paper about it.

The distinction between the programs bothered me most because of how gender-divided they were, despite the trivial difference in core curricula (two courses). The year I did my research, 2009, the gender proportions were inverse: about 70% of library science students were women and about 70% of information science students were men. What my paper didn’t include, but should have, was a deeper analysis of the stark gender gaps between the programs and how that informed students’ perceptions of and interactions with each other and their career choices. As a woman in the information science program going into a career in librarianship, I was a deep outlier in the program. I identified myself as a technologist, while many women in the library science program did not, even though their tech aptitude was way higher than most mortals’. Something was holding other women back from choosing essentially the same degree path but with a more technical label.

Now with five years under my belt as a librarian, a few things have become clear to me:

The distinction between the grad programs was largely cultural, not curricular.

Libraries are technology. The past, present, and future of libraries is technology.

The future of library leadership is technology.

To be such a female-dominated field, libraryland has a disproportionately low number of women in leadership roles and in technology roles. So few women align themselves as technologists even when they are doing work in technology. And so few women align themselves as leaders even when they are poised to take leadership roles.

We need to encourage more women to embrace technology leadership roles in libraries.

What we can do

This is cultural. This is something we need to talk about. This is something we need to work on. Even if that process is uncomfortable.

Thank you: Wren Lanier for clarifying edits and Laura Gariepy for additional ideas.

In the Library, With the Lead Pipe: Open for Business – Why In the Library with the Lead Pipe is Moving to CC-BY Licensing

Wed, 2014-09-10 10:30

Blown Away, CC-BY felixtsao (Flickr).

In brief: Lead Pipe is changing our licensing from CC-BY-NC to CC-BY. Here, we explain why.

In the Library with the Lead Pipe has, since we began publishing in 2008, been run by volunteers with a desire to spread ideas for positive change as widely as possible.

For this reason, we have required that all articles are published under a Creative Commons Attribution Non-Commercial (CC BY-NC 3.0 US) license.

Publishing under a CC BY-NC license has always been viewed by Lead Pipe as a way of balancing our commitment to authors (by ensuring they retain their own copyrights and are protected from unrecompensed commercial exploitation of their work) with our commitment to our readers (by ensuring our articles can be openly and freely accessed on our own site and distributed elsewhere for non-commercial purposes).

In the first half of 2014, however, as we took time to reflect on what had changed in over five years of publishing, we began to debate the merits of moving to a more permissive license.

Why change?

The central tension for any publisher is that of distribution versus control. The more effectively reading and publishing can be controlled, the less widely an article will be distributed and read. Alternatively, if wide distribution is given preference, we must relinquish control over how and to whom it is distributed. There are all sorts of solutions to this problem, depending on the goals of the author and the publisher.

As the Lead Pipe Editorial Board worked through our new documentation, we re-assessed our own mission. Our About page states that:

Lead Pipe intends to help improve communities, libraries, and professional organizations. Our goal is to explore new ideas and start conversations, to document our concerns and argue for solutions.

The question we began to discuss is whether our licensing matched this mission. Implied in this mission is that the conversations we start include as many people as possible. That is, we should privilege wide distribution over control. The definition of ‘non commercial’ in CC BY-NC licenses, however, is vague, and a recent court case indicates it may be much more narrowly applied than we anticipated. In this case, it was found that use of a CC-BY-NC licensed photo on the website of the German national broadcaster breached copyright. The Court found that ‘non-commercial’ use means ‘personal’ use, so use by a non-profit organisation on its advertising-free website still infringed. This case, and other legal opinions in Germany and other countries, potentially make Lead Pipe articles unusable even by the people and organizations we hope to support, such as educators and public broadcasters.

The discussion really kicked off in February when Micah (then a member of the Board) proposed an article co-authored by Chealsye Bowley, and suggested that (based on his research) we may need to re-assess our licensing. In March, Hugh was contacted by McGraw-Hill, which sought to negotiate terms to republish one of his (CC-BY-NC licensed) blog posts for use in school assessment software. Both Micah’s research and Hugh’s experience resulted in a proposal to the Lead Pipe Editorial Board regarding a change to CC-BY licensing. With Open Access mandates creeping closer to requiring CC-BY, (especially in the UK, – for example the Wellcome Trust and RCUK), and all our authors writing for love rather than money, we reached consensus relatively easily.

Back to the future

The aim of relicensing previously published articles is primarily to ensure that, to the greatest extent possible, our readers are able to understand what rights and obligations they have when re-using or re-publishing Lead Pipe articles. Whilst we suspected it was unattainable when we began this process, our ultimate goal has been to ensure all articles published by Lead Pipe are licensed the same way to ensure clarity regarding license terms.

Since Lead Pipe does not hold the copyright in the articles we publish, we needed to ask each of our authors to change the licensing on their articles. Not including (then) current Editorial Board members, this required us to contact 65 authors by email with an explanation of what we were asking, and why (the text of our initial email is included in the appendix below). We first did so on 13 July, and after a second nudge we gained agreement from 52 of our authors, with two authors declining to change their licensing. We have so far been unable to contact the remaining eleven authors, but continue our efforts to do so. Where an author (including any co-author) has not clearly stated that they agree to re-license, the existing license is retained.

it is important to note that this is a request we have made of our authors, but it is they who hold the copyright in, and decide how to license, their articles.  As we wrote in our initial email to authors:

We recognize that if Lead Pipe required CC-BY licensing at the time you wrote your article, you may have chosen another publication with which to publish, or chosen not to write it at all.

Essentially, we wrote to our authors stating that we had changed our minds about the most appropriate form of licensing for their articles. Many were happy to change, some requested more information before making a decision, and some made an informed decision not to change their licensing. One of the lessons here is the value of retaining your own copyright as an author, and therefore retaining control over who can re-publish it and under what circumstances.

More open, more access

Lead Pipe has always aimed to be open, progressive, and a force for positive change. This requires us to go further than simply publishing provocative articles. We must be open to changing our own behaviour and procedures when evidence suggests they inhibit our goals. We have come to the view that changing the licensing of Lead Pipe articles will better align our practice with our goals.

From today, all new articles, and most existing articles, will be published under a Creative Commons Attribution International 4.0 (CC-BY 4.0) license.

Where previously-published authors have requested that the existing license remain, or we have been unable to ascertain their wishes, we have noted the non-commercial licensing terms at the end of their articles.

Lead Pipe would like to thank all of our authors for their positive and gracious responses to our relicensing process. We look forward to working with other amazing authors to explore new ideas and start conversations, and to helping spread those ideas even more widely.

Thanks to Katrina McAlpine for editing and advice on this editorial.

References and further reading

Daught, G, 25 March 2014. ‘I dropped the “NC” from my Creative Commons license’, Alpha Omega | Open Access.

Moody, G, 27 March 2014. ‘German Court Says Creative Commons ‘Non-Commercial’ Licenses Must Be Purely For Personal Use’, Techdirt.

Nowviskie, B, 11 May 2011. ‘Why Oh Why, CC-BY?’

Research Councils UK,  8 April 2013. RCUK Policy on Open Access and Supporting Guidance,

Rundle, H., 23 March 2014. ‘Creative Commons, Open Access, and hypocrisy’,

Rundle, H, 2 January 2013. ‘Mission Creep – a 3D printer will not save your library’,

Vandegrift, M & Bowley, C, 23 April 2014. ‘Librarian, Heal Thyself: A Scholarly Communications Analysis of LIS Journals’, In the Library with the Lead Pipe.

Wellcome Trust, une 2014,. Open access: CC-BY licence required for all articles which incur an open access publication fee – FAQ, J

Appendix: text of email to authors

Dear [author]

Since 2008, In the Library with the Lead Pipe has been publishing posts and articles that inspire, challenge and provoke librarians around the world. We are honoured to have had you as one of our authors. When Lead Pipe launched, we were determined not only to be inspiring and challenging to the profession, but also to be open. As a Lead Pipe author, we asked you to provide us with ‘first publisher’ attribution, and assign a Creative Commons Attribution-Non Commercial (CC-BY-NC) license to your article.

After more than five years of publishing, we have recently taken time to consider our position as a publisher. Lead Pipe started as a peer-reviewed group blog, but we have now repositioned ourselves as an open access, open peer reviewed journal. As part of this process we have reconsidered our licensing, and will be moving to a more permissive Creative Commons Attribution (CC-BY) license. In making this decision, we have recognized that CC-BY-NC licenses are surrounded by confusion and controversy. The definition of ‘non commercial’ is vague, and a recent court case indicates it may be applied much more narrowly than we anticipated, potentially making Lead Pipe articles unusable even by the people and organizations we hope to support, such as educators and public broadcasters.

As an Editorial Board, we aim for all the work published in Lead Pipe to find a wide audience and for it to help change library practice for the better. By removing the ‘non commercial’ license provisions we feel that this aim will more easily be achieved. We would therefore like not only to license all future works CC-BY, but also relicense all previously published articles from CC-BY-NC to CC-BY.

As one of our existing authors, we ask that you agree to relicense your article to a Creative Commons Attribution 4.0 International license. Our goal in relicensing all previously published articles is to provide clarity for our readers by ensuring our licensing is consistent throughout our website.

If you would like to read more about CC-BY versus CC-BY-NC licensing, we recommend the following:

If you are willing to agree to relicense your article, please reply to this email with the words:  “I agree to re-license all work published under my name in In the Library with the Lead Pipe, to a Creative Commons Attribution 4.0 International License”.

Whilst we would prefer for all authors to re-license their articles, we understand that you may prefer not to do so. We recognize that if Lead Pipe required CC-BY licensing at the time you wrote your article, you may have chosen another publication with which to publish, or chosen not to write it at all. If you would prefer your article to remain CC-BY-NC, we will ensure that its licensing is clearly indicated as such.

If you are sure that you would prefer to keep the existing CC-BY-NC license, please reply to this email with the words “Please keep the licensing of my article as CC-BY-NC”.

If you have any questions about re-licensing your work, or would like more information, please let us know by emailing [email].


Brett Bonfield
Ellie Collier
Erin Dorney
Emily Ford
Gretchen Kolderup
Hugh Rundle
Coral Sheldon-Hess
Micah Vandegrift

DuraSpace News: Islandora 7.x-1.4 Release Timeline

Wed, 2014-09-10 00:00
From Samantha Fritz, Interim Project and Community Manager, Islandora Foundation   Charlottetown, Prince Edward Island, CA  Islandora is pleased to announce that the upcoming 7.x-1.4 release is well underway!   Islandora 7.x-1.4 will see the return of Islandora Solr Views, as well as the addition of one new module, Islandora Video.js, in addition to a number of improvements to existing modules and tools.  

DuraSpace News: Islandora Camp Set for Denver, Colorado, Oct. 13-15

Wed, 2014-09-10 00:00
From Samantha Fritz, Interim Project and Community Manager, Islandora Foundation   Charlottetown, CA  Islandora Camp will be in Colorado this Fall. Hosted by the University of Denver in association with the Colorado Alliance of Research Libraries, this first Central-US camp will take place October 13-15, 2014.  

DuraSpace News: Islandora 7.x-1.4 Release Timeline

Wed, 2014-09-10 00:00
From Samantha Fritz, Interim Project and Community Manager, Islandora Foundation   Charlottetown, Prince Edward Island, CA  Islandora is pleased to announce that the upcoming 7.x-1.4 release is well underway!   Islandora 7.x-1.4 will see the return of Islandora Solr Views, as well as the addition of one new module, Islandora Video.js, in addition to a number of improvements to existing modules and tools.  

DuraSpace News: REGISTER: VIVO Project to Host Hackathon at Cornell, Oct. 13-15

Wed, 2014-09-10 00:00

From Layne Johnson, VIVO Project Director

Please Note: If you plan to attend the Hackathon please make your hotel reservations ASAP—the $129 rate at the Ithaca Hotel expires this Friday, September 12.

Roy Tennant: The Power of Powers of 2

Tue, 2014-09-09 23:43

Despite the fact that I consider myself a lifelong feminist, I am still surprised and dismayed at how easily I can overlook discriminatory behavior toward women. Or not even discriminatory behavior but things that are much more subtle, like situations that discourage women from speaking up or participating.

So when a colleague forwarded a notice about the Ada Initiative’s “Allies Workshop” (now called the “Ally Skills” workshop), I jumped at the chance to go. I had heard of the Ada Initiative and I was interested to hear what they had to tell me. The workshop I attended in San Francisco included mostly men from startup technology companies. I learned more about the subtle ways in which discrimination occurs and how to be a better ally to those experiencing such discrimination. I was also surprised and pleased to discover that I learned a lot from situations that others had experienced and described in our interactive sessions.

I left the workshop feeling more knowledgeable and empowered to help make a difference. But more importantly, I learned more about how my own behavior can be modified to provide space for voices that might otherwise go unheard. I also left being impressed with Valerie Aurora, who led the workshop. Little did I know at the time that we would cross paths again soon.

The 2014 Code4Lib Conference was able to sign on Valerie Aurora as a keynote speaker, and she requested to be interviewed rather than to give a speech. I was happy to offer to do the interview, which you can see here. It was one of the best sessions we’ve ever had at Cod4Lib, and not because of the interviewer.

Now the Ada Initiative is asking for our help and I’m happy to help publicize the library-specific campaign to raise money to support and expand their work. Suggested donation amounts use powers of 2, which is totally a geek thing. Andromeda Yelton, Bess Sadler, Chris Bourg, and Mark Matienzo joined forces to help raise $4,092 (2 to the 12th power) in matching funds. This means that any donation you make is automatically doubled. The campaign runs from now to September 15th, so donate now!

Feel the power of using powers of 2 to build a more equitable and just society.

Bess Sadler: The Ada Initiative Has My Back

Tue, 2014-09-09 22:49

The Ada Initiative has my back. In the past several years they have been a transformative force in the open source software community and in the lives of women I know and care about. To show our support, Andromeda Yelton, Chris Bourg, Mark Matienzo and I have pledged to match up to $5120 of donations to the Ada Initiative made through this link before Tuesday September 16. That seems like a lot of money, right? Well, here’s my story about how the Ada Initiative helped me when I needed it most.

Thanks to the Ada Initiative, having a conference code of conduct has become an established best practice, and it is changing conference culture for the better. I’m so proud of the many library conferences and organizations who have adopted Code of Conduct policies this year. However, just because a conference has a policy in place doesn’t mean there won’t be any problems. I’d like to share something that happened to me this year, the way the Ada Initiative helped me and the conference in question deal with it, and how things have since improved.

I gave a talk at code4lib a few years ago called “Vampires vs Werewolves: Ending the War Between Developers and Sysadmins with Puppet“. The talk was well received enough that it became a little bit of a meme, and last year PuppetConf asked me if I’d give an updated version of it for PuppetConf 2013.

It was exciting to participate in such a huge conference. PuppetConf is well funded and professionally managed. They have a code of conduct in place, friendly and helpful staffers, and high quality content. I was having a great time up until right before my talk.

Unfortunately, the talk before mine was titled “Nobody Has to Die Today: Keeping the Peace With the Other Meat Sacks.” I watched this talk on the video monitor from backstage, while getting hooked up to a lapel microphone, already a little nervous about facing such a large audience.

The speaker was a large heavily muscled man who was shouting more than speaking into the microphone. He was shouting about violence, and about how many people get murdered in the workplace. He particularly mentioned the fact that murder is the number one cause of death for women in the workplace. I felt my blood running cold. My body felt flooded with fear, and I wanted to run. He went on to discuss the many ways he personally had hurt people, through his work as a bouncer, in martial arts, or just because someone made him angry. At this point I was literally shaking. I have been on the receiving end of violence. I have known people who have been murdered. I know people, especially women, who have been hospitalized with the kinds of injuries he was graphically describing having inflicted. In spite of the small print disclaimer on his slides that this presentation was not encouraging violence, it was doing precisely that. If you don’t communicate technical requirements in the way he specifies, apparently, you will get what’s coming to you and you will deserve it.

The conference staff backstage were horrified. Some high profile people in the audience were walking out. It was clear that I was upset (I was trying not to hyperventilate at this point) and someone kept asking me if I wanted to file a code of conduct complaint. The thought of this made my panic even worse. I could easily picture filing a complaint and then paying for it with my life, when this guy found out and beat me to death in the parking lot after the conference. I said I did not want to file a complaint. I tried to take deep breaths and to not break down crying. I was determined to give my talk in spite of my shaky emotional state.

And I did. I delivered my presentation, which went surprisingly well, except for the fact that in the video I am swaying back and forth. I don’t usually do that when I speak, and I read it as the outward manifestation of how upset I was.

Afterwards, I thought for a long time about writing to the conference with my concerns. I started to do so several times, but I always chickened out. It was too easy for me to picture this guy learning my name and coming after me. Before you dismiss me as paranoid please consider the stories of Anita Sarkeesian and Kathy Sierra. Women in technology face worse than sexist jokes. We face assault. We face death threats. If you defend the status quo, understand what you are defending.

It wasn’t until this year’s PuppetConf call for proposals that I complained. The conference had liked my talk last year, and invited me to submit another talk this year. I wrote to decline, and told them why. I also sent a copy of my letter to Valerie Aurora, asking for the advice of the Ada Initiative.

I am very pleased to say that PuppetConf took my concerns seriously. Working with the Ada Initiative, they strengthened their code of conduct, put more screening measures in place for presentations, and improved training for conference staff on how to deal with problematic situations. PuppetLabs is an example of a company that is doing things right. They have specific outreach programs to get more women involved in the Puppet community and they are pursuing similar strategies to encourage participation from underrepresented racial groups. I feel good about the fact that I’m sending members of my staff to PuppetConf 2014, and at this point I would gladly speak at the conference again.

As upsetting as this incident was, this is a story with a happy ending. Because the Ada Initiative exists, both PuppetConf and I had someone to go to for guidance in how to improve the situation. Honestly, I still feel a little afraid about writing this post. But I also believe that nothing gets better until people take the risk of speaking out publicly. I am choosing to take that risk, in order to better communicate about why this work matters.

The Ada Initiative continues to do great things. You can read their 2014 progress report here. I am particularly excited about the Ally Skills workshop that will be offered at the Digital Library Federation Forum on October 29. Today, librarians are showing our love for the Ada Initiative. Watch for blog and social media posts from friends in library land who will be sharing more stories about why the Ada Initiative matters, and follow the action on twitter under the hashtag #libs4ada. Join us in supporting the Ada Initiative’s mission and donate today!

State Library of Denmark: Small scale sparse faceting

Tue, 2014-09-09 20:35

While sparse faceting has profound effect on response time in our web-archive, we are a bit doubtful about the amount of multi billion document Solr indexes out there. Luckily we also have our core index at Statsbiblioteket, which should be a bit more representative of your everyday Solr installation: Single-shard, 50GB, 14M documents. The bulk of the traffic are user-issued queries, which involves spellcheck, edismax qf & pf on 30+ fields and faceting on 8 fields. In this context, the faceting is of course the focus.

Of the 8 facet fields, 6 are low-cardinality and 2 are high-cardinality. Sparse was only active for the 2 high-cardinality ones, namely subject (4M unique values, 51M instances (note to self: 51M!? How did it get so high?)) and author (9M unique values, 40M instances).

To get representative measurements, the logged response times were extracted for the hours 07-22; there’s maintenance going on at night and it makes measurement unreliable. Only user-entered searches with faceting were considered. To compare before- and after sparse-enabling, the data for this Tuesday and last Tuesday were used.

50GB / 14M docs, logged timing from production, without (20140902) and with (20140909) sparse faceting

The performance improvement is palpable with response time being halved, compared to the non-sparse faceting. Fine-reading the logs, the time spend on faceting the high-cardinality fields is now in the single-digit milliseconds for nearly all queries. We’ll have to do some test to see what stops the total response time from getting down to that level. I am guessing spellcheck.

As always, sparse faceting is readily available for the adventurous at SOLR-5894.

State Library of Denmark: Small scale sparse faceting

Tue, 2014-09-09 20:35

While sparse faceting has profound effect on response time in our web-archive, we are a bit doubtful about the amount of multi billion document Solr indexes out there. Luckily we also have our core index at Statsbiblioteket, which should be a bit more representative of your everyday Solr installation: Single-shard, 50GB, 14M documents. The bulk of the traffic are user-issued queries, which involves spellcheck, edismax qf & pf on 30+ fields and faceting on 8 fields. In this context, the faceting is of course the focus.

Of the 8 facet fields, 6 are low-cardinality and 2 are high-cardinality. Sparse was only active for the 2 high-cardinality ones, namely subject (4M unique values, 51M instances (note to self: 51M!? How did it get so high?)) and author (9M unique values, 40M instances).

To get representative measurements, the logged response times were extracted for the hours 07-22; there’s maintenance going on at night and it makes measurement unreliable. Only user-entered searches with faceting were considered. To compare before- and after sparse-enabling, the data for this Tuesday and last Tuesday were used.

50GB / 14M docs, logged timing from production, without (20140902) and with (20140909) sparse faceting

The performance improvement is palpable with response time being halved, compared to the non-sparse faceting. Fine-reading the logs, the time spend on faceting the high-cardinality fields is now in the single-digit milliseconds for nearly all queries. We’ll have to do some test to see what stops the total response time from getting down to that level. I am guessing spellcheck.

As always, sparse faceting is readily available for the adventurous at SOLR-5894.

Nicole Engard: Bookmarks for September 9, 2014

Tue, 2014-09-09 20:30

Today I found the following resources and bookmarked them on <a href=

  • Color Oracle Color Oracle is a free color blindness simulator for Window, Mac and Linux.

Digest powered by RSS Digest

The post Bookmarks for September 9, 2014 appeared first on What I Learned Today....

Related posts:

  1. Another Satisfied Customer
  2. Amazon’s bestselling laptop is open source!
  3. September Workshops

Roy Tennant: In Memoriam: Anne Grodzins Lipow

Tue, 2014-09-09 20:23

I was reminded by her daughter on Facebook that Anne Grodzins Lipow passed away ten years ago today. In commemoration of that horrible event, I am posting the Foreword I wrote for Anne’s festschrift that was published in 2008.

On September 9, 2004 librarianship lost a true champion. Anne Grodzins Lipow was unique – of all the testimonials I’ve read about her that is one undeniable truth. We each knew a different set of Anne’s qualities, or engaged with her in a different way, but in the end it all came down to the fact that Anne was someone we could all say was “larger than life”.

The days after her passing were filled with personal testimonials that were mostly lodged as comments on the Infopeople blog. It was an odd experience for me to read these messages and realize that as much as I felt that I knew her, I barely knew her at all. I was like the proverbial blind man with his hands wrapped around one part of the elephant, while others had a firm grip on other body parts and would describe a very different animal. My reality, as deeply felt as it was, was only a pale shadow of the whole.

But for all that, it was a long, long shadow. As a newly-minted librarian at UC Berkeley in the second half of the 1980s, I knew Anne as the person who led the outreach and instructional efforts of the library. Before long, she saw in me the potential to be a good teacher, despite my fear of public speaking, so she pulled me into her program and began teaching me everything she knew about speaking, putting on workshops, making handouts, etc. Under her tutelage, I taught classes such as dialup access to the library catalog, when 300bps modems were still common.

As the Internet began making inroads into universities, Anne was there with newly developed workshops on how to use it. She was convinced very early on, as was I, that the Internet would be an essential technology for libraries. This led to her approaching my colleague John Ober (then on faculty at the library school at Berkeley) and I about doing a full-day Internet workshop scheduled to coincide with the 1992 ALA Annual Conference in San Francisco. Using a metaphor of John’s, we called it
“Crossing the Internet Threshold”.

In preparing for the workshop, we created so many handouts that we needed to put them into a binder that began to look increasingly like a book in the making. With typical Anne flair, she arranged for the gifted librarian cartoonist Gary Handman (also our colleague at Berkeley) to create a snazzy cover for the binder, that she also used to create T-shirts (which many of us have to this day).

Anne knew enough about workshops to do a “trial run” before the big day, so we did one for UC Berkeley library staff a couple weeks before, which gave us feedback essential to making an excellent workshop. In the end, the workshop was such a hit that Anne ran with it. She took the binder of handouts we had created and made a book out of it — the first book of her newly-created business called Library Solutions Institute and Press. Her decision to publish the book herself rather than seek out a publisher was so typical of Anne. And how she did it will tell you a lot about her.

Despite the higher cost, Anne insisted on using domestic union printing shops for printing. While other publishers were publishing books overseas for a fraction of the cost, publishing for Anne was a political and social activity, through which she could do good for those around her. It was very important to her to treat people with respect and kindness, and she did it so well. That was the kind of person Anne was.

While every publisher I have since worked with after Anne has insisted they are incapable of paying royalties any more frequently than twice a year, Anne paid her authors monthly. And whereas other publishers wait months to pay you for royalties earned long before, Anne would pay immediately. This meant that when books were returned, as they sometimes were, she took the loss for having paid the author royalties on books that had not been sold. That was the kind of person Anne was.

Anne continued to blaze new trails after libraries began climbing on the Internet bandwagon, due in no small measure to her books and workshops on the topic. Anne became a well-known and coveted consultant on a number of topics, but in particular on reference services.

Her “Rethinking Reference” institutes and book were widely acclaimed, and her book The Virtual Reference Librarian’s Handbook (2003) demonstrated that Anne was always at the cutting edge of librarianship. That was the kind of person Anne was.

I visited her after her cancer was diagnosed and after her treatment had failed. We all knew there was no hope, that she had only a matter of weeks to live. Despite the obvious ravages of the illness, Anne’s outlook remained bright and welcoming. She was happy to have her friends and family around her, and we talked of many things except the dark shadow that hung over us all. Even then, she was happy to see whoever came by, and to talk with them with a smile and good wishes. That was the kind of person Anne was.

A piece of all my major professional accomplishments I owe to Anne, and her great and good influence on me. She would deny this, despite it’s truth, wanting all the credit to accrue to me alone. That was the kind of person Anne was.


Each one of us who have contributed to this volume have been touched by Anne in our own, quite personal ways. Some of us have known of her work mostly by reputation and reading, while others were blessed with more direct and personal contact. But the fact remains that Anne cast a long professional shadow that will affect many librarians yet to come.

For those of us who created a monument of words to someone we love and respect, Anne had one final gift to give. As anyone who has ever created a present for someone they love knows, in so doing you think about the person for whom you are making the gift. Therefore, the authors of this volume have all spent more time with Anne, and as always it was time well spent. We know our readers will count it so too.

31 January 2008, Sonoma, CA

LITA: LITA Midwinter Institutes

Tue, 2014-09-09 19:34

Registration for LITA’s Midwinter Institutes opened today with ALA’s joint registration! Whether you’ll be attending Midwinter or are just looking for a great one day continuing education event in the Chicago/Midwest area, we hope you’ll join us.

When? All workshops will be held on Friday, January 30, 2015, from 8:30-4:00

Cost for LITA Members: $235  (ALA $350 / Non-ALA $380)
(If you are a member of LITA use special code LITA2015 to receive the price of $235.)

Workshops Descriptions:

Developing mobile apps to support field research
Instructor: Wayne Johnston, University of Guelph Library

Researchers in most disciplines do some form of field research. Too often they collect data on paper which is not only inefficient but vulnerable to date loss. Surveys and other data collection instruments can easily be created as mobile apps with the resulting data stored on the campus server and immediately available for analysis. The apps also enable added functionality like improved data validity through use of authority files and capturing GPS coordinates. This support to field research represents a new way for academic libraries to connect with researchers within the context of a broader research date management strategy.

Introduction to Practical Programming
Instructor: Elizabeth Wickes, University of Illinois at Urbana-Champaign

This workshop will introduce foundational programming skills using the Python programming language. There will be three sections to this workshop: a brief historical review of computing and programming languages (with a focus on where Python fits in), hands on practice with installation and the basics of the language, followed by a review of information resources essential for computing education and reference. This workshop will prepare participants to write their own programs, jump into programming education materials, and provide essential experience and background for the evaluation of computing reference materials and library program development. Participants from all backgrounds with no programming experience are encouraged to attend.

From Lost to Found: How user Testing Can Improve the User Experience of Your Library Website
Instructors: Kate Lawrence, EBSCO Information Services; Deirdre Costello, EBSCO Information Services; Robert Newell, University of Houston

When two user researchers from EBSCO set out to study the digital lives of college students, they had no idea the surprises in store for them. The online behaviors of “digital natives” were fascinating: from students using Google to find their library’s website, to what research terms and phrases students consider another language altogether: “library-ese.” Attendees of this workshop will learn how to conduct usability testing, and participate in a live testing exercise via Participants will leave the session with the knowledge and confidence to conduct user testing that will yield actionable and meaningful insights about their audience.


More details about these workshops will be coming in interviews with the instructors in October! If you have a question you’d like to ask the instructors, please contact LITA Education Chair Abigail Goben at [firstnamelastname]





LITA: 2014 LITA Forum: early bird rates available through Sept. 15

Tue, 2014-09-09 19:19
Don’t miss your chance to save up to $50 on registration for the 2014 LITA Forum “From Node to Network” to be held Nov. 5-8, 2014 at the Hotel Albuquerque in Albuquerque N.M.

Don’t forget to book your room at the Hotel Albuquerque by Oct. 14, 2014 to guarantee the LITA room rate.

This year’s Forum will feature three keynote speakers

  • AnnMarie Thomas, Engineering Professor, University of St. Thomas
  • Lorcan Dempsey, Vice President, OCLC Research and Chief Strategist
  • Kortney Ryan Ziegler, Founder Trans*h4ck.

More than 30 concurrent colleague inspired sessions and a dozen poster sessions will provide a wealth of practical information on a wide range of topics.

Networking opportunities, a major advantage of a smaller conference, are an important part of the Forum. Take advantage of the Thursday evening reception and sponsor showcase, the Friday networking dinners or Kitchen Table Conversations, plus meals and breaks throughout the Forum to get to know LITA leaders, Forum speakers, sponsors, and peers.

This year two preconference workshops will also be offered.

Linked Data for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users
Led by: Dean B. Krafft and Jon Corson-Rikert, Cornell University Library

Linked Open Data (LOD) provides an expressive and extensible mechanism for sharing information (metadata) about all the materials research libraries make available. In this workshop the presenters will introduce the principles and practices of creating and consuming Linked Open Data via a series of examples from sources relevant to libraries. They will provide an introduction to the technologies, tools, and types of data typically involved in creating and working with Linked Open Data and the semantic web. The preconference will also address the challenges of data quality, interoperability, authoritativeness, privacy, and other issues accompanying the adoption of new technologies as these apply to making use of Linked Open Data.

Learn Python by Playing with Library Data
Led by: Francis Kayiwa, Kayiwa Consulting

What can be more fun than learning Python? Learning Python by hacking on library data! In this workshop, you’ll learn Python basics by reading files, looking at MARC (yes MARC), building data structures, and analyzing library data (those logs aren’t going to appreciate themselves). By the end, you will have set up your Python environment, installed some useful packages, and learned how to write simple programs that you can use to impress your colleagues back at work.

2014 LITA Forums sponsors include EBSCO, Springshare, @mire, Innovative and OCLC.

Visit the LITA website for more information.

Library and Information Technology Association (LITA) members are information technology professionals dedicated to educating, serving, and reaching out to the entire library and information community.   LITA is a division of the American Library Association.

LITA and the LITA Forum fully support the Statement of Appropriate Conduct at ALA Conferences

Bill Dueber: Help me test yet another LC Callnumber parser

Tue, 2014-09-09 19:10

Those who have followed this blog and my code for a while know that I have a long, slightly sad, and borderline abusive relationship with Library of Congress call numbers.

They're a freakin' nightmare. They just are.

But, based on the premise that Sisyphus was a quitter, I took another stab at it, this time writing a real (PEG-) parser instead of trying to futz with extended regular expressions.

The results, so far, aren't too bad.

The gem is called lc_callnumber, but more importantly, I've put together a little heroku app to let you play with it, and then correct any incorrect parses (or tell me that it worked correctly) to build up a test suite.

So…Please try to break my LC Callnumber parser!

[Code for the app itself is on github; pull requests for both the app and the gem joyously received]

David Rosenthal: Two Brief Updates

Tue, 2014-09-09 17:56
A couple of brief updates on topics I've been covering, Amazon's margins and the future of flash memory.

First, Benedict Evans has a fascinating and detailed analysis of Amazon's financial reports. Read the whole thing.

He shows how Amazon's strategy is not to generate and distribute profits, but to re-invest their cash flow into staring and developing businesses. Starting each business absorbs cash, but as they develop they turn around and start generating cash that can be used to start the next one.

His graphs and numbers make the case brilliantly. Here, for example, is Amazon's revenues and profits since launch; lots of revenues and almost no profit. But it is more revealing to focus, as Amazon does, on cash flow.

Here Evans shows Free Cash Flow (FCF), Capital Expenditure (capex), and Operating Cash Flow (OCF) as a proportion of revenue.
Amazon’s OCF margin has been very roughly stable for a decade, but the FCF has fallen, due to radically increased capex.Here Evans shows capex as a proportion of sales, showing a relentless rise starting in late 2009.
That is, if Amazon was spending the same on capex per dollar of revenue as it was in 2009, it would have kept $3bn more in cash in the last 12 months.What we're interested in here is the AWS business, which is most of the category Amazon calls "Other". Here is the growth of "Other" revenue. This is a market that Amazon is absolutely dominating. Its cash flow is doing two things, paying for the computing infrastructure Amazon needs to runs its other, much larger, established businesses, and paying for the startup costs of new businesses.

As far as I can see, in the "cloud" business only Google has the same synergy between an established business, and a cloud business. Other competitors don't need the cloud scale of investment to support another, much larger existing business. They have to treat their cloud investments as a stand-alone business, which is much less efficient. And they are much smaller than AWS. So they aren't going to survive. IBM and Microsoft, I'm looking at you.

Second, Chris Mellor at The Register looks at the hype surrounding the "all-flash data center" and makes the point that Dave Anderson of Seagate has been making for years.
That leaves us with the view that all-flash data centres are not feasible at present. They may become feasible if the cost of flash falls to near-parity with nearline and bulk storage disk but there is another problem: the flash foundry capacity to build the stuff just doesn't exist.

In terms of exabytes of capacity, worldwide disk production is vastly higher than that of flash, and with flash fabs costing $7bn to $9bn apiece it is likely to remain so.

This is no small matter. An all-flash data centre would need approximately the same number of TB of storage as current all-disk or hybrid flash/disk data centres.
The flash foundry operators are paranoid about avoiding loss-making gluts of product, having seen the dire effects of that in the memory industry, with its persistent huge losses and dramatic supplier consolidation. They will be slow to bring new flash fab capacity online.

They are working towards increasing flash capacity by increasing wafer density through cell geometry shrinks, and also through building flash chips with stacked layers of cells, so-called 3D NAND.

These in themselves won't allow the flash industry to take on any substantial portion of worldwide disk capacity in the next few years. That requires many new fabs and there is no sign of that happening.Not to mention that generating a return on a $7-9B investment requires that the product it builds be in the market for many years. Flash technology is approaching its limits, so the time during which flash will dominate the solid-state storage market with its premium pricing is short, too short to generate the necessary return.

HangingTogether: Innovative solutions for dealing with born-digital content in obsolete forms – Part 2

Tue, 2014-09-09 17:00

This is the second of three posts about a lightning talk session at SAA. Part 1 began with descriptions of the array of media an archives might confront and an effort to test how much can be done in house.

Part 2 picks up with four archivists talking about solutions to particularly challenging formats.

Abby Adams is the Digital Archivist at Hagley Museum & Library, an independent research library in Wilmington, Delaware, documenting American enterprise from its inception to present day with a focus on the intersection of industry, technology, and society. In 2012, Hagley received a large hybrid collection, consisting primarily of textual analog materials, in addition to a number of born-digital records. The records were created by various tech corporations during the normal course of business in the late 1990s and early 2000s and document aspects of the dot-com boom and bust, an area of research where primary sources are sorely lacking. Given the potentially high research value of the collection, Adams gave the preservation of the born-digital content high priority and culled hundreds of records cartons to discover the following obsolete media formats: 349 compact discs; 134 3.5” floppy disks; 113 digital linear tapes (DLT); 49 digital data storage tapes (DDS); 19 quarter-inch mini cartridges; 15 Travan cartridges; and 8 zip disks.

Although the CDs and floppy disks presented few problems, the remaining obsolete formats offered a lesson in how complex data recovery can be. Adams’ attempts to use “freecycled” drives and jerry-rig old PCs were just not working. Even if she could connect a computer to the exact generation DLT or DDS drive to read the tapes, she would also need to know the software program used to create the backup, which could vary widely depending on the date of creation, then successfully install it, and cross her fingers the media isn’t encrypted or corrupt. Since Hagley is a small shop with limited in-house resources, it was clear outsourcing the data extraction was the best course of action. After consulting several vendors, Adams and her coworker Kevin Martin found a company that specializes in data extraction and indexing of backup tapes. After establishing a budget for the first phase of the project, Adams and Martin sent the vendor a sample consisting of five DLT and three DDS tapes. Less than a week later, the vendor provided them access to the indexed data from seven out of eight tapes. Due to the size of the collection and Hagley’s limited in-house resources, Adams was strict with appraisal, retaining only about ten percent of the data. The original media was returned to Hagley a few weeks later. Having successfully completed the first phase of the project, Hagley will continue to use the same company for the remaining backup tapes.

Elise Warshavsky, is the Digital Archivist at the Presbyterian Historical Society, which serves as the national archives of the Presbyterian Church, documenting the political and social history of the church. The archives acquired the laptop of Clifton Kirkpatrick former Stated Clerk, the highest elected official within the church. The laptop contained files he had worked on as well as his email. Five years later Elise was hired and was asked to archive the Stated Clerk’s laptop. This was the nature of the “detailed instructions” she received regarding passwords, the types of files, and that there were 28,000 emails in the Novell GroupWise account:

The records manager who had originally received the laptop had converted the account to a Remote account enabling the email to live solely on the laptop. The records archivist had also reorganized the inbox and appraised each individual email, resulting in lost folder structure and possibly other lost metadata. The emails were readable, but because of a 50-year embargo on access to them, the goal was to ensure that these files would be readable in 50 years. After not being able to find a way to convert the GroupWise Remote email to another format, she finally contacted a company that makes a commercial grade email converter called Transend. They agreed to resurrect the Remote account on their GroupWise servers and then convert it to .pst, Microsoft’s open proprietary file format. Then she was able to move forward with her migration plan: convert to a more archival email format, .MBOX, as well as run a tool to batch export PDFs from each individual email and convert them to PDF/As – a format researchers would be able to search and access in 50 years.

Elise’s advice: If you get frustrated about not having the tools or skills necessary to complete a project, reach out to find help. There’s no need to develop resources in house when dealing with a unique, most likely not repeatable incident. Get help, and move on to doing what you do know how to do – accession, appraise, and preserve.

Ted Hull, Director of the Electronic Records Division at the National Archives at College Park, told of a project to recover content from 7-track tapes.

The Electronic Records Division accessions, processes, arranges for preservation, describes, and provides access to the born-digital federal records scheduled for permanent retention in the National Archives. They hold 932 series from over 100 federal agencies; consisting of over 750 million unique files and over 320 terabytes of data. 7-track magnetic tape was an industry standard from the 1950s -1970s, when it was generally replaced with 9-track magnetic tape. While most of the Archives’ content had already been transferred off of 7-track tape, in 2013, staff identified 13 remaining tapes containing records from the Federal Home Loan Bank Board, the Bureau of Indian Affairs, and the U.S. Joint Chiefs of Staff. The Archives reached out and found that the National Center for Atmospheric Research (NCAR) in Boulder, CO still had the capability to read 7-track tapes and were able to recover data from 9 tapes; the other 4 were blank. NCAR converted the binary-coded decimal encoding to ASCII and made the files available to NARA for direct download from their FTP site; NARA processed and accessioned the records and the original tapes were returned to NARA for disposal.

Ben Goldman, the Digital Records Archivist for Pennsylvania State University Libraries, discovered 27 3-inch disks in a modern literary manuscript collection. They didn’t have the equipment needed to read the disks, and we weren’t even sure if the disks were readable or even contained data worth recovering.
Amstrad disk from the Fiona Pitt-Kethley papers, Penn State University Special Collections Library

The author confirmed that she did own an Amstrad computer (a somewhat popular computer in the UK for a brief period in the 1980s), but because Ben didn’t know exactly what hardware or software was needed to read the disks, he decided to outsource recovery of the disks. He wanted to use the opportunity to come up with a model vendor agreement and to make the project an extension of their internal born-digital workflow. To that end, he created a media inventory spreadsheet to be used to identify the disks, their labels, their contents, the images derived from them, and to accommodate checksums after their eventual transfer. Mostly, however, he wanted to see if outsourcing was a viable option for archivists confronting elusive computer media formats and to see if core archival requirements could be met by outsourcing, whether service providers could adhere to emerging best practices, and to see if the costs were viable for archives. PSU provided funding for a project at $40 per disk.

[Tweet] Jason P. Evans Groth: $40/disk is same as person making $40k spending two hours to image obsolete disks, so maybe it is the right deal? #s601 #saa14

Soon Ben had a signed vendor agreement with the Museum of Computer Culture to provide disk images that could be processed using forensic tools. They were to work from the inventory and follow naming conventions and provide checksums to ensure accurate transfer.

Many months later, however, Ben was working with two other vendors – without a signed agreement. They found that disk images that were native to the Amstrad operating system couldn’t be migrated to modern formats or processed using common forensic tools. Instead, Ben received three versions of every file in three different formats, each with its own brand of lossy-ness and, in the end, there was no adherence to naming conventions and no checksums. Despite not really meeting his expectations, Ben doesn’t think of the project as a failure. “Fugitive media is defiant,” he warns. Communication is key and the vendor agreement should establish communication requirements. Beyond that, Ben is not sure this cost model will be sustainable. Instead, he suggests that archivists need to develop in-network options. There are technologies, resources, and talented people working on these issues. It would be nice to see some better community strategies for tackling the issues and supporting each other.

Next up: Part 3 will continue with three speakers representing the service provider point of view.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (32)

DPLA: September 10/15, 2014: Board and Board Finance Committee Open Calls

Tue, 2014-09-09 14:00

The DPLA Board and its Finance Committee will each hold an open conference call in September 2014.  Both of these calls are open to the public.

Board Finance Committee Open Call
September 10, 2014 at 1:00 PM EDT

? View Agenda and Dial-in


  • Overview of recent grant awards
  • Open comments and suggestions from the committee
  • Comments and suggestions from the public


Via the web:

Via telephone
United States: +1 (805) 309-0012
Access Code: 312-488-189
Audio PIN: Shown after joining the meeting
Meeting ID: 312-488-189


Board of Directors Open Call
September 15, 2014 at 3:00 PM EDT

? View Agenda and Dial-in



Public Session

  • Proposal to amend DPLA By­laws to allow for increased number of Directors (Call to vote)
  • Overview of draft DPLA Strategic Plan
  • Update from Executive Director
  • Questions/comments from the public

Executive Session

  • Review of DPLA Handbook
  • Conflict of Interest Certification
  • Review of draft DPLA Strategic Plan
  • Funding and financial update


All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.