You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 4 hours 8 min ago

DuraSpace News: Islandora 7.x-1.4 Release Timeline

Wed, 2014-09-10 00:00
From Samantha Fritz, Interim Project and Community Manager, Islandora Foundation   Charlottetown, Prince Edward Island, CA  Islandora is pleased to announce that the upcoming 7.x-1.4 release is well underway!   Islandora 7.x-1.4 will see the return of Islandora Solr Views, as well as the addition of one new module, Islandora Video.js, in addition to a number of improvements to existing modules and tools.  

DuraSpace News: REGISTER: VIVO Project to Host Hackathon at Cornell, Oct. 13-15

Wed, 2014-09-10 00:00

From Layne Johnson, VIVO Project Director

Please Note: If you plan to attend the Hackathon please make your hotel reservations ASAP—the $129 rate at the Ithaca Hotel expires this Friday, September 12.

Roy Tennant: The Power of Powers of 2

Tue, 2014-09-09 23:43

Despite the fact that I consider myself a lifelong feminist, I am still surprised and dismayed at how easily I can overlook discriminatory behavior toward women. Or not even discriminatory behavior but things that are much more subtle, like situations that discourage women from speaking up or participating.

So when a colleague forwarded a notice about the Ada Initiative’s “Allies Workshop” (now called the “Ally Skills” workshop), I jumped at the chance to go. I had heard of the Ada Initiative and I was interested to hear what they had to tell me. The workshop I attended in San Francisco included mostly men from startup technology companies. I learned more about the subtle ways in which discrimination occurs and how to be a better ally to those experiencing such discrimination. I was also surprised and pleased to discover that I learned a lot from situations that others had experienced and described in our interactive sessions.

I left the workshop feeling more knowledgeable and empowered to help make a difference. But more importantly, I learned more about how my own behavior can be modified to provide space for voices that might otherwise go unheard. I also left being impressed with Valerie Aurora, who led the workshop. Little did I know at the time that we would cross paths again soon.

The 2014 Code4Lib Conference was able to sign on Valerie Aurora as a keynote speaker, and she requested to be interviewed rather than to give a speech. I was happy to offer to do the interview, which you can see here. It was one of the best sessions we’ve ever had at Cod4Lib, and not because of the interviewer.

Now the Ada Initiative is asking for our help and I’m happy to help publicize the library-specific campaign to raise money to support and expand their work. Suggested donation amounts use powers of 2, which is totally a geek thing. Andromeda Yelton, Bess Sadler, Chris Bourg, and Mark Matienzo joined forces to help raise $4,092 (2 to the 12th power) in matching funds. This means that any donation you make is automatically doubled. The campaign runs from now to September 15th, so donate now!

Feel the power of using powers of 2 to build a more equitable and just society.

Bess Sadler: The Ada Initiative Has My Back

Tue, 2014-09-09 22:49


The Ada Initiative has my back. In the past several years they have been a transformative force in the open source software community and in the lives of women I know and care about. To show our support, Andromeda Yelton, Chris Bourg, Mark Matienzo and I have pledged to match up to $5120 of donations to the Ada Initiative made through this link before Tuesday September 16. That seems like a lot of money, right? Well, here’s my story about how the Ada Initiative helped me when I needed it most.






Thanks to the Ada Initiative, having a conference code of conduct has become an established best practice, and it is changing conference culture for the better. I’m so proud of the many library conferences and organizations who have adopted Code of Conduct policies this year. However, just because a conference has a policy in place doesn’t mean there won’t be any problems. I’d like to share something that happened to me this year, the way the Ada Initiative helped me and the conference in question deal with it, and how things have since improved.

I gave a talk at code4lib a few years ago called “Vampires vs Werewolves: Ending the War Between Developers and Sysadmins with Puppet“. The talk was well received enough that it became a little bit of a meme, and last year PuppetConf asked me if I’d give an updated version of it for PuppetConf 2013.

It was exciting to participate in such a huge conference. PuppetConf is well funded and professionally managed. They have a code of conduct in place, friendly and helpful staffers, and high quality content. I was having a great time up until right before my talk.

Unfortunately, the talk before mine was titled “Nobody Has to Die Today: Keeping the Peace With the Other Meat Sacks.” I watched this talk on the video monitor from backstage, while getting hooked up to a lapel microphone, already a little nervous about facing such a large audience.

The speaker was a large heavily muscled man who was shouting more than speaking into the microphone. He was shouting about violence, and about how many people get murdered in the workplace. He particularly mentioned the fact that murder is the number one cause of death for women in the workplace. I felt my blood running cold. My body felt flooded with fear, and I wanted to run. He went on to discuss the many ways he personally had hurt people, through his work as a bouncer, in martial arts, or just because someone made him angry. At this point I was literally shaking. I have been on the receiving end of violence. I have known people who have been murdered. I know people, especially women, who have been hospitalized with the kinds of injuries he was graphically describing having inflicted. In spite of the small print disclaimer on his slides that this presentation was not encouraging violence, it was doing precisely that. If you don’t communicate technical requirements in the way he specifies, apparently, you will get what’s coming to you and you will deserve it.

The conference staff backstage were horrified. Some high profile people in the audience were walking out. It was clear that I was upset (I was trying not to hyperventilate at this point) and someone kept asking me if I wanted to file a code of conduct complaint. The thought of this made my panic even worse. I could easily picture filing a complaint and then paying for it with my life, when this guy found out and beat me to death in the parking lot after the conference. I said I did not want to file a complaint. I tried to take deep breaths and to not break down crying. I was determined to give my talk in spite of my shaky emotional state.

And I did. I delivered my presentation, which went surprisingly well, except for the fact that in the video I am swaying back and forth. I don’t usually do that when I speak, and I read it as the outward manifestation of how upset I was.

Afterwards, I thought for a long time about writing to the conference with my concerns. I started to do so several times, but I always chickened out. It was too easy for me to picture this guy learning my name and coming after me. Before you dismiss me as paranoid please consider the stories of Anita Sarkeesian and Kathy Sierra. Women in technology face worse than sexist jokes. We face assault. We face death threats. If you defend the status quo, understand what you are defending.

It wasn’t until this year’s PuppetConf call for proposals that I complained. The conference had liked my talk last year, and invited me to submit another talk this year. I wrote to decline, and told them why. I also sent a copy of my letter to Valerie Aurora, asking for the advice of the Ada Initiative.

I am very pleased to say that PuppetConf took my concerns seriously. Working with the Ada Initiative, they strengthened their code of conduct, put more screening measures in place for presentations, and improved training for conference staff on how to deal with problematic situations. PuppetLabs is an example of a company that is doing things right. They have specific outreach programs to get more women involved in the Puppet community and they are pursuing similar strategies to encourage participation from underrepresented racial groups. I feel good about the fact that I’m sending members of my staff to PuppetConf 2014, and at this point I would gladly speak at the conference again.

As upsetting as this incident was, this is a story with a happy ending. Because the Ada Initiative exists, both PuppetConf and I had someone to go to for guidance in how to improve the situation. Honestly, I still feel a little afraid about writing this post. But I also believe that nothing gets better until people take the risk of speaking out publicly. I am choosing to take that risk, in order to better communicate about why this work matters.

The Ada Initiative continues to do great things. You can read their 2014 progress report here. I am particularly excited about the Ally Skills workshop that will be offered at the Digital Library Federation Forum on October 29. Today, librarians are showing our love for the Ada Initiative. Watch for blog and social media posts from friends in library land who will be sharing more stories about why the Ada Initiative matters, and follow the action on twitter under the hashtag #libs4ada. Join us in supporting the Ada Initiative’s mission and donate today!

State Library of Denmark: Small scale sparse faceting

Tue, 2014-09-09 20:35

While sparse faceting has profound effect on response time in our web-archive, we are a bit doubtful about the amount of multi billion document Solr indexes out there. Luckily we also have our core index at Statsbiblioteket, which should be a bit more representative of your everyday Solr installation: Single-shard, 50GB, 14M documents. The bulk of the traffic are user-issued queries, which involves spellcheck, edismax qf & pf on 30+ fields and faceting on 8 fields. In this context, the faceting is of course the focus.

Of the 8 facet fields, 6 are low-cardinality and 2 are high-cardinality. Sparse was only active for the 2 high-cardinality ones, namely subject (4M unique values, 51M instances (note to self: 51M!? How did it get so high?)) and author (9M unique values, 40M instances).

To get representative measurements, the logged response times were extracted for the hours 07-22; there’s maintenance going on at night and it makes measurement unreliable. Only user-entered searches with faceting were considered. To compare before- and after sparse-enabling, the data for this Tuesday and last Tuesday were used.

50GB / 14M docs, logged timing from production, without (20140902) and with (20140909) sparse faceting

The performance improvement is palpable with response time being halved, compared to the non-sparse faceting. Fine-reading the logs, the time spend on faceting the high-cardinality fields is now in the single-digit milliseconds for nearly all queries. We’ll have to do some test to see what stops the total response time from getting down to that level. I am guessing spellcheck.

As always, sparse faceting is readily available for the adventurous at SOLR-5894.


State Library of Denmark: Small scale sparse faceting

Tue, 2014-09-09 20:35

While sparse faceting has profound effect on response time in our web-archive, we are a bit doubtful about the amount of multi billion document Solr indexes out there. Luckily we also have our core index at Statsbiblioteket, which should be a bit more representative of your everyday Solr installation: Single-shard, 50GB, 14M documents. The bulk of the traffic are user-issued queries, which involves spellcheck, edismax qf & pf on 30+ fields and faceting on 8 fields. In this context, the faceting is of course the focus.

Of the 8 facet fields, 6 are low-cardinality and 2 are high-cardinality. Sparse was only active for the 2 high-cardinality ones, namely subject (4M unique values, 51M instances (note to self: 51M!? How did it get so high?)) and author (9M unique values, 40M instances).

To get representative measurements, the logged response times were extracted for the hours 07-22; there’s maintenance going on at night and it makes measurement unreliable. Only user-entered searches with faceting were considered. To compare before- and after sparse-enabling, the data for this Tuesday and last Tuesday were used.

50GB / 14M docs, logged timing from production, without (20140902) and with (20140909) sparse faceting

The performance improvement is palpable with response time being halved, compared to the non-sparse faceting. Fine-reading the logs, the time spend on faceting the high-cardinality fields is now in the single-digit milliseconds for nearly all queries. We’ll have to do some test to see what stops the total response time from getting down to that level. I am guessing spellcheck.

As always, sparse faceting is readily available for the adventurous at SOLR-5894.


Nicole Engard: Bookmarks for September 9, 2014

Tue, 2014-09-09 20:30

Today I found the following resources and bookmarked them on <a href=

  • Color Oracle Color Oracle is a free color blindness simulator for Window, Mac and Linux.

Digest powered by RSS Digest

The post Bookmarks for September 9, 2014 appeared first on What I Learned Today....

Related posts:

  1. Another Satisfied Customer
  2. Amazon’s bestselling laptop is open source!
  3. September Workshops

Roy Tennant: In Memoriam: Anne Grodzins Lipow

Tue, 2014-09-09 20:23

I was reminded by her daughter on Facebook that Anne Grodzins Lipow passed away ten years ago today. In commemoration of that horrible event, I am posting the Foreword I wrote for Anne’s festschrift that was published in 2008.

On September 9, 2004 librarianship lost a true champion. Anne Grodzins Lipow was unique – of all the testimonials I’ve read about her that is one undeniable truth. We each knew a different set of Anne’s qualities, or engaged with her in a different way, but in the end it all came down to the fact that Anne was someone we could all say was “larger than life”.

The days after her passing were filled with personal testimonials that were mostly lodged as comments on the Infopeople blog. It was an odd experience for me to read these messages and realize that as much as I felt that I knew her, I barely knew her at all. I was like the proverbial blind man with his hands wrapped around one part of the elephant, while others had a firm grip on other body parts and would describe a very different animal. My reality, as deeply felt as it was, was only a pale shadow of the whole.

But for all that, it was a long, long shadow. As a newly-minted librarian at UC Berkeley in the second half of the 1980s, I knew Anne as the person who led the outreach and instructional efforts of the library. Before long, she saw in me the potential to be a good teacher, despite my fear of public speaking, so she pulled me into her program and began teaching me everything she knew about speaking, putting on workshops, making handouts, etc. Under her tutelage, I taught classes such as dialup access to the library catalog, when 300bps modems were still common.

As the Internet began making inroads into universities, Anne was there with newly developed workshops on how to use it. She was convinced very early on, as was I, that the Internet would be an essential technology for libraries. This led to her approaching my colleague John Ober (then on faculty at the library school at Berkeley) and I about doing a full-day Internet workshop scheduled to coincide with the 1992 ALA Annual Conference in San Francisco. Using a metaphor of John’s, we called it
“Crossing the Internet Threshold”.

In preparing for the workshop, we created so many handouts that we needed to put them into a binder that began to look increasingly like a book in the making. With typical Anne flair, she arranged for the gifted librarian cartoonist Gary Handman (also our colleague at Berkeley) to create a snazzy cover for the binder, that she also used to create T-shirts (which many of us have to this day).

Anne knew enough about workshops to do a “trial run” before the big day, so we did one for UC Berkeley library staff a couple weeks before, which gave us feedback essential to making an excellent workshop. In the end, the workshop was such a hit that Anne ran with it. She took the binder of handouts we had created and made a book out of it — the first book of her newly-created business called Library Solutions Institute and Press. Her decision to publish the book herself rather than seek out a publisher was so typical of Anne. And how she did it will tell you a lot about her.

Despite the higher cost, Anne insisted on using domestic union printing shops for printing. While other publishers were publishing books overseas for a fraction of the cost, publishing for Anne was a political and social activity, through which she could do good for those around her. It was very important to her to treat people with respect and kindness, and she did it so well. That was the kind of person Anne was.

While every publisher I have since worked with after Anne has insisted they are incapable of paying royalties any more frequently than twice a year, Anne paid her authors monthly. And whereas other publishers wait months to pay you for royalties earned long before, Anne would pay immediately. This meant that when books were returned, as they sometimes were, she took the loss for having paid the author royalties on books that had not been sold. That was the kind of person Anne was.

Anne continued to blaze new trails after libraries began climbing on the Internet bandwagon, due in no small measure to her books and workshops on the topic. Anne became a well-known and coveted consultant on a number of topics, but in particular on reference services.

Her “Rethinking Reference” institutes and book were widely acclaimed, and her book The Virtual Reference Librarian’s Handbook (2003) demonstrated that Anne was always at the cutting edge of librarianship. That was the kind of person Anne was.

I visited her after her cancer was diagnosed and after her treatment had failed. We all knew there was no hope, that she had only a matter of weeks to live. Despite the obvious ravages of the illness, Anne’s outlook remained bright and welcoming. She was happy to have her friends and family around her, and we talked of many things except the dark shadow that hung over us all. Even then, she was happy to see whoever came by, and to talk with them with a smile and good wishes. That was the kind of person Anne was.

A piece of all my major professional accomplishments I owe to Anne, and her great and good influence on me. She would deny this, despite it’s truth, wanting all the credit to accrue to me alone. That was the kind of person Anne was.

 

Each one of us who have contributed to this volume have been touched by Anne in our own, quite personal ways. Some of us have known of her work mostly by reputation and reading, while others were blessed with more direct and personal contact. But the fact remains that Anne cast a long professional shadow that will affect many librarians yet to come.

For those of us who created a monument of words to someone we love and respect, Anne had one final gift to give. As anyone who has ever created a present for someone they love knows, in so doing you think about the person for whom you are making the gift. Therefore, the authors of this volume have all spent more time with Anne, and as always it was time well spent. We know our readers will count it so too.

31 January 2008, Sonoma, CA

LITA: LITA Midwinter Institutes

Tue, 2014-09-09 19:34

Registration for LITA’s Midwinter Institutes opened today with ALA’s joint registration! Whether you’ll be attending Midwinter or are just looking for a great one day continuing education event in the Chicago/Midwest area, we hope you’ll join us.

When? All workshops will be held on Friday, January 30, 2015, from 8:30-4:00

Cost for LITA Members: $235  (ALA $350 / Non-ALA $380)
(If you are a member of LITA use special code LITA2015 to receive the price of $235.)

Workshops Descriptions:

Developing mobile apps to support field research
Instructor: Wayne Johnston, University of Guelph Library

Researchers in most disciplines do some form of field research. Too often they collect data on paper which is not only inefficient but vulnerable to date loss. Surveys and other data collection instruments can easily be created as mobile apps with the resulting data stored on the campus server and immediately available for analysis. The apps also enable added functionality like improved data validity through use of authority files and capturing GPS coordinates. This support to field research represents a new way for academic libraries to connect with researchers within the context of a broader research date management strategy.

Introduction to Practical Programming
Instructor: Elizabeth Wickes, University of Illinois at Urbana-Champaign

This workshop will introduce foundational programming skills using the Python programming language. There will be three sections to this workshop: a brief historical review of computing and programming languages (with a focus on where Python fits in), hands on practice with installation and the basics of the language, followed by a review of information resources essential for computing education and reference. This workshop will prepare participants to write their own programs, jump into programming education materials, and provide essential experience and background for the evaluation of computing reference materials and library program development. Participants from all backgrounds with no programming experience are encouraged to attend.

From Lost to Found: How user Testing Can Improve the User Experience of Your Library Website
Instructors: Kate Lawrence, EBSCO Information Services; Deirdre Costello, EBSCO Information Services; Robert Newell, University of Houston

When two user researchers from EBSCO set out to study the digital lives of college students, they had no idea the surprises in store for them. The online behaviors of “digital natives” were fascinating: from students using Google to find their library’s website, to what research terms and phrases students consider another language altogether: “library-ese.” Attendees of this workshop will learn how to conduct usability testing, and participate in a live testing exercise via usertesting.com. Participants will leave the session with the knowledge and confidence to conduct user testing that will yield actionable and meaningful insights about their audience.

 

More details about these workshops will be coming in interviews with the instructors in October! If you have a question you’d like to ask the instructors, please contact LITA Education Chair Abigail Goben at [firstnamelastname]@gmail.com

 

 

 

 

LITA: 2014 LITA Forum: early bird rates available through Sept. 15

Tue, 2014-09-09 19:19
Don’t miss your chance to save up to $50 on registration for the 2014 LITA Forum “From Node to Network” to be held Nov. 5-8, 2014 at the Hotel Albuquerque in Albuquerque N.M.

Don’t forget to book your room at the Hotel Albuquerque by Oct. 14, 2014 to guarantee the LITA room rate.

This year’s Forum will feature three keynote speakers

  • AnnMarie Thomas, Engineering Professor, University of St. Thomas
  • Lorcan Dempsey, Vice President, OCLC Research and Chief Strategist
  • Kortney Ryan Ziegler, Founder Trans*h4ck.

More than 30 concurrent colleague inspired sessions and a dozen poster sessions will provide a wealth of practical information on a wide range of topics.

Networking opportunities, a major advantage of a smaller conference, are an important part of the Forum. Take advantage of the Thursday evening reception and sponsor showcase, the Friday networking dinners or Kitchen Table Conversations, plus meals and breaks throughout the Forum to get to know LITA leaders, Forum speakers, sponsors, and peers.

This year two preconference workshops will also be offered.

Linked Data for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users
Led by: Dean B. Krafft and Jon Corson-Rikert, Cornell University Library

Linked Open Data (LOD) provides an expressive and extensible mechanism for sharing information (metadata) about all the materials research libraries make available. In this workshop the presenters will introduce the principles and practices of creating and consuming Linked Open Data via a series of examples from sources relevant to libraries. They will provide an introduction to the technologies, tools, and types of data typically involved in creating and working with Linked Open Data and the semantic web. The preconference will also address the challenges of data quality, interoperability, authoritativeness, privacy, and other issues accompanying the adoption of new technologies as these apply to making use of Linked Open Data.

Learn Python by Playing with Library Data
Led by: Francis Kayiwa, Kayiwa Consulting

What can be more fun than learning Python? Learning Python by hacking on library data! In this workshop, you’ll learn Python basics by reading files, looking at MARC (yes MARC), building data structures, and analyzing library data (those logs aren’t going to appreciate themselves). By the end, you will have set up your Python environment, installed some useful packages, and learned how to write simple programs that you can use to impress your colleagues back at work.

2014 LITA Forums sponsors include EBSCO, Springshare, @mire, Innovative and OCLC.

Visit the LITA website for more information.

Library and Information Technology Association (LITA) members are information technology professionals dedicated to educating, serving, and reaching out to the entire library and information community.   LITA is a division of the American Library Association.

LITA and the LITA Forum fully support the Statement of Appropriate Conduct at ALA Conferences

Bill Dueber: Help me test yet another LC Callnumber parser

Tue, 2014-09-09 19:10

Those who have followed this blog and my code for a while know that I have a long, slightly sad, and borderline abusive relationship with Library of Congress call numbers.

They're a freakin' nightmare. They just are.

But, based on the premise that Sisyphus was a quitter, I took another stab at it, this time writing a real (PEG-) parser instead of trying to futz with extended regular expressions.

The results, so far, aren't too bad.

The gem is called lc_callnumber, but more importantly, I've put together a little heroku app to let you play with it, and then correct any incorrect parses (or tell me that it worked correctly) to build up a test suite.

So…Please try to break my LC Callnumber parser!

[Code for the app itself is on github; pull requests for both the app and the gem joyously received]

David Rosenthal: Two Brief Updates

Tue, 2014-09-09 17:56
A couple of brief updates on topics I've been covering, Amazon's margins and the future of flash memory.



First, Benedict Evans has a fascinating and detailed analysis of Amazon's financial reports. Read the whole thing.

He shows how Amazon's strategy is not to generate and distribute profits, but to re-invest their cash flow into staring and developing businesses. Starting each business absorbs cash, but as they develop they turn around and start generating cash that can be used to start the next one.

His graphs and numbers make the case brilliantly. Here, for example, is Amazon's revenues and profits since launch; lots of revenues and almost no profit. But it is more revealing to focus, as Amazon does, on cash flow.

Here Evans shows Free Cash Flow (FCF), Capital Expenditure (capex), and Operating Cash Flow (OCF) as a proportion of revenue.
Amazon’s OCF margin has been very roughly stable for a decade, but the FCF has fallen, due to radically increased capex.Here Evans shows capex as a proportion of sales, showing a relentless rise starting in late 2009.
That is, if Amazon was spending the same on capex per dollar of revenue as it was in 2009, it would have kept $3bn more in cash in the last 12 months.What we're interested in here is the AWS business, which is most of the category Amazon calls "Other". Here is the growth of "Other" revenue. This is a market that Amazon is absolutely dominating. Its cash flow is doing two things, paying for the computing infrastructure Amazon needs to runs its other, much larger, established businesses, and paying for the startup costs of new businesses.

As far as I can see, in the "cloud" business only Google has the same synergy between an established business, and a cloud business. Other competitors don't need the cloud scale of investment to support another, much larger existing business. They have to treat their cloud investments as a stand-alone business, which is much less efficient. And they are much smaller than AWS. So they aren't going to survive. IBM and Microsoft, I'm looking at you.

Second, Chris Mellor at The Register looks at the hype surrounding the "all-flash data center" and makes the point that Dave Anderson of Seagate has been making for years.
That leaves us with the view that all-flash data centres are not feasible at present. They may become feasible if the cost of flash falls to near-parity with nearline and bulk storage disk but there is another problem: the flash foundry capacity to build the stuff just doesn't exist.

In terms of exabytes of capacity, worldwide disk production is vastly higher than that of flash, and with flash fabs costing $7bn to $9bn apiece it is likely to remain so.

This is no small matter. An all-flash data centre would need approximately the same number of TB of storage as current all-disk or hybrid flash/disk data centres.
...
The flash foundry operators are paranoid about avoiding loss-making gluts of product, having seen the dire effects of that in the memory industry, with its persistent huge losses and dramatic supplier consolidation. They will be slow to bring new flash fab capacity online.

They are working towards increasing flash capacity by increasing wafer density through cell geometry shrinks, and also through building flash chips with stacked layers of cells, so-called 3D NAND.

These in themselves won't allow the flash industry to take on any substantial portion of worldwide disk capacity in the next few years. That requires many new fabs and there is no sign of that happening.Not to mention that generating a return on a $7-9B investment requires that the product it builds be in the market for many years. Flash technology is approaching its limits, so the time during which flash will dominate the solid-state storage market with its premium pricing is short, too short to generate the necessary return.

HangingTogether: Innovative solutions for dealing with born-digital content in obsolete forms – Part 2

Tue, 2014-09-09 17:00

This is the second of three posts about a lightning talk session at SAA. Part 1 began with descriptions of the array of media an archives might confront and an effort to test how much can be done in house.

Part 2 picks up with four archivists talking about solutions to particularly challenging formats.

Abby Adams is the Digital Archivist at Hagley Museum & Library, an independent research library in Wilmington, Delaware, documenting American enterprise from its inception to present day with a focus on the intersection of industry, technology, and society. In 2012, Hagley received a large hybrid collection, consisting primarily of textual analog materials, in addition to a number of born-digital records. The records were created by various tech corporations during the normal course of business in the late 1990s and early 2000s and document aspects of the dot-com boom and bust, an area of research where primary sources are sorely lacking. Given the potentially high research value of the collection, Adams gave the preservation of the born-digital content high priority and culled hundreds of records cartons to discover the following obsolete media formats: 349 compact discs; 134 3.5” floppy disks; 113 digital linear tapes (DLT); 49 digital data storage tapes (DDS); 19 quarter-inch mini cartridges; 15 Travan cartridges; and 8 zip disks.

Although the CDs and floppy disks presented few problems, the remaining obsolete formats offered a lesson in how complex data recovery can be. Adams’ attempts to use “freecycled” drives and jerry-rig old PCs were just not working. Even if she could connect a computer to the exact generation DLT or DDS drive to read the tapes, she would also need to know the software program used to create the backup, which could vary widely depending on the date of creation, then successfully install it, and cross her fingers the media isn’t encrypted or corrupt. Since Hagley is a small shop with limited in-house resources, it was clear outsourcing the data extraction was the best course of action. After consulting several vendors, Adams and her coworker Kevin Martin found a company that specializes in data extraction and indexing of backup tapes. After establishing a budget for the first phase of the project, Adams and Martin sent the vendor a sample consisting of five DLT and three DDS tapes. Less than a week later, the vendor provided them access to the indexed data from seven out of eight tapes. Due to the size of the collection and Hagley’s limited in-house resources, Adams was strict with appraisal, retaining only about ten percent of the data. The original media was returned to Hagley a few weeks later. Having successfully completed the first phase of the project, Hagley will continue to use the same company for the remaining backup tapes.

Elise Warshavsky, is the Digital Archivist at the Presbyterian Historical Society, which serves as the national archives of the Presbyterian Church, documenting the political and social history of the church. The archives acquired the laptop of Clifton Kirkpatrick former Stated Clerk, the highest elected official within the church. The laptop contained files he had worked on as well as his email. Five years later Elise was hired and was asked to archive the Stated Clerk’s laptop. This was the nature of the “detailed instructions” she received regarding passwords, the types of files, and that there were 28,000 emails in the Novell GroupWise account:

The records manager who had originally received the laptop had converted the account to a Remote account enabling the email to live solely on the laptop. The records archivist had also reorganized the inbox and appraised each individual email, resulting in lost folder structure and possibly other lost metadata. The emails were readable, but because of a 50-year embargo on access to them, the goal was to ensure that these files would be readable in 50 years. After not being able to find a way to convert the GroupWise Remote email to another format, she finally contacted a company that makes a commercial grade email converter called Transend. They agreed to resurrect the Remote account on their GroupWise servers and then convert it to .pst, Microsoft’s open proprietary file format. Then she was able to move forward with her migration plan: convert to a more archival email format, .MBOX, as well as run a tool to batch export PDFs from each individual email and convert them to PDF/As – a format researchers would be able to search and access in 50 years.

Elise’s advice: If you get frustrated about not having the tools or skills necessary to complete a project, reach out to find help. There’s no need to develop resources in house when dealing with a unique, most likely not repeatable incident. Get help, and move on to doing what you do know how to do – accession, appraise, and preserve.

Ted Hull, Director of the Electronic Records Division at the National Archives at College Park, told of a project to recover content from 7-track tapes.

The Electronic Records Division accessions, processes, arranges for preservation, describes, and provides access to the born-digital federal records scheduled for permanent retention in the National Archives. They hold 932 series from over 100 federal agencies; consisting of over 750 million unique files and over 320 terabytes of data. 7-track magnetic tape was an industry standard from the 1950s -1970s, when it was generally replaced with 9-track magnetic tape. While most of the Archives’ content had already been transferred off of 7-track tape, in 2013, staff identified 13 remaining tapes containing records from the Federal Home Loan Bank Board, the Bureau of Indian Affairs, and the U.S. Joint Chiefs of Staff. The Archives reached out and found that the National Center for Atmospheric Research (NCAR) in Boulder, CO still had the capability to read 7-track tapes and were able to recover data from 9 tapes; the other 4 were blank. NCAR converted the binary-coded decimal encoding to ASCII and made the files available to NARA for direct download from their FTP site; NARA processed and accessioned the records and the original tapes were returned to NARA for disposal.

Ben Goldman, the Digital Records Archivist for Pennsylvania State University Libraries, discovered 27 3-inch disks in a modern literary manuscript collection. They didn’t have the equipment needed to read the disks, and we weren’t even sure if the disks were readable or even contained data worth recovering.
Amstrad disk from the Fiona Pitt-Kethley papers, Penn State University Special Collections Library

The author confirmed that she did own an Amstrad computer (a somewhat popular computer in the UK for a brief period in the 1980s), but because Ben didn’t know exactly what hardware or software was needed to read the disks, he decided to outsource recovery of the disks. He wanted to use the opportunity to come up with a model vendor agreement and to make the project an extension of their internal born-digital workflow. To that end, he created a media inventory spreadsheet to be used to identify the disks, their labels, their contents, the images derived from them, and to accommodate checksums after their eventual transfer. Mostly, however, he wanted to see if outsourcing was a viable option for archivists confronting elusive computer media formats and to see if core archival requirements could be met by outsourcing, whether service providers could adhere to emerging best practices, and to see if the costs were viable for archives. PSU provided funding for a project at $40 per disk.

[Tweet] Jason P. Evans Groth: $40/disk is same as person making $40k spending two hours to image obsolete disks, so maybe it is the right deal? #s601 #saa14

Soon Ben had a signed vendor agreement with the Museum of Computer Culture to provide disk images that could be processed using forensic tools. They were to work from the inventory and follow naming conventions and provide checksums to ensure accurate transfer.

Many months later, however, Ben was working with two other vendors – without a signed agreement. They found that disk images that were native to the Amstrad operating system couldn’t be migrated to modern formats or processed using common forensic tools. Instead, Ben received three versions of every file in three different formats, each with its own brand of lossy-ness and, in the end, there was no adherence to naming conventions and no checksums. Despite not really meeting his expectations, Ben doesn’t think of the project as a failure. “Fugitive media is defiant,” he warns. Communication is key and the vendor agreement should establish communication requirements. Beyond that, Ben is not sure this cost model will be sustainable. Instead, he suggests that archivists need to develop in-network options. There are technologies, resources, and talented people working on these issues. It would be nice to see some better community strategies for tackling the issues and supporting each other.

Next up: Part 3 will continue with three speakers representing the service provider point of view.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (32)

DPLA: September 10/15, 2014: Board and Board Finance Committee Open Calls

Tue, 2014-09-09 14:00

The DPLA Board and its Finance Committee will each hold an open conference call in September 2014.  Both of these calls are open to the public.

Board Finance Committee Open Call
September 10, 2014 at 1:00 PM EDT

? View Agenda and Dial-in

Agenda

  • Overview of recent grant awards
  • Open comments and suggestions from the committee
  • Comments and suggestions from the public

Dial-in

Via the web:

https://global.gotomeeting.com/meeting/join/312488189

Via telephone
United States: +1 (805) 309-0012
Access Code: 312-488-189
Audio PIN: Shown after joining the meeting
Meeting ID: 312-488-189

 

Board of Directors Open Call
September 15, 2014 at 3:00 PM EDT

? View Agenda and Dial-in

 

Agenda

Public Session

  • Proposal to amend DPLA By­laws to allow for increased number of Directors (Call to vote)
  • Overview of draft DPLA Strategic Plan
  • Update from Executive Director
  • Questions/comments from the public

Executive Session

  • Review of DPLA Handbook
  • Conflict of Interest Certification
  • Review of draft DPLA Strategic Plan
  • Funding and financial update

Dial-in

All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

LITA: Introducing the New LITA Blog Writers

Tue, 2014-09-09 13:00

You’ll still be able to find LITA announcements and events posted on the blog, but now there will also be original content by LITA members representing a variety of perspectives, from library students to  public, academic, and special librarians.

The LITA blog also welcomes guest posts. To submit an idea for consideration, please email me at briannahmarshall(at)gmail(dot)com with a bio, brief summary of your post topic, and link to a writing sample if possible.

Without further ado, here are the writers whose posts you’ll be reading in the coming months.

Bryan J. Brown

Bryan received his BS in English and Philosophy from the University of Southern Indiana, and is a recent graduate from Indiana University’s Department of Information and Library Science where he focused on digital libraries and metadata. After graduation, Bryan transplanted to Tallahassee, FL to be a developer at Florida State University Libraries’ Technology and Digital Scholarship Department. His professional interests include Open Source software in libraries and archives, digital preservation and the semantic web. For more information, check out bryjbrown.github.io.

Lindsay Cronk

Lindsay – librarian, blogger, and adventurer – graduated with her MLIS from Valdosta State is 2012 and has been advocating and serving libraries through her work at LYRASIS ever since. Her interests include open source development models, tools for library marketing and outreach, student research behavior, and later career David Bowie. You can catch her online at her blog or tweeting @linds_bot.

Brittney Farley

Brittney is in her final year as an MSLIS student at the Florida State University’s iSchool. Her specializations include information management/technology and human-computer interaction. She received her BA in History from the University of Florida. She is currently a library assistant in the City of Boca Raton Public Library’s Instructional Services department. Brittney blends her background, as help desk assistant and researcher, to better serve patrons of varying technical understanding.

Lauren Hays

Lauren is the instructional and research librarian at MidAmerica Nazarene University in Olathe, KS. Along with her master’s in library science, she recently completed her second master’s degree in educational technology and also received a graduate certificate in online teaching and learning.  Her professional interests include information literacy, adult learners, online learning, technology, connected learning, and the scholarship of teaching and learning.   In her spare time, she can be found drinking coffee, reading, or planning her next trip.  Follow her on Twitter @Lib_Lauren.

John Klima

John is the Assistant Director of the Waukesha Public Library where one of his many hats is maintaining, upgrading, and innovating technology within the library. Klima wrote a number of articles on steampunk for Library Journal. In his spare time, he is the editor of The Bulletin, the professional publication of the Science Fiction and Fantasy Writers of America. From 2001 to 2013 he edited the Hugo-Award winning magazine Electric Velocipede. Klima has also edited several anthologies including Logorrhea: Good Words Make Good Stories, and Happily Ever After. He co-edited the anthology Glitter & Mayhem with Lynne M. Thomas and Michael Damian Thomas.

Brianna Marshall

Brianna is Digital Curation Coordinator at the University of Wisconsin-Madison, where she manages the institutional repository and develops campuswide services for research data management and curation. She received her Master of Information Science and Master of Library Science from Indiana University’s School of Informatics and Computing in May 2014. From 2012-2014 she was a writer and managing editor for the library student-run blog Hack Library School. Now she is excited to be the new LITA blog editor. She tweets on occasion at @notsosternlib and keeps a blog, too.

Leanne Mobley

Leanne recently earned her MLS from Indiana University and currently works as the Digital Literacy Librarian for the Martin County Library System. Her background is in media production and she is passionate about using technology to bring ideas to life. She is an ardent library lover and still carries her very first library card in her wallet. Find her on Twitter @hey_library.

Leanne Olson

Leanne is a Metadata Management Librarian at Western University in London, Ontario, Canada.  Her main library-related areas of interest include metadata and cataloguing, digital libraries, authority control, teaching, and library history.  She’s also a playwright and lover of the outdoors.  Much of her blogging will be done from her backyard, possibly under five feet of snow.

Michael Rodriguez

Michael is the newly minted eLearning Librarian at Hodges University in southwest Florida, with the faculty rank of Assistant Professor. He graduated in August 2014 with his MLIS from Florida State University and has a background in history and public librarianship. Michael is also a technologist, interested in software customization, distance education, and free web tools and apps. When not doing cool stuff at work, he kayaks among the many mangrove islands off the Florida coast. He tweets @topshelver and blogs at Shelver’s Cove.

Erik Sandall

Erik is Electronic Services Librarian and Webmaster at Mechanics’ Institute in San Francisco, Calif. His professional interests are in integrated library systems, content management systems, online databases, ebooks, and web design and development. When he’s not working on these things, Erik is probably playing soccer or practicing how to open a wine bottle without breaking the cork.

Leo Stezano

Leo is a Project Manager at the Avery Architecture and Fine Arts Library at Columbia University; this is his first library job since receiving his MLIS from Syracuse University in 2011. Previously he spent many years in the private sector, working in Project and Product Management and Business Analysis for a variety of companies. His professional interests include digital librarianship, process optimization, and innovative technical project philosophies. He also enjoys playing soccer and raising two toddlers. You can follow Leo at http://leosmlisblog.wordpress.com/ and on Twitter at @LeoStezano.

Grace Thomas

Grace is a first year grad student working toward a dual-degree MLIS at Indiana University. With a background in English, Computer Science, and Digital Humanities from the University of Nebraska-Lincoln, she is especially interested in digital libraries and archives, and digital preservation. Currently, she works as a Graduate Assistant with associate professors John Walsh and Noriko Hara in the IU School of Informatics and Computing, and on the Petrarchive Digital Archive Project. Grace spends the rest of her time in swimming pools, watching any and all dance performances, and exploring Bloomington by bicycle, occasionally tweeting about all of the above at @gracehthom.

John Miedema: Slow reading six years later. Digital technology has evolved, and so have I. There is a trade-off.

Tue, 2014-09-09 12:59

I was recently interviewed by The Wall Street Journal about slow reading. It has been a few years since I did one of these interviews. I wrote Slow Reading in 2008, six years ago. At the time, the Kindle had just been released and there was a surge of discussion about reading practices, to which I attribute the interest in my little book of research. The request for an interview suggests an ongoing interest in slow reading. So what do I have to say about the subject now?

I used to slow-read often. I would write books reviews, thinking myself progressive in a digital sense for blogging reviews in just four paragraphs. A shift began. My ongoing use of digital technology to read, write and think forced that shift along. I tried to write about that shift in a new online book project — I, Reader — but I failed. The shift was still in progress. I hit a wall at one point. I thought for a time I had reached the end of reading. In 2013, I stopped reading and writing. A year later I started again. I have a good perspective on the shift, but I have no immediate plans to resume writing about it.

So what did I tell the interviewer about slow reading? I confessed that I slow-read print books less often. I re-asserted that “Slow reading is a form of resistance,  challenging a hectic culture that requires speed reading of volumes of information fragments.” I admitted that my resistance is waning. Digital technology has evolved to allow for reading, not just for scanning of information fragments, but also for comprehension of complex and rich material. I was surprised and pleased to discover how digital technology has re-programmed my reading and writing skills to process information more quickly and deeply. I am smarter than I used to be.

I have resumed my writing of book reviews. I restored a selection of book reviews from the past, ones relevant to my current blogging purposes. I will be writing new reviews, probably less often. I will be writing them differently. Currently I am reading Book Was There: Reading in Electronic Times by Andrew PiperI no longer take notes on paper as I read. I have been tweeting notes. I like the way it is evolving. I use a hashtag for the title and author, and sometimes a reader joins in. When I am done, I will write a very short review, two paragraphs tops, and post it here.

That’s not all I said to the interviewer. I said there has been a trade-off because of digital technology. There is always a trade-off. We just have to decide whether whether the gains are more than the losses. What have we lost? I lingered on this question because the loss is less than I anticipated. We still read. We still read rich and complex material. Students still prefer print books for serious reading but I expect they are going through the same transition as I did. What is lost, I assert, is long-form writing. Books born print can be scanned and put online, but books born digital are getting shorter all the time. It is no coincidence that my book, Slow Reading, was short. I was already a reader in transition. Digital technology prefers shortness. It is one reason that many kinds of poetry will survive and thrive on the web. Things should be short and simple as possible (but not simpler, per the quote attributed to Einstein). Long-form novels and textbooks will be lost in time. It is a loss. Is it worth it?

Jakob Voss: Abbreviated URIs with rdfns

Tue, 2014-09-09 09:26

Working with RDF and URIs can be annoying because URIs such as “http://purl.org/dc/elements/1.1/title” are long and difficult to remember and type. Most RDF serializations make use of namespace prefixes to abbreviate URIs, for instance “dc” is frequently used to abbreviate “http://purl.org/dc/elements/1.1/” so “http://purl.org/dc/elements/1.1/title” can be written as qualified name “dc:title“. This simplifies working with URIs, but someone still has to remember mappings between prefixes and namespaces. Luckily there is a registry of common mappings at prefix.cc.

A few years ago I created the simple command line tool rdfns and a Perl library to look up URI namespace/prefix mappings. Meanwhile the program is also available as Debian and Ubuntu package librdf-ns-perl. The newest version (not included in Debian yet) also supports reverse lookup to abbreviate an URI to a qualified name. Features of rdfns include:

look up namespaces (as RDF/Turtle, RDF/XML, SPARQL…)

$ rdfns foaf.ttl foaf.xmlns dbpedia.sparql foaf.json @prefix foaf: . xmlns:foaf="http://xmlns.com/foaf/0.1/" PREFIX dbpedia: "foaf": "http://xmlns.com/foaf/0.1/"

expand a qualified name

$ rdfns dc:title http://purl.org/dc/elements/1.1/title

lookup a preferred prefix

$ rdfns http://www.w3.org/2003/01/geo/wgs84_pos# geo

create a short qualified name of an URL

$ rdfns http://purl.org/dc/elements/1.1/title dc:title

I use RDF-NS for all RDF processing to improve readability and to avoid typing long URIs. For instance Catmandu::RDF can be used to parse RDF into a very concise data structure:

$ catmandu convert RDF --file rdfdata.ttl to YAML

Jonathan Rochkind: Cardo is a really nice free webfont

Tue, 2014-09-09 04:39

Some of the fonts on google web fonts aren’t that great. And I’m not that good at picking the good ones from the not-so-good ones on first glance either.

Cardo is a really nice old-style serif font that I originally found recommended on some list of “the best of google fonts”.

It’s got a pretty good character repertoire for latin text (and I think Greek). The Google Fonts version doesn’t seem to include Hebrew, even though some other versions might?  For library applications, the more characters the better, and it should have enough to deal stylishly with whatever letters and diacritics you throw at it in latin/germanic languages, and all the usual symbols (currency, punctuation; etc).

I’ve used it in a project that my eyeballs have spent a lot of time looking at (not quite done yet), and been increasingly pleased by it, it’s nice to look at and to read, especially on a ‘retina’ display. (I wouldn’t use it for headlines though)


Filed under: Uncategorized

DPLA: DPLA &amp; Imgur’s Summer of Archives Comes to a Close

Tue, 2014-09-09 00:58

Back in June, we announced our collaboration with the Digital Public Library of America (DPLA) for the Summer of Archives–an experimental gallery endeavor that brought tons of historical OC gems to User Submitted. From perfectly looping space GIFs, to famous cats of history, to beautiful book covers, to celestial maps, we’re happy to call this experiment a huge and awesome success.

The very last Summer of Archives post is live in User Submitted right now. We’re going out the same way we came in–with historical GIFs!

Huge thanks to the DPLA for sharing this special content with Imgur all summer long. Be sure to check the DPLA Imgur account to revisit all of the submissions. If your thirst for history cannot be quenched, head over to the DPLA website for a vast array of great content.

This blogpost was originally published on the Imgur.com blog (view on Imgur.com).

William Denton: Augustus

Tue, 2014-09-09 00:32

A few people recommended Stoner by John Williams to me, and they were right. It’s a gem.

I was in Book City tonight and the clerk was selling a customer on Stoner for a book club. Browsing the new release tables with Williams on my mind I saw a similar new edition from New York Review Books of Augustus, which is about that Augustus.

The first line is a doozy:

… I was with him at Actium, when the sword struck fire from metal, and the blood of soldiers was awash on deck and stained the blue Ionian Sea, and the javelin whistled in the air, and the burning hulls hissed upon the water, and the day was loud with the screams of men whose flesh roasted in the armor they could not fling off; and earlier I was with him at Mutina, where that same Marcus Antonius overran our camp and the sword was thrust into the empty bed where Caesar Augustus had lain, and where we persevered and earned the first power that was to give us the world; and at Philippi, where he traveled so ill he could not stand and yet made himself to be carried among his troops in a litter, and came near death again by the murderer of his father, and where he fought until the murderers of the mortal Julius, who became a god, were destroyed by their own hands.

Pages