You are here

Feed aggregator

Bess Sadler: The Ada Initiative Has My Back

planet code4lib - Tue, 2014-09-09 22:49


The Ada Initiative has my back. In the past several years they have been a transformative force in the open source software community and in the lives of women I know and care about. To show our support, Andromeda Yelton, Chris Bourg, Mark Matienzo and I have pledged to match up to $5120 of donations to the Ada Initiative made through this link before Tuesday September 16. That seems like a lot of money, right? Well, here’s my story about how the Ada Initiative helped me when I needed it most.






Thanks to the Ada Initiative, having a conference code of conduct has become an established best practice, and it is changing conference culture for the better. I’m so proud of the many library conferences and organizations who have adopted Code of Conduct policies this year. However, just because a conference has a policy in place doesn’t mean there won’t be any problems. I’d like to share something that happened to me this year, the way the Ada Initiative helped me and the conference in question deal with it, and how things have since improved.

I gave a talk at code4lib a few years ago called “Vampires vs Werewolves: Ending the War Between Developers and Sysadmins with Puppet“. The talk was well received enough that it became a little bit of a meme, and last year PuppetConf asked me if I’d give an updated version of it for PuppetConf 2013.

It was exciting to participate in such a huge conference. PuppetConf is well funded and professionally managed. They have a code of conduct in place, friendly and helpful staffers, and high quality content. I was having a great time up until right before my talk.

Unfortunately, the talk before mine was titled “Nobody Has to Die Today: Keeping the Peace With the Other Meat Sacks.” I watched this talk on the video monitor from backstage, while getting hooked up to a lapel microphone, already a little nervous about facing such a large audience.

The speaker was a large heavily muscled man who was shouting more than speaking into the microphone. He was shouting about violence, and about how many people get murdered in the workplace. He particularly mentioned the fact that murder is the number one cause of death for women in the workplace. I felt my blood running cold. My body felt flooded with fear, and I wanted to run. He went on to discuss the many ways he personally had hurt people, through his work as a bouncer, in martial arts, or just because someone made him angry. At this point I was literally shaking. I have been on the receiving end of violence. I have known people who have been murdered. I know people, especially women, who have been hospitalized with the kinds of injuries he was graphically describing having inflicted. In spite of the small print disclaimer on his slides that this presentation was not encouraging violence, it was doing precisely that. If you don’t communicate technical requirements in the way he specifies, apparently, you will get what’s coming to you and you will deserve it.

The conference staff backstage were horrified. Some high profile people in the audience were walking out. It was clear that I was upset (I was trying not to hyperventilate at this point) and someone kept asking me if I wanted to file a code of conduct complaint. The thought of this made my panic even worse. I could easily picture filing a complaint and then paying for it with my life, when this guy found out and beat me to death in the parking lot after the conference. I said I did not want to file a complaint. I tried to take deep breaths and to not break down crying. I was determined to give my talk in spite of my shaky emotional state.

And I did. I delivered my presentation, which went surprisingly well, except for the fact that in the video I am swaying back and forth. I don’t usually do that when I speak, and I read it as the outward manifestation of how upset I was.

Afterwards, I thought for a long time about writing to the conference with my concerns. I started to do so several times, but I always chickened out. It was too easy for me to picture this guy learning my name and coming after me. Before you dismiss me as paranoid please consider the stories of Anita Sarkeesian and Kathy Sierra. Women in technology face worse than sexist jokes. We face assault. We face death threats. If you defend the status quo, understand what you are defending.

It wasn’t until this year’s PuppetConf call for proposals that I complained. The conference had liked my talk last year, and invited me to submit another talk this year. I wrote to decline, and told them why. I also sent a copy of my letter to Valerie Aurora, asking for the advice of the Ada Initiative.

I am very pleased to say that PuppetConf took my concerns seriously. Working with the Ada Initiative, they strengthened their code of conduct, put more screening measures in place for presentations, and improved training for conference staff on how to deal with problematic situations. PuppetLabs is an example of a company that is doing things right. They have specific outreach programs to get more women involved in the Puppet community and they are pursuing similar strategies to encourage participation from underrepresented racial groups. I feel good about the fact that I’m sending members of my staff to PuppetConf 2014, and at this point I would gladly speak at the conference again.

As upsetting as this incident was, this is a story with a happy ending. Because the Ada Initiative exists, both PuppetConf and I had someone to go to for guidance in how to improve the situation. Honestly, I still feel a little afraid about writing this post. But I also believe that nothing gets better until people take the risk of speaking out publicly. I am choosing to take that risk, in order to better communicate about why this work matters.

The Ada Initiative continues to do great things. You can read their 2014 progress report here. I am particularly excited about the Ally Skills workshop that will be offered at the Digital Library Federation Forum on October 29. Today, librarians are showing our love for the Ada Initiative. Watch for blog and social media posts from friends in library land who will be sharing more stories about why the Ada Initiative matters, and follow the action on twitter under the hashtag #libs4ada. Join us in supporting the Ada Initiative’s mission and donate today!

State Library of Denmark: Small scale sparse faceting

planet code4lib - Tue, 2014-09-09 20:35

While sparse faceting has profound effect on response time in our web-archive, we are a bit doubtful about the amount of multi billion document Solr indexes out there. Luckily we also have our core index at Statsbiblioteket, which should be a bit more representative of your everyday Solr installation: Single-shard, 50GB, 14M documents. The bulk of the traffic are user-issued queries, which involves spellcheck, edismax qf & pf on 30+ fields and faceting on 8 fields. In this context, the faceting is of course the focus.

Of the 8 facet fields, 6 are low-cardinality and 2 are high-cardinality. Sparse was only active for the 2 high-cardinality ones, namely subject (4M unique values, 51M instances (note to self: 51M!? How did it get so high?)) and author (9M unique values, 40M instances).

To get representative measurements, the logged response times were extracted for the hours 07-22; there’s maintenance going on at night and it makes measurement unreliable. Only user-entered searches with faceting were considered. To compare before- and after sparse-enabling, the data for this Tuesday and last Tuesday were used.

50GB / 14M docs, logged timing from production, without (20140902) and with (20140909) sparse faceting

The performance improvement is palpable with response time being halved, compared to the non-sparse faceting. Fine-reading the logs, the time spend on faceting the high-cardinality fields is now in the single-digit milliseconds for nearly all queries. We’ll have to do some test to see what stops the total response time from getting down to that level. I am guessing spellcheck.

As always, sparse faceting is readily available for the adventurous at SOLR-5894.


State Library of Denmark: Small scale sparse faceting

planet code4lib - Tue, 2014-09-09 20:35

While sparse faceting has profound effect on response time in our web-archive, we are a bit doubtful about the amount of multi billion document Solr indexes out there. Luckily we also have our core index at Statsbiblioteket, which should be a bit more representative of your everyday Solr installation: Single-shard, 50GB, 14M documents. The bulk of the traffic are user-issued queries, which involves spellcheck, edismax qf & pf on 30+ fields and faceting on 8 fields. In this context, the faceting is of course the focus.

Of the 8 facet fields, 6 are low-cardinality and 2 are high-cardinality. Sparse was only active for the 2 high-cardinality ones, namely subject (4M unique values, 51M instances (note to self: 51M!? How did it get so high?)) and author (9M unique values, 40M instances).

To get representative measurements, the logged response times were extracted for the hours 07-22; there’s maintenance going on at night and it makes measurement unreliable. Only user-entered searches with faceting were considered. To compare before- and after sparse-enabling, the data for this Tuesday and last Tuesday were used.

50GB / 14M docs, logged timing from production, without (20140902) and with (20140909) sparse faceting

The performance improvement is palpable with response time being halved, compared to the non-sparse faceting. Fine-reading the logs, the time spend on faceting the high-cardinality fields is now in the single-digit milliseconds for nearly all queries. We’ll have to do some test to see what stops the total response time from getting down to that level. I am guessing spellcheck.

As always, sparse faceting is readily available for the adventurous at SOLR-5894.


Nicole Engard: Bookmarks for September 9, 2014

planet code4lib - Tue, 2014-09-09 20:30

Today I found the following resources and bookmarked them on <a href=

  • Color Oracle Color Oracle is a free color blindness simulator for Window, Mac and Linux.

Digest powered by RSS Digest

The post Bookmarks for September 9, 2014 appeared first on What I Learned Today....

Related posts:

  1. Another Satisfied Customer
  2. Amazon’s bestselling laptop is open source!
  3. September Workshops

Roy Tennant: In Memoriam: Anne Grodzins Lipow

planet code4lib - Tue, 2014-09-09 20:23

I was reminded by her daughter on Facebook that Anne Grodzins Lipow passed away ten years ago today. In commemoration of that horrible event, I am posting the Foreword I wrote for Anne’s festschrift that was published in 2008.

On September 9, 2004 librarianship lost a true champion. Anne Grodzins Lipow was unique – of all the testimonials I’ve read about her that is one undeniable truth. We each knew a different set of Anne’s qualities, or engaged with her in a different way, but in the end it all came down to the fact that Anne was someone we could all say was “larger than life”.

The days after her passing were filled with personal testimonials that were mostly lodged as comments on the Infopeople blog. It was an odd experience for me to read these messages and realize that as much as I felt that I knew her, I barely knew her at all. I was like the proverbial blind man with his hands wrapped around one part of the elephant, while others had a firm grip on other body parts and would describe a very different animal. My reality, as deeply felt as it was, was only a pale shadow of the whole.

But for all that, it was a long, long shadow. As a newly-minted librarian at UC Berkeley in the second half of the 1980s, I knew Anne as the person who led the outreach and instructional efforts of the library. Before long, she saw in me the potential to be a good teacher, despite my fear of public speaking, so she pulled me into her program and began teaching me everything she knew about speaking, putting on workshops, making handouts, etc. Under her tutelage, I taught classes such as dialup access to the library catalog, when 300bps modems were still common.

As the Internet began making inroads into universities, Anne was there with newly developed workshops on how to use it. She was convinced very early on, as was I, that the Internet would be an essential technology for libraries. This led to her approaching my colleague John Ober (then on faculty at the library school at Berkeley) and I about doing a full-day Internet workshop scheduled to coincide with the 1992 ALA Annual Conference in San Francisco. Using a metaphor of John’s, we called it
“Crossing the Internet Threshold”.

In preparing for the workshop, we created so many handouts that we needed to put them into a binder that began to look increasingly like a book in the making. With typical Anne flair, she arranged for the gifted librarian cartoonist Gary Handman (also our colleague at Berkeley) to create a snazzy cover for the binder, that she also used to create T-shirts (which many of us have to this day).

Anne knew enough about workshops to do a “trial run” before the big day, so we did one for UC Berkeley library staff a couple weeks before, which gave us feedback essential to making an excellent workshop. In the end, the workshop was such a hit that Anne ran with it. She took the binder of handouts we had created and made a book out of it — the first book of her newly-created business called Library Solutions Institute and Press. Her decision to publish the book herself rather than seek out a publisher was so typical of Anne. And how she did it will tell you a lot about her.

Despite the higher cost, Anne insisted on using domestic union printing shops for printing. While other publishers were publishing books overseas for a fraction of the cost, publishing for Anne was a political and social activity, through which she could do good for those around her. It was very important to her to treat people with respect and kindness, and she did it so well. That was the kind of person Anne was.

While every publisher I have since worked with after Anne has insisted they are incapable of paying royalties any more frequently than twice a year, Anne paid her authors monthly. And whereas other publishers wait months to pay you for royalties earned long before, Anne would pay immediately. This meant that when books were returned, as they sometimes were, she took the loss for having paid the author royalties on books that had not been sold. That was the kind of person Anne was.

Anne continued to blaze new trails after libraries began climbing on the Internet bandwagon, due in no small measure to her books and workshops on the topic. Anne became a well-known and coveted consultant on a number of topics, but in particular on reference services.

Her “Rethinking Reference” institutes and book were widely acclaimed, and her book The Virtual Reference Librarian’s Handbook (2003) demonstrated that Anne was always at the cutting edge of librarianship. That was the kind of person Anne was.

I visited her after her cancer was diagnosed and after her treatment had failed. We all knew there was no hope, that she had only a matter of weeks to live. Despite the obvious ravages of the illness, Anne’s outlook remained bright and welcoming. She was happy to have her friends and family around her, and we talked of many things except the dark shadow that hung over us all. Even then, she was happy to see whoever came by, and to talk with them with a smile and good wishes. That was the kind of person Anne was.

A piece of all my major professional accomplishments I owe to Anne, and her great and good influence on me. She would deny this, despite it’s truth, wanting all the credit to accrue to me alone. That was the kind of person Anne was.

 

Each one of us who have contributed to this volume have been touched by Anne in our own, quite personal ways. Some of us have known of her work mostly by reputation and reading, while others were blessed with more direct and personal contact. But the fact remains that Anne cast a long professional shadow that will affect many librarians yet to come.

For those of us who created a monument of words to someone we love and respect, Anne had one final gift to give. As anyone who has ever created a present for someone they love knows, in so doing you think about the person for whom you are making the gift. Therefore, the authors of this volume have all spent more time with Anne, and as always it was time well spent. We know our readers will count it so too.

31 January 2008, Sonoma, CA

LITA: LITA Midwinter Institutes

planet code4lib - Tue, 2014-09-09 19:34

Registration for LITA’s Midwinter Institutes opened today with ALA’s joint registration! Whether you’ll be attending Midwinter or are just looking for a great one day continuing education event in the Chicago/Midwest area, we hope you’ll join us.

When? All workshops will be held on Friday, January 30, 2015, from 8:30-4:00

Cost for LITA Members: $235  (ALA $350 / Non-ALA $380)
(If you are a member of LITA use special code LITA2015 to receive the price of $235.)

Workshops Descriptions:

Developing mobile apps to support field research
Instructor: Wayne Johnston, University of Guelph Library

Researchers in most disciplines do some form of field research. Too often they collect data on paper which is not only inefficient but vulnerable to date loss. Surveys and other data collection instruments can easily be created as mobile apps with the resulting data stored on the campus server and immediately available for analysis. The apps also enable added functionality like improved data validity through use of authority files and capturing GPS coordinates. This support to field research represents a new way for academic libraries to connect with researchers within the context of a broader research date management strategy.

Introduction to Practical Programming
Instructor: Elizabeth Wickes, University of Illinois at Urbana-Champaign

This workshop will introduce foundational programming skills using the Python programming language. There will be three sections to this workshop: a brief historical review of computing and programming languages (with a focus on where Python fits in), hands on practice with installation and the basics of the language, followed by a review of information resources essential for computing education and reference. This workshop will prepare participants to write their own programs, jump into programming education materials, and provide essential experience and background for the evaluation of computing reference materials and library program development. Participants from all backgrounds with no programming experience are encouraged to attend.

From Lost to Found: How user Testing Can Improve the User Experience of Your Library Website
Instructors: Kate Lawrence, EBSCO Information Services; Deirdre Costello, EBSCO Information Services; Robert Newell, University of Houston

When two user researchers from EBSCO set out to study the digital lives of college students, they had no idea the surprises in store for them. The online behaviors of “digital natives” were fascinating: from students using Google to find their library’s website, to what research terms and phrases students consider another language altogether: “library-ese.” Attendees of this workshop will learn how to conduct usability testing, and participate in a live testing exercise via usertesting.com. Participants will leave the session with the knowledge and confidence to conduct user testing that will yield actionable and meaningful insights about their audience.

 

More details about these workshops will be coming in interviews with the instructors in October! If you have a question you’d like to ask the instructors, please contact LITA Education Chair Abigail Goben at [firstnamelastname]@gmail.com

 

 

 

 

LITA: 2014 LITA Forum: early bird rates available through Sept. 15

planet code4lib - Tue, 2014-09-09 19:19
Don’t miss your chance to save up to $50 on registration for the 2014 LITA Forum “From Node to Network” to be held Nov. 5-8, 2014 at the Hotel Albuquerque in Albuquerque N.M.

Don’t forget to book your room at the Hotel Albuquerque by Oct. 14, 2014 to guarantee the LITA room rate.

This year’s Forum will feature three keynote speakers

  • AnnMarie Thomas, Engineering Professor, University of St. Thomas
  • Lorcan Dempsey, Vice President, OCLC Research and Chief Strategist
  • Kortney Ryan Ziegler, Founder Trans*h4ck.

More than 30 concurrent colleague inspired sessions and a dozen poster sessions will provide a wealth of practical information on a wide range of topics.

Networking opportunities, a major advantage of a smaller conference, are an important part of the Forum. Take advantage of the Thursday evening reception and sponsor showcase, the Friday networking dinners or Kitchen Table Conversations, plus meals and breaks throughout the Forum to get to know LITA leaders, Forum speakers, sponsors, and peers.

This year two preconference workshops will also be offered.

Linked Data for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users
Led by: Dean B. Krafft and Jon Corson-Rikert, Cornell University Library

Linked Open Data (LOD) provides an expressive and extensible mechanism for sharing information (metadata) about all the materials research libraries make available. In this workshop the presenters will introduce the principles and practices of creating and consuming Linked Open Data via a series of examples from sources relevant to libraries. They will provide an introduction to the technologies, tools, and types of data typically involved in creating and working with Linked Open Data and the semantic web. The preconference will also address the challenges of data quality, interoperability, authoritativeness, privacy, and other issues accompanying the adoption of new technologies as these apply to making use of Linked Open Data.

Learn Python by Playing with Library Data
Led by: Francis Kayiwa, Kayiwa Consulting

What can be more fun than learning Python? Learning Python by hacking on library data! In this workshop, you’ll learn Python basics by reading files, looking at MARC (yes MARC), building data structures, and analyzing library data (those logs aren’t going to appreciate themselves). By the end, you will have set up your Python environment, installed some useful packages, and learned how to write simple programs that you can use to impress your colleagues back at work.

2014 LITA Forums sponsors include EBSCO, Springshare, @mire, Innovative and OCLC.

Visit the LITA website for more information.

Library and Information Technology Association (LITA) members are information technology professionals dedicated to educating, serving, and reaching out to the entire library and information community.   LITA is a division of the American Library Association.

LITA and the LITA Forum fully support the Statement of Appropriate Conduct at ALA Conferences

Bill Dueber: Help me test yet another LC Callnumber parser

planet code4lib - Tue, 2014-09-09 19:10

Those who have followed this blog and my code for a while know that I have a long, slightly sad, and borderline abusive relationship with Library of Congress call numbers.

They're a freakin' nightmare. They just are.

But, based on the premise that Sisyphus was a quitter, I took another stab at it, this time writing a real (PEG-) parser instead of trying to futz with extended regular expressions.

The results, so far, aren't too bad.

The gem is called lc_callnumber, but more importantly, I've put together a little heroku app to let you play with it, and then correct any incorrect parses (or tell me that it worked correctly) to build up a test suite.

So…Please try to break my LC Callnumber parser!

[Code for the app itself is on github; pull requests for both the app and the gem joyously received]

David Rosenthal: Two Brief Updates

planet code4lib - Tue, 2014-09-09 17:56
A couple of brief updates on topics I've been covering, Amazon's margins and the future of flash memory.



First, Benedict Evans has a fascinating and detailed analysis of Amazon's financial reports. Read the whole thing.

He shows how Amazon's strategy is not to generate and distribute profits, but to re-invest their cash flow into staring and developing businesses. Starting each business absorbs cash, but as they develop they turn around and start generating cash that can be used to start the next one.

His graphs and numbers make the case brilliantly. Here, for example, is Amazon's revenues and profits since launch; lots of revenues and almost no profit. But it is more revealing to focus, as Amazon does, on cash flow.

Here Evans shows Free Cash Flow (FCF), Capital Expenditure (capex), and Operating Cash Flow (OCF) as a proportion of revenue.
Amazon’s OCF margin has been very roughly stable for a decade, but the FCF has fallen, due to radically increased capex.Here Evans shows capex as a proportion of sales, showing a relentless rise starting in late 2009.
That is, if Amazon was spending the same on capex per dollar of revenue as it was in 2009, it would have kept $3bn more in cash in the last 12 months.What we're interested in here is the AWS business, which is most of the category Amazon calls "Other". Here is the growth of "Other" revenue. This is a market that Amazon is absolutely dominating. Its cash flow is doing two things, paying for the computing infrastructure Amazon needs to runs its other, much larger, established businesses, and paying for the startup costs of new businesses.

As far as I can see, in the "cloud" business only Google has the same synergy between an established business, and a cloud business. Other competitors don't need the cloud scale of investment to support another, much larger existing business. They have to treat their cloud investments as a stand-alone business, which is much less efficient. And they are much smaller than AWS. So they aren't going to survive. IBM and Microsoft, I'm looking at you.

Second, Chris Mellor at The Register looks at the hype surrounding the "all-flash data center" and makes the point that Dave Anderson of Seagate has been making for years.
That leaves us with the view that all-flash data centres are not feasible at present. They may become feasible if the cost of flash falls to near-parity with nearline and bulk storage disk but there is another problem: the flash foundry capacity to build the stuff just doesn't exist.

In terms of exabytes of capacity, worldwide disk production is vastly higher than that of flash, and with flash fabs costing $7bn to $9bn apiece it is likely to remain so.

This is no small matter. An all-flash data centre would need approximately the same number of TB of storage as current all-disk or hybrid flash/disk data centres.
...
The flash foundry operators are paranoid about avoiding loss-making gluts of product, having seen the dire effects of that in the memory industry, with its persistent huge losses and dramatic supplier consolidation. They will be slow to bring new flash fab capacity online.

They are working towards increasing flash capacity by increasing wafer density through cell geometry shrinks, and also through building flash chips with stacked layers of cells, so-called 3D NAND.

These in themselves won't allow the flash industry to take on any substantial portion of worldwide disk capacity in the next few years. That requires many new fabs and there is no sign of that happening.Not to mention that generating a return on a $7-9B investment requires that the product it builds be in the market for many years. Flash technology is approaching its limits, so the time during which flash will dominate the solid-state storage market with its premium pricing is short, too short to generate the necessary return.

HangingTogether: Innovative solutions for dealing with born-digital content in obsolete forms – Part 2

planet code4lib - Tue, 2014-09-09 17:00

This is the second of three posts about a lightning talk session at SAA. Part 1 began with descriptions of the array of media an archives might confront and an effort to test how much can be done in house.

Part 2 picks up with four archivists talking about solutions to particularly challenging formats.

Abby Adams is the Digital Archivist at Hagley Museum & Library, an independent research library in Wilmington, Delaware, documenting American enterprise from its inception to present day with a focus on the intersection of industry, technology, and society. In 2012, Hagley received a large hybrid collection, consisting primarily of textual analog materials, in addition to a number of born-digital records. The records were created by various tech corporations during the normal course of business in the late 1990s and early 2000s and document aspects of the dot-com boom and bust, an area of research where primary sources are sorely lacking. Given the potentially high research value of the collection, Adams gave the preservation of the born-digital content high priority and culled hundreds of records cartons to discover the following obsolete media formats: 349 compact discs; 134 3.5” floppy disks; 113 digital linear tapes (DLT); 49 digital data storage tapes (DDS); 19 quarter-inch mini cartridges; 15 Travan cartridges; and 8 zip disks.

Although the CDs and floppy disks presented few problems, the remaining obsolete formats offered a lesson in how complex data recovery can be. Adams’ attempts to use “freecycled” drives and jerry-rig old PCs were just not working. Even if she could connect a computer to the exact generation DLT or DDS drive to read the tapes, she would also need to know the software program used to create the backup, which could vary widely depending on the date of creation, then successfully install it, and cross her fingers the media isn’t encrypted or corrupt. Since Hagley is a small shop with limited in-house resources, it was clear outsourcing the data extraction was the best course of action. After consulting several vendors, Adams and her coworker Kevin Martin found a company that specializes in data extraction and indexing of backup tapes. After establishing a budget for the first phase of the project, Adams and Martin sent the vendor a sample consisting of five DLT and three DDS tapes. Less than a week later, the vendor provided them access to the indexed data from seven out of eight tapes. Due to the size of the collection and Hagley’s limited in-house resources, Adams was strict with appraisal, retaining only about ten percent of the data. The original media was returned to Hagley a few weeks later. Having successfully completed the first phase of the project, Hagley will continue to use the same company for the remaining backup tapes.

Elise Warshavsky, is the Digital Archivist at the Presbyterian Historical Society, which serves as the national archives of the Presbyterian Church, documenting the political and social history of the church. The archives acquired the laptop of Clifton Kirkpatrick former Stated Clerk, the highest elected official within the church. The laptop contained files he had worked on as well as his email. Five years later Elise was hired and was asked to archive the Stated Clerk’s laptop. This was the nature of the “detailed instructions” she received regarding passwords, the types of files, and that there were 28,000 emails in the Novell GroupWise account:

The records manager who had originally received the laptop had converted the account to a Remote account enabling the email to live solely on the laptop. The records archivist had also reorganized the inbox and appraised each individual email, resulting in lost folder structure and possibly other lost metadata. The emails were readable, but because of a 50-year embargo on access to them, the goal was to ensure that these files would be readable in 50 years. After not being able to find a way to convert the GroupWise Remote email to another format, she finally contacted a company that makes a commercial grade email converter called Transend. They agreed to resurrect the Remote account on their GroupWise servers and then convert it to .pst, Microsoft’s open proprietary file format. Then she was able to move forward with her migration plan: convert to a more archival email format, .MBOX, as well as run a tool to batch export PDFs from each individual email and convert them to PDF/As – a format researchers would be able to search and access in 50 years.

Elise’s advice: If you get frustrated about not having the tools or skills necessary to complete a project, reach out to find help. There’s no need to develop resources in house when dealing with a unique, most likely not repeatable incident. Get help, and move on to doing what you do know how to do – accession, appraise, and preserve.

Ted Hull, Director of the Electronic Records Division at the National Archives at College Park, told of a project to recover content from 7-track tapes.

The Electronic Records Division accessions, processes, arranges for preservation, describes, and provides access to the born-digital federal records scheduled for permanent retention in the National Archives. They hold 932 series from over 100 federal agencies; consisting of over 750 million unique files and over 320 terabytes of data. 7-track magnetic tape was an industry standard from the 1950s -1970s, when it was generally replaced with 9-track magnetic tape. While most of the Archives’ content had already been transferred off of 7-track tape, in 2013, staff identified 13 remaining tapes containing records from the Federal Home Loan Bank Board, the Bureau of Indian Affairs, and the U.S. Joint Chiefs of Staff. The Archives reached out and found that the National Center for Atmospheric Research (NCAR) in Boulder, CO still had the capability to read 7-track tapes and were able to recover data from 9 tapes; the other 4 were blank. NCAR converted the binary-coded decimal encoding to ASCII and made the files available to NARA for direct download from their FTP site; NARA processed and accessioned the records and the original tapes were returned to NARA for disposal.

Ben Goldman, the Digital Records Archivist for Pennsylvania State University Libraries, discovered 27 3-inch disks in a modern literary manuscript collection. They didn’t have the equipment needed to read the disks, and we weren’t even sure if the disks were readable or even contained data worth recovering.
Amstrad disk from the Fiona Pitt-Kethley papers, Penn State University Special Collections Library

The author confirmed that she did own an Amstrad computer (a somewhat popular computer in the UK for a brief period in the 1980s), but because Ben didn’t know exactly what hardware or software was needed to read the disks, he decided to outsource recovery of the disks. He wanted to use the opportunity to come up with a model vendor agreement and to make the project an extension of their internal born-digital workflow. To that end, he created a media inventory spreadsheet to be used to identify the disks, their labels, their contents, the images derived from them, and to accommodate checksums after their eventual transfer. Mostly, however, he wanted to see if outsourcing was a viable option for archivists confronting elusive computer media formats and to see if core archival requirements could be met by outsourcing, whether service providers could adhere to emerging best practices, and to see if the costs were viable for archives. PSU provided funding for a project at $40 per disk.

[Tweet] Jason P. Evans Groth: $40/disk is same as person making $40k spending two hours to image obsolete disks, so maybe it is the right deal? #s601 #saa14

Soon Ben had a signed vendor agreement with the Museum of Computer Culture to provide disk images that could be processed using forensic tools. They were to work from the inventory and follow naming conventions and provide checksums to ensure accurate transfer.

Many months later, however, Ben was working with two other vendors – without a signed agreement. They found that disk images that were native to the Amstrad operating system couldn’t be migrated to modern formats or processed using common forensic tools. Instead, Ben received three versions of every file in three different formats, each with its own brand of lossy-ness and, in the end, there was no adherence to naming conventions and no checksums. Despite not really meeting his expectations, Ben doesn’t think of the project as a failure. “Fugitive media is defiant,” he warns. Communication is key and the vendor agreement should establish communication requirements. Beyond that, Ben is not sure this cost model will be sustainable. Instead, he suggests that archivists need to develop in-network options. There are technologies, resources, and talented people working on these issues. It would be nice to see some better community strategies for tackling the issues and supporting each other.

Next up: Part 3 will continue with three speakers representing the service provider point of view.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (32)

DPLA: September 10/15, 2014: Board and Board Finance Committee Open Calls

planet code4lib - Tue, 2014-09-09 14:00

The DPLA Board and its Finance Committee will each hold an open conference call in September 2014.  Both of these calls are open to the public.

Board Finance Committee Open Call
September 10, 2014 at 1:00 PM EDT

? View Agenda and Dial-in

Agenda

  • Overview of recent grant awards
  • Open comments and suggestions from the committee
  • Comments and suggestions from the public

Dial-in

Via the web:

https://global.gotomeeting.com/meeting/join/312488189

Via telephone
United States: +1 (805) 309-0012
Access Code: 312-488-189
Audio PIN: Shown after joining the meeting
Meeting ID: 312-488-189

 

Board of Directors Open Call
September 15, 2014 at 3:00 PM EDT

? View Agenda and Dial-in

 

Agenda

Public Session

  • Proposal to amend DPLA By­laws to allow for increased number of Directors (Call to vote)
  • Overview of draft DPLA Strategic Plan
  • Update from Executive Director
  • Questions/comments from the public

Executive Session

  • Review of DPLA Handbook
  • Conflict of Interest Certification
  • Review of draft DPLA Strategic Plan
  • Funding and financial update

Dial-in

All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

LITA: Introducing the New LITA Blog Writers

planet code4lib - Tue, 2014-09-09 13:00

You’ll still be able to find LITA announcements and events posted on the blog, but now there will also be original content by LITA members representing a variety of perspectives, from library students to  public, academic, and special librarians.

The LITA blog also welcomes guest posts. To submit an idea for consideration, please email me at briannahmarshall(at)gmail(dot)com with a bio, brief summary of your post topic, and link to a writing sample if possible.

Without further ado, here are the writers whose posts you’ll be reading in the coming months.

Bryan J. Brown

Bryan received his BS in English and Philosophy from the University of Southern Indiana, and is a recent graduate from Indiana University’s Department of Information and Library Science where he focused on digital libraries and metadata. After graduation, Bryan transplanted to Tallahassee, FL to be a developer at Florida State University Libraries’ Technology and Digital Scholarship Department. His professional interests include Open Source software in libraries and archives, digital preservation and the semantic web. For more information, check out bryjbrown.github.io.

Lindsay Cronk

Lindsay – librarian, blogger, and adventurer – graduated with her MLIS from Valdosta State is 2012 and has been advocating and serving libraries through her work at LYRASIS ever since. Her interests include open source development models, tools for library marketing and outreach, student research behavior, and later career David Bowie. You can catch her online at her blog or tweeting @linds_bot.

Brittney Farley

Brittney is in her final year as an MSLIS student at the Florida State University’s iSchool. Her specializations include information management/technology and human-computer interaction. She received her BA in History from the University of Florida. She is currently a library assistant in the City of Boca Raton Public Library’s Instructional Services department. Brittney blends her background, as help desk assistant and researcher, to better serve patrons of varying technical understanding.

Lauren Hays

Lauren is the instructional and research librarian at MidAmerica Nazarene University in Olathe, KS. Along with her master’s in library science, she recently completed her second master’s degree in educational technology and also received a graduate certificate in online teaching and learning.  Her professional interests include information literacy, adult learners, online learning, technology, connected learning, and the scholarship of teaching and learning.   In her spare time, she can be found drinking coffee, reading, or planning her next trip.  Follow her on Twitter @Lib_Lauren.

John Klima

John is the Assistant Director of the Waukesha Public Library where one of his many hats is maintaining, upgrading, and innovating technology within the library. Klima wrote a number of articles on steampunk for Library Journal. In his spare time, he is the editor of The Bulletin, the professional publication of the Science Fiction and Fantasy Writers of America. From 2001 to 2013 he edited the Hugo-Award winning magazine Electric Velocipede. Klima has also edited several anthologies including Logorrhea: Good Words Make Good Stories, and Happily Ever After. He co-edited the anthology Glitter & Mayhem with Lynne M. Thomas and Michael Damian Thomas.

Brianna Marshall

Brianna is Digital Curation Coordinator at the University of Wisconsin-Madison, where she manages the institutional repository and develops campuswide services for research data management and curation. She received her Master of Information Science and Master of Library Science from Indiana University’s School of Informatics and Computing in May 2014. From 2012-2014 she was a writer and managing editor for the library student-run blog Hack Library School. Now she is excited to be the new LITA blog editor. She tweets on occasion at @notsosternlib and keeps a blog, too.

Leanne Mobley

Leanne recently earned her MLS from Indiana University and currently works as the Digital Literacy Librarian for the Martin County Library System. Her background is in media production and she is passionate about using technology to bring ideas to life. She is an ardent library lover and still carries her very first library card in her wallet. Find her on Twitter @hey_library.

Leanne Olson

Leanne is a Metadata Management Librarian at Western University in London, Ontario, Canada.  Her main library-related areas of interest include metadata and cataloguing, digital libraries, authority control, teaching, and library history.  She’s also a playwright and lover of the outdoors.  Much of her blogging will be done from her backyard, possibly under five feet of snow.

Michael Rodriguez

Michael is the newly minted eLearning Librarian at Hodges University in southwest Florida, with the faculty rank of Assistant Professor. He graduated in August 2014 with his MLIS from Florida State University and has a background in history and public librarianship. Michael is also a technologist, interested in software customization, distance education, and free web tools and apps. When not doing cool stuff at work, he kayaks among the many mangrove islands off the Florida coast. He tweets @topshelver and blogs at Shelver’s Cove.

Erik Sandall

Erik is Electronic Services Librarian and Webmaster at Mechanics’ Institute in San Francisco, Calif. His professional interests are in integrated library systems, content management systems, online databases, ebooks, and web design and development. When he’s not working on these things, Erik is probably playing soccer or practicing how to open a wine bottle without breaking the cork.

Leo Stezano

Leo is a Project Manager at the Avery Architecture and Fine Arts Library at Columbia University; this is his first library job since receiving his MLIS from Syracuse University in 2011. Previously he spent many years in the private sector, working in Project and Product Management and Business Analysis for a variety of companies. His professional interests include digital librarianship, process optimization, and innovative technical project philosophies. He also enjoys playing soccer and raising two toddlers. You can follow Leo at http://leosmlisblog.wordpress.com/ and on Twitter at @LeoStezano.

Grace Thomas

Grace is a first year grad student working toward a dual-degree MLIS at Indiana University. With a background in English, Computer Science, and Digital Humanities from the University of Nebraska-Lincoln, she is especially interested in digital libraries and archives, and digital preservation. Currently, she works as a Graduate Assistant with associate professors John Walsh and Noriko Hara in the IU School of Informatics and Computing, and on the Petrarchive Digital Archive Project. Grace spends the rest of her time in swimming pools, watching any and all dance performances, and exploring Bloomington by bicycle, occasionally tweeting about all of the above at @gracehthom.

John Miedema: Slow reading six years later. Digital technology has evolved, and so have I. There is a trade-off.

planet code4lib - Tue, 2014-09-09 12:59

I was recently interviewed by The Wall Street Journal about slow reading. It has been a few years since I did one of these interviews. I wrote Slow Reading in 2008, six years ago. At the time, the Kindle had just been released and there was a surge of discussion about reading practices, to which I attribute the interest in my little book of research. The request for an interview suggests an ongoing interest in slow reading. So what do I have to say about the subject now?

I used to slow-read often. I would write books reviews, thinking myself progressive in a digital sense for blogging reviews in just four paragraphs. A shift began. My ongoing use of digital technology to read, write and think forced that shift along. I tried to write about that shift in a new online book project — I, Reader — but I failed. The shift was still in progress. I hit a wall at one point. I thought for a time I had reached the end of reading. In 2013, I stopped reading and writing. A year later I started again. I have a good perspective on the shift, but I have no immediate plans to resume writing about it.

So what did I tell the interviewer about slow reading? I confessed that I slow-read print books less often. I re-asserted that “Slow reading is a form of resistance,  challenging a hectic culture that requires speed reading of volumes of information fragments.” I admitted that my resistance is waning. Digital technology has evolved to allow for reading, not just for scanning of information fragments, but also for comprehension of complex and rich material. I was surprised and pleased to discover how digital technology has re-programmed my reading and writing skills to process information more quickly and deeply. I am smarter than I used to be.

I have resumed my writing of book reviews. I restored a selection of book reviews from the past, ones relevant to my current blogging purposes. I will be writing new reviews, probably less often. I will be writing them differently. Currently I am reading Book Was There: Reading in Electronic Times by Andrew PiperI no longer take notes on paper as I read. I have been tweeting notes. I like the way it is evolving. I use a hashtag for the title and author, and sometimes a reader joins in. When I am done, I will write a very short review, two paragraphs tops, and post it here.

That’s not all I said to the interviewer. I said there has been a trade-off because of digital technology. There is always a trade-off. We just have to decide whether whether the gains are more than the losses. What have we lost? I lingered on this question because the loss is less than I anticipated. We still read. We still read rich and complex material. Students still prefer print books for serious reading but I expect they are going through the same transition as I did. What is lost, I assert, is long-form writing. Books born print can be scanned and put online, but books born digital are getting shorter all the time. It is no coincidence that my book, Slow Reading, was short. I was already a reader in transition. Digital technology prefers shortness. It is one reason that many kinds of poetry will survive and thrive on the web. Things should be short and simple as possible (but not simpler, per the quote attributed to Einstein). Long-form novels and textbooks will be lost in time. It is a loss. Is it worth it?

Jakob Voss: Abbreviated URIs with rdfns

planet code4lib - Tue, 2014-09-09 09:26

Working with RDF and URIs can be annoying because URIs such as “http://purl.org/dc/elements/1.1/title” are long and difficult to remember and type. Most RDF serializations make use of namespace prefixes to abbreviate URIs, for instance “dc” is frequently used to abbreviate “http://purl.org/dc/elements/1.1/” so “http://purl.org/dc/elements/1.1/title” can be written as qualified name “dc:title“. This simplifies working with URIs, but someone still has to remember mappings between prefixes and namespaces. Luckily there is a registry of common mappings at prefix.cc.

A few years ago I created the simple command line tool rdfns and a Perl library to look up URI namespace/prefix mappings. Meanwhile the program is also available as Debian and Ubuntu package librdf-ns-perl. The newest version (not included in Debian yet) also supports reverse lookup to abbreviate an URI to a qualified name. Features of rdfns include:

look up namespaces (as RDF/Turtle, RDF/XML, SPARQL…)

$ rdfns foaf.ttl foaf.xmlns dbpedia.sparql foaf.json @prefix foaf: . xmlns:foaf="http://xmlns.com/foaf/0.1/" PREFIX dbpedia: "foaf": "http://xmlns.com/foaf/0.1/"

expand a qualified name

$ rdfns dc:title http://purl.org/dc/elements/1.1/title

lookup a preferred prefix

$ rdfns http://www.w3.org/2003/01/geo/wgs84_pos# geo

create a short qualified name of an URL

$ rdfns http://purl.org/dc/elements/1.1/title dc:title

I use RDF-NS for all RDF processing to improve readability and to avoid typing long URIs. For instance Catmandu::RDF can be used to parse RDF into a very concise data structure:

$ catmandu convert RDF --file rdfdata.ttl to YAML

Jonathan Rochkind: Cardo is a really nice free webfont

planet code4lib - Tue, 2014-09-09 04:39

Some of the fonts on google web fonts aren’t that great. And I’m not that good at picking the good ones from the not-so-good ones on first glance either.

Cardo is a really nice old-style serif font that I originally found recommended on some list of “the best of google fonts”.

It’s got a pretty good character repertoire for latin text (and I think Greek). The Google Fonts version doesn’t seem to include Hebrew, even though some other versions might?  For library applications, the more characters the better, and it should have enough to deal stylishly with whatever letters and diacritics you throw at it in latin/germanic languages, and all the usual symbols (currency, punctuation; etc).

I’ve used it in a project that my eyeballs have spent a lot of time looking at (not quite done yet), and been increasingly pleased by it, it’s nice to look at and to read, especially on a ‘retina’ display. (I wouldn’t use it for headlines though)


Filed under: Uncategorized

DPLA: DPLA &amp; Imgur’s Summer of Archives Comes to a Close

planet code4lib - Tue, 2014-09-09 00:58

Back in June, we announced our collaboration with the Digital Public Library of America (DPLA) for the Summer of Archives–an experimental gallery endeavor that brought tons of historical OC gems to User Submitted. From perfectly looping space GIFs, to famous cats of history, to beautiful book covers, to celestial maps, we’re happy to call this experiment a huge and awesome success.

The very last Summer of Archives post is live in User Submitted right now. We’re going out the same way we came in–with historical GIFs!

Huge thanks to the DPLA for sharing this special content with Imgur all summer long. Be sure to check the DPLA Imgur account to revisit all of the submissions. If your thirst for history cannot be quenched, head over to the DPLA website for a vast array of great content.

This blogpost was originally published on the Imgur.com blog (view on Imgur.com).

William Denton: Augustus

planet code4lib - Tue, 2014-09-09 00:32

A few people recommended Stoner by John Williams to me, and they were right. It’s a gem.

I was in Book City tonight and the clerk was selling a customer on Stoner for a book club. Browsing the new release tables with Williams on my mind I saw a similar new edition from New York Review Books of Augustus, which is about that Augustus.

The first line is a doozy:

… I was with him at Actium, when the sword struck fire from metal, and the blood of soldiers was awash on deck and stained the blue Ionian Sea, and the javelin whistled in the air, and the burning hulls hissed upon the water, and the day was loud with the screams of men whose flesh roasted in the armor they could not fling off; and earlier I was with him at Mutina, where that same Marcus Antonius overran our camp and the sword was thrust into the empty bed where Caesar Augustus had lain, and where we persevered and earned the first power that was to give us the world; and at Philippi, where he traveled so ill he could not stand and yet made himself to be carried among his troops in a litter, and came near death again by the murderer of his father, and where he fought until the murderers of the mortal Julius, who became a god, were destroyed by their own hands.

DuraSpace News: Update 5: Beta Pilot Projects Set to Kick-Off

planet code4lib - Tue, 2014-09-09 00:00
From David Wilcox, Fedora Product Manager   Winchester, MA This is the fifth in a series of updates on the status of Fedora 4.0 as we move from the Beta [1] to the Production Release. The updates are structured around the goals and activities outlined in the July-December 2014 Planning document [2], and will serve to both demonstrate progress and call for action as needed. New information since the last status update is highlighted in bold text.  

Library of Congress: The Signal: Hybrid Born-Digital and Analog Special Collecting: Megan Halsband on the SPX Comics Collection

planet code4lib - Mon, 2014-09-08 17:29

Megan Halsband, Reference Librarian with the Library of Congress Serial and Government Publications Division.

Every year, The Small Press Expo in Bethesda, Md brings together a community of alternative comic creators and independent publishers. With a significant history of collecting comics, it made sense for the Library of Congress’ Serial and Government Publications Division and the Prints & Photographs Division to partner with SPX to build a collection documenting alternative comics and comics culture. In the last three years, this collection has been developing and growing.

While the collection itself is quite fun (what’s not to like about comics), it is also a compelling example of the way that web archiving can complement and fit into work developing a special collection. To that end, I am excited to talk with Megan Halsband, Reference Librarian with the Library of Congress Serial and Government Publications Division and one of the key staff working on this collection as part of our Content Matters interview series.

Trevor: First off, when people think Library of Congress I doubt “comics” is one of the first things that comes to mind. Could you tell us a bit about the history of the Library’s comics collection, the extent of the collections and what parts of the Library of Congress are involved in working with comics?

Megan: I think you’re right – the comics collection is not necessarily one of the things that people associate with the Library of Congress – but hopefully we’re working on changing that! The Library’s primary comics collections are two-fold – first there are the published comics held by the Serial & Government Publications Division, which appeared in newspapers/periodicals and later in comic books, as well as the original art, which is held by the Prints & Photographs Division.

Example of one of the many comics available through The Library of Congress National Digital Newspaper Program. The End of a Perfect Day. Mohave County miner and our mineral wealth (Kingman, Ariz.) October 14, 1921, p.2.

The Comic Book Collection here in Serials is probably the largest publicly available collection in the country, with over 7,000 titles and more than 125,000 issues. People wonder why our section at the Library is responsible for the comic books – and it’s because most comic books are  published serially.  Housing the comic collection in Serials also makes sense, as we are also responsible for the newspaper collections (which include comics). The majority of our comic books come through the US Copyright Office via copyright deposit, and we’ve been receiving comic books this way since the 1930?s/1940?s.

The Library tries to have complete sets of all the issues of major comic titles but we don’t necessarily have every issue of every comic ever published (I know what you’re thinking and no, we don’t have an original Action Comics No. 1 – maybe someday someone will donate it to us!). The other main section of the Library that works with comic materials is Prints & Photographs – though Rare Book & Special Collections and the area studies reading rooms probably also have materials that would be considered ‘comics.’

Trevor: How did the idea for the SPX collection come about? What was important about going out to this event as a place to build out part of the collection? Further, in scoping the project, what about it suggested that it would also be useful/necessary to use web archiving to complement the collection?

Megan: The executive director of SPX, Warren Bernard, has been working in the Prints & Photographs Division as a volunteer for a long time, and the collection was established in 2011 after an Memorandum of Understanding was signed between the Library and SPX. I think Warren really was a major driving force behind this agreement, but the curators in both Serials and Prints & Photographs realized that our collections didn’t include materials from this particular community of creators and publishers in the way that it should.

Small Press Expo floor in 2013

Given that SPX is a local event with an international reputation and awards program (SPX awards the Ignatz) and the fact that we know staff at SPX, I think it made sense for the Library to have an ‘official’ agreement that serves as an acquisition tool for material that we wouldn’t probably otherwise obtain. Actually going to SPX every year gives us the opportunity to meet with the artists, see what they’re working on and pick up material that is often only available at the show – in particular mini-comics or other free things.

Something important to note is that the SPX Collection – the published works, the original art, everything – is all donated to the Library. This is huge for us – we wouldn’t be able to collect the depth and breadth of material (or possibly any material at all) from SPX otherwise.  As far as including online content for the collection, the Library’s Comics and Cartoons Collection Policy Statement (PDF) specifically states that the Library will collect online/webcomics, as well as award-winning comics. The SPX Collection, with its web archiving component,  specifically supports both of these goals.

Trevor:  What kinds of sites were selected for the web archive portion of the collection? In this case, I would be interested in hearing a bit about the criteria in general and also about some specific examples. What is it about these sites that is significant? What kinds of documentation might we lose if we didn’t have these materials in the collection?

Archived web page from the American Elf web comic.

Megan: Initially the SPX webarchive (as I refer to it – though its official name is Small Press Expo and Comic Art Collection) was extremely  selective – only the SPX website itself and the annual winner of the Ignatz Award for Outstanding Online Comic were captured.  The staff wanted to see how hard it would be to capture websites with lots of image files (of various types). Turns out it works just fine (if there’s not paywall/subscriber login credentials required) – so we expanded the collection to include all the Ignatz nominees in the Outstanding Online Comic category as well.

Some of these sites, such as Perry Bible Fellowship and American Elf, are long-running online comics who’s creators have been awarded Eisner, Harvey and Ignatz awards. There’s a great deal of content on these websites that isn’t published or available elsewhere – and I think that this is one of the major reasons for collecting this type of material. Sometimes the website might have initial drafts or ideas that later are published, sometimes the online content is not directly related to published materials, but for in-depth research on an artist or publication, often this type of related content is extremely useful.

Trevor: You have been working with SPX to build this collection for a few years now. Could you give us an overview of what the collection consists of at this point? Further, I would be curious to know a bit about how the idea of the collection is playing out in practice. Are you getting the kinds of materials you expected? Are there any valuable lessons learned along the way that you could share? If anyone wants access to the collection how would they go about that?

Megan: At this moment in time, the SPX Collection materials that are here in Serials include acquisitions from 2011-2013, plus two special collections that were donated to us, the Dean Haspiel Mini-Comics Collection and the Heidi MacDonald Mini-Comics Collection.  I would say that the collection has close to 2,000 items (we don’t have an exact count since we’re still cataloging everything) as well as twelve websites in the web archive. We have a wonderful volunteer who has been working on cataloging items from the collection, and so far there are over 550 records available in the Library’s online catalog.

Mini comics from the SPX collection

Personally, I didn’t have any real expectations of what kinds of materials we would be getting – I think that definitely we are getting a good selection of mini-comics, but it seems like there are more graphic novels that I anticipated. One of the fun things about this collection are the new and exciting things that you end up finding at the show – like an unexpected tiny comic that comes with its own magnifying glass or an oversize newsprint series.

The process of collecting has definitely gotten easier over the years. For example, the Head of the Newspaper Section, Georgia Higley, and I just received the items that were submitted in consideration for the 2014 Ignatz Awards. We’ll be able to prep permission forms/paperwork in advance of the show for the materials we’re keeping from this material, and it will help us cut down on potential duplication. This is definitely a valuable lesson learned! We’ve also come up with a strategy for visiting the tables at the show – there are 287 tables this year – so we divide up the ballroom between four of us (Georgia and I, as well as two curators from Prints & Photographs – Sara Duke and Martha Kennedy) to make it manageable.

We also try to identify items that we know we want to ask for in advance of the show – such as ongoing serial titles or debut items listed on the SPX website – to maximize our time when we’re actually there. Someone wanting to access the collection would come to the Newspaper & Current Periodical Reading Room to request the comic books and mini-comics. Any original art or posters from the show would be served in the Prints & Photographs Reading Room. As I mentioned – there is still a portion of this collection that is unprocessed – and may not be immediately accessible.

Trevor: Stepping back from the specifics of the collection, what about this do you think stands for a general example of how web archiving can complement the development of special collections?

Megan: One of the true strengths of the Library of Congress is that our collections often include not only the published version, but also the ephemeral material related to the published item/creator, all in one place. From my point of view, collecting webcomics gives the Library the opportunity to collect some of this ‘ephemera’ related to comics collections and only serves to enhance what we are preserving for future research. And as I mentioned earlier, some of the content on the websites provides context, as well as material for comparison, to the physical collection materials that we have, which is ideal from a research perspective.

Trevor:  Is there anything else with web archiving and comics on the horizon for your team? Given that web comics are such significant part of digital culture I’m curious to know if this is something you are exploring. If so, is there anything you can tell us about that?

We recently began another web archive collection to collect additional webcomics beyond those nominated for Ignatz Awards – think Dinosaur Comics and XKCD. It’s very new (and obviously not available for research use yet) – but I am really excited about adding materials to this collection. There are a lot of webcomics out there – and I’m glad that the Library will now be able to say we have a selection of this type of content in our collection! I’m also thinking about proposing another archive to capture comics literature and criticism on the web – stay tuned!

Pages

Subscribe to code4lib aggregator