You are here

Feed aggregator

Ariadne Magazine: Mining the Archives: Metadata Development and Implementation

planet code4lib - Fri, 2015-02-13 14:15

Martin White looks through the Ariadne archive to track the development and implementation of metadata in a variety of settings.

I was an early starter in the world of metadata. Within hours of arriving at the offices of the British Non-Ferrous Metals Research Association in Euston Street, London, in 1970 to start a career as an information scientist I was writing my first abstract. ‘Writing’ is the correct verb as my A3 abstract would be typed up on an IBM golfball typewriter for production. At the bottom of this form was a section called ‘Index Terms’ and it was made very clear at the outset that mistakes in the abstract were regrettable, but mistakes in indexing were unforgivable.

read more

Ariadne Magazine: SUNCAT: Ten Years and Beyond

planet code4lib - Fri, 2015-02-13 14:13

Celia Jenkins charts the beginnings of SUNCAT, its development over the last ten years and what the future holds for the service.

2013 marked the 10th anniversary of SUNCAT. Back in 2003, SUNCAT (Serials Union CATalogue) started as a project undertaken by EDINA [1] in response to an observed need for better journals information in the UK, which was identified in the UKNUC report [2].

read more

Ariadne Magazine: SUNCAT: Ten Years and Beyond

planet code4lib - Fri, 2015-02-13 14:13

Celia Jenkins charts the beginnings of SUNCAT, its development over the last ten years and what the future holds for the service.

2013 marked the 10th anniversary of SUNCAT. Back in 2003, SUNCAT (Serials Union CATalogue) started as a project undertaken by EDINA [1] in response to an observed need for better journals information in the UK, which was identified in the UKNUC report [2].

read more

Ariadne Magazine: The Value of Open Access Publishing to Health and Social Care Professionals in Ireland

planet code4lib - Fri, 2015-02-13 14:04

Aoife Lawton and Eimear Flynn discuss the value of open access publishing for allied health professions and their work in the health services with particular reference to Ireland.

This article will focus on how open access publishing may add value to a number of health and social care professionals and their work in the health services. The results of two recent surveys are explored in relation to the research activity, barriers and awareness about open access publishing by health and social care professionals (HSCPs) working in the Irish health system.

read more

Ariadne Magazine: The Value of Open Access Publishing to Health and Social Care Professionals in Ireland

planet code4lib - Fri, 2015-02-13 14:04

Aoife Lawton and Eimear Flynn discuss the value of open access publishing for allied health professions and their work in the health services with particular reference to Ireland.

This article will focus on how open access publishing may add value to a number of health and social care professionals and their work in the health services. The results of two recent surveys are explored in relation to the research activity, barriers and awareness about open access publishing by health and social care professionals (HSCPs) working in the Irish health system.

read more

Library of Congress: The Signal: Boxes of Hard Drives and Other Challenges at WGBH: An NDSR Project Update

planet code4lib - Fri, 2015-02-13 13:57

The following is a guest post by Rebecca Fraimow, National Digital Stewardship Resident at WGBH in Boston

Rebecca Fraimow

I have a pretty comprehensive list of goals to accomplish over the course of my time as the National Digital Stewardship Resident at WGBH’s Media, Library and Archives. That is:

  1. Document WGBH’s existing ingest workflow for production media and make recommendations for improvement.
  2. Design, implement, and complete an ingest process for over 70 hard drives worth of content created for the American Archive project, which needs to be backed up on LTO with appropriate metadata.
  3. Research the file failures that WGBH discovered last summer when initially pulling video files out of networked storage and putting them on aforementioned hard drives.
  4. Create a video webinar (or series of video webinars) putting together a set of digital media recommendations to share with other public media stations, based on everything above.

The scope of the work could have been overwhelming, but the structure of the projects has actually flowed very naturally. Starting with phase one – working with WGBH’s workflow – allowed me to ease into the daily operations of the archive. This involves QCing (or, checking for Quality Control) and editing the Filemaker metadata that WGBH receives along with the assets from every production, checking for consistency across the different deliverables and getting familiar with the master database and the institutional workflow as it currently stands.

At the time I arrived, the archives staff had only recently acquired a set of LTO-6 decks, the latest version of the Linear Tape-Open magnetic tape data storage technology.  As I got more comfortable understanding the needs of the archive, I also started looking at ways to make the LTO-6 workflow more standardized and eventually wrote a script to automate the generation and comparison of checksums for files during batch ingest. These smaller-scale projects served as building blocks for the creation of a set of workflow diagrams showing the ingest process as it exists at WGBH now, and the ingest process as we think it should exist in the future.

As for phase two, I knew that was ready to kick off when I walked into my cubicle one day and saw that it was entirely filled with boxes of hard drives…

At this point, I’d worked extensively with WGBH’s LTO system during phase one, so I was familiar with the possibilities and limitations of the technology and could put that knowledge to use in designing my own personal workflow for this massive ingest process. An LTO-6 tape, uncompressed and formatted as a Linear Tape File System (which allows the computer to directly access the tape as it would a hard drive), holds about 2.5 TB of data. To write this much data from hard drive to tape, using a high-speed connection such as a SATA or USB 3, takes about 4-5 hours. When you’re stuck with a slower connection, such as USB 2, it takes exponentially longer. We also had to generate metadata to live with the files and be confident that all the information that went onto the tape was 100% accurate, because data cannot truly be deleted from an LTO without erasing the whole tape.

Taking all these factors into account, we designed a workflow that included removing all hard drives from their casings, (many of which did not include the kind of connections that we needed), barcoding them, and accessing them using high-speed docking stations. I also wrote a script that would allow me to batch-generate metadata in the Dublin Core-based PBCore standard for audiovisual material, incorporating technical information provided by the media file analysis program MediaInfo as well as MD5 checksums, before transferring files to LTO.   While it’s not all running perfectly smoothly yet and there are always new complications to discover, at this point the workflow is streamlined enough that I can start using the hours when the computer is processing checksums or transfers to dedicate to working on phase three of the project.

Phase three is a new addition to the project plan as initially outlined by WGBH – this became part of the task list last summer, when WGBH started receiving very worrisome reports that a high percentage of the video files they were sending to be included in the American Archive project were failing.  These video files were either showing severe signs of corruption in QC or failing to open at all. The persistent difficulties that the archives department had in pulling these files off of WGBH’s institutional LTO 4 tapes over network storage was part of the impetus for WGBH acquiring their own, directly connected LTO 6 decks, which led directly to all the work I’ve done above. Now my job is to analyze the failures and see if I can figure out why they occurred.

While I’m still in the beginning phases of this research, so far I’ve managed to rule out the idea that the files are getting corrupted directly on tape; checksum analysis of the files stored on the tapes reveal that they still have the same unique signature as they did before they were written to LTO. I’ve also discovered that the variety of different failure types represented are most likely due to the different structures of the video files themselves – specifically, differences in the placement of the “moov atom,” a section of the media file that contains structural metadata for the file as a whole, and without which the file cannot be read.

Atomic structure of a Quicktime video file, showing the various different elements of data that make up the file structure (generated by Apple’s Atom Inspector).

I recently gave a presentation about these failures and my research into the problem at Code4Lib 2015 and will be sharing my slides, as well as my planned next steps for investigation, on the NDSR Boston blog.

As for phase four  – creating a video webinar – well, that’s not coming up for another two months, which is probably a good thing given that this post is already getting too long.

My residency work so far  has involved everything from graphing out workflows to writing bash scripts to batch editing XML metadata to taking apart hard drives (and putting them back together, and then taking them apart again…). There’s a new and unexpected challenge to conquer every day – it keeps me on my toes in the best possible way. One of the greatest things about digital preservation as a field is how it forces us all to constantly keep learning and pushing our work forward, and I’m incredibly excited to be a part of it.

Roy Tennant: Tennant’s Technology Tenets

planet code4lib - Thu, 2015-02-12 22:08

tenet — a principle or belief

Having worked in libraries my entire adult life (I began by volunteering at 17), I’ve seen a lot of technology come and go. I like to say that I’ve forgotten more library technology than most young librarians know. And mostly I’m happy about that. If an acoustic coupler modem never again darkens my door, color me happy.

But along with all of that time on the road comes lessons learned. Opinions formed and solidified. Perhaps even calcified. So I’ve decided it’s time to dump a few of mine on my largely unsuspecting (and admittedly tiny) readership.

  1. Every technology can be improved upon or superseded. Toilet paper seems to be a technology that has reached a logical peak, but even there I’m not sure it can’t be improved upon. Just don’t ask me how.
  2. When adopting a new technology, neither an early adopter nor laggard be. Of course there are exceptions, or else we wouldn’t have early adopters, and there are those who aren’t paying attention, or else we wouldn’t have laggards. But generally speaking, you and your organization is going to want to implement a new technology after it has been tested and improved but before it’s virtually on its way out.
  3. The pace of technological change is inconsistent and unreliable. We tend to think about progress as advancing at a relatively constant pace, with new things happening all the time. Actually, technological change comes in fits and starts, punctuated by significant pauses in innovation. I happen to think we are in one of those pauses now, having made it through the revolutionary early days of the Internet and the web.
  4. A technology that solves a problem sufficiently will tend to stagnate. How much has the simple pencil evolved? Not much, in recent decades.
  5. A stagnant technology is one that is ripe for disruption. Just look at the vacuum cleaner, which went relatively unchanged for decades until Dyson came along and decided to accomplish the task in a completely new way. Then the entire rest of the market followed along, but they had been perfectly happy to sell us the same basic vacuum cleaner for years.
  6. The best technology cannot overcome the best marketing. Just look at VHS vs. Betamax. Betamax was considered the best technology, but VHS was rolled out more effectively. By “marketing” I’m including all of the various things you can do to encourage adoption, including the kind of consortium of organizations that helped VHS win the day.
  7. Your hardest problems are not technical, they’re political. If you’ve been working for longer than about a year this hardly needs explaining.
  8. This list is incomplete, not correct in every situation, and will change over time. Just like technology.

So sue me. Or better yet, argue with me in the comments. I can take it.


Image by Will Lion, Creative Commons BY-NC-ND 2.0

Cynthia Ng: Code4Lib 2015: Closing Keynote

planet code4lib - Thu, 2015-02-12 18:15
While I didn’t make it to Code4Lib 2015 this year, I did manage to catch the closing keynote on the livestream, so here are the notes. Architect and Wanderlust: The Web and Open Things Andromeda Yelton @ThatAndromeda Links and ideas from talk Nine years ago, we were in a much smaller building. No one knew … Continue reading Code4Lib 2015: Closing Keynote

District Dispatch: In progress we trust

planet code4lib - Thu, 2015-02-12 17:28

At yesterday’s copyright briefing for new members and staff of the U.S House of Representatives, six aspects of copyright were discussed: the purpose of the copyright law, copyright term and extension, formalities, statutory damages, fair use and first sale—all in ninety minutes, amazing given the complexity of the topic and the fact that all of the panelists were lawyers (who have a tendency to talk at length). The panel of speakers was ably facilitated by Zach Graves from the R Street Institute (another lawyer!), and each panelist provided remarks with choice tidbits that even a “copyright know it all” like myself could savor—in part due to the fact that I needed to eat something because all of the bag lunches were gone when I arrived.

Protected by copyright for the life of the child plus 70 years.

The first speaker was Mike Godwin, Innovation Policy Director for the R Street Institute, who provided remarks on the purpose of the copyright law. When looking at what we tend to call “the copyright clause” in the U.S. Constitution, Godwin suggested an alternative name—”the progress clause.”

“To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries…” (U.S. Constitution, Section 8, Clause 8)

It is a helpful way to think about it. Both rights holders and users of creative works progress in some way—monetarily for rights holders, of course, but also progress achieved by sharing and gaining knowledge, by enjoying books or films, by being inspired to create new original and creative works.

Jonathan Band, from the Library Copyright Alliance (LCA) discussed how the extension of the copyright term (from 14 years and one 14 year renewal to life of the author plus 70 years) hinders the creation of new works. As copyrighted works are protected for longer and longer times, more content is prevented from moving to the public domain, providing less material available to freely build on. Extending the term hinders the dissemination of existing knowledge as demonstrated by a chart of books available for purchase on Amazon.

By Paul Heald

There’s an enormous amount of content that is just not readily available for purchase because incremental extensions of the copyright term also retroactively protected materials that were bound for the public domain. Thus, publishers and other distributors are unable to re-publish these works still under protection. Alas, out of print books.

In addition, there are no longer any requirements to formally demonstrate or seek copyright protection. Because creative works are protected by copyright law at the point of fixation, even a child’s doodle is protected by copyright for life of the child plus 70 years. Lots of protected works are everywhere, and they aren’t going to the public domain any time soon.

Sherwin Siy, vice president for Legal Affairs for Public Knowledge, spoke about the statutory damages available to rights holder in a favorable court decision – $750 to $30,000 per work infringed. In a world where digital copies are made all of the time when using computing technology, an innocent infringer could easily rack up a high penalty if found guilty. Rights holders are more likely to choose statutory damages over actual damages because they don’t have to provide the evidence to demonstrate that actual money was lost due to the infringement.

Rebecca Tushnet, law professor at Georgetown University Law Center, provided examples of services and activities that would not be possible without fair use— search engines, Facebook, fan fiction (check out Homelamb), digitizing one’s CD collection, preservation, reverse engineering, and the list goes on. Tushnet called fair use a “distinctly an American creation” that supports both free markets and free speech.

Siy then spoke about first sale – the exception that enables secondary markets and library lending. First sale allows that once the library has lawfully acquired a copy of a protected work, the library can distribute it to others, furthering the free flow of information. In the digital environment, contracts and non-negotiated license agreements can circumvent library lending. A library cannot loan songs downloaded from iTunes because the license restricts use to “personal, non-commercial only.”

I almost always enjoy reading, hearing and talking about copyright, and I hope the copyright newbies packed in the standing room only meeting room in the Rayburn Building gained new understanding—there is a lot more to copyright than piracy. Copyright is a good thing. It is about progress, for all.

The post In progress we trust appeared first on District Dispatch.

Ed Summers: #c4l15

planet code4lib - Thu, 2015-02-12 16:45

code4lib 2015 is about to kick off in Portland this morning. Unfortunately I couldn’t make it this year, but I’m looking forward to watching the livestream over the next few days. Thanks so much to the conference organizers for setting up the livestream. The schedule has the details about who is speaking when.

As a little gift to real and virtual conference goers (mostly myself) I quickly created a little web app that will watch the Twitter stream for #c4l15 tweets, and keep track of which URLs people are talking about. You can see it running, at least while the conference is going here.

I’ve done this sort of thing in an ad hoc way with twarc and some scripts–mostly after (rather than during) an event. For example here’s a report of URLs mentioned during #dlfforum. But I wanted something a bit more dynamic. As usual the somewhat unkempt code is up on Github as a project named earls, in case you have ideas you’d like to try out.

earls is a node app that listens to Twitter’s filter stream API for tweets mentioning #c4l15. When it finds one it then looks for 1 or more links in the tweet. Each link is fetched (which also unshortens it), it tries to parse any HTML (thanks cheerio) to find a page title, and then stashes these details as well as the tweet in redis.

When you load the page it will show you the latest counts for all URLs it has found so far. Unfortunately at the moment you need to reload the page to get an update. If I have time I will work on making it update live in the page with earls could be used for other conferences, and ought to run pretty easily on heroku for free.

Oh, and you can see the JSON data here in case you have other ideas of things you’d like to do with the data.

Have a superb conference you crazy dreamers and doers!

Andromeda Yelton: #c4l15 talk extras

planet code4lib - Thu, 2015-02-12 16:00

These are links, background reading, and photo credits for my code4lib 2015 closing keynote “Architect for wanderlust: the web and open things”. Also, slides [PDF].

Links and ideas!!!

I mention a bunch of links in the talk and here they are so you can click on them. There’s also a bunch of ideas I mention, or allude to, or had in the back of my head, or wanted to put in this talk but couldn’t. Here you go. They fit into the talk in roughly this order.

This talk is under a CC BY-NC-SA license, except for third-party copyrighted content used here by permission.

Photo credits!!!

LaSells Stewart Center at OSU
Public domain
M.O. Stevens

meh cookie
Rick Harris / rickharris

sean hobson / seanhobson

408 status cat
Tomomi / girliemac

Arab spring
CC BY-SA 3.0
Essam Sharaf

code4lib montage
all rights reserved, used by permission
Ray Schwartz / schwartzray
all rights reserved, used by permission
Ray Schwartz / schwartzray
all rights reserved, used by permission
Declan Fleming / bigdpix
all rights reserved, used by permission
Declan Fleming / bigdpix
CC BY-NC-SA, by permission
Ranti Junus / ranti
CC BY-NC-ND, by permission
Cynthia Ng / ladyartemis

the Duck
all rights reserved, used by permission
Max Wang

story sculpture
schizoform / schizoform

passing of the duck
all rights reserved, used by permission
beelockwood / beelockwood

icelandic pony
Thomas Quine / quinet

selfie in a fractured mirror
David Goehring / CarbonNYC

Parts and Crafts
all rights reserved, used by permission
partsandcrafts / partsandcrafts

Parts & Crafts shelves
publiclaboratory / publiclaboratory

Thanks for listening.

OCLC Dev Network: WorldCat Metadata API Maintenance February 15

planet code4lib - Thu, 2015-02-12 14:00

The WorldCat Metadata API will be down worldwide for a brief maintenance lasting about 20 minutes and occurring between 2am and 8am ET on February 15th.

LITA: ALA Midwinter 2015 LITA Preconference Review: How User Testing Can Improve the User Experience of Your Library Website

planet code4lib - Thu, 2015-02-12 13:00

Editor’s note: This is a guest post by Tammi Owens

Last July, Winona State University’s Darrell W. Krueger Library rolled out a completely new website. This January we added to that new user experience by upgrading to LibGuides and LibAnswers v2. Now, we’re looking for continuous improvement through continuous user experience (UX) testing. Although I have some knowledge of the history and general tenets of user experience and website design, I signed up for this LITA pre-conference to dive into some case studies and ask specific questions of UX specialists. I hoped to come away with a concrete plan or framework for UX testing at our library. Specifically, I wanted to know how to implement the results of UX testing on our website.

The instructors

Kate Lawrence is the Vice President of User Research at EBSCO. Deirdre Costello is the Senior User Experience Researcher at EBSCO. I was a little nervous this seminar was going to be surreptitious vendor marketing, but there was no EBSCO marketing at all. Kate brought decades of experience in the user research sector to our conversations, and Dierdre, as a recent MLIS with library experience, was able to connect the dots between research and practice.

The session

There were six participants in our session, with a mix of public and university libraries represented. Participants who attended the session are at all stages of website redesign and have different levels of control over our institutional websites. Some of us report to committees, while others have complete ownership of their library’s site. As in the Python pre-conference, participant experience levels were mixed.

The session was divided into four main sections: “Why usability matters,” “Website best practices,” “Usability: Process,” and an overview of, a company EBSCO uses during their research. Kate and Deirdre presented each section with a slide deck, but interspersed videos and discussion into their formal presentation.

The introductions to usability and website best practices were review for me, but offered enough additional information and examples that I continued to be engaged throughout the morning. Some memorable moments for me were watching and discussing Steve Krug’s usability demo, and visiting two websites: and

After lunch, Kate went step-by-step through a typical usability testing process in her department. She has nine steps in her process (yes, nine!), but after she explained each step it somehow went from overwhelming and scary to doable and exciting.

After another break, Kate and Deirdre invited Sadaf Ahmed in to speak about the company Unfortunately, this was less hands-on than I expected it to be, but I was gobsmacked by the information that could be gleaned quickly using the tool. (In short: students use Google a lot more than I ever imagined.)

At the end of the day, Kate and Deirdre set aside time for us to create research questions with which to begin our UX testing. By that time, though, everyone was overloaded with new information and we all agreed we’d rather go home, apply our knowledge, and contact Kate and Deirdre directly for feedback.

Further study

To make sure we could implement user testing at our own institutions, Kate and Deirdre distributed USB drives filled with research plans, presentations, and reports. If they referenced it during the day, it went on our USB drives. This is proving to be beneficial as I make sense of my own notes from the session and begin the research plan for our first major UX test. Additionally, Kate ordered several books for all attendees to read in the coming weeks. These items alone, along with the new network we created among attendees during the day, may be the most valuable part of the session going forward.

Review in a nutshell

This pre-conference was, for me, well worth the time and money to attend. The case studies we discussed contributed to my understanding of how to ask small questions about our website in order to make a big impact on user experience. I left with exactly the tools I desired: a framework for user testing implementation, and connections to colleagues who are willing to help us make it happen at Winona State.

Tammi Owens is the Emerging Services and Liaison Librarian at Winona State University in Winona, MN. Along with being a liaison to three academic departments, her position at the library means she often coordinates technical projects and gets to play with cool toys. Find her on Twitter (@tammi_owens) during conferences and over email ( otherwise.

LITA: What is a Librarian?

planet code4lib - Thu, 2015-02-12 03:09


When people ask me what I do, I have to admit I feel a bit of angst. I could just say I’m a librarian. After all I’ve been in the library game for nearly 10 years now. I went to library school, got a library degree, and I now work at FSU’s Strozier library with a bunch of librarians on library projects. It feels a bit disingenuous to call myself a librarian though because the word “librarian” is not in my job title. Our library, like all others, draws a sharp distinction between librarians and staff. Calling myself a librarian may feel right, but it is a total lie in the eyes of Human Resources. If I take the HR stance on my job, “what I do” becomes  a lot harder to explain. The average friend or family member has a vague understanding of what a librarian is, but phrases like “web programming” and “digital scholarship” invite more questions than they answer (assuming their eyes don’t glaze over immediately and they change the subject). The true answer about “what I do” lies somewhere in the middle of all this, not quite librarianship and not just programming. When I first got this job, I spent quite a bit of time wrestling with labels, and all of this philosophical judo kept returning to the same questions: What is a librarian, really? And what’s a library? What is librarianship? These are probably questions that people in less amorphous positions don’t have to think about. If you work at a reference desk or edit MARC records in the catalog, you probably have a pretty stable answer to these questions.

At a place like Strozier library, where we have a cadre of programmers with LIS degrees and job titles like Digital Scholarship Coordinator and Data Research Librarian, the answer gets really fuzzy. I’ve discussed this topic with a few coworkers, and there seems to be a recurring theme: “Traditional Librarianship” vs. “What We Do”. “Traditional Librarianship” is the classic cardigan-and-cats view we all learned in library school, usually focusing on the holy trinity of reference, collection development and cataloging. These are jobs that EVERY library has to engage in to some degree, so it’s fair to think of these activities as a potential core for librarianship and libraries. The “What We Do” part of the equation encapsulates everything else: digital humanities, data management, scholarly communication, emerging technologies, web programming, etc. These activities have become a canonical part of the library landscape in recent years, and reflect the changing role libraries are playing in our communities. Libraries aren’t just places to ask questions and find books anymore.

The issue as I see it now becomes how we can reconcile the “What We Do” with the “Traditional” to find some common ground in defining librarianship; if we can do that then we might have an answer to our question. An underlying characteristic of almost all library jobs is that, even if they don’t fall squarely under one of the domains of this so-called “Traditional Librarianship”, they still probably include some aspects of it. Scholarly communication positions could be seen as a hybrid collection development/reference position due to the liaison work, faculty consultation and the quest to obtain Open Access faculty scholarship for the institutional repository. My programming work on the FSU Digital Library could be seen as a mix of collection development and cataloging since it involves getting new objects and metadata into our digital collections. The deeper I pursue this line of thinking, the less satisfying it gets. I’m sure you could make the argument that any job is librarianship if you repackage its core duties in just the right way. I don’t feel like I’m a librarian because I kinda sorta do collection development and cataloging.

I feel like a librarian because I care about the same things as other librarians. The same passion that motivates a “traditional” librarian to help their community by purchasing more books or helping a student make sense of a database is the same passion that motivates me to migrate things into our institutional repository or make a web interface more intuitive. Good librarians all want to make the world a better place in their own way (none of us chose librarianship because of the fabulous pay). In this sense, I suppose I see librarianship less as a set of activities and more as a set of shared values and duties to our communities. The ALA’s Core Values of Librarianship does a pretty good job of summing things up, and this has finally satisfied my philosophical quest for the Platonic ideal of a librarian. I no longer see place of work, job title, duties or education as having much bearing on whether or not you are truly a librarian. If you care about information and want to do good with it, that’s enough for me. Others are free to put more rigorous constraints on the profession if they want, but in order for libraries to survive I think we should be more focused on letting people in than on keeping people out.

What does librarianship mean to you? Following along with other LITA bloggers as we explore this topic from different writers’ perspectives. Keep the conversation going in the comments!

Galen Charlton: The Vanilla Password Reflex, or libraries and security education by example

planet code4lib - Thu, 2015-02-12 02:54

At the first face-to-face meeting of the LITA Patron Privacy Technologies Interest Group at Midwinter, one of the attendees mentioned that they had sent out an RFP last year for library databases. One of the questions on the RFP asked how user passwords were stored — and a number of vendors responded that their systems stored passwords in plain text.

Here’s what I tweeted about that, and here is Dorothea Salo’s reply:


— Ondatra libskoolicus (@LibSkrat) January 31, 2015

This is a repeatable response, by the way — much like the way a hammer strike to the patellar ligament instigates a reflexive kick, mention of plain-text password storage will trigger an instinctual wail from programmers, sysadmins, and privacy and security geeks of all stripes.

Call it the Vanilla Password Reflex?

I’m not suggesting that you should whisper “plain text passwords” into the ear of your favorite system designer, but if you are the sort to indulge in low and base amusements…

A recent blog post by Eric Hellman discusses the problems with storing passwords in plain text in detail. The upshot is that it’s bad practice — if a system’s password list is somehow leaked, and if the passwords are stored in plain text, it’s trivially easy for a cracker to use those passwords to get into all sorts of mischief.

This matters, even “just” for library reference databases. If we take the right to reader privacy seriously, it has to extend to the databases offered by the library — particularly since many of them have features to store citations and search results in a user’s account.

As Eric mentions, the common solution is to use a one-way cryptographic hash function to transform the user’s password into a bunch of gobbledegook.

For example, “p@ssw05d” might be stored as the following hash:


To make it more secure, I might add some random salt and end up with the following salted hash:


To log in, the user has to prove that they know the password by supplying it, but rather than compare the password directly, the result of the one-way function applied to the password is compared with the stored hash.

How is this more secure? If a hacker gets the list of password hashes, they won’t be able to deduce the passwords, assuming that the hash function is good enough. What counts as good enough? Well, relatively few programmers are experts in cryptography, but suffice it to say that there does exist a consensus on techniques for managing passwords and authentication.

The idea of one-way functions to encrypt passwords is not new; in fact, it dates back to the 1960s. Nowadays, any programmer who wants to be considered a professional really has no excuse for writing a system that stores passwords in plain text.

Back to the “Vanilla Password Reflex”. It is, of course, not actually a reflex in the sense of an instinctual response to a stimulus — programmers and the like get taught, one way or another, about why storing plain text passwords is a bad idea.

Where does this put the public services librarian? Particularly the one who has no particular reason to be well versed in security issues?

At one level, it just changes the script. If a system is well-designed, if a user asks what their password is, it should be impossible to get an answer to the question. How to respond to a patron who informs you that they’ve forgotten their password? Let them know that you can change it for them. If they respond by wondering why you can’t just tell them, if they’re actually interested in the answer, tell them about one-way functions — or just blame the computer, that’s fine too if time is short.

However, libraries and librarians can have a broader role in educating patrons about online security and privacy practices: leading by example. If we insist that the online services we recommend follow good security design; if we use HTTPS appropriately; if we show that we’re serious about protecting reader privacy, it can only buttress programming that the library may offer about (say) using password managers or avoiding phishing and other scams.

There’s also a direct practical benefit: human nature being what it is, many people use the same password for everything. If you crack an ILS’s password list, you’ve undoubtedly obtained a non-negligible set of people’s online banking passwords.

I’ll end this with a few questions. Many public services librarians have found themselves, like it or not, in the role of providing technical support for e-readers, smartphones, and laptops. How often does online security come up during such interactions? How often to patrons come to the library seeking help against the online bestiary of spammers, phishers, and worse? What works in discussing online security with patrons, who of course can be found at all levels of computer savvy? And what doesn’t?

I invite discussion — not just in the comments section, but also on the mailing list of the Patron Privacy IG.

FOSS4Lib Recent Releases: Krikri - 0.1.3

planet code4lib - Thu, 2015-02-12 01:46

Last updated February 11, 2015. Created by Peter Murray on February 11, 2015.
Log in to edit this page.

Package: KrikriRelease Date: Friday, February 6, 2015

FOSS4Lib Updated Packages: Krikri

planet code4lib - Thu, 2015-02-12 01:45

Last updated February 11, 2015. Created by Peter Murray on February 11, 2015.
Log in to edit this page.

A Rails engine for metadata aggregation, enhancement, and quality control.
Digital Public Library of America uses Kri-Kri as part of
Heiðrún, its metadata ingestion system.

More information about Heidrun and Kri-kri can be found on DPLA's Technology Team site.

Package Type: Metadata ManipulationLicense: MIT License Package Links Development Status: In DevelopmentOperating System: Linux Releases for Krikri Programming Language: RailsRubyOpen Hub Link: Hub Stats Widget: works well with: Blacklight

FOSS4Lib Recent Releases: Fedora Repository - 4.1.0

planet code4lib - Thu, 2015-02-12 01:41

Last updated February 11, 2015. Created by Peter Murray on February 11, 2015.
Log in to edit this page.

Package: Fedora RepositoryRelease Date: Wednesday, February 4, 2015

District Dispatch: Not my Maytag!!

planet code4lib - Wed, 2015-02-11 22:52

Today I attended a briefing for new staff and members of Congress on copyright (free lunch). Event speakers included Mike Godwin, innovation policy director, R Street Institute; Rebecca Tushnet, professor, Georgetown University Law School; Jonathan Band, copyright counsel, American Library Association; and Sherwin Siy, vice president for legal affairs, Public Knowledge. The program was remarkable because the speakers were well prepared, smart, and covered the aspects of copyright that are most relevant today. I already know this stuff, and I didn’t get a bag lunch—they ran out because too many people came—but I thoroughly enjoyed the program. How do explain copyright to a full room of people who likely know nothing about copyright in less than 90 minutes?

What does this have to do with a washing machine?

Wait, I’m getting there.

Representative Blake Farenthold (R-TX) welcomed the group and talked briefly about the You Own Devices Act (YODA), reintroduced by Farenthold and Jared Polis (D-CO). Wednesday. YODA seeks to address the problem that consumers will increasingly face when buying a tangible product whose operation is controlled by licensed software connected to the internet. Dig: You bought the refrigerator but you only licensed the software necessary to run the appliance. You cannot transfer or sell that refrigerator and its software companion because you would violate the terms of your software contract, ultimately restricting your right to first sale.

Farenthold used the example of a flat screen television. You buy a used one from your “must have the latest shiny” neighbor, but when you try to watch a film via your Netflix account, the TV will not work. There are some ridiculous examples like the automated cat litter box I wrote about earlier. Remember the unlocking cell phone controversy. It required legislation to fix so consumers could lawfully transfer to another cell phone carrier.

Why are manufacturers making it so difficult to transfer products? Competition, baby. Sometimes it is about tying the consumer to one manufacturer to maximize profit—i.e only Company X sells the printer cartridge necessary for your printer. Licensing the software inside of your tractor makes it harder to repair your tractor yourself because it requires breaking the software and the copyright law. Instead you have to use company’s repair staff to fix your truck, and the cost is sky high because there is no competition. Secondary markets like Purple Heart or the used bookstore eBay have always been problematic for rights holders because they reap no money from the second sale.

If the Congress actually considers YODA in this Congressional session, wait for the sparks to fly. This is beyond the typical copyright controversies. This is about your freaking washing machine!

More on the copyright briefing in a future post, right here on the District Dispatch!

The post Not my Maytag!! appeared first on District Dispatch.

FOSS4Lib Recent Releases: ResCarta Toolkit - 5.1.0

planet code4lib - Wed, 2015-02-11 21:37
Package: ResCarta ToolkitRelease Date: Thursday, February 5, 2015

Last updated February 11, 2015. Created by rescarta on February 11, 2015.
Log in to edit this page.

ResCarta Toolkit version 5.1.0 incorporates one-click OCR and one-click AAT. The toolkit includes the Tesseract OCR engine for conversion of image to text and CMU Sphinx for transcription of audio files. The included Apache Tomcat server hosts the ResCarta-Web application which can perform full text search against both text and audio objects.


Subscribe to code4lib aggregator