You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 23 hours 23 min ago

Jenny Rose Halperin: New /contribute page

Wed, 2014-10-15 19:58

In an uncharacteristically short post, I want to let folks know that we just launched our new /contribute page.

I am so proud of our team! Thank you to Jess, Ben, Larissa, Jen, Rebecca, Mike, Pascal, Flod, Holly, Sean, David, Maryellen, Craig, PMac, Matej, and everyone else who had a hand. You all are the absolute most wonderful people to work with and I look forward to seeing what comes next!

I’ll be posting intermittently about new features and challenges on the site, but I first want to give a big virtual hug to all of you who made it happen and all of you who contribute to Mozilla in the future.

LITA: Jobs in Information Technology: October 15

Wed, 2014-10-15 17:55

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing.  Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Assistant Coordinator, Stacks and Circulation,  Colorado State University,  Fort Collins, CO

Digital Archivist, University of Georgia Libraries,  Athens,  GA

Metadata Systems Specialist, NYU, Division of Libraries, New York City,  NY

Visit the LITA Job Site for more available jobs and for information on submitting a  job posting.

Roy Tennant: The Great Plateau

Wed, 2014-10-15 15:37

I had what you might call an unusual early adulthood. Whereas most young adults march off to college and garner the degree that will define their life, I dropped out of high school at the 8th grade, attended an alternative high school (read dope-smoking, although I passed at the time) for two years, then dropped out entirely. The story is long, but I helped to build two dome homes in Indiana, built and slept in a treehouse through an Indiana winter, and returned to California where I had been mostly raised, two weeks after I turned 18, with not much more than bus fare and a duffle bag.

From there I built my own life, on my own terms, which meant (oddly enough, although there are reasons if you cared to ask) a job at the local community college library in the foothills of the Sierra Nevada Mountains and a life in the outdoors, which had always beckoned.

This is all background for the point I want to make. In the end, I paused before seriously attending college for about seven years. I dabbled in courses, I learned to run rivers and many other things. And that made all of the difference.

In the end, what made the difference was the timing. Had I entered college when I should have (in 1975), that would have been too early for the computer revolution. As it was, I entered college exactly with the computer revolution. I remember writing my first software program just as I was getting serious about pursuing my college education in the early 1980s, on a Commodore PET computer. My fate was sealed, and I didn’t even realize it.

Later, at Humboldt State University where I majored in Geography and minored in Computer Science, I wrote programs in FORTRAN to process rainfall data for my Geography professor. From there, I jumped on every single computer and network opportunity there was to be had.

I was an early and enthusiastic adopter (and proselytizer in the various organizations where I found work) for the Macintosh computer. I still was, when I joined OCLC seven years ago and broke the Microsoft stranglehold that still existed.

I was an operator of an early automated circulation system (CLSI) at Humboldt State. And not long after that, I co-wrote the first book about the Internet aimed at librarians.

So I am here to tell you, that after a career of being on the cutting edge, the cutting edge doesn’t seem so cutting anymore. We seem to have reached, in libraries and I would argue in society more generally, a technical plateau. We might see innovation around the edges, but there is nothing I can point to that is truly transformative like the Internet was.

This is not necessarily a problem. In fact, systemic, major change can be downright painful. Believe me, I lived it in trying to make others understand how transformative it would be when few actually wanted to hear it. But for someone like me who counted his salad days as finding and pursuing the next truly transformative technology, this feels like a desert. Well, call it a plateau.

A long straight stretch without much struggle, or altitude gain, or major benefit. It is what it is. But you will have to forgive me if I regret the days when massive change was obvious, and surprising, and massively enabling.

Photo of the Tonto Plateau, Grand Canyon National Park, by Roy Tennant,

David Rosenthal: The Internet of Things

Wed, 2014-10-15 15:00
In 1996, my friend Steven McGeady gave a fascinating and rather prophetic keynote address to the Harvard Conference on the Internet and Society. In his introduction, Steven said:
I was worried about speaking here, but I'm even more worried about some of the pronouncements that I have heard over the last few days, ... about the future of the Internet. I am worried about pronouncements of the sort: "In the future, we will do electronic banking at virtual ATMs!," "In the future, my car will have an IP address!," "In the future, I'll be able to get all the old I Love Lucy reruns - over the Internet!" or "In the future, everyone will be a Java programmer!"

This is bunk. I'm worried that our imagination about the way that the 'Net changes our lives, our work and our society is limited to taking current institutions and dialling them forward - the "more, better" school of vision for the future.I have the same worries that Steven did about discussions of the Internet of Things that looms so large in our future. They focus on the incidental effects, not on the fundamental changes. Barry Ritholtz points me to a post by Jon Evans at TechCrunch entitled The Internet of Someone Else's Things that is an exception. Jon points out that the idea that you own the Smart Things you buy is obsolete:
They say “possession is nine-tenths of the law,” but even if you physically and legally own a Smart Thing, you won’t actually control it. Ownership will become a three-legged stool: who physically owns a thing; who legally owns it; …and who has the ultimate power to command it. Who, in short, has root.What does this have to do with digital preservation? Follow me below the fold.

On a smaller scale than the Internet of Things (IoT), we already have at least two precursors that demonstrate some of the problems of connecting to the Internet huge numbers of devices over which consumers don't have "root" (administrative control). The first is mobile phones. As Jon says:
Your phone probably has three separate computers in it (processor, baseband processor, and SIM card) and you almost certainly don’t have root on any of them, which is why some people refer to phones as “tracking devices which make phone calls.” The second is home broadband routers. My friend Jim Gettys points me to a short piece by Vint Cerf entitled Bufferbloat and Other Internet Challenges that takes off from Jim's work on these routers. Vint concludes:
I hope it’s apparent that these disparate topics are linked by the need to find a path toward adapting Internet-based devices to change, and improved safety. Internet users will benefit from the discovery or invention of such a path, and it’s thus worthy of further serious research.Jim got sucked into working on these problems when, back in 2010, he got fed up with persistent network performance problems on his home's broadband internet service, and did some serious diagnostic work. You can follow the whole story, which continues, on his blog. But the short, vastly over-simplified version is that he discovered that Moore's Law had converted a potential problem with TCP first described in 1985 into a nightmare.

Back then, the idea of a packet switch with effectively infinite buffer storage was purely theoretical. A quarter of a century later, RAM was so cheap that even home broadband routers had packet buffers so large as to be almost infinite. TCP depends on dropping packets to signal that a link is congested. Very large buffers mean packets don't get dropped, so the sender never finds out the some link is congested, so it never slows down. Jim called this phenomenon "bufferbloat", and started a crusade to eliminate it. In less than two years, Kathleen Nichols and Van Jacobson working with Jim and others had a software fix to the TCP/IP stack, called CoDel.

CoDel isn't a complete solution, further work has produced even more fixes, but it makes a huge difference. Problem solved, right? All we needed to do was to deploy CoDel everywhere in the Internet that managed a packet buffer, which is every piece of hardware connected to it. This meant convincing every vendor of an internet-connected  device that they needed to adopt and deploy CoDel not just in new products they were going to ship, but in all the products that they had ever shipped that were still connected to the Internet.

For major vendors such as Cisco this was hard, but for vendors of consumer devices, including even Cisco's Linksys divison, it was simply impossible. There is no way for Linksys to push updates of the software to their installed base. Worse, many networking chips implement on-chip packet buffering; their buffer management algorithms are probably both unknowable and unalterable. So even though there is a pretty good fix for bufferbloat that, if deployed, would be a major improvement to Internet performance, we will have to wait for much of the hardware in the edge of the Internet to be replaced before we can get the benefit.

We know that the Smart Things the IoT is made of are full of software. That's what makes them smart. Software has bugs and performance problems like the ones Jim found. More importantly it has vulnerabilities that allow the bad guys to compromise the systems running it. Botnets assembled from hundreds of thousands of compromised home routers have been around from at least 2009 to the present. Other current examples include the Brazilian banking malware that hijacks home routers DNS settings, and the Moon worm that is scanning the Internet for vulnerable Linksys routers (who do you think would want to do that?). It isn't just routers that are affected. For example, network storage boxes have been hijacked to mine $620K worth of Dogecoin, and (PDF):
HP Security Research reviewed 10 of the most popular devices in some of the most common IoT niches revealing an alarmingly high average number of vulnerabilities per device. Vulnerabilities ranged from Heartbleed to Denial of Service to weak passwords to cross-site scripting.Just as with bufferbloat, its essentially impossible to eliminate the vulnerabilities that enable these bad guys. It hasn't been economic for low-cost consumer product vendors to provide the kind of automatic or user-approved updates that PC and smartphone systems now routinely provide; the costs of the bad guys attacks are borne by the consumer. It is only fair to mention that there are some exceptions. The Nest smoke detector can be updated remotely; Google did this when it was discovered that it might disable itself instead of reporting a fire. Not, as Vint points out, that the remote update systems have proven adequately trustworthy:
Digital signatures and certificates authenticating software’s origin have proven only partly successful owing to the potential for fabricating false but apparently valid certificates by compromising certificate authorities one way or another.See, for an early example, the Flame malware. Further, as Jon points out:
When you buy a Smart Thing, you get locked into its software ecosystem, which is controlled by its manufacturer, whether you like it or not.Even valid updates are in the vendor's interest, which may not be yours.

This will be the case for the Smart Things in the IoT too. The IoT will be a swamp of malware. In Charles Stross' 2011 novel Rule 34 many of the deaths Detective Inspector Liz Cavanaugh investigates are caused by malware-infested home appliances; you can't say you weren't warned of the risks of the IoT. Jim has a recent blog post about this problem, with links to pieces he inspired by Bruce Schneier and Dan Geer. All three are must-reads.

This whole problem is another example of a topic I've often blogged about, the short-term thinking that pervades society and makes investing, or even planning, now to reap benefits or avoid disasters in the future so hard. In this case, the disaster is already starting to happen.

Finally, why is this relevant to digital preservation? I've written frequently about the really encouraging progress being made in delivering emulation in browsers and as a cloud service in ways that make running really old software transparent. This solves a major problem in digital preservation that has been evident since Jeff Rothenberg's seminal 1995 article.

Unfortunately, the really old software that will be really easy for everyone to run will have all the same bugs and vulnerabilities it had when it was new. Because old vulnerabilities, especially in consumer products, don't go away with time, attempts to exploit really old vulnerabilities don't go away either. And we can't fix the really old software to make the bugs and vulnerabilities go away, because the whole point of emulation is to run the really old software exactly the way it used to be. So the emulated system will be really, really vulnerable and it will be attacked. How are we going to limit the damage from these vulnerabilities?

William Denton: Lehman Libraries

Wed, 2014-10-15 02:25

This summer I spied and acquired a copy of The Foolish Gentlewoman by Margery Sharp, who also wrote Cluny Brown and The Rescuers and sequels (none of which I’ve read). It’s the 1948 Canadian edition, published by Wm. Collins Sons & Co., Canada, at 70 Bond Street in Toronto (the Collins in HarperCollins).

What caught my eye was this, on the front endpapers:

I suppose Lehman Libraries was a private subscription library, but I’ve never heard of it and a quick search online didn’t turn anything up. If anyone knows anything about it I’d be happy to hear.

SearchHub: Stump the Chump is Coming to D.C.!

Tue, 2014-10-14 21:38

In just under a month, Lucene/Solr Revolution will be coming to Washington D.C. — and once again, I’ll be in the hot seat for Stump The Chump.

If you are not familiar with “Stump the Chump” it’s a Q&A style session where “The Chump” (That’s Me!) is put on the spot with tough, challenging, unusual questions about Lucene & Solr — live, on stage, in front of hundreds of rambunctious convention goers, with judges who have all seen and thought about the questions in advance and get to mock The Chump (still me) and award prizes to people whose questions do the best job of “Stumping The Chump”.

People frequently tell me it’s the most fun they’ve ever had at a Tech Conference — You can judge for yourself by checking out the videos from last years events: Lucene/Solr Revolution 2013 in Dublin, and Lucene/Solr Revolution 2013 in San Diego.

I’ll be posting more details in the weeks ahead, but until then you can subscribe to this blog (or just the “Chump” tag) to stay informed.

And if you haven’t registered for Lucene/Solr Revolution yet, what are you waiting for?!?!

FOSS4Lib Recent Releases: Umlaut - 4.0

Tue, 2014-10-14 20:09
Package: UmlautRelease Date: Monday, October 6, 2014

Last updated October 14, 2014. Created by Peter Murray on October 14, 2014.
Log in to edit this page.

This release is mostly back-end upgrades, including:

  • Support for Rails 4.x (Rails 3.2 included to make migration easier for existing installations, but recommend upgrading to Rails 4.1 asap, and starting with Rails 4.1 in new apps)
  • Based on Bootstrap 3 (Umlaut 3.x was Bootstrap 2)
  • internationalization/localization support
  • A more streamlined installation process with a custom installer

LITA: Midwinter Workshop Highlight: Meet the UX Presenters!

Tue, 2014-10-14 20:08

We asked our LITA Midwinter Workshop Presenters to tell us a little more about themselves and what to expect from their workshops in January. This week, we’re hearing from Kate Lawrence, Deirdre Costello, and Robert Newell, who will be presenting the workshop:

From Lost to Found: How User Testing Can Improve the User Experience of Your Library Website
(For registration details, please see the bottom of this blog post)

LITA: We’ve seen your formal bios but can you tell us a little more about you?

Kate: If I didn’t work as a user researcher, I would be a professional backgammon player or cake decorator (I am a magician with fondant!). Or both.

Deirdre: I’m horse crazy!

Robert: In a past life I was a professional actor. If you pay really really close attention (like, don’t blink), you might spot me in a few episodes of Friday Night Lights or Prison Break.

LITA: User Testing is a big area. Who is your target audience for this workshop?

Presenters: This is a perfect workshop for people who want to learn user testing in a supportive environment. We will teach people how to test their websites in the real world – we understand that time and other resources are limited. This is for anyone who wants to know what it’s like for patrons to try accessing their library’s resources through their website.

LITA: How much experience with UX do attendees need to succeed in the workshop?

Presenters: Experience isn’t required, but an understanding of the general UX field and goals is useful. Attendees are encouraged to come with a potential usability study topic in mind. From Robert: “You just need to be able to put your social scientist hat on and look at user testing as an informal (and fun!) psychology experiment.”

LITA: If your workshop was a character from the Marvel or Harry Potter universe, which would it be, and why?

Kate: Having just read the Harry Potter series with my two kids, I can say that our workshop will inspire like Dumbledore, give you a chuckle like those naughty Weasley twins, teach you like the astute Minerva McGonagle would, and leave you smiling with satisfaction just like the brilliant Hermione Grainger.

Deirdre: Marvel: definitely Wolverine. Tough and sassy with a heart of gold, calls everyone “bub.” Harry Potter: 100% Hermione. I’m an avid reader, rule-follower and overachiever. (LITA note, I think those are of Dierdre, maybe not the workshop ? )

Robert: I’m gonna say Mystique. Mystique can literally put herself in someone else’s shoes (human or Mutant). When we conduct usability testing, we’re directly observing what it’s like to be in the user’s shoes and we’re seeing things from their perspective.

LITA: Name one concrete thing your attendees will be able to take back to their libraries after participating in your workshop.

Kate: The knowledge about how to conduct a user test on their library site, a coupon for a free test from, and support and encouragement from a team of experienced researchers.

Deirdre: The skills to plan, recruit for and execute small-sample usability tests. The ability to communicate the findings for those tests in a way that will advocate for their users.

Robert: The ability to validate your ideas about your website with direct, reliable user feedback. Whenever you think, “This might work, but would it make sense to our users?” You’ll have the skills and tools to go find out.

LITA: What kind of gadgets/software do your attendees need to bring?

Presenters: Whatever note taking method you prefer; a laptop or mobile device to follow along is recommending but isn’t required. Kate recommends “A laptop. A pen and paper. A positive, can-do attitude!”

LITA: Respond to this scenario: You’re stuck on a desert island. A box washes ashore. As you pry off the lid and peer inside, you begin to dance and sing, totally euphoric. What’s in the box?

Kate: I’m assuming my family is on the island with me, and in that case – I want that box to contain Hershey’s hugs, the white chocolate kisses with milk chocolate swirls. I’m obsessed!

Deirdre: Hostess Orange Cupcakes.

Robert: A gallon of Coppertone Oil Free Faces SPF 50+ Sunscreen. I’m sorry but I’m fair skinned with a ton of freckles and a desert island scenario just screams melanoma to me.

Thank you to Kate, Deirdre, and Robert for giving us this interview! We’re looking forward to their UX Workshop at Midwinter in Chicago. We’ll hear from our other workshop presenters in the coming weeks!

More information about Midwinter Workshops. 

Registration Information: LITA members get one third off the cost of Mid-Winter workshops. Use the discount promotional code:  LITA2015 during online registration to automatically receive your member discount.  Start the process at the ALA web sites: Conference web site: Registration start page: LITA Workshops registration descriptions: When you start the registration process and BEFORE you choose the workshop, you will encounter the Personal Information page.  On that page there is a field to enter the discount promotional code:  LITA2015 As in the example below.  If you do so, then when you get to the workshops choosing page the discount prices, of $235, are automatically displayed and entered.  The discounted total will be reflected in the Balance Due line on the payment page. Please contact the LITA Office if you have any registration questions.

District Dispatch: Free Wi-fi in the Allegheny Mountains

Tue, 2014-10-14 19:13

Allegheny Mountains. Photo by Nicholas A. Tonelli via flickr.

Last week, Emily Sheketoff, executive director of the American Library Association (ALA) Washington Office, Cathleen Bourdon associate executive director of ALA Communication and Member Relations, and I (staff lackey) took a road trip to the Snowshoe resort in West Virginia to speak at the West Virginia Library Association Conference. The five-hour drive from D.C. to Snowshoe, W.V., was a pastoral treat, with fall leaves at their peak in the Allegheny Mountains.

There was a gas station in Warrensville where a gallon was only $3.09! The folksy diner there served a grilled cheese sandwich for $2.50. We saw a lot of cows (which is a big deal for folks who live in cities and rarely leave their offices). Emily’s theory that pending rainfall could be determined by whether a cow was standing or laying down on the ground proved to be inconclusive.

Once we got to Snowshoe, we experienced firsthand the difficulties a rural state like West Virginia have with access to broadband. We were assured prior to the trip that Wi-Fi was free, but upon arrival learned that that meant free at the Starbucks (which closes at 4pm). AT&T and T-Mobile were the only cellular networks supported. Because of the Robert C. Byrd Green Bank Telescope and potential interference with its operation, a large swath of land surrounding the area requires that all radio transmissions be severely limited. Check out the West Virginia Broadband map to see for yourself. Library-wise, over 65 percent of West Virginia libraries still require increased broadband based on the Digital Inclusion Survey.

For those of us suffering digital overload, this might not seem too bad. Cheap gas, low cost grilled cheese sandwiches, and beautiful mountains sound great, so who needs broadband? Everyone. In today’s connected world, how can people succeed without broadband?

The post Free Wi-fi in the Allegheny Mountains appeared first on District Dispatch.

HangingTogether: The Elusive User

Tue, 2014-10-14 15:04


[city man watching fog dust | pixabay]

Each year, OCLC Research staff gather together to review current activities and to plan for the upcoming year. During this year’s meeting, which happened in September, we reviewed our activity areas. I lead the User Behavior Studies and Synthesis activity area; our group engaged in a discussion about describing and possibly renaming the activity area. We discussed “user behavior studies” and whether this terminology is overused and whether it reflects the whole picture of studying and identifying how individuals engage with technology; how they seek, access, and use information; and how and why they demonstrate these behaviors and do what they do.

I wonder if we, as librarians and information professionals, spend too much time contemplating and discussing users of our services and resources and if this energy would be well spent on identifying those individuals who choose not to use library services and resources. I wonder why we are fixated on users of library services and resources and why we do not expend energy on learning about those who go elsewhere for their technology and information needs and try to position library services and resources in their workflows and personal and professional landscapes. Marie L. Radford and I define these individuals who do not use library services and resources as potential users.

If we do buy into this need to identify potential users and their behaviors, what do we call this group? Are these individuals users also, just not users of library services and resources? The term potential user seems cumbersome and not very enticing when trying to promote interest and activity in this area. Even more difficult is identifying a term that describes both users and potential users of library services and resources. Could that term be Elusive Users? According to Choose Your Words, “Anything elusive is hard to get a hold of. It eludes you.” Does this term, elusive, accurately describe the individuals who we observe, interview, and track in various contexts of using technology and acquiring information? I invite you to share your ideas in the comments!

About Lynn Connaway

Senior Research Scientist at OCLC Research. I study how people get & use information & engage with technology.

Mail | Web | Twitter | More Posts (2)

LibraryThing (Thingology): Job: Library Developer at LibraryThing (Telecommute)

Tue, 2014-10-14 14:22
UPDATE: We are offering $1,000 of books to the person who finds us a library developer. Code! Code! Code!

LibraryThing, the company behind and LibraryThing for Libraries, is looking to hire a top-notch developer/programmer.

We like to think we make “products that don’t suck,” as opposed to much of what’s developed for libraries. We’ve got new ideas and not enough developers to make them. That’s where you come in.

The Best Person
  • Work for us in Maine, or telecommute in your pajamas. We want the best person available.
  • If you’re junior, this is a “junior” position. If you’re senior, a “senior” one. Salary is based on your skills and experience.
Technical Skills
  • LibraryThing is mostly non-OO PHP. You need to be a solid PHP programmer or show us you can become one quickly.
  • You should be experienced in HTML, JavaScript, CSS and SQL.
  • We welcome experience with design and UX, Python, Solr, and mobile development.
The highly-photogenic LibraryThing staff only use stock photos ironically. What We Value
  • Execution is paramount. You must be a sure-footed and rapid coder, capable of taking on jobs and finishing them with diligence and expedition.
  • Creativity, diligence, optimism, and outspokenness are important.
  • Experience with library data and systems is favored.
  • LibraryThing is an informal, high-pressure and high-energy environment. This puts a premium on speed and reliability, communication and responsibility.
  • Working remotely gives you freedom, but also requires discipline and internal motivation.
  • Gold-plated health insurance.
  • Cheese.
How To Apply
  • We have a simple quiz, developed back in 2011. If you can do it in under five minutes, you should apply for the job! If not, well, wasn’t that fun anyway?
  • To apply, send a resume. Skip the cover letter, and go through the blog post in your email, responding to the tangibles and intangibles bullet-by-bullet.
  • Also include your solution to the quiz, and how long it took you. Anything under five minutes is fine. If it takes you longer than five minutes, we won’t know. But the interview will involve lots of live coding.
  • Feel free to send questions to, or Skype chat Tim at LibraryThingTim.
  • Please put “Library developer” somewhere in your email subject line.

LibraryThing (Thingology): Send us a programmer, win $1,000 in books.

Tue, 2014-10-14 14:04

We just posted a new job post Job: Library Developer at LibraryThing (Telecommute).

To sweeten the deal, we are offering $1,000 worth of books to the person who finds them. That’s a lot of books.

Rules! You get a $1,000 gift certificate to the local, chain or online bookseller of your choice.

To qualify, you need to connect us to someone. Either you introduce them to us—and they follow up by applying themselves—or they mention your name in their email (“So-and-so told me about this”). You can recommend yourself, but if you found out about it from someone else, we hope you’ll do the right thing and make them the beneficiary.

Small print: Our decision is final, incontestable, irreversible and completely dictatorial. It only applies when an employee is hired full-time, not part-time, contract or for a trial period. If we don’t hire someone for the job, we don’t pay. The contact must happen in the next month. If we’ve already been in touch with the candidate, it doesn’t count. Void where prohibited. You pay taxes, and the insidious hidden tax of shelving. Employees and their families are eligible to win, provided they aren’t work contacts. Tim is not.

» Job: Library Developer at LibraryThing (Telecommute)

Library of Congress: The Signal: Close Reading, Distant Reading: Should Archival Appraisal Adjust?

Tue, 2014-10-14 13:34

From time to time, co-chairs of the National Digital Stewardship Alliance Arts and Humanities Content Working Group will bring you guest posts addressing the future of research and development for digital cultural heritage as a follow-up to a dynamic forum held at the 2014 Digital Preservation Conference.  Anyone interested in contributing a posting for The Signal on this topic should contact either jsternfeld at or gail at

The following is a guest post from Meg Phillips, External Affairs Liaison, National Archives and Records Administration. Opinions expressed are those of the author and do not necessarily represent positions of the National Archives and Records Administration.

Meg Phillips, External Affairs Liaison at the National Archives and Records Administration and member of the NDSA Coordinating Committee.

Digital humanists and digital historians are employing research methods that most of us did not anticipate when we were learning to be archivists.  Do new types of research mean archivists should re-examine the way we learned to do appraisal?

The new types of researchers are experimenting with methods beyond the scholarly tradition of “close reading.”  When paper archives were the only game in town, close reading was all a researcher could do – it’s what we generally mean by “reading.”  Researchers studied individual records, extracting meaning and context from the information contained in each document.  Now, however, digital humanists are using born-digital or digitized collections to explore the benefits of computational analysis techniques, or “distant reading.” They are using computer programs to analyze patterns and find meaning in entire corpora of records without a human ever reading any individual record at all.

I have been interested in digital scholarship and its implications for archives for a while, but I hadn’t heard the phrase “distant reading” until seeing Franco Moretti’s book “Distant Reading” reviewed earlier this year. (See  “What is Distant Reading?” in the New York Times and “In Praise of Overstating the Case: A review of Franco Moretti, Distant Reading” in Digital Humanities Quarterly for a taste of the debate over the book.)  The phrase stuck with me as provocative shorthand for a new way of using records, and I started thinking about what distant reading might mean for archival appraisal.

Our traditions of archival appraisal are based on locating records that reward close reading.  A series appraised as permanent contains individual records that contain historically valuable information.  Both appraisal itself and the culling that happens during transfer or processing focus on removing records that do not contain permanently valuable information.

Now, however, it is possible to ask and answer entirely new kinds of questions with born-digital or digitized records. What did the network of influence in an organization look like?  How did communication flow? Was the chief executive interacting with a particular vendor unusually often? When did a new concept or term first appear and how quickly did use of the new term spread?  How did a disease spread through a community?  Not only is it possible, but early adopters are now teaching these research methods to a new generation of students.  For example, Professor Matthew Connelly is teaching a seminar at the London School of Economics called Hacking the Archives.  The course challenges students of international history to explore the new kinds of questions computational research allows.  These are questions whose answers emerge not from deep reading of individual records but from analysis of patterns in  large bodies of records.

The National Archives from user silbersam on Flickr.

The interesting thing about these questions is that the answers may rely on the presence of records that would clearly be temporary if judged on their individual merits. Consider email messages like “Really sick today – not coming in” or a message from the executive of a  regulated company saying “Want to meet for lunch?” to a government policymaker. In the aggregate, the patterns of these messages  may paint a picture of disease spread or the inner workings of access and influence in government.  Those are exactly the kinds of messages traditional archival practice would try to cull. In these cases, appraising an entire corpus of records as permanent would support distant reading much better.  The informational value of the whole corpus cannot be captured by selecting just the records with individual value.

If we adjusted practice to support more distant reading, archivists would still do appraisal, deciding what is worth permanent preservation.  We would just be doing it at a different level of granularity – appraising the research value of an entire email system, SharePoint site or social media account, for example.

Incidentally, on a practical level this level of appraisal might also lead to disposition instructions that are easier for creating offices to carry out.

Figuring out how to do appraisal to support both distant reading and close reading would be an excellent project for the archival and digital preservation fields.  What questions would we want to answer?  We could start with some questions like these:

  • How many researchers are actually engaged in distant reading?  What fields do they work in?  Are their numbers increasing?
  • Do they want to apply computational techniques to archival materials, for example Federal records in the National Archives, or in any other environment?  Perhaps they are getting their source material somewhere else, bypassing archives.
  • To what extent do their research methods rely on having a complete set of the records created rather than a subset of the most permanently valuable records?
  • Do current definitions of a record and current recordkeeping regulations support a change to appraisal of entire corpora of records?
  • How would we know which corpora of records were most useful to researchers?
    • Is the benefit of distant reading worth the cost and risk of retaining more material that could have personal privacy or other protected content?
  • Is there a meaningful difference between trying to support computational research and actually just keeping everything?  (Perhaps this whole discussion is just the modern version of the old tension between historians who want to save everything and archivists who are trying to put their resources toward the most important materials.)

Staff at the National Archives and other institutions are starting to create opportunities for archivists to discuss questions like these.  Josh Sternfeld of NEH, Jordan Steele of Johns Hopkins and Paul Wester and I from NARA will be holding a panel discussion of these issues at the Fall 2014 Mid Atlantic Regional Archives Conference meeting in Baltimore, for example.   Paul and I will be also be speaking with Matthew Connelly and others on an American Historical Association panel at the 2015 annual meeting in New York City, “Are We Losing History? Capturing Archival Records for a New ERA of Research.”

However, we need to create even more opportunities for archivists to explore these issues with digital humanists. A forum that pulled together digital researchers, archivists, librarians and technologists could be a great opportunity for us all to learn from each other. Such an event could also spread the word about the exciting new things that can be done with digital primary sources and the rich collections of digital resources that are now available in archives and libraries.

Of course, we can also blog about the issues and hope that the community leaps into the fray!

In that spirit, do you think archival appraisal needs to change, and if so, how?

PeerLibrary: Towards Open Access to Research and Knowledge for Development

Tue, 2014-10-14 04:00

Beyond establishing an online database of publicly accessible academic articles, PeerLibrary has committed itself to providing an open space where people are encouraged to collaborate and communicate with one another. This week we want to highlight a fascinating article in our database, titled “Towards Open and Equitable Access to Research and Knowledge for Development”, by Professor Leslie Chan, et. al. at the University of Toronto. Professor Chan’s team focuses on the importance of research and the necessity for anyone, not just academic scholars, to be able to engage and conduct research as well as share and critique one another’s ideas. He claims that only open collaboration will effectively promote human development and allow the merging of different identities. With the words of Professor Chan in mind, PeerLibrary hopes to aid the development of free collaboration under the guiding principle of the universal right to education.

Dan Scott: DCMI 2014: holdings in open source library systems

Tue, 2014-10-14 01:07

My slides from DCMI 2014: in the wild: open source libraries++.

Last week I was at the Dublin Core Metadata Initiative 2014 conference, where Richard Wallis, Charles MacCathie Nevile and I were slated to present on and the work of the W3C Bibliographic Extension Community Group (#schemabibex). As a first-timer at DCMI, I wasn't sure what kind of an audience to expect: there is a peer-reviewed papers track, and a series of sessions on a truly intimidating topic (RDF Application Profiles), but on the other hand our own topic was fairly basic. As it turned out, there was an invigoratingly mixed set of backgrounds present, and Eric Miller's opening keynote, which gave an oral history of the origins of DCMI and a look towards the future challenges for the organization, reassured me that I wasn't going to be out of my depth.

Special kudos to Eric for his analogy of the Web to a credit card, which offers both human-readable and machine-readable data. A nice, clean image!

Richard, Charles and I opted to structure our 1.5 hour session as a series of short talks followed by a long period of discussion. However, as often happens, the excitement of speaking in front of a room that drew so many attendees that we had to jam with more chairs led to that plan breaking down. I cut my own materials back to illustrating how one of my primary contributions to the #schemabibex effort--representing library holdings using's GoodRelations-based Product/Offer model--had been implemented in free software library systems, including Evergreen, Koha, and VuFind. I walked from a basic bibliographic record (represented as a Product), through to the associated borrowable items (represented as Offers with a price of $0.00, call numbers as SKUs, and barcodes as serialNumbers), that were offered by a specific Library with its own set of operating hours, address, and contact information... all published out of the box as RDFa in modern Evergreen systems.

I did stray a little to posit that the use case for is not and should not be limited to "search engine optimization", but that this very simple level of structured data could fairly easily form the basis of an API. In the rather limited discussion that we were able to hold at the end of the session (and encroaching on break time), Charles counselled that libraries shouldn't really bother with dumbing down their beautiful metadata simply to publish while I countered that the pursuit of publishing beautiful metadata in the past has generally led librarians to publish no metadata at all, and that was a great first step towards building a web of cultural heritage metadata meant for machine consumption.

I wish I could have stayed longer at DCMI, but it was Thanksgiving in Canada and there were families to visit and feast with--not to mention children to help take car of--so I had to depart after just a day and a half. I'm encouraged by the steps the organization is taking to renew itself, and I hope to be able to participate again in the future.

DPLA: DPLA and DigitalNZ present GIF IT UP, an international GIF-making competition: October 13 – December 1, 2014

Mon, 2014-10-13 14:28

It’s a public domain celebration! The Digital Public Library of America and DigitalNZ are very excited to announce the launch of GIF IT UP, an international competition over the next six weeks to find the best GIFs reusing public domain and openly licensed digital video, images, text, and other material available via our search portals. The winners will have their work featured and celebrated online at the Public Domain Review and Pretty sweet, huh?


Cat Galloping (1887). The still images used in this GIF come from Eadweard Muybridge’s “Animal locomotion: an electro-photographic investigation of consecutive phases of animal movements” (1872-1885). Courtesy USC Digital Library, 2010. View original record (item is in the public domain). GIF available under a CC-BY license.

How it works. The GIF IT UP competition has six categories:

  1. Animals
  2. Planes, trains, and other transport
  3. Nature and the environment
  4. Your hometown, state, or province
  5. WWI, 1914-1918
  6. GIF using a stereoscopic image

A winner will be selected in each of these categories and, if necessary, a winner will be awarded in two fields: use of an animated still public domain image, and use of video material.

To view the competition’s official homepage, visit

Judging. GIF IT UP will be co-judged by Adam Green, Editor of the Public Domain Review and by Brian Wolly, Digital Editor of Entries will be judged on coherence with category theme, thoroughness of entry (correct link to source material and contextual information), creativity, and originality.

Gallery. All entries that meet the criteria outlined below in the Guidelines and Rules will be posted to the GIF IT UP Tumblr Gallery. The gallery entries with the most amount of Tumblr “notes” will receive the people’s choice award and will appear online at the Public Domain Review and alongside the category winners.

Submit. To participate, please first take a moment to read “How it Works” and the guidelines and rules on the GIF IT UP homepage, and then submit your entry by clicking here.

Deadline. The competition deadline is December 1, 2014 at 5:00 PM EST / December 2, 2014 at 10:00 AM GMT+13.

GIFtastic Resources. You can find more information about GIF IT UP–including select DPLA and DigitalNZ collections available for re-use and a list of handy GIF-making tips and tools–over on the GIF IT UP homepage.

Questions. For questions or other inquiries, email us at or, or tweet us @digitalnz or @dpla. Good luck and happy GIFing!

 All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

LITA: ADE in the Library eBook Data Lifecycle

Mon, 2014-10-13 13:53

Reader: “Hey, I heard there is some sort of problem with those ebooks I checked out from the library?”

Librarian: “There are technical problems, potential legal problems, and philosophical problems – but not with the book itself nor your choice to read it.”

As mentioned, there are (at least) three sides to the problem. Nate Hoffelder* discovered the technical problem with the way the current version (4) of Adobe Digital Editions (ADE) manages the ebook experience, which was confirmed by security researcher Benjamin Daniel Mussler, and later reviewed by Eric Hellman. The technical problem, that arguably private data is sent in plain text from a reader’s device to a central data-store, seems pretty obvious once it was discovered. The potential legal problem stems from laws in every state which protect reader privacy which set expectations for data security, plus other laws which may apply. The philosophical problem has several facets, which could be simplified down to the tension between privacy and convenience.

When a widely-used software platform is found to be logging data unexpectedly and transmitting it for some unknown use it causes great unease among users. When that transmission is happening in plain text over easily-intercepted channels, it causes anger among technologists who think a leading software developer should know better. When this is all happening in the context of the library world where privacy is highly valued, there is outrage as expressed by LITA Board member Andromeda Yelton.

Here are the library profession’s basic positions:

  1. Each individual’s reading choices and behavior should be private (i.e. anonymized or, better, not tracked)
  2. Data gathered for user-desired functionality across devices should be private (i.e. anonymized)
  3. Insofar as there is any tracking of reading choices and behavior, there should be an opt-out option readily available to individuals (i.e, not buried in the fine print)

In his October 9th post from The Digital Shift, Matt Enis reports that Adobe is working to correct the problem of data being transmitted in clear text but “maintains that its collection of this data is covered under its user agreement.” The data that corporations transmit should be limited to the data and data elements necessary to provide desired functionality yet also restricted enough for an individual’s activity to remain private.

To join the conversation, begin to educate yourself using our ADE Primer, below, plus the following resources:

A Primer on how Adobe Digital Editions (ADE) works with library ebooks

I’m a reader and I go to use a library ebook
(via Overdrive or other downloading service offered):

  1. what will I need to install on my device(s)?
    (laptop, tablet, phone, & iPod let’s assume)

    • laptop/computer: Adobe Digital Editions (ADE), activated with an Adobe ID
    • tablet, phone, iPod, etc.: Bluefire Reader (or compatible) app, activated with an Adobe ID
  2. how do the various devices know which page to show me next when I switch between them?
    • access and synchronization across devices are managed using the Adobe ID and the information associated with the ebook and by data tracked with ADE
  3. what technologies are behind the scenes?
    • the ADE managed digital rights management (DRM) required by the ebook publisher
    • the ebook reader software/app
    • the internet
  4. what data is needed to be able to do the sync?
    • the minimum required data is arguably the UserID, BookID, and a page-accessed timestamp
    • the current ADE version, ADE4, tracks significantly more data than the minimums above
  5. how is that data shared between devices?
    • Users can access their ADE account from up to 6 different devices. When accessing the ID/account from a new device the user must “activate” the device by logging into the Adobe ID/Account to prove that the user is the legitimate account holder.
    • ADE4 shares all ebook data it tracks in plain-text in an unsecured channel over the internet
  6. what functionality would not work if this were suddenly not provided?
    • if ADE did not provide reader tracking data, each time a reader opened an ebook on a different device the reader would have to remember the page s/he was on and then navigate to that page to continue reading from where they left off
    • A computer can be anonymously activated using ADE, however this will prevent the items from being accessible from more than one computer/device. The ebooks would then be considered to be “owned” by that computer and would not be available to be accessed from other devices.
    • if ADE were completely withdrawn from availability, ebook DRM would prevent use of ADE-managed DRM-protected ebooks

From a technology point of view, the clear-text data transmitted suggests the data may be for synchronization, but it seems, first and foremost, to support various licensing business models. Because Adobe might in the future have customers who want to use Adobe DRM to expire a book after a certain number of hours or pages read, they may feel the need to collect that data. Adobe’s data collection seems to be working as intended here. Clear-text transmission is clearly a bug, but that this data about patron reading habits is being transmitted to Adobe is a feature of the software.

The philosophical discussion which needs to happen around ebooks and DRM should include:

  • what data elements enable user-desired functionality
  • what data elements enable digital rights management
  • what data elements above are/are not within ALA’s stated professional ethics
  • whether tracking ebook user behavior is acceptable *at all*

From libraryland conversations around the issue so far, opinions have ranged from ‘tracking is not the problem, the clear-text transmission is‘ to ‘tracking is very much a problem, it’s unacceptable.’

Issues like this highlight the need to revisit stated positions and evaluate where the balance point is between accomodating user functionality and protecting against collection of personally identifiable data, or metadata.

*Post updated to correctly credit Nate Hoffelder as the original discoverer (my apologies!)

John Miedema: How Watson Works in Four Steps

Mon, 2014-10-13 13:34

A good overview of how IBM’s Watson works. When humans seek to understand something and to make a decision we go through four steps.

  1. Observe visible phenomena and bodies of evidence;
  2. Draw on what we know to interpret evidence and to generate hypotheses;
  3. Evaluate which hypotheses are right or wrong; and
  4. Decide the best option and act accordingly.

So does Watson. Key to the success is the ability to process unstructured inputs using Natural Language Processing.

District Dispatch: Free webinar: Helping patrons understand Ebola

Mon, 2014-10-13 06:48

Photo by Phil Moyer

Reminder: On Tuesday, October 14, 2014, library leaders from the U.S. National Library of Medicine will host the free webinar “Fighting Ebola and Infectious Diseases with Information: Resources and Search Skills can Arm Librarians.” The webinar will teach participants how to find and share reliable health information.

Recent outbreaks across the globe and in the U.S. have increased public awareness of the potential public health impacts of infectious diseases. As a result, many librarians are assisting their patrons in finding credible information sources on topics such as Ebola, Chikungunya and pandemic influenza.

Speakers include:

Siobhan Champ-Blackwell
Siobhan Champ-Blackwell is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center. She selects material to be added to the NLM disaster medicine grey literature data base and is responsible for the Center’s social media efforts. She has over 10 years of experience in providing training on NLM products and resources.

Elizabeth Norton
Elizabeth Norton is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center where she has been working to improve online access to disaster health information for the disaster medicine and public health workforce. She has presented on this topic at national and international association meetings and has provided training on disaster health information resources to first responders, educators, and librarians working with the disaster response and public health preparedness communities.

Date: Tuesday, October 14, 2014
Time: 2:00 PM – 3:00 PM Eastern
Register for the free event

If you cannot attend this live session, a recorded archive will be available to view at your convenience. To view past webinars also done in collaboration with iPAC, please visit

The post Free webinar: Helping patrons understand Ebola appeared first on District Dispatch.

Patrick Hochstenbach: Homework assignment #3 Sketchbookskool

Sun, 2014-10-12 07:07
This week we were asked to go to a park and draw people using line art. It was raining, so I decided to go to the Church of Our Lady which is always a tourists attraction.   Filed under: Doodles