You are here

Feed aggregator

Eric Hellman: Suggested improvements for a medical journal privacy policy

planet code4lib - Wed, 2015-04-01 17:03

After I gave the New England Journal of Medicine a failing grade for user privacy noting that their website used more trackers than any other scholarly journal website I looked at, the Massachusetts Medical Society asked me to review the privacy policy for and make changes that would improve its transparency. On the whole their website privacy policy is more informative and less misleading than most privacy policies I've looked at. Still, there's always room for improvement. They've kindly allowed me to show you the changes I recommended:

Last updated: April 1, 2015
Governing Principles is owned and operated by the Massachusetts Medical Society (“MMS”). We take privacy issues seriously and are committed to protecting your personal information. We want to say that up front because it sounds nice and is legally meaningless. Please take a moment to review our privacy policy, which explains how we collect, use, and safeguard information you enter at and any of our digital applications (such as our iPhone and iPad applications). This privacy policy applies only to information collected by MMS through and our digital applications. This privacy policy does not govern personal information furnished to MMS through any other means.

WHAT INFORMATION DO WE COLLECT?Information You Provide to UsWe will request information from you if you establish a personal profile to gain access to certain content or services, if you ask to be notified by e-mail about online content, or if you participate in surveys we conduct. This requires the input of personal information and preferences that may include, but is not limited to, details such as your name, address (postal and e-mail), telephone number, or demographic information. You can't use secure communications to give us this information, so you should consider anything you tell us to be public information. If you request paid content from, including subscriptions, we will also ask for payment information such as credit card type and number. Our payment providers won't actually let us see your credit card number, because there are federal regulations and such.
Information That Is Automatically CollectedLog FilesWe use log files to collect general data about the movement of visitors through our site and digital applications. This may include some or includes all of the following information: the Internet Protocol Address (IP Address) of your computer or other digital device, host name, domain name, browser version and platform, date and time of requests, and the files downloaded or viewed. We use this information to track what you read and to measure and analyze traffic and usage of and our digital applications. We build our site in such a way that this information is leaked to our advertisers, our widget providers, our analytics partners, the advertising partners of our widget providers, all the ISPs that connect us, and government entities such as the NSA, the Great Firewall of China, and the "Five Eyes" group.CookiesWe use cookies to collect information and help personalize your user experience us make more money. We store minimal personally identifying information ten tracking identifiers in cookies and protect allow our partners to access this information. We do not store complete records or credit card numbers in cookies. We don't put chocolate chips in cookies either. Even if they're the other kind of cookies. Because we read about the health effects of fatty foods, in NEJM of course. You can find out more about how we use cookies at our Cookie Information page which is a separate page because it's more confusing that way. Most web browsers automatically accept cookies. Browsers can be configured to prevent this, but if you do not accept any cookies from, you will not be able to use the site. The site will function if you block third party cookies.In some cases we also work with receive services or get paid by third party vendors (such as Google, Google's DoubleClick Ad Network, Checkm8, Scorecard Reasearch, Unica, AddThis, Crazy Egg, Flashtalking, Monetate, DoubleVerify, and SLI Systems) who help deliver advertisements on our behalf across the Internet, and vendors like Coremetrics, Chartbeat and Mii Solutions, who provide flashy dashboards for our managers. These vendors may use cookies to collect information about your activity at our site (i.e., the pages you have visited) in order to help deliver particular ads that they believe you would find most relevant. You can opt out of those vendors' use of cookies to tailor advertising to you by visiting Except for Checkm8, Scorecard Reasearch, Unica, Crazy Egg, Monetate, Coremetrics, Chartbeat, Mii Solutions and SLI Systems. And even if you opt out of advertising customization, these companies still get all the information. We have no idea how long they retain the information or what they do with the information other than ad targetting and data dashboarding.Clear Gifs (Web Beacons/Web Bugs)
We may also use clear gifs which are tiny graphics with unique identifiers that function similarly to cookies to help us to track site activity. We do not use these to collect personally identifying information, because that's impossible. We also do not use clear gifs to shovel snow, even though we've had a whole mess of it. Oh and by the way, some of our partners have used "flash cookies", which you can't delete. And maybe even "canvas fingerprints". But they pay us money or give us services, so we don't want to interfere.

Information that you provide to us will be used to process, fulfill, and deliver your requests for content and services. We may send you information about our products and services, unless you have indicated you do not wish to receive further information.

Information that is automatically collected is used to monitor usage patterns at and at our digital applications in order to help us improve our service offerings. We do not sell or rent your e-mail address to any third party. You may unsubscribe from our e-mail services at any time. Life is short. You may have a heart attack at any time, or get run over by a truck. For additional information on how to unsubscribe from our e-mail services, please refer to the How to Make Changes to Your Information section of this Privacy Policy.

We may report aggregate information about usage to third parties, including our service vendors and advertisers. These advertisers may include your competitors, so be careful. For additional information, please also see our Internet Advertising Policy. We may also disclose personal and demographic information about your use of and our digital applications to the countless companies and individuals we engage to perform functions on our behalf. Examples may include hosting our Web servers, analyzing data, and providing marketing assistance. These companies and individuals are obligated to maintain your personal information as confidential and may have access to your personal information only as necessary to perform their requested function on our behalf, which is usually to earn us more money, except as detailed in their respective privacy policies. So of course, these companies may sell the data collected in the course of your interaction with us.
Advertisers We contract with third-party advertisers and their agents to post banner and other advertisement at our site and digital applications. These advertisements may link to Web sites not under our control. These third-party advertisers may use cookie technology or similar means i.e. Flash to measure the effectiveness of their ads or may otherwise collect personally identifying information from you when you leave our site or digital applications. We are not responsible or liable for any content, advertising, products or other materials offered from such advertisers and their agents. Transactions that occur between you and the third-party advertisers are strictly between you and the third party and are not our responsibility. You should review the privacy policy of any third-party advertiser and its agent, as their policies may differ from ours.
Advertisement ServersIn addition to advertising networks run by Google, which know everything about you already, We use a third-party ad server, CheckM8, to serve advertising at Using an advertising network diminishes our ability to control what advertising is shown on the NEJM website. Instead, auctions are held between advertisers that want to show you ads. Complicated algorithms decide which ads you are most likely to click on and generate the most revenue for us. We're thinking of outsourcing our peer-review process for our article content to similar sorts of software agents, as it will save us a whole lot of money. Anyway, if you see ads for miracle drugs on our site, it's because we really need these advertising dollars to continue our charitable work of publicizing top quality medical research, not because these drugs have been validated by top quality medical research. CheckM8 does not collect any personally identifiable information regarding consumers who view or interact with CheckM8 advertisements. CheckM8 solely collects non-personally identifiable ad delivery and reporting data. For further information, see CheckM8’s privacy policy. Please note that the opt-out website we mentioned above doesn't cover CheckM8, And there's not a good way to opt out of CheckM8, so there. The Massachusetts Medical Society takes in about $25 million per year in advertising revenue, so we really don't want you to opt out of our targeted advertising.

When you submit personal information via or our digital applications, your information is protected both online and offline with what we believe to be appropriate physical, electronic, and managerial procedures to safeguard and secure the information we collect. For information submitted via, we use the latest Secure Socket Layer (SSL) technology to encrypt your credit card and personal information. But other information is totally up for grabs.

Any data or personal information that you submit to us as user-generated content becomes public and may be used by MMS in connection with, our digital applications, and other MMS publications in any and all media. For more information, see our User-Generated Content Guidelines. We'll have the right to publish your name and location worldwide forever if you do so, and we can sue you if you try to use a pseudonym.

Do Not Track SignalsLike most web services, at this time we do not alter our behavior or change our services in response to do not track signals. In other words, our website tracks you, even if you use technical means to tell us you do not want us to track you.
Compliance with Legal ProcessWe may disclose personally identifying information if we are required to do so by law or we in good faith believe that such action is necessary to (1) comply with the law or legal process; (2) protect our rights and property; (3) protect against misuse or the unauthorized use of our Web site; or (4) protect the personal safety or property of our users or the public. So, for example, if you are involved in a divorce proceeding, we can help your spouse verify that you weren't staying late at your office reading up on the latest research like you said you were.

Children is not intended for children under 13 years of age. We do not knowingly collect or store any personal information from children under 13. If we did not have this disclaimer, our lawyer would not let us do things we want to do. If you are under 13, we're really impressed, you should spend more time outside getting fresh air.

Changes to This PolicyThis privacy policy may be periodically updated. We will post a notice that this policy has been amended by revising the “Last updated” date at the top of this page. Use of constitutes consent to any policy then in effect. So basically, what we say here is totally meaningless with respect to your ability to rely on it. Oh well.

Mark E. Phillips: Metadata Edit Events: Part 5 – Identifying an average metadata editing time.

planet code4lib - Wed, 2015-04-01 16:39

This is the fifth post in a series of posts related to metadata edit events for the UNT Libraries’ Digital Collections from January 1, 2014 to December 31, 2014.  If you are interested in the previous posts in this series,  they talked about the when, what, who, and first steps of duration.

In this post we are going to try and come up with the “average” amount of time spent on metadata edits in the dataset.

The first thing I wanted to do was to figure out which of the values mentioned in the previous post about duration buckets I could ignore as noise in the dataset.

As a reminder the duration data for metadata edit events is started when a user opens a metadata record in the edit system, and finished when they submit the record back to the system as a publish event.  The duration is the difference in seconds of those two time timestamps.

There are a number of factors that can cause the duration data to vary wildly,  a user can have a number of tabs open at the same time while only working on one of them.  They may open a record and then walk off without editing that record.  They could also be using a browser automation tool like Selenium that automates the metadata edits and therefore pushes the edit time down considerably.

In doing some tests of my own editing skills it isn’t unreasonable to have edits that are four or five seconds in duration if you are going in to change a known value from a simple dropdown. For example adding a language code to a photograph that you know should be “no-language” doesn’t take much time at all.

My gut feeling based on the data in the previous post was to say that edits that have a duration of over one hour should be considered outliers.  This would remove 844 events from the total 94,222 edit events leaving me 93,378 (99%) of the events.  This seemed like a logical first step but I was curious if there were other ways of approaching this.

I had a chat with the UNT Libraries’ Director of Research & Assessment Jesse Hamner and he suggested a few methods for me to look at.

IQR for calculating outliers

I took a stab at using the Interquartile Range of the dataset as the basis for identifying the outliers.  With a little bit of R I was able to find the following information about the duration dataset.

Min. : 2.0 1st Qu.: 29.0 Median : 97.0 Mean : 363.8 3rd Qu.: 300.0 Max. :431644.0

With that I have Q1 of 29 and a Q3 of 300,  this gives me an IQR of 271.

So the range for outliers is Q1–1.5 × IQR  for the low end and Q3+1.5 × IQR on the high end.

With the numbers that says that values under -377.5 or over 706.5 should be considered outliers.

Note: I’m pretty sure there are some different ways of dealing IQR and datasets that end at Zero so that’s something to investigate.

For me the key here is that I’ve come up with 706.5 seconds being the ceiling for a valid event duration based on this method.  Thats 11 minutes and 47 seconds.  If I limit the dataset to edit events that are under 707 seconds  I am left with 83,239 records.  That is now just 88% of the dataset with 12% being considered an outlier.   I thought this seemed to be too many records to ignore so after talking with my resident expert in the library I had a new method.

Two Standard Deviations

I took a look at what the timings would look look like if i based my outliers on the standard deviations.  Edit events that are under 1,300 seconds (21 min 40 sec) in duration amount to 89,547 which is 95% of the values in the dataset.  I also wanted to see what 2.5% of the dataset would look like.  Edit durations under 2,100 seconds (35 minutes) result in 91,916 usable edit events for calculations which is right at 97.6%.

Comparing the methods

The following table takes the four duration ceilings that I tried. (IQR, 95 and 97.5, and gut feeling one hour) and makes them a bit more readable. The total number of duration events in the dataset before limiting is 94,222.

Duration Ceiling Events Remaining Events Removed % remaining 707 83,239 10,983 88% 1,300 89,547 4,675 95% 2,100 91,916 2,306 97.6% 3,600 93,378 844 99%

Just for kicks I calculated the average time spent on editing records across the datasets that remained for the various cutoffs to get an idea how the ceilings changed things.

Duration Ceiling Events Included Events Ignored Mean Stddev Sum Average Edit Duration Total Edit Hours 707 83,239 10,983 140.03 160.31 11,656,340 2:20 3,238 1,300 89,547 4,675 196.47 260.44 17,593,387 3:16 4,887 2,100 91,916 2,306 233.54 345.48 21,466,240 3:54 5,963 3,600 93,378 844 272.44 464.25 25,440,348 4:32 7,067 431,644 94,222 0 363.76 2311.13 34,274,434 6.04 9,521

In the table above you can see how the different duration ceilings do to the data analyzed.  I calculated the mean of the various datasets,  and their standard deviations (really Solr statsComponent did that).  I converted those Means into minutes and seconds in the “Average Edit Duration” column and the final column is the number of person hours that were spent editing metadata in 2014 based on the various datasets.

In going forward I will be using 2,100 seconds as my duration ceiling and ignoring the edit events that took longer than that period of time.  I’m going to do a little work in figuring out the costs associated with metadata creation in our collections for the last year.  So check back for the next post in this series.

As always feel free to contact me via Twitter if you have questions or comments.

CrossRef: CrossRef staff at upcoming conferences

planet code4lib - Wed, 2015-04-01 15:22

CrossRef International Workshop, April 29, Shanghai, China - Ed Pentz and Pippa Smart presenting.

CSE 2015 Annual Meeting, May 15-18, Philadelphia, PA - Rachael Lammey and Chuck Koscher presenting.

MLA '15 "Librarians Without Limits", May 15-20, Austin, TX. Exhibiting at booth number 234.

2015 SSP 37th Annual Meeting, May 27-29, Arlington, VA. Exhibiting at table 6.

CrossRef International Workshop, June 11, Vilnius, Lithuania - Ed Pentz and Pippa Smart presenting.

PKP Scholarly Publishing Conference 2015, August 11-14, Vancouver, BC - Karl Ward attending.

ISMTE 8th Annual North American Conference, August 20-21, Baltimore, MD - Rachael Lammey presenting.

ALPSP Conference, 9-11 September, London, UK - CrossRef staff attending.

John Miedema: Evernote Random. A daily email link to a random note. Keep your content alive.

planet code4lib - Wed, 2015-04-01 15:20

I write in bits and pieces. I expect most writers do. I think of things at the oddest moments. I surf the web and find a document that fits into a writing project. I have an email dialog and know it belongs with my essay. It is almost never a good time to write so I file everything. Evernote is an excellent tool for aggregating all of the bits in notebooks. I have every intention of getting back to them. Unfortunately, once the content is filed, it usually stays buried and forgotten.

I need a way to keep my content alive. The solution is a daily email, a link to a random Evernote note. I can read the note to keep it fresh in memory. I can edit the note, even just one change to keep it growing.

I looked around for a service but could not find one. I did find an IFTTT recipe for emailing a daily link to a random Wikipedia page. IFTTT sends the daily link to a Wikipedia page that automatically generates a random entry. In the end, I had to build an Evernote page to do a similar thing.

You can set up Evernote Random too, but you need a few things.

  • An Evernote account, obviously.
  • A web host that supports PHP.
  • A bit of technical skill. I have already written the Evernote script that generates the random link. But you have to walk through some technical Evernote setup steps, like generating keys and testing your script in their sandbox.
  • The Evernote Random script. It has all the instructions.
  • An IFTTT recipe. That’s the easy part.

Take the script. Improve it. Share it. Sell it. Whatever you like. I would enjoy hearing about it.

NYPL Labs: The Internet Loves Digital Collections (March 2015)

planet code4lib - Wed, 2015-04-01 14:49
West 52nd Street between 5th Avenue and 6th Avenue. Odd Numbers.

What was the most viewed image on NYPL's Digital Collections platform in March 2015?

It was a door.

Specifically, a door on the north side of 52nd Street between 5th Avenue and 6th Avenue. (Pictured at right; you can see what it looks like today at the bottom of this post.)

Why was that image the most viewed? Here's the story: The image comes from "The Roy Colmer New York City doors photograph collection," which includes 3,122 images related to a set of "photographic prints used in Colmer's conceptual art piece, Doors, NYC (1976)" (from the collection description).

A blog post from early 2014 commemorating Colmer and his work describes the project a bit more fully:

From November 1975 to September 1976, Colmer photographed more than 3,000 doors, inclusive and in sequence, on 120 intersections and streets of Manhattan from Wall Street to Fort Washington. The project, although documentary in nature, was essentially conceptual to Colmer, for whom Doors, NYC was as much an exploration of the serial possibilities of photography as of its ability to capture a place.

Meanwhile, for quite some time David Lowe, a specialist in our photography division, has been working with the division's metadata to create what he calls the Photo Geographies. Colmer's door project was among the first mapping projects of Lowe's geodata work (see map embedded below).

This project in turn attracted the attention of NPR's History Dept. (among others). And it was this NPR post that drove the most traffic to our Digital Collections site, and the photo above in particular.

That's the story for this month! Check back in a few weeks for more stories from our Digital Collections.

The view of 52nd st. today:

FOSS4Lib Recent Releases: Evergreen - 2.8.0

planet code4lib - Wed, 2015-04-01 14:18
Package: EvergreenRelease Date: Tuesday, March 31, 2015

Last updated April 1, 2015. Created by gmcharlt on April 1, 2015.
Log in to edit this page.

New features and enhancements of note in Evergreen 2.8.0 include:

Evergreen ILS: Evergreen web team extends protocol support

planet code4lib - Wed, 2015-04-01 11:27

In light of the upcoming adoption of HTTP/2, the Evergreen web team decided to survey the space of online information-sharing protocols.

Since a span of 26 years has evidently not proven sufficient to shake all the bugs out of HTTP, we’ve decided to hedge our bets and extend our Internet presence to include support for a protocol that’s been patiently waiting in the wings… just in case.

We are therefore pleased to announce the availability of


We are proud to join the ever-expanding Gophersphere, and hope members of the Evergreen community find this to be a useful, if light-hearted, historical resource.

To join in the fun, install your favorite Gopher client and start exploring! Here’s a little preview:


Hydra Project: IMLS funds collaborative development of “Hydra-in-a-Box”

planet code4lib - Wed, 2015-04-01 09:18

The IMLS, the US Institute for Museum and Library Services, has announced that it will be funding a $2M proposal to build a turnkey and cloud-ready Hydra solution over the next 2.5 years. DPLA, Stanford & DuraSpace submitted the joint proposal; alignment with the Hydra community, and distributed input on the design, specification and development is structurally built into the grant.

The text of the announcement reads:

“The Digital Public Library of America (DPLA), Stanford University, and DuraSpace will foster a greatly expanded network of open-access, content-hosting “hubs” that will enable discovery and interoperability, as well as the reuse of digital resources by people from this country and around the world. At the core of this transformative network are advanced digital repositories that not only empower local institutions with new asset management capabilities, but also connect their data and collections. Currently, DPLA’s hubs, libraries, archives, and museums more broadly use aging, legacy software that was never intended or designed for use in an interconnected way, or for contemporary web needs. The three partners will engage in a major development of the community-driven open source Hydra project to provide these hubs with a new all-in-one solution, which will also allow countless other institutions to easily join the national digital platform.”

This work provides a wonderful chance to accelerate the convergence of the Hydra community on robust, broadly useful, and common codebase. It also looks likely to rapidly expand the Hydra user base not only in the US but worldwide.  Our congratulations to all concerned!

William Denton: Data and Goliath

planet code4lib - Wed, 2015-04-01 03:16

I recommend Bruce Schneier’s new book Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World to everyone.

Schneier, as you probably know, is a security expert. A real one, a good one, and a thoughtful one. He wrote the book on implementing cryptography in software, he design the playing card encryption method used in Neal Stephenson’s Cryptonomicon, he was helped reporters understand the Snowden documents.

This is his post-Snowden book, on everything that’s known about how we’re being monitored every second of our lives, by whom, why this is a very serious problem, and what we can do about it. His three section headings set it out clearly: The World We’re Creating, What’s at Stake, and What to Do About It. In each section he explains things clearly and understandably without requiring any major technical knowledge. Often there isn’t time to get into technical details, anyway: we are monitored so minutely online, and what the NSA and other spy agencies do is so staggeringly intrusive, that the briefest description of one technique or system is all that’s needed to get the point across before moving on to another.

Data and Goliath is the book I’ve been waiting for, the one that lays it all out and brings all of the recent discoveries and revelations together. It has much that is new, such as discussions of why privacy is necessary so that people have the freedom to break some laws that ultimately lead to societal change (homosexuality is one of his examples), and good arguments against the “but I have nothing to hide, I don’t care” idiocy of people who ignorantly give up all their privacy. (Glenn Greenwald’s No Place to Hide: Edward Snowden, the NSA and the U.S. Surveillance State has more good arguments, such as: you may not care, but millions of people around the world trying to make things better do, and they’re in danger of being arrested and beaten just for speaking their mind).

He ends with what we can do, from the small scale (paying cash, browser privacy extensions, leaving cell phones at home) to the large (major political action). Here’s the final paragraph of the penultimate chapter:

There is strength in numbers, and if the public outcry grows, governments and corporations will be forced to respond. We are trying to prevent an authoritarian government like the one portrayed in Orwell’s Nineteen Eighty-Four, and a corporate-ruled state like the ones portrayed in countless dystopian cyberpunk science fiction novels. We are nowhere near either of those endpoints, but the train is moving in both those directions, and we need to apply the brakes.

Schneier’s page about the book has lots of links to excerpts and reviews. Have a look, then get the book. You should read it.

Here in Canada, just this week we learn that the Communications Security Establishment continues to spy on people worldwide and see the Conservatives push Bill C-51. Schneier’s book helps here as everywhere else: what is happening, why it matters, and what to do.

Terry Reese: MarcEdit 101 Webinar Series

planet code4lib - Wed, 2015-04-01 01:17

The MarcEdit 101 Webinar Series were created over the course of multiple months for the CARLI ( consortium in Spring 2015.  In late March 2015, CARLI reached out to me and requested that these webinars be made available to the larger MarcEdit community, so if you find these webinars useful, please reach out and thank the folks at CARLI.

Couple of notes, these webinars are being made available as is, save for the following modifications:

  1. Attendee names have been anonymized.  While I’m certain most attendees would have no problem with their names showing up in these webinar lists, the original intended audience was locally scoped to CARLI and it’s members.  Masking attendees was done primarily because of this change of scope.
  2. The Q/A at the end of the sessions has generally been removed from the webinars.  Again, these are localized webinars and questions asked during the webinars tend to be within the scope of this consortia.

I’ll be making these video available over the next couple of months.  Again, if you find these webinars useful, please make sure you let the folks at CARLI know.

Series URL:


Peter Murray: What Does it Mean to Have Unlimited Storage in the Cloud?

planet code4lib - Wed, 2015-04-01 00:33

We’ve seen big announcements recently about unlimited cloud storage offerings for a flat monthly or fee. Dropbox offers it for subscribers to its Business plan. Similarly, Google has unlimited storage for Google Apps for Business customers. In both cases, though, you have to be part of a business group of some sort. Then Microsoft unlimited storage for any subscriber of all Office 365 customers (Home, School, and soon Business) as bundled offering of OneDrive with the Office suite of products. Now comes word today from Amazon of unlimited storage to consumers…no need to be part of a business grouping or have bundled software come with it.

Today a colleague asked why all of this cloud storage couldn’t be used as file storage for the Islandora hosting service that is offered by LYRASIS. On the surface, it would seem to be a perfect backup strategy — particularly if you subscribed to multiple of these services and ran audits between them to make sure that they were truly in sync. Alas, the terms of service prevent you from doing something like that. Here is an excerpt from Amazon:

1.2 Using Your Files with the Service. You may use the Service only to store, retrieve, manage, and access Your Files for personal, non-commercial purposes using the features and functionality we make available. You may not use the Service to store, transfer or distribute content of or on behalf of third parties, to operate your own file storage application or service, to operate a photography business or other commercial service, or to resell any part of the Service. You are solely responsible for Your Files and for complying with all applicable copyright and other laws, including import and export control laws and regulations, and with the terms of any licenses or agreements to which you are bound. You must ensure that Your Files are free from any malware, viruses, Trojan horses, spyware, worms, or other malicious or harmful code.

- Amazon Cloud Drive Terms of Use, Last updated March 25, 2015

It did get me wondering, though. Decades ago the technology community created RAID storage: Redundant Array of Inexpensive Disks. The concept is that if you copy your data across many different disks, you can survive the failure of one of those disks and rebuild the information from the remaining drives. We also have virtual storage systems like iRODS and distributed file systems like Google File System and Apache Hadoop Distributed File System. I wonder what it would take to layer these concepts together to have a cloud-independent, cloud-redundant storage array for personal backups. Sort of like a poor-man’s RAID over Dropbox/Amazon/Microsoft/Google. Something that would take care of the file verifications, the rebuilding from redundant copies, and the caching of content between services. Even if we couldn’t use it for our library services, it would be a darn good way to ensure the survivability of our cloud-stored files against the failure of a storage provider’s business model.

Link to this post!

Evergreen ILS: Evergreen 2.8.0 released

planet code4lib - Wed, 2015-04-01 00:12

The Evergreen community is pleased to announce the release of version 2.8.0 of the Evergreen open source integrated library system. Please visit the download page to get it!

New features and enhancements of note in Evergreen 2.8.0 include:

  • Acquisitions improvements to help prevent the creation of duplicate orders and duplicate purchase order names.
  • In the select list and PO view interfaces, beside the line item ID, the number of catalog copies already owned is now displayed.
  • A new Apache access handler that allows resources on an Evergreen webs server, or which are proxied via an Evergreen web server, to be authenticated using user’s Evergreen credentials.
  • Copy locations can now be marked as deleted. This allows information about disused copy locations to be retained for reporting purposes without cluttering up location selection drop-downs.
  • Support for matching authority records during MARC import. Matches can be made against MARC tag/subfield entries and against a record’s normalized heading and thesaurus.
  • Patron message center: a new mechanism via which messages can be sent to patrons for them to read while logged into the public catalog.
  • A new option to stop billing activity on zero-balance billed transaction, which will help reduce the incidence of patron accounts with negative balances.
  • New options to void lost item and long overdue billings if a loan is marked as claims returned.
  • The staff interface for placing holds now offers the ability to place additional holds on the same title.
  • The active date of a copy record is now displayed more clearly.
  • A number of enhancements have been made to the public catalog to better support discoverability by web search engines.
  • There is now a direct link to “My Lists” from the “My Account” area in the top upper-right part of the public catalog.
  • There is a new option for TPAC to show more details by default.

For more information about what’s in the release, check out the release notes.

As release manager, I would like to thank the many people and institutions who contributed to this release in various ways, including testing, writing documentation, writing code, helping project teams and committees to run smoothly, and providing financial support.


LITA: Desk Set – A Critical Response

planet code4lib - Tue, 2015-03-31 23:42
A Provocative Documentary

At the LITA Blog, we know you look to us as a source for what’s going on in technology and librarianship. When we discovered Desk Set, a recent documentary that takes the viewer through the process of one library’s struggle to integrate a new technology, we knew you would want to know our responses. Never fear: the LITA bloggers are here with the kind of hard-hitting commentary you’ve come to expect from us.

A Startling Lack of Diversity

Not a Bunny Watson Lauren H.

Office romance, machines, corporate mergers, and job security, what tiresome topics for a documentary. Desk Set should represent the good work librarians do every day, but instead the writers and directors choose to represent a view of librarianship that no longer exists in the modern world. Librarians are smart, intelligent workers who deserve respect and for a documentary to show them conducting intellectual work.

Furthermore, why are the librarians only women? Men should have equal representation. Those working on the film might have thought they were helping raise the view of women by having a single working women run the library, but instead they succumbed to stereotypes. There is certainly a troubling lack of interaction in this workplace.

Misused Space and Resources

Shame on the Federal Broadcasting Network Lindsay C.

In addition to the many other troubling questions raised in this odd documentary, Desk Set, as a librarian, I find the work site conditions and management particularly unsettling. What sort of workplace implements technology like the EMERAC without an advance audit and training? I can only suggest that FBN stakeholders be engaged in the process of reassessing such reckless deployments and untested software patches. Perhaps a staff member could be sent to an assessment training session- I would suggest the obviously under appreciated Peg Costello, so that an appropriate implementation plan could be developed.

The erroneous pink slip incident is particularly telling. Had library and other staff been properly trained in the automation process, panic and morale issues could have been completely avoided.

Beyond these gentle suggestions though, I must insist that Richard Sumner review his own product design, as a self-destruct button seems like a dangerous liability for any computer.

Mike Cutler uses his influence as boss to control Bunny et al

Mike Cutler, a Harassment Suit in Waiting John K.

If the problems in the office were only limited to EMERAC. The man, and of course it’s a man, who oversees the reference department at FBN has Bunny Watson wrapped around his finger. Mike Cutler is reprehensible as a boss. He rarely shows himself in the reference department and when he does it’s only to press advances on Ms. Watson or give her his work to finish. Ms. Watson is very clear that she does not want Mr. Cutler touching her in the office and yet he pays no heed to her wishes, only serving to fulfill his base desires.

Not only does Mr. Cutler harass Ms. Watson at work, but the documentary shows that he’s stalking her. Mr. Cutler shows up at Ms. Watson’s house, unannounced, barges through the front door, and makes advances on her, even going so far as to demand food and drink. Thank goodness Mr. Sumner was there—for an evening of intellectual discussion it seems—to keep things from getting out of hand. As if Ms. Watson didn’t have enough to worry about!

If he hadn’t gotten transferred to FBN’s West Coast offices, I would have expected Ms. Watson to file a harassment suit against her boss. Only in today’s office environment’s could a cad like Mr. Cutler get a promotion after the unprofessional way in which he acted.


Are you looking for an adorable, classic, and entirely charming (and entirely fictional) film with librarians and super computers? Desk Set is the perfect option for you this April Fool’s Day. We hope you’ll post your own responses below, and know that you can always count on us to know when to take a topic seriously.

John Miedema: Categories? Tags? Pffft. Words are the true pit of chaos. Or not.

planet code4lib - Tue, 2015-03-31 23:31

Categories? Very eighteenth century. Tags? So Web 2.0. Pretty cryptic stuff. What will Lila do differently? Let’s take another step.

Tags are messier than categories; I called tags evil. But tags are easier to manage than the next level down, the words themselves. Tags are messy when left to humans, but tags can be managed with automation. Many services auto-suggest tags, controlling the vocabulary. Lila will generate its own tags, refreshing them on demand. Tags can be managed.

Words are the true pit of chaos. People conform to the rules of language when they write, or they don’t. People make up words on the fly. Down the rabbit hole. But is it so bad? It happens time and again that we think an information problem is too complex to be automated, only to analyze it and discover that we can do a good chunk of what we hoped following a relatively simple set of rules. One mature technology is keyword search. Keyword search is so effective we take it for granted. Words can be managed with the right technologies.

Another mature technology is Natural Language Processing (NLP). Its history dates back to the 1950’s. The field is enjoying a resurgence of interest in the context of cognitive computing. Consider that a person can learn basic capability in a second language with only a couple thousand words and some syntax for combining them. Words and syntax. Data and rules. Build dictionaries with words and their variant forms. Assign parts-of-speech. Use pattern recognition to pick out words occurring together. Run through many examples to develop context sensitivity. Shakespeare it is not, but human meaning can be extracted from unstructured text in this way for many useful purposes.

Lila’s purpose is to make connections between passages of text (“slips”) and to suggest hierarchical views, e.g., a table of contents. I’ve talked a lot about how Lila can compute connections. Keywords and NLP can be used effectively to find common subjects across passages. Hierarchy is something different. How can the words in a passage say something about how it should be ordered relative to other external passages? We can go no deeper than the words. It’s all we have to work with. To compute hierarchy, Lila needs something different, something special. Stay tuned.

DPLA: DPLAfest 2015: Dive into the DPLA Hub network

planet code4lib - Tue, 2015-03-31 18:30

The DPLAFest schedule is packed with interesting sessions–everything from ebooks, to project management, to digitization, and education, has a space in the lineup. A set of those programs is related to the DPLA Hubs and the work that they do. In addition to showcasing the incredible work being done by institutions that are part of the DPLA Hub network, it’s a great way for attendees from aspiring Hubs to find out more about the application process. Here are some Hub Highlights from this year’s fest:

Best Practices for Establishing a DPLA Service Hub in Your State/Region: Gear up for the 2015 Content Hub or Service Hub open calls by learning more about what it takes to be a DPLA network Hub. Session speakers include DPLA’s Director for Content Emily Gore, Assistant Director for Content Amy Rudersdorf, and Data Services Coordinator Gretchen Gueguen.

Best Practices for Digitization Training: Want to know more about designing a digitization training program curriculum? Join this session led by two members of the DPLA Hub Network and participants in the Public Library Partnerships Project.

Newspapers and the DPLA: Extra, extra! Learn all about it in this session exploring the potential to integrate newspaper content into the DPLA. Speakers include Head of the Digital Scholarship Center, University of Oregon Libraries, Karen Estlund, and DPLA’s Emily Gore. We’d love to hear your ideas and expertise as we explore this new opportunity.

DPLA Hubs Showcase: The Hubs Showcase will combine a lot of learning with a little bit of fun as we give nine speakers five minutes each to talk about the innovative and unique work they’re doing at their institutions across the US. Topics will range from geospatial metadata best practices in the Mountain West to Fedora/Blacklight systems aggregating South Carolina content, image sharing via the International Image Interoperability Framework (IIIF), access to digitized newspapers in North Carolina, and much more. Set your timer for an information-packed hour.


SearchHub: Speak at Revolution 2015: Call for Papers open through May 8

planet code4lib - Tue, 2015-03-31 17:08
As 2015 whizzes by (seriously, March where did you go?), the team at Lucidworks is busy preparing for another exciting Lucene/Solr Revolution, the largest conference dedicated to open source Lucene/Solr, taking place this year in Austin, TX October 13-16. The Call for Papers is now open and we are looking forward to seeing this year’s speaking proposals. The speakers at Lucene/Solr Revolution really are the heart of the conference and provide attendees with an opportunity to hear about so many different ways Lucene and Solr can be used, implemented, and optimized. Last year, we had speakers from companies like Apple, Bloomberg, LinkedIn, Twitter, Airbnb, and more share how they are leveraging Lucene and Solr to better their business. Amazon Web Services, Sematext, Cloudera, Basis Technology, and others shed light on best practices for data visualization, scaling, log management, multilingual search, and more. This year, we want to hear from you! Whether you are a developer, business leader, Solr committer, or someone who just loves search and open source, and have something interesting to share, submit your proposal today. Don’t miss out on the chance to teach, innovate, motivate, and share your experiences with the Solr community and engage with the best and brightest across the industry.  Speaking submissions will be accepted through May 8. Registration for the conference is also now open! Early bird registration is in effect through May 31. Register today to save up to $500 on Conference-Only and Training + Conference packages. For all things Lucene/Solr Revolution 2015, follow us on Twitter @lucenesolrrev and join us on Facebook.  

The post Speak at Revolution 2015: Call for Papers open through May 8 appeared first on Lucidworks.

District Dispatch: Net neutrality battle continues in new venues

planet code4lib - Tue, 2015-03-31 17:00

Tom Wheeler. Photo by Adweek.

In the month since the Federal Communications Commission (FCC) voted to approve an Order to Protect and Promote the Open Internet, the American Library Association (ALA) and its allies have been wading into the details published by the Commission March 12 to better understand the implications for libraries and higher education. It’s also been a busy time with Congressional hearings and the first lawsuits filed, so an update is in order.

As coalition partner EDUCAUSE notes in a recent blog, the new Order references our coalition’s ideas and proposals nearly 20 times, indicating the impact and value of our engagement in this far-reaching proceeding.

First, and most importantly, the new “bright line” rules adopted protect against internet service providers (ISPs) blocking, degrading or prioritizing legal internet traffic align with the key concerns of the library and higher education coalition, as we wrote earlier. These rules apply to both mobile and fixed broadband, which our coalition has consistently advocated for since 2009. The Commission also strengthened transparency requirements to include the duty to disclose prices and fees, as well as network performance and practices.

The Commission also laid out a standard for future conduct to address concerns that may arise with new technologies and practices. The Order establishes that ISPs cannot “unreasonably interfere with or unreasonably disadvantage” the ability of consumers to select the online content and services they want and the ability of content providers to reach these consumers. Here again, the FCC cites the higher education and library comments proposing an “internet reasonable” standard that would protect the unique and open character of the Internet.

Our coalition also sought to ensure libraries and educational institutions are again explicitly included in network neutrality protections and to differentiate between public broadband internet access and private networks. The Commission explicitly affirmed both of these points.

Finally, ALA raised concerns about how forbearance from some Title II regulations might impact Universal Service Fund (USF) programs like the E-rate program. We are pleased the Commission recognized that “Even prior to the classification of broadband Internet access service adopted here, the Commission already supported broadband services to schools, libraries, and health care providers and supported broadband-capable networks in high-cost areas.” As a result, the FCC determined that broadband Internet access services would be subject to Section 254 USF protections, but not immediately required to provide USF contributions. It is expected that the contribution question will be addressed in a separate FCC proceeding. The Order also preempts any state from imposing any new state USF contributions on broadband at this time.

Congress has been busy on the network neutrality front, as well. Over the last two weeks Congress has held a series of hearings—including directly questioning FCC Chairman Tom Wheeler and fellow FCC Commissioners—that concluded in the House Judiciary Committee last week. While these hearings were contentious—and, at times, contentious among the commissioners themselves—it’s fair to say no major shift in policy by the commissioners was detected nor were minds changed in Congress.

ALA joined (pdf) 137 other groups and companies in a letter thanking Chairman Wheeler, Commissioner Mignon Clyburn and Commissioner Jessica Rosenworcel for their leadership in protecting the Open Internet. It was read into the record at the March 17 House Oversight hearing by Congresswoman Eleanor Holmes Norton (D-DC).

Senate Commerce Committee Chairman John Thune (R-SD) and House Energy and Commerce Committee Chairman Fred Upton (R-MI) introduced legislation earlier this year that would codify some network neutrality protections while also undermining FCC authority to regulate broadband internet access. Together these two elements leave a gulf between Democrats and Republicans in finding a legislative solution that would truly protect network neutrality and potentially forestall years of litigation. A bipartisan (and non-binding) budget amendment expressing support for network neutrality rules did manage to clear the Senate last Friday, perhaps signaling room for compromise. Republicans are apparently targeting Senator Bill Nelson (D-FL), who has indicated openness to discussions but is unwilling to accept language that would usurp FCC authority and oversight powers, for a compromise deal.

As of March 31st, the FCC Order had not yet been published in the Federal Register, which begins the true “shot clock” for Congressional and legal action on the Order. Trade association US Telecom (which includes AT&T, Verizon, Frontier and CenturyLink among its members) and Alamo Broadband couldn’t wait that long—filing suit in two different appeals courts on March 23. Using language all of us are likely to hear a lot more about, US Telecom argues (pdf) the FCC decision is “arbitrary, capricious, and an abuse of discretion.” FCC lawyers have argued the suits were filed prematurely and should be rejected.

Expect more drama but likely little immediate resolution. Carrying our message to Capitol Hill about why network neutrality is so vital for libraries and all of our users will be essential. I hope many of you will join with us on National Library Legislative Day to amplify our voice and impact.

The post Net neutrality battle continues in new venues appeared first on District Dispatch.

District Dispatch: The Washington Pest: National Library Legislative Day news

planet code4lib - Tue, 2015-03-31 16:56

SCOOP!  ALA Washington Office Encourages Inter-Chapter Fight

Washington, DC – the ALA Washington Office has confirmed that it’s actively pitting state chapter against state chapter in the run up to National Library Legislative Day 2015 (NLLD). In the latest in a record-shattering string of 4,268 reminders about the event, held on May 4 – 5 in the nation’s capital, the Office of Government Relations publicly admitted today that they are actively encouraging competition between the state chapters. “It’s true. We want every state library chapter in the nation to fight tooth and nail . . . to bring more people to NLLD per capita than anybody else, that is, and we plan to reward the winner handsomely,” says OGR Director Adam Eisgrau.

Not-so-secret memoranda leaked to The Pest confirm that all members of the largest state delegation registered for and attending NLLD (as a percentage of the state’s total population) will receive a guided, behind-the-scenes tour of the incredible United States Capitol building.  “We need heroic turnout from every state this year,” added Eisgrau.  “It’s time to take the registration gloves off!”

Register for NLLD today and be sure to book now for a special Capitol Hill hotel room rate; the offer ends this week! The deadline to register online for NLLD is April 24th.

For more information or assistance of any kind, please contact Lisa Lindle, ALA Washington’s Grassroots Communications Specialist, at or 202-403-8222.


The post The Washington Pest: National Library Legislative Day news appeared first on District Dispatch.

HangingTogether: Round of Eight: Peaches and Pumpkins

planet code4lib - Tue, 2015-03-31 16:47

OCLC Research Collective Collections Tournament


In December 1891, Dr. James Naismith rigged up an elevated peach basket in a Springfield, Massachusetts gymnasium – and the game of basketball was born. Now, nearly 125 years later, we celebrate Dr. Naismith’s innovation with the third round of the OCLC Research Collective Collections Tournament. We are down to four conferences … did yours make the cut?

[Click to Enlarge]

Competition in the Round of 8 goes to the very roots of basketball: which conference collective collection has the most materials published in the birthplace of basketball, Springfield, Massachusetts?* The Atlantic 10 came through as the big winner, with more than 2,300 publications originating from Springfield. Conference USA was close behind with more than 1,900 publications, winning handily over our erstwhile tournament Cinderella, Big South. After bracket-busting victories over the mighty America East and Big Ten conferences in earlier rounds, Big South could not get past Conference USA in the Round of 8. So the clock has struck midnight, and Big South’s fairytale tournament run has turned back into a pumpkin. Summit League and Missouri Valley join the Atlantic 10 and Conference USA in the next round.

Springfield, Massachusetts boasts much more than the honor of being the birthplace of basketball. Theodor Geisel, better known as Dr. Seuss, was born there (visit the Dr. Seuss Collection at the University of California, San Diego’s Geisel Library). Merriam-Webster, Inc., publisher of the eponymous dictionary, is headquartered in Springfield (visit the Warren N. and Suzanne B. Cordell Collection of Dictionaries at Indiana State University’s Cunningham Memorial Library, to which Merriam-Webster donated 500 volumes). And the Milton Bradley Company, maker of such classic games as Chutes & Ladders, Twister, and Candyland, was founded in Springfield in 1860 (visit the George M. Fox collection of rare children’s books at the San Francisco Public Library, which began as the archives of a publishing company acquired by Milton Bradley in 1920).

Needless to say, the works of Dr. Seuss, Merriam-Webster, and yes, even Milton Bradley, are part of the conference collective collections competing in this tournament!

Bracket competition participants: Remember, if the conference you chose is now watching the tournament from the sidelines, there is still a ray of hope! If no one picked the tournament Champion, all entrants will be part of a random drawing for the big prize!

The tournament semi-finals are next! Results will be posted April 3.


*Number of publications in conference collective collections that were published in Springfield, Massachusetts. Data is current as of January 2015.

[Click to Enlarge]

More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

Round of 16: The plot thickens … and so do the books

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (11)


Subscribe to code4lib aggregator