You are here

Feed aggregator

Open Library Data Additions: Amazon Crawl: part 13

planet code4lib - Tue, 2015-06-02 06:35

Part 13 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

District Dispatch: ALA draws line in sand on USA FREEDOM amendments

planet code4lib - Tue, 2015-06-02 02:37


The United States Senate adjourned today with the stage set for votes Tuesday afternoon on at least three “hostile” amendments to the USA FREEDOM Act filed by Senate Majority Leader Mitch McConnell (R-KY).  As explained in a letter by Washington Office Executive Director Emily Sheketoff that will be delivered to all Senators ahead of Tuesday’s votes, passage of any one such amendment would water down the USA FREEDOM Act so seriously as to cause ALA to reverse course and oppose the bill.

Now is the time for one last push by librarians everywhere to again call and email their Senators to deliver a simple message: 1) VOTE “NO” on any and every amendment that would weaken the USA FREEDOM Act; and 2) PASS the bill now without change so that the President can sign it without delay.

Please, visit ALA’s Legislative Action Center to send that urgent message now.

For detailed information on the pending amendments and why they’re utterly unacceptable, please see this analysis by our coalition compatriots at the Center for Democracy and Technology.  The ALA Washington Office’s “line in the sand” letter is available here: USAF Letter 060115.

The post ALA draws line in sand on USA FREEDOM amendments appeared first on District Dispatch.

LibUX: 020: Localizing the User Experience with Robert Laws

planet code4lib - Mon, 2015-06-01 23:59

Robert Laws is the Digital Services Librarian for Georgetown University’s School of Foreign Service in Qatar. In this episode of LibUX, Robert discusses customizing Drupal and LibGuides to present a more localized version of those sites for his campus. He gives tips on how he got started and how to stay relevant in the world of web services. As our first international guest, Amanda asked him about the challenges of regional restrictions on content.

You can listen to LibUX on Stitcher, find us on iTunes, or subscribe to the straight feed. Consider signing-up for our weekly newsletter, the Web for Libraries.

The post 020: Localizing the User Experience with Robert Laws appeared first on LibUX.

District Dispatch: Update on 1201 proceedings

planet code4lib - Mon, 2015-06-01 22:06

In the last two weeks, the Copyright Office held ten hearings in Los Angeles and Washington, D.C. and heard the arguments for and against circumvention of digital locks—Section 1201 of the Digital Millennium Copyright Act—on the proposed classes of works, including cell phones, video games, e-readers, and oh yes, farm equipment. Many have said that these hearings are unbearable and long, but in a weird way, I like to attend them (and ALA Council). Unfortunately, I was out of town and missed the hearings. So read along with me, reports on the hearings from Brandon Butler of the Washington College of Law at American University, and Rebecca Tushnet, from Georgetown Law.

The post Update on 1201 proceedings appeared first on District Dispatch.

HangingTogether: What’s changed in linked data implementations?

planet code4lib - Mon, 2015-06-01 20:47

Last year we received 96 responses to the OCLC Research “International Linked Data Survey for Implementers” reporting 172 linked data projects or services in 15 countries, of which 76 were described. Of the 76 projects described, 27 (36%) were not yet implemented and 13 (17%) had been in production in less than a year.

So we were curious – what might have changed in the last year? OCLC Research decided to repeat its survey to learn details of specific projects or services that format metadata as linked data and/or make subsequent uses of it.  We’re curious to see whether the projects that had not yet been implemented have now been, whether any of last year’s respondents would have any different answers, and whether we could encourage linked data implementers who didn’t respond to last year’s survey to respond to this year’s.

The questions are the same so we can more easily compare results. (Some multiple-choice questions have more options taken from the “other” responses in last year’s responses, and some open-ended questions are now multiple-choice, again based on last year’s responses.) The target audiences are staff who have implemented or are implementing linked data projects or services-either by publishing data as linked data or ingesting linked data resources into their own data or applications, or both.

The survey is available at

We are asking that responses be completed by 17 July 2015. As with last year’s survey, we will share the examples collected for the benefit of others wanting to undertake similar efforts, wondering what is possible to do and how to go about it. We summarized last year’s results in a series of blog posts here: 1) Who’s doing it; 2) Examples in production; 3) Why and what institutions are consuming; 4) Why and what institutions are publishing; 5) Technical details; 6) Advice from the implementers.

What do you think has changed in the last year?



About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (59)

District Dispatch: Update on 1201 proceedings

planet code4lib - Mon, 2015-06-01 20:16

In the last two weeks, the Copyright Office held ten hearings in Los Angeles and Washington, D.C. and heard the arguments for and against circumvention of digital locks—Section 1201 of the Digital Millennium Copyright Act—on the proposed classes of works, including cell phones, video games, e-readers, and oh yes, farm equipment. Many have said that these hearings are unbearable and long, but in a weird way, I like to attend them (and ALA Council). Unfortunately, I was out of town and missed the hearings. So read along with me, reports on the hearings from Brandon Butler of the Washington College of Law at American University, and Rebecca Tushnet, from Georgetown Law.


The post Update on 1201 proceedings appeared first on District Dispatch.

Patrick Hochstenbach: Triennale Brugge 2015

planet code4lib - Mon, 2015-06-01 18:23
Filed under: Doodles Tagged: brugge, triennale, urban, urbansketching

District Dispatch: Experts to demystify 3D printing policies at 2015 ALA Conference

planet code4lib - Mon, 2015-06-01 18:22

As more and more libraries nationwide begin to offer 3D printing services, library leaders are now confronting a litany of copyright, trademark and patent complications that arise from the new technology. To help the library community address 3D printing concerns, the American Library Association (ALA) Committee on Legislation’s (COL) Copyright Subcommittee will explore 3D printing policy issues at the 2015 ALA Annual Conference in San Francisco.

Join Tomas A. Lipinski, dean of the University of Wisconsin-Milwaukee’s School of Information Studies and COL Copyright Subcommittee member, St. Louis’ University City Public Library Director Patrick Wall, and other policy experts at the session “Copyright and 3D Printing: Be Informed, Be Fearless, Be Smart!” for a “plain English” discussion of 3D printing, its copyright implications, and the patent and trademark issues that this breakthrough technology raises for libraries everywhere. The session will take place from 10:30 to 11:30 a.m. on Saturday, June 27, 2015, at the Moscone Convention Center in room 2001 of the West Building.

Lipinski has worked in a variety of legal settings including the private, public and non-profit sectors. He currently teaches, researches and speaks frequently on various topics within the areas of information law and policy, especially copyright, free speech and privacy issues in schools and libraries. Patrick Wall has been the director of St. Louis’ University City Public Library since March of 2011 and was its assistant director for the previous eight years. He also serves as President of the Municipal Library Consortium of St. Louis County, a group of nine libraries providing collective public access to more than 700,000 volumes.

  • Tomas A. Lipinski, dean of the University of Wisconsin-Milwaukee’s School of Information Studies, member of the American Library Association Committee on Legislation
  • Patrick Wall, director, University City Public Library (St. Louis)

View all ALA Washington Office conference sessions

The post Experts to demystify 3D printing policies at 2015 ALA Conference appeared first on District Dispatch.

ACRL TechConnect: Where do Library Staff Learn About Programming? Some Preliminary Survey Results

planet code4lib - Mon, 2015-06-01 14:05

[Editor’s Note:  This post is part of a series of posts related to ACRL TechConnect’s 2015 survey on Programming Languages, Frameworks, and Web Content Management Systems in Libraries.  The survey was distributed between January and March 2015 and received 265 responses.  A longer journal article with additional analysis is also forthcoming.  For a quick summary of the article below, check out this infographic.]

Our survey on programming languages in libraries has resulted in a mountain of fascinating data.  One of the goals of our survey was to better understand how staff in libraries learn about programming and develop their coding skills.  Based upon anecdotal evidence, we hypothesized that library staff members are often self-taught, learning through a combination of on-the-job learning and online tutorials.  Our findings indicate that respondents use a wide variety of sources to learn about programming, including MOOCs, online tutorials, Google searches, and colleagues.

Are programming skills gained by formal coursework, or in Library Science Master’s Programs?

We were interested in identifying sources of programming learning, whether that involved course work (either formal coursework as part of a degree or continuing education program, or through Massive Online Open Courseware (MOOCs)).  Nearly two-thirds of respondents indicated they had an MLS or were working on one:

When asked about coursework taken in programming, application, or software development, results were mixed, with the most popular choice being 1-2 classes:

However, of those respondents who have taken a course in programming (about 80% of all respondents) AND indicated that they either had an MLS or were attending an MLS program, only about a third had taken any of those courses as part of a Master’s in Library Science program:

Resources for learning about programming

The final question of the survey asked respondents, in an open-ended way, to describe resources they use to learn about programming.  It was a pretty complex question:

Please list or describe any learning resources, discussion boards or forums, or other methods you use to learn about or develop your skills in programming, application development, or scripting. Please includes links to online resources if available. Examples of resources include, but are not limited to:, MOOC courses, local community/college/university course on programming, Books, Code4Lib listserv, Stack Overflow, etc.).

Respondents gave, in many cases, incredibly detailed responses – and most respondents indicated a list of resources used.  After coding the responses into 10 categories, some trends emerged.  The most popular resources for learning about programming, by far, were courses (whether those courses were taken formally in a classroom environment, or online in a MOOC environment):

To better illustrate what each category entails, here are the top five resources in each category:

By far, the most commonly cited learning resource was Stack Overflow, followed by the Code4Lib Listserv, Books/ebooks (unspecified) and  Results may skew a little toward these resources because they were mentioned as examples in the question, priming respondents to include them in their responses.  Since links to the survey were distributed, among other places, on the Code4Lib listserv, its prominence may also be influenced by response bias. One area that was a little surprising was the number of respondents that included social networks (including in-person networks like co-workers) as resources – indeed, respondents who mentioned colleagues as learning resources were particularly enthusiastic, as one respondent put it:

…co-workers are always very important learning resources, perhaps the most important!

Preliminary Analysis

While the data isn’t conclusive enough to draw any strong conclusions yet, a few thoughts come to mind:

  • About 3/4 of respondents indicated that programming was either part of their job description, or that they use programming or scripting as part of their work, even if it’s not expressly part of their job.  And yet, only about a third of respondents with an MLS (or in the process of getting one) took a programming class as part of their MLS program.  Programming is increasingly an essential skill for library work, and this survey seems to support the view that there should be more programming courses in library school curriculum.
  • Obviously programming work is not monolithic – there’s lots of variation among those who do programming work that isn’t reflected in our survey, and this survey may have unintentionally excluded those who are hobby coders.  Most questions focused on programming used when performing work-related tasks, so additional research would be needed to identify learning strategies of enthusiast programmers who don’t have the opportunity to program as part of their job.
  • Respondents indicated that learning on the job is an important aspect of their work; they may not have time or institutional support for formal training or courses, and figure things out as they go along using forums like Stack Overflow and Code4Lib’s listserv.  As one respondent put it:

Codecademy got me started. Stack Overflow saves me hours of time and effort, on a regular basis, as it helps me with answers to specific, time-of-need questions, helping me do problem-based learning.

TL;DR?  Here’s an infographic:

In the next post, I’ll discuss some of the findings related to ways administration and supervisors support (or don’t support) programming work in libraries.

LITA: Negotiate!

planet code4lib - Mon, 2015-06-01 13:00

I’m going to say it: Librarians are rarely effective negotiators. Way too often we pay full prices for mediocre resources without demur. Why?

Credit: Flickr user changeorder

First of all, most librarians are introverts and/or peaceable sorts who dislike confrontation. Second, we are unlikely to get bonuses or promotions when we save our organizations money, so there goes most of the extrinsic motivation for driving a hard bargain with vendors. Third and most importantly, we go into the library business because libraries aren’t a business. Most of us deliver government-funded public services, so we have zero profit motive, and our non-business mentality is almost a professional value in itself. But this failure to negotiate weakens our value to the communities we serve.

Libraries pay providers over a billion dollars a year for digital services and resources, only to get overpriced subscriptions and comparatively shoddy products. When did you last meet a librarian who loved their ILS? Meanwhile, we lose whatever dignity remains to us when our national associations curry favor with “Library Champions” like Elsevier, soliciting these profiteers to give back a minuscule fraction of their profits squeezed from libraries. We forget that vendors exist because of us.

Recently I sat in a dealer’s office for ninety minutes, refusing to budge till I got a better deal on my new car. The initial offer was 7% APR. The final offer was 0.9% APR with new all-season floor mats thrown in. The experience awoke me to the realization that I, as the customer, always held the leverage in any business relationship. I was thrilled.

I applied that realization to my work managing electronic resources, renegotiating contracts, haggling reduced rates, and saving about 10% of my annual budget my first year while delivering equivalent levels of services. This money then could be shuffled to fund other e-resources and services, or saved so as to forestall forced budgets cuts and make the library look good to external administrators keen to cut costs.

The key to negotiation is not to fold at the first “no.” Initial price quotes and contracts are a starting point for negotiation, by no means the final offer. Trim unneeded services to obtain a price reduction. Renegotiate, don’t renew, contracts. Ask to renew existing subscriptions at the previous year’s price, dodging the 5% annual increase that most providers slap on products. And take nothing at face value! I once saved $4000 on a single bill because I phoned to ask for a definitive list of our product subscriptions only to discover that the provider had neglected to document one very active subscription. Sooo… we didn’t have to pay for it.

Don’t hesitate to call out bad service either. A company president once personally phoned me because I had rather vociferously objected to his firm’s abysmal customer service. Bear in mind, though, that most vendor reps are delightful people who care about libraries too. So when you’re negotiating, be firm and persistent but please don’t be a jerk.

Long-term solutions to vendor overpricing and second-rate products include consortiums, open access publishing, and open source software. But the simplest and quickest short-term solution for us individuals is to negotiate to get your money’s worth. Vendors want to keep your business, so to get a better deal, sometimes all you have to do is ask.

Michael Rodriguez is the E-Learning Librarian at Hodges University in Florida. He manages the library’s digital services and resources, including 130-plus databases, the library website, and the ILS. He also teaches and tutors students in the School of Liberal Studies and the School of Technology, runs social media for LITA, and does freelance professional training and consulting. He tweets @topshelver and blogs at Shelver’s Cove.

Cherry Hill Company: Recap from the DrupalCon Drupal 4 Libraries BoF

planet code4lib - Sun, 2015-05-31 05:08

We had a great time at the Drupal 4 Libraries Birds of a Feather gathering at DrupalCon Los Angeles. So many people participated that we had to find extra chairs in order to accommodate everyone. We had representation from a diverse range of interests. Many attendees were from academic libraries, but we also had individuals join from public libraries, and we had a few vendors who work with libraries in the room, as well. Following is a recap of the tools people mentioned during the discussion.

Digital Asset Management Systems

Digital asset management systems took up a good portion of the conversation. This discussion started with a question about good photo archiving solutions. It was suggested that, as long as the archive is not large, Drupal on its own — with content types and views — may be the best solution for the photo archives needed. It was also suggested, however, that Drupal is not necessarily scalable for large archives, or for storing original assets....

Read more »

Open Library Data Additions: Amazon Crawl: part bf

planet code4lib - Sat, 2015-05-30 22:33

Part bf of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

State Library of Denmark: Heuristically correct top-X facets

planet code4lib - Sat, 2015-05-30 15:11

For most searches in our Net Archive, we have acceptable response time, due to the use of sparse faceting with Solr. Unfortunately as well as expectedly, some of the searches are slow. Response times in minutes slow, if we’re talking worst case. It is tied to the number of hits: Getting top-25 most popular links from pages about hedgehogs will take a few hundred milliseconds. Getting the top-25 links from all pages from 2010 takes minutes. Visualised, the response times looks like this:

Massive speed drop-off for higher result sets

Everything beyond 1M hits is slow, everything beyond 10M hits is coffee time. Okay for batch analysis, but we’re aiming for interactive use.

Get the probably correct top-X terms by sampling

Getting the top-X terms for a given facet can be achieved by sampling: Instead of processing all hits in the result set, some of them are skipped. The result set iterator conveniently provides an efficient advance-method, making this very easy. As we will only use sampling with larger result sets, there should be enough data to be quite sure that the top-25 terms are the correct ones, although their counts are somewhat off.

This of course all depends on how high X is in top-X, concrete corpus etc. The biggest danger is clusters of content in the corpus, which might be skipped. Maybe the skipping could be made in small steps? Process 100 documents, skip 500, process the next 100…? Tests will have to be made.

Fine count the top-X terms

With the correct terms being isolated, precisely those term can be fine counted. This is nearly the same as vanilla distributed faceting, with the exception that all shards must fine count all the top-X terms, instead of only the terms they had not already processed earlier.

Of course the fine counting could be skipped altogether, which would be faster and potentially very usable for interactive exploratory use, where the exact counts does not really matter.

But there’s no guarantee?

No. Do remember that vanilla Solr distributed faceting is also a best-effort, with the same guarantee as above: The terms are not guaranteed to be the correct ones, but their counts are.

Seems simple enough

Ticket #38 for sparse faceting has been opened and we could really use this in the Danish Net Archive Search. No promises though.

Note 2015-05-30

Knut Anton Bøckman mentioned on Twitter that Primo has a faceting mechanism that looks similar to my proposal. It seems that Primo uses the top-200 hits to select the facets (or rather terms?), then do a fine-count on those.

It might work well to base the term selection on the top hits, rather than sampling randomly through all the hits, but I am afraid that 200 is so small a sample that some of the terms will differ from the right ones. I understand the need for a small number though: Getting the top-million hits or just top-hundred-thousand is costly.

David Rosenthal: The Panopticon Is Good For You

planet code4lib - Sat, 2015-05-30 15:00
As Stanford staff I get a feel-good email every morning full of stuff about the wonderful things Stanford is doing. Last Thursday's linked to this article from the medical school about Stanford's annual Big Data in Biomedicine conference. It is full of gee-whiz speculation about how the human condition can be improved if massive amounts of data is collected about every human on the planet and shared freely among medical researchers. Below the fold, I give a taste of the speculation and, in my usual way, ask what could possibly go wrong?

All the following quotes are from the article:
In his keynote address, Lloyd Minor, MD, dean of the School of Medicine, defined a term, “precision health,” as “the next generation of precision medicine.” Precision health, he said, is the application of precision medicine to prevent or forestall disease before it occurs. “Whereas precision medicine is inherently reactive, precision health is prospective,” he said. “Precision medicine focuses on diagnosing and treating people who are sick, while precision health focuses on keeping people healthy.”

The fuel that powers precision health, Minor said, is big data: the merging of genomics and other ways of measuring what’s going on inside people at the molecular level, as well as the environmental, nutritional and lifestyle factors they’re exposed to, as captured by both electronic medical records and mobile-health devices.This isn't just what would normally be thought of as medical data:
Precision health requires looking beyond medical data to behavioral data, several speakers said. This is especially true in a modern society where it is behavior, not infectious disease, that’s increasingly the cause of disability and mortality, noted Laura Carstensen, PhD, professor of psychology and founding director of the Stanford Center on Longevity.But not to worry, we can now collect all sorts of useful data from people's smartphones:
That’s where mobile devices for monitoring everyday behavior can be useful in ways electronic health records can’t. Several speakers touched on the potential for using mobile-health devices to survey behavior and chronic disease and, perhaps, provide insights that could be used to support better behavior.
By monitoring 24/7 which room of one’s home one is in at any given minute over a 100-day period, you can detect key changes in behavior — changes in sleep-wake rhythms, for instance — that can indicate or even predict the onset of a health problem.

An expert in analyzing conversations, [Intel fellow Eric] Dishman recounted how he’d learned, for example, that “understanding the opening patterns of a phone conversation can tell you a lot,” including giving clues that a person is entering the initial stages of Alzheimer’s disease. Alternatively, “the structure of laughter in a couple’s conversation can predict marital trouble months before it emerges.”If only we could get rid of these pesky privacy requirements:
“Medical facilities won’t share DNA information, because they feel compelled to protect patients’ privacy. There are legitimate security and privacy issues. But sharing this information is vital. We’ll never cure rare DNA diseases until we can compare data on large numbers of people. And at the level of DNA, every disease is a rare disease: Every disease from A to Z potentially has a genomic component that can be addressed if we share our genomes.”The potential benefits of having this data widely shared across the medical profession are speculative, but plausible. But its not speculative at all to state that the data will also be shared with governments, police, insurance companies, lawyers, advertisers and most of all with criminals. Anyone who has been paying the slightest attention to the news over the last few years cannot possibly believe that these vast amounts of extremely valuable data being widely shared among researchers will never leak, or be subpoenaed. Only if you believe "its only metadata, there's nothing to worry about" can you believe that the data, the whole point of which is that it is highly specific to an individual, can be effectively anonymized. Saying "There are legitimate security and privacy issues. But ..." is simply a way of ignoring those issues, because actually addressing them would reveal that the downsides vastly outweigh the upsides.

Once again, we have an entire conference of techno-optimists, none of whom can be bothered to ask themselves "what could possibly go wrong?". In fact, in this case what they ought to be asking themselves is "what's the worst that could happen?", because the way they're going the worst is what is going to happen.

These ideas are potentially beneficial and in a world where data could be perfectly anonymized and kept perfectly secure for long periods of time despite being widely shared they should certainly be pursued. But this is not that world, and to behave as if it is violates the precept "First, do no harm" which, while strictly not part of the Hippocratic Oath, I believe is part of the canon of medical ethics.

Terry Reese: Enhancements to the MarcEdit Replace Function — making complex conditional edits easy

planet code4lib - Sat, 2015-05-30 06:09

MarcEdit provides lots of different ways for users to edit their data.  However, one use case that comes up often is the ability to perform an action on a field or fields based on the presence of data within another field.  While you can currently do this in MarcEdit by using tools to isolate the specific records to edit, and then working on just those items — more could be done to make this process easier.  So, to that end, I’ve updated the Replace Function to include a new conditional element that will allow MarcEdit to presort using an in-string or regular expression query, prior to evaluating data for replacement.  Here’s how it will work…

When you first open the Replace Window:

Notice that the conditional string text has been replaced.  This was confusing to folks – because maybe that didn’t reflect exactly what was being done.  Rather, this is an option that allows a user to run an instring or Regular Expression search across your entire record before the Find/Replace is run.  The search options grouped below – these *only* affect the Find/Replace textboxes.  They do not affect the options that are enabled when the Perform Find/Replace If…is checked.  Those data fields have their own toggles for instring (has) or regular expression (regex) matching.


If you check the box, the following information will be displayed:

Again – the If  [Textbox] [REGEX] is a search that is performed and must evaluate as true in order for the paired find and replace runs.  The use case for this function are things like:

  • I want to modify the field x but only if foobar is found in field y.


There are other ways to do this by extracting data from files and creating lots of different files for processing or writing a script – but this will give users a great deal more flexibility when wanting to perform options, but only if specific data is found within a field.


A simple example would be below:

This is a non-real world example of how this function works.  A user wants to change the 050 field to an 090 field, but only if the data in the 945$a is equal to an m-z.  That’s what the new option allows.  By checking the Perform Find/Replace If option, I’m allowed to provide a pre-search that will then filter the data sets that I’m going to actually perform the primary Find/Replace pair on.  Make sense?  I hope so.

Finally – I’ve updated the code around the task wizard so that this information can be utilized within tasks.  This enhancement will be in the next available update.


pinboard: Regex Crossword

planet code4lib - Fri, 2015-05-29 19:12
One note on this one: since it's in a crossword, you have to treat all regexen as though they are pinned at front & back -- eg, if the game says A?B?, you need to treat that as /^A?B?$/.

pinboard: lbjay/apache-elk-in-five-minutes · GitHub

planet code4lib - Fri, 2015-05-29 17:20
Minutes away from ensnaring yet another audience w/ the siren song of Elasticsearch/Logstash/Kibana. #code4lib

LITA: Volunteer to join the LITA AV Club

planet code4lib - Fri, 2015-05-29 16:38

You know you always wanted to be part of the cool gang, well now is your big chance. Be a part of creating the LITA AV club. Help make videos of important LITA conference presentations like the Top Tech Trends panel and LITA Forum Keynotes. Create the recordings to share these exciting and informative presentations with your LITA colleagues who weren’t able to attend. Earn the undying gratitude of all LITA members.

Sound right up your alley? We’ll need a couple of chief wranglers plus a bunch of hands on folks. The group can organize via email now, and meet up in San Francisco say sometime Friday or early Saturday, June 26th and 27th. Details galore to be worked on by all. If you have enough fun you can always turn the Club into a LITA Interest Group and achieve immortality, fame and fortune, or more likely the admiration of your fellow LITAns.

To get started email Mark Beatty at:
I’ll get gather names and contacts and create a “space” for you all to play.

Thanks. We can tell you are even cooler now than you were before you read this post.

Karen Coyle: International Cataloguing Principles, 2015

planet code4lib - Fri, 2015-05-29 14:05
IFLA is revising the International Cataloguing Principles and asked for input. Although I doubt that it will have an effect, I did write up my comments and send them in. Here's my view of the principles, including their history.

The original ICP dates from 1961 and read like a very condensed set of cataloging rules. [Note: As T Berger points out, this document was entitled "Paris Principles", not ICP.] It was limited to choice and form of entries (personal and corporate authors, titles). It also stated clearly that it applied to alphabetically sequenced catalogs:
The principles here stated apply only to the choice and form of headings and entry words -- i.e. to the principal elements determining the order of entries -- in catalogues of printed books in which entries under authors' names and, where these are inappropriate or insufficient, under the titles of works are combined in one alphabetical sequence. The basic statement of principles was not particularly different from those stated by Charles Ammi Cutter in 1875.

ICP 1961

 Note that the ICP does not include subject access, which was included in Cutter's objectives for the catalog. Somewhere between 1875 and 1961, cataloging became descriptive cataloging only. Cutter's rules did include a fair amount detail about subject cataloging (in 13 pages, as compared to 23 pages on authors).

The next version of the principles was issued in 2009. This version is intended to be "applicable to online catalogs and beyond." This is a post-FRBR set of principles, and the objectives of the catalog are given in points with headings find, identify, select, obtain and navigate. Of course, the first four are the FRBR user tasks. The fifth one, navigate, as I recall was suggested by Elaine Svenonius and obviously was looked on favorably even though it hasn't been added to the FRBR document, as far as I know.

The statement of functions of the catalog in this 2009 draft is rather long, but the "find" function gives an idea of how the goals of the catalog have changed:

ICP 2009  
It's worth pointing out a couple of key changes. The first is the statement "as the result of a search..." The 1961 principles were designed for an alphabetically arranged catalog; this set of principles recognizes that there are searches and search results in online catalogs, and it never mentions alphabetical arrangement. The second is that there is specific reference to relationships, and that these are expected to be searchable along with attributes of the resource. The third is that there is something called "secondary limiting of a search result." This latter appears to reflect the use of facets in search interfaces.

The differences between the 2015 draft of the ICP and this 2009 version are relatively minor. The big jump in thinking takes place between the 1961 version and the 2009 version. My comments (pdf) to the committee are as much about the 2009 version as the 2015 one. I make three points:
    1.  The catalog is a technology, and cataloging is therefore in a close relation to that technology
    Although the ICP talks about "find," etc., it doesn't relate those actions to the form of the "authorized access points." There is no recognition that searching today is primarily on keyword, not on left-anchored strings.

    2. Some catalog functions are provided by the catalog but not by cataloging
    The 2015 ICP includes among its principles that of accessibility of the catalog for all users. Accessibility, however, is primarily a function of the catalog technology, not the content of the catalog data. It also recommends (to my great pleasure) that the catalog data be made available for open access. This is another principle that is not content-based. Equally important is the idea, which is expressed in the 2015 principles under "navigate" as: "... beyond the catalogue, to other catalogues and in non-library contexts." This is clearly a function of the catalog, with the support of the catalog data, but what data serves this function is not mentioned.

    3. Authority control must be extended to all elements that have recognized value for retrieval
    This mainly refers to the inclusion of the elements that serve as limiting facets on retrieved sets. None of the elements listed here are included in the ICP's instructions on "authorized access points," yet these are, indeed, access points. Uncontrolled forms of dates, places, content, carrier, etc., are simply not usable as limits. Yet nowhere in the document is the form of these access points addressed.

    There is undoubtedly much more that could be said about the principles, but this is what seemed to me to be appropriate to the request for comment on this draft.

      Peter Murray: Setting the Right Environment: Remote Staff, Service Provider Participants, and Big-Tent Open Source Communities

      planet code4lib - Fri, 2015-05-29 13:45

      I was asked recently to prepare a 15 minute presentation on lessons learned working with a remote team hosting open source applications. The text of that presentation is below with links added to more information. Photographs are from DPLA and Flickr, and are used under Public Domain or Creative Commons derivatives-okay licenses. Photographs link to their sources.

      Thank you for the opportunity to talk with you today. This is a description of a long-running project at LYRASIS to host open source software on behalf of our members and others in the cultural heritage community. The genesis of this project is member research done at the formation of LYRASIS from SOLINET, PALINET and NELINET. Our membership told us that they wanted the advantages of open source software but did not have the resources within their organization to host it themselves. Our goals were — and still are — to create sustainable technical infrastructure for open source hosting, to provide top-notch support for clients adopting that hosted open source, and to be a conduit through which clients engage in the open source community.

      In the past couple of years, this work has focused on three software packages: the Islandora digital asset system, the ArchivesSpace system for archival finding aids, and most recently the CollectionSpace system for museum objects. Each of these, in sequence, involved new learning and new skills. First was Islandora. For those who are not familiar with Islandora, it is a digital asset system built atop Drupal and Fedora Commons repository system. It is a powerful stack with a lot of moving parts, and that makes it difficult for organizations to set up. One needs experience in PHP and Drupal, Java servlet engines, SOLR, and Fedora Commons among other components. In our internal team those skills were distributed among several staff members, and we are spread out all over the country: I’m in central Ohio, there is a developer in California, a data specialist in Baltimore, a sysadmin in Buffalo, two support and training staff in Atlanta, and servers in the cloud all over North America.

      Importance of Internal Communication

      That first goal I mentioned earlier — creating a sustainable technical architecture — took a lot of work and experimentation for us. All of us had worked in library IT in earlier jobs. Except for our sysadmin, though, none of us had built a hosted service to scale. It was a fast-moving time, with lots of small successes and failures, swapping of infrastructure components, and on-the-fly procedures. It was hard to keep up. We took a page from the Scrum practice and instituted a daily standup meeting. The meeting started at 10:30am eastern time, which got our west coast person up just a little early, and — since we were spread out all over the country — used a group Skype video conference.

      The morning standups usually took longer than the typical 15 minutes. In addition to everyone’s reports, we shared information about activities with the broader LYRASIS organization as well as informal things about our personal lives — what our kids were doing, our vacation plans, or laughing about the latest internet meme. It was the sort of sharing that would happen naturally when meeting someone at the building entrance or popping a head over a cubicle wall, and that helped cement our social bonds. We kept the Skype window open throughout the day and used the text chat function to post status updates, ask questions of each other, and share links to funny cat pictures. Our use of this internal communication channel has evolved over the years. We no longer have the synchronous video call every morning for our standup; we post our morning reports as chat messages. If we were to hire a new team member, I would make a suggestion to the team that we restart the video calls at least for a brief period to acclimate the new person to the group. We’ve also moved from Skype to Slack — a better tool for capturing, organizing, searching, and integrating our activity chatter. What started out as a suggestion by one of our team members to switch to Slack for the seven of us has grown organically to include about a third of the LYRASIS staff.

      In their book “Remote: Office Not Required” the founders of 37Signals describe the “virtual water cooler”. They say that the idea is to have a single, permanent chat room where everyone hangs out all day to shoot the breeze, post funny pictures, and generally goof around. They acknowledge that it can also be used to answer questions about work, but its primary function is to provide social cohesion. With a distributed team, initiating communication with someone is an intentional act. It doesn’t happen serendipitously by meeting at a physical water cooler. The solution is to lower the barrier of initiating that communication while still respecting the boundaries people need to get work done.

      How does your core team communicates among itself. How aware are they of what each other are doing? Do they know each other’s strengths and feel comfortable enough to call on each other for help? Do the members share a sense of forward accomplishment with the project as a whole?

      Clear Demarcation of Responsibilities between Hosting Company and Organizational Home

      One of the unique thing about the open source hosting activity at LYRASIS is that for two of the projects we are also closely paired with organizational homes. Both ArchivesSpace and CollectionSpace have separate staff within LYRASIS that report to their own community boards and have their own financial structure. LYRASIS provides human resource and fiscal administrative services to the organizational homes, and we share resources and expertise among the groups. From the perspective of a client to our services, though, it can seem like the hosting group and the organizational home are one entity. We run into confusion about why we in the hosting group cannot add new features or address bugs in the software. We gently remind our clients that the open source software is bigger than our hosting of it — that there is an organizational home that is advancing the software for all users and not just our hosted clients.

      Roles between open source organizations and hosting companies should be clearly defined as well, and the open source organization must help hosting providers make this distinction clear to the provider’s clients as well as self-hosting institutions. For instance, registered service provider agreements could include details for how questions about software functionality are handed off between the hosting provider and the organizational home. I would also include a statement from the registered service provider about the default expectations for when code and documentation will be contributed back to the community’s effort. This would be done in such a way as to give a service provider an avenue to distinguish itself from others while also strengthening the core community values of the project. While there is significant overlap, there are members of ArchivesSpace that are not hosted by LYRASIS and there are clients hosted by LYRASIS that are not members of ArchivesSpace.

      How does your project divide responsibilities between the community and the commercial affiliates? What are the expectations that hosted adopters should have about the roles of support, functionality enhancement, and project governance?

      Empowering the Community

      Lastly, one of the clear benefits of developing software as open source is the shared goals of the community participants. Whether someone is a project developer, a self-hosted user of the software, a service provider, or a client hosted by a service provider, everyone wants to see the software thrive and grow. While the LYRASIS hosting service does provide a way for clients to use the functionality of open source software, what we are really aiming to offer is a path for clients to get engaged in the project’s community by removing the technology barriers to hosting. We are selling a service, but sometimes I think the service that we are selling is not necessarily the one that the client is initially looking for. What clients come to us seeking is a way to make use of the functions that they see in the open source software. What we want them to know is how adopting open source software is different. As early as the first inquiry about hosting, we let clients know that the organizational home exists and offer to make an introduction to the project’s community organizer. When a project announces important information, we reflect that information on to our client mailing list. When a client files an issue in the LYRASIS hosting ticket system for an enhancement request, we forward that request to the project but we also urge the client to send the description of their use case through the community channels.

      Maintaining good client support while also gently directing the client into the community’s established channels is a tough balancing act. Some clients get it right away, and become active participants in the community. Others are unable or unwilling to take that leap to participation in the project’s greater community. As a hosting provider we’ve learned to be flexible and supportive where ever the client is on its journey in adopting open source. Open source communities need to be looking for ways a hosted client’s staff — no matter what the level of technical expertise — can participate in the work of the community.

      Do you have low barriers of entry for comments, corrections, and enhancements to the documentation? Is there a pipeline in place for triaging issue reports and comments that both help the initiator and funnel good information into the project’s teams? And is that triaging work valued on par with code contributions? Can you develop a mentoring program that aids new adopters into the project’s mainstream activities?


      As you can probably tell, I’m a big believer in the open source method of developing the systems and services that our patrons and staff need. I have worked on the adopter side of open source — using DSpace and Fedora Commons and other library-oriented software…to say nothing of more mainstream open source projects. I have worked on the service provider side of open source — making Islandora, ArchivesSpace and CollectionSpace available to organizations that cannot host the software themselves and empowering them to join the community. Through this experience I’ve learned a great deal about how many software projects think, what adopters look for in projects, what other service providers need to be successful community participants. Balancing the needs of the project, the needs of self-hosted adopters, and the needs of service providers is delicate — but the results are worth it.

      I also believe that by using new technologies and strategies, distributed professionals can build a hosting service that is attractive to clients. We may not be able to stand around a water cooler or conference table, but we can replicate the essence of those environments with tools, policies, and a collaborative attitude. In doing so we have more freedom to hire the staff the make the right fit for our organization, no matter where they are located.

      Link to this post!


      Subscribe to code4lib aggregator