You are here

Feed aggregator

District Dispatch: My First Book Expo America

planet code4lib - Tue, 2015-06-02 19:34

From Wikimedia Commons.

I went for work, of course.Our assignment was to meet with middle to small sized trade publishers to talk about ebooks and business models, make contacts, share expectations, and identify obstacles. We came away with a lot of good information and ideas, which are discussed in Carol Anthony’s e-content blog post.

But in this personal note, it was weird talking about ebooks when there were so many print books around! Sure, there are a lot of books in libraries, in bookstores, or at ALA conference exhibits, but not like in this extravaganza. Imagine well-crafted displays everywhere, piles of books, ribbons and banners, huge house-size posters of David Balducci, publisher swag, and people — lots of people.

Who cares about ebooks! Long live print!

It was hard to concentrate on ebooks at times, because the exhibit hall was loud, and did I mention— there were all of these books around! Pretty, colorful books; darling children’s books; gardening books; a bunch of recipe books that I wanted; and of course, cute photos of puppies and kittens on some of the book covers. Don’t get me wrong. I love books, but …I could not wait to get out of there.

On the last day, I arrived early to meet with a representative from Sourcebooks. We planned to meet at the Sourcebooks exhibit. Already, there were lines of people outside of the exhibit hall. I then learned that registrants have a limited opportunity to grab free books each morning when the exhibits open. There were several guards positioned, anticipating a frenzy. Everyone was pretty excited, and ready to run.

When the exhibits did open, the mass rush ensued. One contingent of women dashed past by me, their lieutenant yelling “Go to HarperCollins!”  I tried to get out of the way but not before I was bumped from behind firmly on my right side. Due to the ridiculously heavy bag I had over my shoulder, the blow was magnified to such an extent that I twirled a full 360 degrees, but remained standing ….and still in everybody’s way. I could hear breathing down my neck. “Go! Go! Move out of the way!” (Some people even cursed).

Luckily, Sourcebooks was only half way up the exhibit aisle so I was relatively unscathed, but some were not so lucky. Later in the day, I did see one woman down… and she was carrying a cane.

Oh the humanity!

Apparently, Book Con is even more intense, making Book Expo America look like a ride in the park. Print is alive, books are beautiful, and free books —they can be dangerous.

The post My First Book Expo America appeared first on District Dispatch.

David Rosenthal: Brittle systems

planet code4lib - Tue, 2015-06-02 15:00
In my recent rant on the Internet of Things, I linked to Mike O'Dell's excellent post to Dave Farber's IP list, Internet of Obnoxious Things, and suggested you read it. I'm repeating that advice as, below the fold, I start from a different part of Mike's post.

Mike writes:
The problem with pursuing such a goal is that it has led us down a path of "brittle failure" where things work right up until they fail, and then they fail catastrophically. The outcome is forced to be binary.

In most of Computer Science, there have been only relatively modest efforts directed at building systems which fail gracefully, or partially. Certainly some sub-specialties have spent a lot of effort on this notion, but it is not the norm in the education of a journeyman system builder.

If it is the case that we are unlikely to build any large system which is fail-proof, and that certainly seems to be the situation, we need to focus on building systems which can tolerate, isolate, and survive local failures.My response also made the IP list:
Mike is absolutely right to point out the brittle nature of most current systems. But education isn't going to fix this. My co-authors and I won Best Paper at SOSP2003 for showing a system in a restricted application space that, under attack, failed slowly and made "alarming noises". The analogy is with suspension bridges - they use stranded cables for just this reason.

However, the cost differential between stranded and solid cables in a bridge is small. Brittle fault-tolerant systems such as Byzantine Fault Tolerance are a lot more expensive than a non-fault-tolerant system that (most of the time) does the same job. Systems such as the one we showed are a lot more expensive than BFT. This is because three essential aspects of a, I believe any, solution are rate limits, excess replication and randomization.

The problem is that vendors of systems are allowed to disclaim liability for their products. Given that even the most egregious failure is unlikely to cause more than reputational harm, why would a vendor even implement BFT, let alone something much more expensive?

Just finding techniques that allow systems to fail gracefully is not going to be enough (not that it is happening). We need techniques that do so with insignificant added cost. That is a truly hard problem. But we also need to change the law so that vendors cannot escape financial liability for the failures of their products. That is an even harder problem.I should explain the comment about the importance of "rate limits, excess replication and randomization":
  • Rate Limits: The design goal of almost all systems is to do what the user wants as fast as possible. This means that when the bad guy wrests control of the system from the user, the system will do what the bad guy wants as fast as possible. Doing what the bad guy wants as fast as possible pretty much defines brittleness in a system; failures will be complete and abrupt. In last year's talk at UC Berkeley's Swarm Lab I pointed out that rate limits were essential to LOCKSS, and linked to Paul Vixie's article Rate-Limiting State making the case for rate limits on DNS, NTP and other Internet services. Imposing rate limits on system components makes the overall system more expensive.
  • Excess Replication: The standard fault-tolerance technique, Byzantine Fault Tolerance (BFT), is brittle. As faults in the system increase, it works perfectly until they pass a threshold. After that the system is completely broken. The reason is that BFT defines the minimum number of replicas that can survive a given number of faults. In order to achieve this minimum, every replica is involved in every operation of the system. There is no cushion of excess, unnecessary replicas to help the system retain some functionality above the threshold at which it stops behaving perfectly. The LOCKSS system was not concerned with minimizing the number of replicas. It assumed that it had excess replicas, Lots Of Copies, so it could Keep Stuff Safe by failing gradually as faults increased. Adding replicas to the system makes it more expensive.
  • Randomization: In general, the more predictable the behavior of the system the easier it is to attack. Randomizing the system's behavior makes it unpredictable. A significant part of the LOCKSS system's defenses is that since the selection of replicas to take part in each operation is random, the bad guy cannot predict which they are. Adding randomization to the system makes it more expensive (and harder to debug and test).
Debugging and testing were key to Karl Auerbach's contribution to the IP list discussion (reproduced in full by permission):
One of the motivations for packet switching and the ARPAnet was the ability to continue communications even during/after a nuclear holocaust. (Yes, I know that some people claim that that was not the purpose - but I was there, at SDC, from 1972 building ARPAnet like networks with that specific purpose.)

In recent years, or decades, we seem to be moving towards network architectures that are more brittle.

For example, there is a lot of discussion about "Software Defined Networks" and Openflow - which to my mind is ATM re-invented. Every time I look at it I think to myself "this design invites brittle failures."

My personal concern is slightly different. I come from a family of repairmen - radio and then TV - so when I look at something I wonder "how can it break?" and "how can it be repaired?".

We've engineered the internet so that it is not easy to diagnose problems. Unlike Ma Bell we have not learned to make remote loopbacks a mandatory part of many parts of the system. Thus we often have a flat, one sided view of what is happening. And if we need the view from the other end we often have to ask assistance of non-technical people who lack proper tools or knowledge how to use them.

As a first step we ought to be engineering more test points and remote loopback facilities into internet protocols and devices.

And a second step ought to be the creation of a database of network pathology. With that we can begin to create tools that help us reason backwards from symptoms towards causes. I'm not talking artificial intelligence or even highly expert systems. Rather this would be something that would help us look at symptoms, understand possible causes, and know what tests we need to run to begin to evaluate which of the possible causes are candidates and which are not.Examples of brittle systems abound:
  • SSL is brittle in many ways. Browsers trust a pre-configured list of certificate authorities, whose role is to provide the illusion of security. If any one of them is malign or incompetent, the system is completely broken, as we see with the recent failure of the official Chinese certificate authority.
  • IP routing is brittle. Economic pressures have eliminated the "route around failure" property of the IP networks that Karl was building to survive nuclear war. Advertizing false routes is a routine trick used by the bad guys to divert traffic for interception.
  • Perimeter security as implemented in firewalls is brittle. Once the bad guy is inside there are few limits on what, and how fast, he can do Bad Things.
  • The blockchain, and its applications such as Bitcoin are brittle.
The blockchain is brittle because it can be taken over by a conspiracy. As I wrote in another of my contributions to the IP list, responding to and quoting from this piece of techno-optimism:
The revolution in progress can generally be described as “disintermediation”. It is the transference of trust, data, and ownership infrastructure from banks and businesses into distributed peer to peer network protocols.

A distributed “world wide ledger” is one of several technologies transforming our highly centralized structures. This technology, cryptically named the “block chain” is embodied in several distributed networks such as Bitcoin, Eris Industries DB, and Ethereum.

Through an encrypted world wide ledger built on a block chain, trust in the systems maintained by third party human institutions can be replaced by trust in math. In block chain systems, account identity and transactions are cryptographically verified by network “consensus” rather than by trust in a single third party. These techno-optimists never seem to ask "what could possibly go wrong"? To quote from this blog post:
Since then, there has been a flood of proposals to base other P2P storage systems, election voting, even a replacement for the Internet on blockchain technology. Every one of these proposals for using the blockchain as a Solution for Everything I've looked at appears to make three highly questionable assumptions:
There have been times in the past when a single mining pool controlled more than 50% of the mining power, and thus the blockchain. That pool is known to have abused their control of the blockchain.

As I write this, 3 pools control 57% of the mining power. Thus a conspiracy between three parties would control the blockchain. More than two decades ago at Sun I was convinced that making systems ductile (the opposite of brittle) was the hardest and most important problem in system engineering. After working on it in the LOCKSS Program for nearly 17 years I'm still convinced that this is true.

Library of Congress: The Signal: Dodge that Memory Hole: Saving Digital News

planet code4lib - Tue, 2015-06-02 14:00

Newspapers are some of the most-used collections at libraries. They have been carefully selected and preserved and represent what is often referred to as “the first draft of history.” Digitized historical newspapers provide broad and rich access to a community’s past, enabling new kinds of inquiry and research. However, these kinds of resources are at risk of being lost to future users.  Networked digital technologies have changed how we communicate with each other and have rapidly changed how information is disseminated. These changes have had a drastic effect in the news industry, disrupting delivery mechanisms, upending business models and dispersing  resources across the world wide web.

Current library acquisition and preservation methods for news are closely linked to the physical newspaper. Ensuring that the new modes of journalism, which are moving toward a “digital- and mobile-first” model, are captured and preserved at libraries and other memory institutions is the main goal of the Dodging the Memory Hole series of events. The first was organized in November 2014 by the Reynolds Journalism Institute at the University of Missouri.  The most recent took place in May of 2015 and was organized by the Educopia Institute at the Charlotte Mecklenburg Public Library in Charlotte, NC.

Hong Kong, 31st day of the Umbrella Revolution, taken October 28, 2014 by Pasu Au Yeung.

I had the opportunity to close out the May meeting and highlight areas where continued work would have an impact in helping libraries collect, preserve and provide access to born-digital news. A (slightly longer but hopefully clearer) version of my talk (pdf) is below.

I want to start with a photograph from last year’s protest in Hong Kong known as the Umbrella Revolution. The picture speaks to the complexity of the problem we face in capturing and preserving the news of today. The protest was unique in that it was one of the first protests in China organized, sustained and broadcast via social media. Capturing a diverse set of materials about this news event would mean capturing the stories from established media companies and the writings and images from individual blogs and other social media. This is especially important in the case of the Umbrella Revolution because official media outlets (and social media accounts) in China are often censored. This protest was also an example of how activism in general has adapted due to networked digital technologies. Future researchers studying social and political movements happening right now would never get the whole story without access to the social media.

The role of the journalist is to get the story out and just like other publishers in the digital age, they’ve had to adapt to stay relevant. Digital storytelling is becoming more dynamic,  exemplified by publications like Highline, a new long-form product from Huffington Post which is richly illustrated with audio and visual elements and is translated into a variety of languages. We can expect that in the pursuit of getting the story out and advancing story telling, news content will come from more sources, be more dynamic and continue using all kinds of formats and distribution mechanisms.

Memory hole.

Libraries have also been transformed by digital technologies. There are a large number of digitized collections; we are creating vast and rich resources and, I think, providing great access and good stewardship to a large amount of this digitized content. Chronicling America and the Digital Public Library of America are great examples of this. However, there are gaps–or holes–in our collections, especially the born-digital content about contemporary events. Libraries haven’t broadly adopted collecting practices so that they are relevant to the current publishing environment which today is dominated by the web.

Several people at this meeting mentioned the study done by Andy Jackson (ppt) at the British Library. I have his permission to share these slides which he presented at the recent General Assembly of the International Internet Preservation Consortium. It is a simple but powerful study of ten years (2004-2014) worth of content from the UK Web Archive. It aims to find out what they have in their archive that is not on the live web anymore. He looked at a sample of URLs per year and analyzed the content to determine if the content at the URL in the archive was still at the same URL on the live web. He broke down and color coded the URLs according to a percentage scale expressing if the content was moved, changed, missing or gone. He found that after one year half of the content was either gone or had been changed so much as to be unrecognizable. After ten years almost no content still resides at its original URL. This analysis was done across all domains but you can make a logical assumption that news content wouldn’t fare any better if subjected to this same type of analysis.

Fifty percent of URLS in the UK Web Archive have lost or missing content after one year. After ten years nearly all content is moved, changed, missing or gone. Credit: Taken from a presentation given by Andy Jackson at the IIPC GA  Apr 27, 2015. The full presentation available at

We have clear data that if content is not captured from the web soon after its creation, it is at risk. Which brings me to where I think our main challenge is with collecting born-digital news: library acquisition policies and practices. Libraries collect the majority of their content by buying something–a newspaper subscription, a standing order for a serial publication, a package of titles from a publisher, an access license from an aggregator, etc. The news content that’s available for purchase and printed in a newspaper is a small subset of the content that’s created and available online. Videos, interactive graphs, comments and other user-generated data are almost exclusively available online. The absence of an acquisition stream for this content puts it at risk of being lost to future library and archives users.

Establishing relationships (and eventually agreements) with the organizations that create, distribute and own news content is one of the more promising strategies for libraries to collect digital news content.  Brian Hocker from KXAS-TV, an NBC affiliate in the Dallas area, shared the story of how KXAS partnered with the University of North Texas Libraries to digitize, share and ultimately preserve their station’s video archives as part of the Portal for Texas History. Jim Kroll from the Denver Public library also shared his story of acquiring the archives of the Rocky Mountain News after the newspaper ceased publication. Both stories emphasized the importance of establishing lasting relationships with decision-makers from news outlets in their respective communities. They also each created donor agreements that provided community access to the news archives which can serve as models for future agreements.

The relationships that enabled these agreements were the result of what I think of as entrepreneurial collection development in the model of acquiring special collections. The archives were pursed actively and over time, they represent a new type of content, required a new type of relationship with a donor and were a good fit–both geographically and topically–with existing collections at UNT and DPL.

Web archiving is another promising strategy to capture and preserve born-digital news. The Library of Congress recently announced its effort to save news websites, specifically those not affiliated with traditional news companies. Ben Walsh, creator of, announced that his service is now Memento-compliant, which will allow the archived front pages of websites from major-market newspapers that PastPages collects to be available in a Momento search. These projects will capture content at a national level, but the hyper-local news sites and citizen journalism and other niche blogs– news that used to be published as community newsletters or pamphlets–are most likely not being captured. Internet Archive’s Archive-It service is a mechanism for smaller libraries to engage in web archiving and capture some of this unique content. Capturing the social media around news events continues to be challenging but tools have been developed to capture tweets and collections of tweets around news events are being captured and shared.

The Dodging the Memory Hole events have thus far been excellent opportunities to bring librarians, archivists, the news industry and technologists together to help save news content for future generations. Look for more from this group on awareness raising, studies on what news content has already been lost, collaborations with the developers of news content management systems, and more guidance on developing donation agreements. To read more about the event, check out Trevor Owens’ report on the IMLS blog.

Open Knowledge Foundation: Why Open Contracting Matters to the OGP Agenda in Africa

planet code4lib - Tue, 2015-06-02 13:53

This is a guest post by Seember Nyager. Seember is an Open Knowledge/Code4Africa Open Government Fellow advocating for the adoption of open contracting data standards in Nigeria.

To be honest, the state of public services across Africa shames us. Often, you find that public services do not meet the generally accepted standards of efficiency, regular maintenance and service delivery. In most cases, it is unknown and improbable whether public services followed any specifications in the phase of contract execution and service delivery is often poor and non-standardized.

The state of public services on the continent is hard to relate with the abundance of our natural resources and the amount of external financing that is channeled to Africa in each year. The standard of Public service delivery has consequences; sometimes tragic and the prevalence of tragedy is witnessed in our health care systems. Arguably the most tragic consequence of low standards in public service delivery is the erosion of trust between the Government and the people as this is the greatest saboteur of good intentions that are in the public interest.

There is no quick fix to the infrastructure and service delivery deficit that plagues the continent. Some public services such as efficient transportation networks may only be fully operational after a decade. But there are ways to rebuild trust between Governments and the citizens and chart a formidable course for sustained efficiency in public service delivery.

In another vein, citizens of OGP participating countries may not know about the OGP and in the light of the current commitments being made by countries, may view OGP as an abstract concept that they do not need to involve themselves with. But there is compelling reason to believe that citizens of OGP participating countries may be able to relate and internalize the values behind the OGP if Open Contracting practices are made a part of the OGP agenda in each of these countries.

Open contracting advocates for all stages that lead to public service delivery to be exposed to scrutiny subject to narrowly defined exceptions. Open contracting also advocates that such routine information ought not be requested for but made readily available through multiple channels so that as much as it is possible, the people know where responsibility for the success or failure of public project lies and can participate in the contracting process which ultimately leads to public service delivery.

The scrutiny of the public contracting process requires that information is presented in ways that enables one set of information to be linked to other related information on a public project or service to be delivered. This would require data standards to be followed. Open contracting would require that information is shared through multiple channels and taken to people in formats that they would understand. Open contracting requires that information on public contracts has milestones that show expectations at each stage of contract implementation and specifications that must have been met at each milestone. Open contracting requires that there is publicly available information of the service to be expected at the end of contract execution. Open contracting requires information around the contracting process to be regularly updated and for contracting information to facilitate continuous dialogue between representatives of Government, the people, the contractors and other stakeholders within a community.

For OGP Africa participating countries like Kenya and Ghana who have FOI and RTI bills currently going through parliament, it is recommended that their bills reflect the proactive disclosure provisions on public finance information as contained in the Model Law on Access to Information. This would provide the legal backing for a robust open contracting practice to thrive. For OGP Africa participating countries like South Africa that are currently undergoing a reform to public sector procurement, it is recommended that there are clear requirements backed by law to ensure public participation in each phase of the contracting process.

For OGP participating countries like Sierra Leone who already have a robust access to information and Public Procurement Law, it is recommended that Contracting data such as pricing benchmarks for public contracts is made readily available, the data follows specified standard, is updated regularly and distributed through multiple channels, in ways that the people can understand.

Committing to open contracting practices would require Government and civil society organizations working closely together and the OGP provides that platform. Further, the Open Contracting Partnership and the web foundation have developed Open contracting data standards that would be of great help to each country willing to adopt open contracting practices. As a non-participant to the OGP, I am hopeful that my own country, Nigeria, would prioritize trust in public service delivery by adopting the spirit and practice of Open Contracting.

Seember can be reached on twitter @Seember1

LITA: Create, Build, Code and Hack with a choice of 4 LITA preconferences

planet code4lib - Tue, 2015-06-02 13:00

Register now for one of four exciting LITA pre conferences at 2015 ALA Annual in San Francisco.

On Friday, June 26, at the 2015 ALA Annual Conference in San Francisco, the Library and Information Technology Association (LITA) brings you a choice of 4 dynamic, useful and fun preconferences. These all-day preconferences, 8:30 a.m. – 4:00 p.m., will teach you how to create, build, code and hack the newest trends in technology for libraries. Register through the 2015 ALA Annual Conference website. The price to register is: $235 for LITA members (use special code LITA2015); $350 for ALA members; and $380 for non-members.

Creating Better Tutorials Through User-Centered Instructional Design. Hands-on workshop with experts from the University of Arizona. Event Code: LIT1

Build a Circuit & Learn to Program an Arduino in a Silicon Valley Hackerspace: Panel of Inventors & Librarians Working Together for a More Creative Tomorrow. This workshop will convene at Noisebridge, a maker space in San Francisco. Clearly, it will be hands on. Event Code: LIT3

Learn to Teach Coding and Mentor Technology Newbies – in Your Library or Anywhere! Work with experts from the Black Girls CODE to become master technology teachers. Event Code: LIT2

Let’s Hack a Collaborative Library Website! This hands-on experience will consist of a morning in-depth introduction to the tools, followed by an afternoon building a single collaborative library website. Event Code: LIT4

Through hands on activities participants will learn to code, build, create and learn to teach others new initiatives such as video tutorials, collaborative website tools, programming languages and arduino boards. These events are intended for any librarian wanting to stretch themselves and meet their patrons in these new hands on technologies worlds.

Notable preconference presenters include: Yvonne Mery, Leslie Sult and Rebecca Blakiston from the University of Arizona Libraries; Mitch Altman of Noisebridge, Brandon (BK) Klevence of The Maker Jawn Initiative (Philadelphia, PA), Angi Chau off the Castilleja School (Palo Alto,CA), Tod Colegrove and Tara M Radniecki of the University of Nevada – Reno; Kimberly Bryant and Lake Raymond from Black Girls CODE; and Kate Bronstad, Heather J Klish of Tufts University; and Junior Tidal of the New York City College of Technology.

See the LITA conference web site for information about LITA events including details on the preconferences, the LITA Presidents program with Lou Rosenfeld, the Top Technology Trends panel, and social events.

For questions, contact Mark Beatty, LITA Programs and Marketing Specialist at or (312) 280-4268.

Open Library Data Additions: Amazon Crawl: part 13

planet code4lib - Tue, 2015-06-02 06:35

Part 13 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

District Dispatch: ALA draws line in sand on USA FREEDOM amendments

planet code4lib - Tue, 2015-06-02 02:37


The United States Senate adjourned today with the stage set for votes Tuesday afternoon on at least three “hostile” amendments to the USA FREEDOM Act filed by Senate Majority Leader Mitch McConnell (R-KY).  As explained in a letter by Washington Office Executive Director Emily Sheketoff that will be delivered to all Senators ahead of Tuesday’s votes, passage of any one such amendment would water down the USA FREEDOM Act so seriously as to cause ALA to reverse course and oppose the bill.

Now is the time for one last push by librarians everywhere to again call and email their Senators to deliver a simple message: 1) VOTE “NO” on any and every amendment that would weaken the USA FREEDOM Act; and 2) PASS the bill now without change so that the President can sign it without delay.

Please, visit ALA’s Legislative Action Center to send that urgent message now.

For detailed information on the pending amendments and why they’re utterly unacceptable, please see this analysis by our coalition compatriots at the Center for Democracy and Technology.  The ALA Washington Office’s “line in the sand” letter is available here: USAF Letter 060115.

The post ALA draws line in sand on USA FREEDOM amendments appeared first on District Dispatch.

LibUX: 020: Localizing the User Experience with Robert Laws

planet code4lib - Mon, 2015-06-01 23:59

Robert Laws is the Digital Services Librarian for Georgetown University’s School of Foreign Service in Qatar. In this episode of LibUX, Robert discusses customizing Drupal and LibGuides to present a more localized version of those sites for his campus. He gives tips on how he got started and how to stay relevant in the world of web services. As our first international guest, Amanda asked him about the challenges of regional restrictions on content.

You can listen to LibUX on Stitcher, find us on iTunes, or subscribe to the straight feed. Consider signing-up for our weekly newsletter, the Web for Libraries.

The post 020: Localizing the User Experience with Robert Laws appeared first on LibUX.

District Dispatch: Update on 1201 proceedings

planet code4lib - Mon, 2015-06-01 22:06

In the last two weeks, the Copyright Office held ten hearings in Los Angeles and Washington, D.C. and heard the arguments for and against circumvention of digital locks—Section 1201 of the Digital Millennium Copyright Act—on the proposed classes of works, including cell phones, video games, e-readers, and oh yes, farm equipment. Many have said that these hearings are unbearable and long, but in a weird way, I like to attend them (and ALA Council). Unfortunately, I was out of town and missed the hearings. So read along with me, reports on the hearings from Brandon Butler of the Washington College of Law at American University, and Rebecca Tushnet, from Georgetown Law.

The post Update on 1201 proceedings appeared first on District Dispatch.

HangingTogether: What’s changed in linked data implementations?

planet code4lib - Mon, 2015-06-01 20:47

Last year we received 96 responses to the OCLC Research “International Linked Data Survey for Implementers” reporting 172 linked data projects or services in 15 countries, of which 76 were described. Of the 76 projects described, 27 (36%) were not yet implemented and 13 (17%) had been in production in less than a year.

So we were curious – what might have changed in the last year? OCLC Research decided to repeat its survey to learn details of specific projects or services that format metadata as linked data and/or make subsequent uses of it.  We’re curious to see whether the projects that had not yet been implemented have now been, whether any of last year’s respondents would have any different answers, and whether we could encourage linked data implementers who didn’t respond to last year’s survey to respond to this year’s.

The questions are the same so we can more easily compare results. (Some multiple-choice questions have more options taken from the “other” responses in last year’s responses, and some open-ended questions are now multiple-choice, again based on last year’s responses.) The target audiences are staff who have implemented or are implementing linked data projects or services-either by publishing data as linked data or ingesting linked data resources into their own data or applications, or both.

The survey is available at

We are asking that responses be completed by 17 July 2015. As with last year’s survey, we will share the examples collected for the benefit of others wanting to undertake similar efforts, wondering what is possible to do and how to go about it. We summarized last year’s results in a series of blog posts here: 1) Who’s doing it; 2) Examples in production; 3) Why and what institutions are consuming; 4) Why and what institutions are publishing; 5) Technical details; 6) Advice from the implementers.

What do you think has changed in the last year?



About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (59)

District Dispatch: Update on 1201 proceedings

planet code4lib - Mon, 2015-06-01 20:16

In the last two weeks, the Copyright Office held ten hearings in Los Angeles and Washington, D.C. and heard the arguments for and against circumvention of digital locks—Section 1201 of the Digital Millennium Copyright Act—on the proposed classes of works, including cell phones, video games, e-readers, and oh yes, farm equipment. Many have said that these hearings are unbearable and long, but in a weird way, I like to attend them (and ALA Council). Unfortunately, I was out of town and missed the hearings. So read along with me, reports on the hearings from Brandon Butler of the Washington College of Law at American University, and Rebecca Tushnet, from Georgetown Law.


The post Update on 1201 proceedings appeared first on District Dispatch.

Patrick Hochstenbach: Triennale Brugge 2015

planet code4lib - Mon, 2015-06-01 18:23
Filed under: Doodles Tagged: brugge, triennale, urban, urbansketching

District Dispatch: Experts to demystify 3D printing policies at 2015 ALA Conference

planet code4lib - Mon, 2015-06-01 18:22

As more and more libraries nationwide begin to offer 3D printing services, library leaders are now confronting a litany of copyright, trademark and patent complications that arise from the new technology. To help the library community address 3D printing concerns, the American Library Association (ALA) Committee on Legislation’s (COL) Copyright Subcommittee will explore 3D printing policy issues at the 2015 ALA Annual Conference in San Francisco.

Join Tomas A. Lipinski, dean of the University of Wisconsin-Milwaukee’s School of Information Studies and COL Copyright Subcommittee member, St. Louis’ University City Public Library Director Patrick Wall, and other policy experts at the session “Copyright and 3D Printing: Be Informed, Be Fearless, Be Smart!” for a “plain English” discussion of 3D printing, its copyright implications, and the patent and trademark issues that this breakthrough technology raises for libraries everywhere. The session will take place from 10:30 to 11:30 a.m. on Saturday, June 27, 2015, at the Moscone Convention Center in room 2001 of the West Building.

Lipinski has worked in a variety of legal settings including the private, public and non-profit sectors. He currently teaches, researches and speaks frequently on various topics within the areas of information law and policy, especially copyright, free speech and privacy issues in schools and libraries. Patrick Wall has been the director of St. Louis’ University City Public Library since March of 2011 and was its assistant director for the previous eight years. He also serves as President of the Municipal Library Consortium of St. Louis County, a group of nine libraries providing collective public access to more than 700,000 volumes.

  • Tomas A. Lipinski, dean of the University of Wisconsin-Milwaukee’s School of Information Studies, member of the American Library Association Committee on Legislation
  • Patrick Wall, director, University City Public Library (St. Louis)

View all ALA Washington Office conference sessions

The post Experts to demystify 3D printing policies at 2015 ALA Conference appeared first on District Dispatch.

ACRL TechConnect: Where do Library Staff Learn About Programming? Some Preliminary Survey Results

planet code4lib - Mon, 2015-06-01 14:05

[Editor’s Note:  This post is part of a series of posts related to ACRL TechConnect’s 2015 survey on Programming Languages, Frameworks, and Web Content Management Systems in Libraries.  The survey was distributed between January and March 2015 and received 265 responses.  A longer journal article with additional analysis is also forthcoming.  For a quick summary of the article below, check out this infographic.]

Our survey on programming languages in libraries has resulted in a mountain of fascinating data.  One of the goals of our survey was to better understand how staff in libraries learn about programming and develop their coding skills.  Based upon anecdotal evidence, we hypothesized that library staff members are often self-taught, learning through a combination of on-the-job learning and online tutorials.  Our findings indicate that respondents use a wide variety of sources to learn about programming, including MOOCs, online tutorials, Google searches, and colleagues.

Are programming skills gained by formal coursework, or in Library Science Master’s Programs?

We were interested in identifying sources of programming learning, whether that involved course work (either formal coursework as part of a degree or continuing education program, or through Massive Online Open Courseware (MOOCs)).  Nearly two-thirds of respondents indicated they had an MLS or were working on one:

When asked about coursework taken in programming, application, or software development, results were mixed, with the most popular choice being 1-2 classes:

However, of those respondents who have taken a course in programming (about 80% of all respondents) AND indicated that they either had an MLS or were attending an MLS program, only about a third had taken any of those courses as part of a Master’s in Library Science program:

Resources for learning about programming

The final question of the survey asked respondents, in an open-ended way, to describe resources they use to learn about programming.  It was a pretty complex question:

Please list or describe any learning resources, discussion boards or forums, or other methods you use to learn about or develop your skills in programming, application development, or scripting. Please includes links to online resources if available. Examples of resources include, but are not limited to:, MOOC courses, local community/college/university course on programming, Books, Code4Lib listserv, Stack Overflow, etc.).

Respondents gave, in many cases, incredibly detailed responses – and most respondents indicated a list of resources used.  After coding the responses into 10 categories, some trends emerged.  The most popular resources for learning about programming, by far, were courses (whether those courses were taken formally in a classroom environment, or online in a MOOC environment):

To better illustrate what each category entails, here are the top five resources in each category:

By far, the most commonly cited learning resource was Stack Overflow, followed by the Code4Lib Listserv, Books/ebooks (unspecified) and  Results may skew a little toward these resources because they were mentioned as examples in the question, priming respondents to include them in their responses.  Since links to the survey were distributed, among other places, on the Code4Lib listserv, its prominence may also be influenced by response bias. One area that was a little surprising was the number of respondents that included social networks (including in-person networks like co-workers) as resources – indeed, respondents who mentioned colleagues as learning resources were particularly enthusiastic, as one respondent put it:

…co-workers are always very important learning resources, perhaps the most important!

Preliminary Analysis

While the data isn’t conclusive enough to draw any strong conclusions yet, a few thoughts come to mind:

  • About 3/4 of respondents indicated that programming was either part of their job description, or that they use programming or scripting as part of their work, even if it’s not expressly part of their job.  And yet, only about a third of respondents with an MLS (or in the process of getting one) took a programming class as part of their MLS program.  Programming is increasingly an essential skill for library work, and this survey seems to support the view that there should be more programming courses in library school curriculum.
  • Obviously programming work is not monolithic – there’s lots of variation among those who do programming work that isn’t reflected in our survey, and this survey may have unintentionally excluded those who are hobby coders.  Most questions focused on programming used when performing work-related tasks, so additional research would be needed to identify learning strategies of enthusiast programmers who don’t have the opportunity to program as part of their job.
  • Respondents indicated that learning on the job is an important aspect of their work; they may not have time or institutional support for formal training or courses, and figure things out as they go along using forums like Stack Overflow and Code4Lib’s listserv.  As one respondent put it:

Codecademy got me started. Stack Overflow saves me hours of time and effort, on a regular basis, as it helps me with answers to specific, time-of-need questions, helping me do problem-based learning.

TL;DR?  Here’s an infographic:

In the next post, I’ll discuss some of the findings related to ways administration and supervisors support (or don’t support) programming work in libraries.

LITA: Negotiate!

planet code4lib - Mon, 2015-06-01 13:00

I’m going to say it: Librarians are rarely effective negotiators. Way too often we pay full prices for mediocre resources without demur. Why?

Credit: Flickr user changeorder

First of all, most librarians are introverts and/or peaceable sorts who dislike confrontation. Second, we are unlikely to get bonuses or promotions when we save our organizations money, so there goes most of the extrinsic motivation for driving a hard bargain with vendors. Third and most importantly, we go into the library business because libraries aren’t a business. Most of us deliver government-funded public services, so we have zero profit motive, and our non-business mentality is almost a professional value in itself. But this failure to negotiate weakens our value to the communities we serve.

Libraries pay providers over a billion dollars a year for digital services and resources, only to get overpriced subscriptions and comparatively shoddy products. When did you last meet a librarian who loved their ILS? Meanwhile, we lose whatever dignity remains to us when our national associations curry favor with “Library Champions” like Elsevier, soliciting these profiteers to give back a minuscule fraction of their profits squeezed from libraries. We forget that vendors exist because of us.

Recently I sat in a dealer’s office for ninety minutes, refusing to budge till I got a better deal on my new car. The initial offer was 7% APR. The final offer was 0.9% APR with new all-season floor mats thrown in. The experience awoke me to the realization that I, as the customer, always held the leverage in any business relationship. I was thrilled.

I applied that realization to my work managing electronic resources, renegotiating contracts, haggling reduced rates, and saving about 10% of my annual budget my first year while delivering equivalent levels of services. This money then could be shuffled to fund other e-resources and services, or saved so as to forestall forced budgets cuts and make the library look good to external administrators keen to cut costs.

The key to negotiation is not to fold at the first “no.” Initial price quotes and contracts are a starting point for negotiation, by no means the final offer. Trim unneeded services to obtain a price reduction. Renegotiate, don’t renew, contracts. Ask to renew existing subscriptions at the previous year’s price, dodging the 5% annual increase that most providers slap on products. And take nothing at face value! I once saved $4000 on a single bill because I phoned to ask for a definitive list of our product subscriptions only to discover that the provider had neglected to document one very active subscription. Sooo… we didn’t have to pay for it.

Don’t hesitate to call out bad service either. A company president once personally phoned me because I had rather vociferously objected to his firm’s abysmal customer service. Bear in mind, though, that most vendor reps are delightful people who care about libraries too. So when you’re negotiating, be firm and persistent but please don’t be a jerk.

Long-term solutions to vendor overpricing and second-rate products include consortiums, open access publishing, and open source software. But the simplest and quickest short-term solution for us individuals is to negotiate to get your money’s worth. Vendors want to keep your business, so to get a better deal, sometimes all you have to do is ask.

Michael Rodriguez is the E-Learning Librarian at Hodges University in Florida. He manages the library’s digital services and resources, including 130-plus databases, the library website, and the ILS. He also teaches and tutors students in the School of Liberal Studies and the School of Technology, runs social media for LITA, and does freelance professional training and consulting. He tweets @topshelver and blogs at Shelver’s Cove.

Cherry Hill Company: Recap from the DrupalCon Drupal 4 Libraries BoF

planet code4lib - Sun, 2015-05-31 05:08

We had a great time at the Drupal 4 Libraries Birds of a Feather gathering at DrupalCon Los Angeles. So many people participated that we had to find extra chairs in order to accommodate everyone. We had representation from a diverse range of interests. Many attendees were from academic libraries, but we also had individuals join from public libraries, and we had a few vendors who work with libraries in the room, as well. Following is a recap of the tools people mentioned during the discussion.

Digital Asset Management Systems

Digital asset management systems took up a good portion of the conversation. This discussion started with a question about good photo archiving solutions. It was suggested that, as long as the archive is not large, Drupal on its own — with content types and views — may be the best solution for the photo archives needed. It was also suggested, however, that Drupal is not necessarily scalable for large archives, or for storing original assets....

Read more »

Open Library Data Additions: Amazon Crawl: part bf

planet code4lib - Sat, 2015-05-30 22:33

Part bf of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

State Library of Denmark: Heuristically correct top-X facets

planet code4lib - Sat, 2015-05-30 15:11

For most searches in our Net Archive, we have acceptable response time, due to the use of sparse faceting with Solr. Unfortunately as well as expectedly, some of the searches are slow. Response times in minutes slow, if we’re talking worst case. It is tied to the number of hits: Getting top-25 most popular links from pages about hedgehogs will take a few hundred milliseconds. Getting the top-25 links from all pages from 2010 takes minutes. Visualised, the response times looks like this:

Massive speed drop-off for higher result sets

Everything beyond 1M hits is slow, everything beyond 10M hits is coffee time. Okay for batch analysis, but we’re aiming for interactive use.

Get the probably correct top-X terms by sampling

Getting the top-X terms for a given facet can be achieved by sampling: Instead of processing all hits in the result set, some of them are skipped. The result set iterator conveniently provides an efficient advance-method, making this very easy. As we will only use sampling with larger result sets, there should be enough data to be quite sure that the top-25 terms are the correct ones, although their counts are somewhat off.

This of course all depends on how high X is in top-X, concrete corpus etc. The biggest danger is clusters of content in the corpus, which might be skipped. Maybe the skipping could be made in small steps? Process 100 documents, skip 500, process the next 100…? Tests will have to be made.

Fine count the top-X terms

With the correct terms being isolated, precisely those term can be fine counted. This is nearly the same as vanilla distributed faceting, with the exception that all shards must fine count all the top-X terms, instead of only the terms they had not already processed earlier.

Of course the fine counting could be skipped altogether, which would be faster and potentially very usable for interactive exploratory use, where the exact counts does not really matter.

But there’s no guarantee?

No. Do remember that vanilla Solr distributed faceting is also a best-effort, with the same guarantee as above: The terms are not guaranteed to be the correct ones, but their counts are.

Seems simple enough

Ticket #38 for sparse faceting has been opened and we could really use this in the Danish Net Archive Search. No promises though.

Note 2015-05-30

Knut Anton Bøckman mentioned on Twitter that Primo has a faceting mechanism that looks similar to my proposal. It seems that Primo uses the top-200 hits to select the facets (or rather terms?), then do a fine-count on those.

It might work well to base the term selection on the top hits, rather than sampling randomly through all the hits, but I am afraid that 200 is so small a sample that some of the terms will differ from the right ones. I understand the need for a small number though: Getting the top-million hits or just top-hundred-thousand is costly.

David Rosenthal: The Panopticon Is Good For You

planet code4lib - Sat, 2015-05-30 15:00
As Stanford staff I get a feel-good email every morning full of stuff about the wonderful things Stanford is doing. Last Thursday's linked to this article from the medical school about Stanford's annual Big Data in Biomedicine conference. It is full of gee-whiz speculation about how the human condition can be improved if massive amounts of data is collected about every human on the planet and shared freely among medical researchers. Below the fold, I give a taste of the speculation and, in my usual way, ask what could possibly go wrong?

All the following quotes are from the article:
In his keynote address, Lloyd Minor, MD, dean of the School of Medicine, defined a term, “precision health,” as “the next generation of precision medicine.” Precision health, he said, is the application of precision medicine to prevent or forestall disease before it occurs. “Whereas precision medicine is inherently reactive, precision health is prospective,” he said. “Precision medicine focuses on diagnosing and treating people who are sick, while precision health focuses on keeping people healthy.”

The fuel that powers precision health, Minor said, is big data: the merging of genomics and other ways of measuring what’s going on inside people at the molecular level, as well as the environmental, nutritional and lifestyle factors they’re exposed to, as captured by both electronic medical records and mobile-health devices.This isn't just what would normally be thought of as medical data:
Precision health requires looking beyond medical data to behavioral data, several speakers said. This is especially true in a modern society where it is behavior, not infectious disease, that’s increasingly the cause of disability and mortality, noted Laura Carstensen, PhD, professor of psychology and founding director of the Stanford Center on Longevity.But not to worry, we can now collect all sorts of useful data from people's smartphones:
That’s where mobile devices for monitoring everyday behavior can be useful in ways electronic health records can’t. Several speakers touched on the potential for using mobile-health devices to survey behavior and chronic disease and, perhaps, provide insights that could be used to support better behavior.
By monitoring 24/7 which room of one’s home one is in at any given minute over a 100-day period, you can detect key changes in behavior — changes in sleep-wake rhythms, for instance — that can indicate or even predict the onset of a health problem.

An expert in analyzing conversations, [Intel fellow Eric] Dishman recounted how he’d learned, for example, that “understanding the opening patterns of a phone conversation can tell you a lot,” including giving clues that a person is entering the initial stages of Alzheimer’s disease. Alternatively, “the structure of laughter in a couple’s conversation can predict marital trouble months before it emerges.”If only we could get rid of these pesky privacy requirements:
“Medical facilities won’t share DNA information, because they feel compelled to protect patients’ privacy. There are legitimate security and privacy issues. But sharing this information is vital. We’ll never cure rare DNA diseases until we can compare data on large numbers of people. And at the level of DNA, every disease is a rare disease: Every disease from A to Z potentially has a genomic component that can be addressed if we share our genomes.”The potential benefits of having this data widely shared across the medical profession are speculative, but plausible. But its not speculative at all to state that the data will also be shared with governments, police, insurance companies, lawyers, advertisers and most of all with criminals. Anyone who has been paying the slightest attention to the news over the last few years cannot possibly believe that these vast amounts of extremely valuable data being widely shared among researchers will never leak, or be subpoenaed. Only if you believe "its only metadata, there's nothing to worry about" can you believe that the data, the whole point of which is that it is highly specific to an individual, can be effectively anonymized. Saying "There are legitimate security and privacy issues. But ..." is simply a way of ignoring those issues, because actually addressing them would reveal that the downsides vastly outweigh the upsides.

Once again, we have an entire conference of techno-optimists, none of whom can be bothered to ask themselves "what could possibly go wrong?". In fact, in this case what they ought to be asking themselves is "what's the worst that could happen?", because the way they're going the worst is what is going to happen.

These ideas are potentially beneficial and in a world where data could be perfectly anonymized and kept perfectly secure for long periods of time despite being widely shared they should certainly be pursued. But this is not that world, and to behave as if it is violates the precept "First, do no harm" which, while strictly not part of the Hippocratic Oath, I believe is part of the canon of medical ethics.

Terry Reese: Enhancements to the MarcEdit Replace Function — making complex conditional edits easy

planet code4lib - Sat, 2015-05-30 06:09

MarcEdit provides lots of different ways for users to edit their data.  However, one use case that comes up often is the ability to perform an action on a field or fields based on the presence of data within another field.  While you can currently do this in MarcEdit by using tools to isolate the specific records to edit, and then working on just those items — more could be done to make this process easier.  So, to that end, I’ve updated the Replace Function to include a new conditional element that will allow MarcEdit to presort using an in-string or regular expression query, prior to evaluating data for replacement.  Here’s how it will work…

When you first open the Replace Window:

Notice that the conditional string text has been replaced.  This was confusing to folks – because maybe that didn’t reflect exactly what was being done.  Rather, this is an option that allows a user to run an instring or Regular Expression search across your entire record before the Find/Replace is run.  The search options grouped below – these *only* affect the Find/Replace textboxes.  They do not affect the options that are enabled when the Perform Find/Replace If…is checked.  Those data fields have their own toggles for instring (has) or regular expression (regex) matching.


If you check the box, the following information will be displayed:

Again – the If  [Textbox] [REGEX] is a search that is performed and must evaluate as true in order for the paired find and replace runs.  The use case for this function are things like:

  • I want to modify the field x but only if foobar is found in field y.


There are other ways to do this by extracting data from files and creating lots of different files for processing or writing a script – but this will give users a great deal more flexibility when wanting to perform options, but only if specific data is found within a field.


A simple example would be below:

This is a non-real world example of how this function works.  A user wants to change the 050 field to an 090 field, but only if the data in the 945$a is equal to an m-z.  That’s what the new option allows.  By checking the Perform Find/Replace If option, I’m allowed to provide a pre-search that will then filter the data sets that I’m going to actually perform the primary Find/Replace pair on.  Make sense?  I hope so.

Finally – I’ve updated the code around the task wizard so that this information can be utilized within tasks.  This enhancement will be in the next available update.



Subscribe to code4lib aggregator