You are here

Feed aggregator

PeerLibrary: Free From Our Alphabetic Cage

planet code4lib - Mon, 2015-04-06 17:01

What is a logo?

Is a logo a representation of an organization’s values, goals, strengths, heart, and solidarity on the cause of lubricating the annals of academic knowledge and the communication of it to the people of the third planet from the sun?

PeerLibrary used to be represented by the letter, “P,” but could that honestly describe an organization which seeks to change not merely academic literature’s presentation to the masses, but honestly the universe as a whole? In the mind’s eye of PeerLibrary (because we are busy ameliorating the wrongs of modern society), we are making EVERYTHING change.

PeerLibrary was once described by the now apparent shallowness that is the letter, “P.” Insanity. We tried to represent a fundamentally brave, bold, and brilliant burst out of the box that academia has been hoping for since the advent of the intellectual superfreeway that is the interwebz, and now we see that we must move on. Additionally, the word pronounced as, “Pee,” is simply not representative of an organization that seeks to not waste the full power of academic literature nor the electronic superverse. PeerLibrary is not some yellow-green, warm, odorous entity wishing to be routinely expelled from users, but instead something engaging and enthralling that will not let the user let it go. Users will not turn away and slam the door on PeerLibrary because PeerLibrary will never bother them anyway. PeerLibrary will let them in and let them see something that they want to know and will not wish to let go.

Why would participants in the social experiment that is PeerLibrary not wish to let it go? Simply put, they want to get the most out of academic literature. At the physical level, academic literature is just a list of words and figures that researchers combined to describe their research. In order for an individual to turn this into something useful for themselves, they would want to comprehend the background of the topic, the direction the researchers decided to take investigation and why, the setup and results of their experimentation, in addition to the author’s conclusion on the supposition investigated. Post-comprehension, the viewer may wish to replicate the experiment, or design their own experiment. In both of these phases of academic literature review, scholars may want to discuss their thoughts and interpretations of the material with others. This desire could stem from an enjoyment of the accompaniment of an arrangement of folks or merely from a perspective that deep understanding comes most effectively from a discussion rather than instruction.

This is the power of PeerLibrary: to take the traditional library ideology of transferring knowledge from source-to-person, and expanding it to source-to-people, which is now technologically empowered.

So, as you now see before you, our logo is thus a book, one page text, the other a web. Alas, representation.

HangingTogether: The OCLC Evolving Scholarly Record Workshop, Chicago Edition

planet code4lib - Mon, 2015-04-06 16:00

On March 23, 2015, we held the third in the Evolving Scholarly Record Workshop series  at Northwestern University. The workshops build on the framework in the OCLC Research report, The Evolving Scholarly Record.

Jim Michalko, Vice President OCLC Research Library Partnership, introduced the third of four workshops to address library roles and new communities of practice in the stewardship of the evolving scholarly record.

Cliff Lynch, Director of CNI, started out by talking about memory institutions as a system — more than individual collections – to capture both the scholarly record and the endlessly ramifying cultural record.  It’s impossible to capture them completely, but hopefully we are sampling the best.

It is our role to safeguard the evidentiary record upon which the scholarly record and future scholarship depend.  But the scholarly record is taking on new definitions. It includes the relationship between the data and the science acted upon it. Its contents are both refereed and un-refereed. It includes videos, blogs, websites, social media… And even the traditional should be made accessible in new ways. There is an information density problem and prioritization must be done.

We need to be careful when thinking about the scholarly record and look at new ways in which scholarly information flows.

There is a lot of stuff that doesn’t make it into IRs because all eyes are on capturing things that are already published somewhere. The eyes are on the wrong ball…

[presentations are available on the event page]

Brian Lavoie, Research Scientist in OCLC Research provided a framework for a common understanding and shared terminology for the day’s conversations.

He defined the scholarly record as being the portions of scholarly outputs that have been systematically gathered, organized, curated, identified and made persistently accessible.

OCLC developed the Evolving Scholarly Record Framework to help support discussions, to define key categories of materials and stakeholder roles, to be high-level so it can be cross disciplinary and practical, to serve as a common reference point across domains, and to support strategic planning.The major component is still outcomes, but in addition there are materials from the process (e.g., algorithms, data, preprints, blogs, grant reviews) and materials from the aftermath (e.g., blogs, reviews, commentaries, revision, corrections, repurposing for new audiences).

The stakeholder ecosystem combines old roles (fix, create, collect, and use) in new combinations and among a variety of organizations.  To succeed, selection of the scholarly record must be supported by a stable configuration of stakeholder roles.

We’ve been doing this, but passively and often at the end of a researcher’s career.  We need to do so much more, proactively and by getting involved early in the process.

Herbert Van de Sompel, Scientist at Los Alamos National Laboratory gave his Perspective on Archiving the Evolving Scholarly Record.  A scholarly communication system has to support the research process (which is more visible than ever before) and fulfill these functions:

  • Registration: allows claims of precedence for scholarly finding (e.g. Mss submission), which is now less discrete and more continuous
  • Certification: establishes the validity of the claim (e.g., peer review), which is becoming less formal
  • Awareness: allows actors to remain aware of new claims (alerts, stacks browsing, web discovery), which is trending toward instantaneous
  • Archiving: allows preservation of the record (by libraries and other stakeholders), which is evolving from medium- to content-driven.

Herbert characterized the future in the following ways:  The scholarly record is undergoing massive extension with objects that are heterogeneous, dynamic, compound, inter-related and distributed across the web – and often hosted on common web platforms that are not dedicated to scholarship.

Our goal is to achieve the ability to persistently, precisely, and seamlessly revisit the Scholarly Web of the Past and of the Now at some point in the Future.  We need to capture compound objects, with context, and in a state of flux at the request of the owner and at the time of relevance.

Herbert’s distinction between recording and archiving is critical. Recording platforms make no commitment to long-term access or preservation.  They may be a significant part of the scholarly process, but they are not a dependable part of the scholarly record.

We need to start creating workflows that support researcher-motivated movement of objects from private infrastructure to recording infrastructure and support curator-motivated movement of objects and context from recording infrastructure to archiving infrastructure.

Sarah Pritchard, Dean of Libraries, Northwestern University put things in the campus politics and technology context.

The evolving scholarly record requires that we work with a variety of stakeholders on campus:  faculty and students (as creators), academic departments (as managers of course content and grey literature), senior administrators (general counsel, CFO, HR), trustees (governance policy), office of research (as proxy for funder’s requirements), information technology units, and disciplinary communities.

There are many research information systems on campus, beyond the institutional repository: course management systems, faculty research networking systems, grant and sponsored research management systems, student and faculty personnel system, campus servers and intranets, and – because the campus boundaries are pervious — disciplinary repositories, cloud and social platforms.  And also office hard drives.

Policies and compliance issues go far beyond the content licensing libraries are familiar with:  copyright (at  the institutional and individual levels), privacy of records (student work, clinical data, business records), IT security controls and web content policies, state electronic records retention laws, open access (institutionally or funder mandated), and rights of external system owners (hosted content).

Sarah finished with some provocative thoughts:

  • The library sees itself as a “selector”, but many may see this as overstepping
  • The library looks out for the institution which can be at odds with the faculty sense of individual professional identity
  • There is a high cost to change the technical infrastructure and workflow mechanisms and to reshape governance and policy
  • There is a lack of a sense of urgency

She recommended that we start with low hanging fruit, engage centers of expertise, find pilot project opportunities, and accept that there won’t be a wholesale move into this environment.

Sarah Pritchard’s presentation really affected me: sort of a rallying cry to go out and make things happen!

The campus context provided a perfect launching point for the Breakout Discussions. From ten pages of notes, I’ve distilled the following action-oriented outcomes:

Within the library

  • If your library has receded from your university goals and strategies, move the library back into the prime business of your institution with a roster of candidate service offerings to re-position yourselves in the campus community.
  • Earn reputation through service provision and through access as opposed to reputation through ownership.
  • Selection
    • Ask yourself, what are we selecting? How do we define the object? What commitments will we make? And how does it fit into the broader system?
    • Consider some minimum requirements in terms of number of hits or other indications of interest for blogs/websites to be archived.  Those indexed by organizations like MLA or that are cited in scholarly articles seem worthy.
    • Declare collections of record so that others can depend on it, but beware of the commitment if you have to create new storage and access systems for a particular type of material.
    • Communicate when you have taken on a commitment to web archiving particular resources, possibly via the MARC preservation commitment field.
    • A lot of stuff that doesn’t get archived because we focus on materials that are already well-tended elsewhere. Look for the at-risk materials.
    • Accept adequate content sampling.
  • Focus on training librarians.  Get them to use the dissertation as the first opportunity to establish a relationship, establish an ORCID, and mint a DOI.  Do some of these things that publishers do to provide a gateway to infrastructure that is not campus-centric but system-centric.
  • Decide where the library will focus; it can’t be expert in all things.  Assess where the vulnerabilities are and set priorities.
  • Provide a solution where none exists to capture the things that have fallen through the cracks.
  • Technical solutions
  • Linked data could be the glue for connecting IDs with institutions. Identifiers for individuals and for organizations, and possibly identifiers for departments, funding agencies, projects…
  • Follow a standard to create metadata to provide consistency in the way it’s formed, in the content, and in the identifiers being used.
  • Use technology that is ready now to
    • help with link rot (the URL is bad) and reference rot (the content has changed), so researchers can reference a resource as it was when they used the data or cited it.  Memento makes it easy to archive a web page at a point in time.
    • provide identifiers
      • ORCID and ISNI are ready for researcher identification.
      • DOIs,, and Memento are ready for use.
    • harvest web resources. Archive-It is ready for web harvesting and the Internet Archive’s Wayback Machine is ready for accessing archived web pages.
    • transport of big data packets. Globus is a solution for researchers and institutions
    • create open source repositories. Consider using DSpace, EPrints, Fedora or Drupal to make your own.
  • Explore ways in which people track conversation around the creation of an output, like the Future of the Book platform or Twitter conversations. Open Annotation is a solution that allows people to discuss where they prefer.
  • Before building a data repository, ask for whom are we doing this and why?  If no one is asking for it, turn your attention elsewhere.
  • Create a hub for scholars who don’t know what they need, where the main activity may be referring researchers to other services.
  • To get quick support, promote and provide assistance with the DMPTool, minting DOIs, and archiving that information.
  • Get your message into two simple sentences.
  • Evolve the model and the people to move from support to collaboration

With researchers

  • Do the work to understand researchers’ perspectives.  Meet them where they live.  A good way to engage researchers is to ask them what’s important in their field. Then ask who is looking after it. Include grad students and untenured and newly-tenured faculty as they may be most receptive.
  • Data services may vary dramatically among disciplines.  Social Sciences want help with SPSS and R.  Others want GIS.  For STEM and Humanities there are completely different needs.
  • Before supporting an open access journal, ask the relevant community: do you need a journal, who is the audience, and what is the best way to communicate with them?
  • Stop hindering researchers with roadblocks relating to using cameras or scanners, copying, or putting up web pages.
  • Help users make good choices in use of existing disciplinary data repositories and provide a local option for disciplines lacking good choices.
  • Help faculty avoid having to profile themselves in multiple venues. Offer bibliography and resume services and portability as they move from institution to institution.
  • Explain the benefits of deposit in the record to students and faculty in terms of their portfolio and resume, and for collaboration.
  • To educate reluctant researchers, use assistants in the workflow, i.e. grant management assistants or use graduate student ambassadors to discount rumors and half-truths.  Try quick lunch and learn workshops.  Market through established channels and access points.
  • Talk to researchers about the levels of granularity available to appropriately manage access to their content.
  • Coordinate with those writing proposals and make sure they know that if they expect library staff to do some of the work, the library needs to be involved in the discussion. Get involved early in the research proposal process. Stress that maintenance has to be built in.    When committing to archiving, include an MOU covering service levels and end-of-life.
  • A formalized request process may help with communication.

With other parts of your institution

  • Get at least one other partner on campus on board early — maybe an academic faculty or department who are moving in the same direction you need to go (or administration, grants manager, IT people, educators, other librarians, funders).
  • Begin with a strategy, a call for partnership and implementation, then have conversations with faculty departments to get an environmental scan.  Identified what is needed (e.g., GIS, text-mining, data analysis), and distill into areas you can support internally or send along to campus partners.
  • Don’t duplicate services. Cede control to another area on the campus.  Communicate what is going on in different divisions and establish relationships. Provide guidance to get researchers to those places.
  • Work with associate deans and others at that level to find out about grant opportunities.
  • Develop partnerships with research centers and computing services, deciding what where in the lifecycle things are to be archived and by whom.
  • Other parts of the university may decide to license data from vendors like Elsevier. The library has a relationship that vendor, offer to do the negotiation.
  • Spin your message to a stakeholder’s context (e.g., archiving the scholarly record is a part of business continuity planning and risk management for the University’s CFO).
  • Coordinate with other campus pockets of activity involved in assigning DOIs, data management, and SEO activities for the non-traditional objects to optimize institutional outcomes. Integrating these objects into the infrastructure makes them able to circulate with the rest of the record.
  • Alliances on campus should be about integrating library services into the campus infrastructure. Unless you’ve done that on campus, you’re not doing your best to connect to the larger scholarly record.

With external entities

  • We should work with scholarly societies to learn about what we need to collect in a particular discipline (data sets, lab books, etc.) — and how to work with those researchers to get those things.
  • Identify the things can be done elsewhere and those that need to be done locally.  Storing e-science data sets may not be a local thing, whereas support for collaboration may be.
  • Make funder program officers aware of how libraries can help with grant proposals, so they can refer researchers’ questions back to the library.
  • Rely on external services like JSTOR, arXiv, SSRN, and ICPSR, which are dependable delivery and access systems with sustainable business models.
  • Use centers of excellence. Consider offering your expertise, for instance, with a video repository and rely on another institution for data deposit.
  • Work with publishers to provide the related metadata that might, for instance, be associated with a dataset uploaded to PLoSOne.
  • To help with the impact of researcher output, work with others, such as Symplectic, because they have the metadata we need.
  • To establish protocols for transferring between layers, make sure conversations include W3C and IETF.
  • Identify pockets of interoperability and find how to connect rather than waiting for interoperability to happen.

We are at the beginning of this; it will get better.

Thanks to all of our participants, but particularly to our hosts at Northwestern University, our speakers, and our note-takers. We’re looking forward to culminating the series at the workshop in San Francisco in June, where we’ll focus on how we can collaboratively move things forward to do our best to ensure stewardship of the scholarly record.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (37)

LITA: Teamwork and Jazz

planet code4lib - Mon, 2015-04-06 14:01

“Jazz Players” by Pedro Ribeiro Simões / CC BY 2.0

Jazz is a pretty unique genre that demands a lot from musicians; a skilled jazz artist must not only be adept at their instrument, they must be highly skilled improvisors and communicators as well. Where other styles of music may only require that a musician remember how to play a piece and run through it the same way every time, good jazz artists can play the same song in an infinite number of ways. Furthermore, they must also be able to collaborate with other jazz artists who can also play the same song an infinite number of ways. This makes jazz an inherently human art form because a listener never knows what to expect; when a jazz group performs, the outcome is the unpredictable result of each musician’s personal taste and style merging into a group effort.

In a lot of ways, team projects are kind of like a jazz performance: you have several people with different skill sets coming together to work toward a common goal, and the outcome is dependent on the people involved. While there are obvious limits to how far we can stretch this metaphor, I think we can learn a lot about being an effective team member from some of the traits all jazz greats have in common.


Trust your bandmates

Many hands make light work. Sometimes we may feel like we could get more done if we simply work alone, but this puts an artificial limit on how effective you can be. Learn to get over the impulse to do it all yourself and trust in your colleagues enough to delegate some of your work. Everyone has different strengths and weaknesses, and great teams know how to balance these differences. Even though Miles Davis was a great trumpeter, his greatest performances were always collaborations with other greats, or at least with a backing band. Great musicians inspire each other to do their best and try to remove all creative hindrances. This hyper-creative environment just isn’t possible to replicate in isolation.

When we got a new metadata librarian here at FSU, I had been making my own MODS records for a few months and was uncomfortable with giving up control over this aspect of my workflow. I’ve since learned that this is his specialty and not mine, and I trust in his expertise. As a result, our projects now have better metadata, I have more time to work on other things that I do have expertise in, and I have learned a lot more about metadata than I ever could have working alone.


Learn to play backup

Everyone wants to play the solo. It’s the fun part, and all the attention is on you. There’s nothing wrong with wanting to shine, but if everyone solos at the same time it defeats the purpose and devolves into noise. Good jazz musicians may be known for their solos, but the greats know how to play in a way that supports others when it’s their turn to solo, too. They are more concerned with the sound of the band as a whole instead of selfishly focusing on their own sound.

A big part of trusting your “bandmates” is staying out of their way when it’s their turn to “solo”. Can you imagine trying play music on stage with someone who doesn’t even play your instrument yelling instructions at you about how you should be playing? That would be pretty distracting, but the office equivalent happens all the time. Micromanaging teammates can kill project morale quickly without even being aware of it. Sometimes projects have bottlenecks where no one can move forward until a specific thing gets done, and this is just a fact of life. If you are waiting for a team member to get something done so you can start on your part of the project, politely let them know that you are available if they need help or advice, and only provide help and advice if they ask. If they don’t need help, then politely stay out of their way.


Communication is key

Jazz musicians aren’t mind readers, but you might think they were after a great performance. It’s unbelievable how some bands can improvise in the midst of such complex patterns without getting lost. This is because improvisation requires a great deal of communication. Musicians communicate to each other using a variety of cues, either musical (one might drop in volume to signal the end of a solo), physical (one might step towards the center of the group to signal the start of a solo and then step away to signal the end), or visual (one might nod, wink or shift their foot as a signal to the rest of the group). These cue systems are all specific to the context of people performing on stage, but we can imagine a different set of cues for a team project that work just as well.

Like jazz musicians, team projects can be incredibly complex and a successful project requires all team members to be aware of their context. It is essential that everyone knows exactly where a project is at on a timeline so that they can act accordingly, and this information can be expressed in a variety of ways. Email is a popular choice, as it leaves a written record of who said what that can be consulted later. Email is great at communicating small, specific bits of information, but it is always helpful to have a “30,000 foot view” of the project as well so the team can see the big picture. Fellow LITA blogger Leo Stezano wrote a post about different ways to keep track of a project’s high-level progress, covering the use of software, spreadsheets, and the classic “post-it notes on a whiteboard” approach. I prefer to use Trello since it combines the simplicity of post-it notes on a wall with the flexibility of software, but there are a lot of options. The best option is whatever works for your team.

Equally important to finding good ways to communicate and sticking with them is uncovering harmful methods of communication and stopping them. Don’t send emails about a project to the rest of your team outside of working hours, it sends the wrong message about work-life balance. Try to eliminate unnecessary meetings and replace them with emails if you can. Emails are asynchronous and team members can respond when it is convenient for them, but meetings pollute our schedules and are productivity kryptonite. Finally, don’t drop into someone’s office unannounced (I do this all the time). Send an email or schedule a short meeting instead. Random office drop-ins derail the victim’s train of thought and sends the signal that whatever they were working on isn’t as important as you are. Can you imagine Miles Davis tapping John Coltrane on the shoulder during a solo to ask what song they should play next? I didn’t think so. Being considerate with your communication is an underrated skill that may be the secret sauce that makes your project run more smoothly.

Brown University Library Digital Technologies Projects: Announcing a Researchers @ Brown data service

planet code4lib - Mon, 2015-04-06 13:59

Campus developers might want to use data from Researchers@Brown (R@B) in other websites. The R@B team has developed a JSON web service that allows for this.  We think it will satisfy many uses on campus. Please give it a try and send feedback to

Main types/resources
  • faculty
  • organizational units (departments, centers, programs, institutes, etc)
  • research topics
Requesting data

To request data, begin with an identifier.  Let’s use Prof. Diane Lipscombe as an example:


Looking through the response you will notice affiliations and topics from Prof. Lipscombe’s profile.  You can make additional requests for information about those types by following the “more” link in the response.


Following the affiliations links from a faculty data profile will return information about the Department of Neuroscience, which Prof. Lipscombe is a member.


Looking up this topic will return more information about the research topic “molecular biology”, including other faculty who have identified this as a research interest.

Responses Faculty
  • first name
  • last name
  • middle
  • title
  • Brown email
  • url (R@B)
  • thumbnail
  • image – original image uploaded
  • affiliations – list with lookups
  • overview – this is HTML and may contain links or other formatting
  • topics – list with lookups
  • name
  • image (if available)
  • url (to R@B)
  • affiliations – list with lookups
  • name
  • url (to R@B)
  • faculty – list with lookups
Technical Details
  • Requests are cached for 18 hours.
  • CORS support for embedding in other sites with JavaScript
  • JSONP for use in browsers that don’t support CORs.
Example implementation

As an example, we have prepared an example of using the R@B data service with JavaScript using the React framework.

David Rosenthal: The Mystery of the Missing Dataset

planet code4lib - Sun, 2015-04-05 19:00
I was interviewed for an upcoming news article in Nature about the problem of link rot in scientific publications, based on the recent Klein et al paper in PLoS One. The paper is full of great statistical data but, as would be expected in a scientific paper, lacks the personal stories that would improve a news article.

I mentioned the interview over dinner with my step-daughter, who was featured in the very first post to this blog when she was a grad student. She immediately said that her current work is hamstrung by precisely the kind of link rot Klein et al investigated. She is frustrated because the dataset from a widely cited paper has vanished from the Web. Below the fold, a working post that I will update as the search for this dataset continues.

My step-daughter works on sustainability and life-cycle analysis. Here is her account of the background to her search:
The data was originally recommended to me by one of our scientific advisors at [a previous company] for use in the software we were developing and for our use in our consulting work. On their recommendation I googled "impact2002+" and found my way to the download page. I originally downloaded it in summer 2011.

It is a model for characterizing environmental flows into impacts. This is incredibly useful when looking at hundreds of pollutants and resource uses across a supply chain to understand how they roll-up into impacts to human health, ecosystem quality, and resources. For example it estimates the disability adjusted life years (impact to human life expectancy) associated with a release of various pollutants to air/land/soil. Another example is the estimate of the ecosystem quality loss (biodiversity loss) associated with various chemical emissions. Another example is the estimate of the future energy required to extract an incremental amount of additional minerals or energy resources (e.g. coal).

I looked for it again in summer 2014 when I noticed it was gone. I always assumed that by just searching "Impact2002+" I'd be able to find the data again - how wrong I was!

I reached out to the webmaster listed on the University of Michigan site and actually got a response but after a couple emails requesting the data with no luck I stopped pursuing that path. I ended up purchasing a dataset that has some of the Impact2002+ data embedded in it but there are still some pieces of my analysis that are limited by not having the original dataset. Here is where the search starts. In 2003, Olivier Jolliet et al published IMPACT 2002+: A new life cycle impact assessment methodology:
The new IMPACT 2002+ life cycle impact assessment methodology proposes a feasible implementation of a combined midpoint/damage approach, linking all types of life cycle inventory results (elementary flows and other interventions) via 14 midpoint categories to four damage categories. ... The IMPACT 2002+ method presently provides characterization factors for almost 1500 different LCI-results, which can be downloaded at its field, this is an extremely important paper. Google Scholar finds 810 citations to it. Unfortunately, this isn't a paper for which Springer provides article-level metrics. The International Journal of Life Cycle Assessment, in which the paper was published, is ranked 8th in the Sustainable Development field by Google's Scholar Metrics. Its h5-median index is 54, so a paper with 810 citations is vastly more cited than the papers it typically publishes.

The authors very creditably provided their data, the 1500 characterization factors, for download from the specified URL. That link,, now redirects to, which returns a 404 Not Found error, so it has unambiguously rotted. The Wayback Machine does not have that page, although it has over 1000 URLs from, nor does the Memento Time Travel service. So not merely has the link rotted, but there don't appear to be any archived versions of the data supporting the paper.

The bookmark my step-daughter had for the dataset was, which links to, which redirects to the broken

The Wayback Machine has 11 captures of between February 11, 2002 and July 7, 2014. The most recent is actually a capture of the page it redirected to at the Michigan's School of Public Health, which now returns 404. That page said:
In order to access the IMPACT 2002+ model we ask that you provide us with your name, affiliation and email address at the bottom of this page. You do not have to be affiliated with the Center for Risk Science and Commnication or the University of Michigan to access the IMPACT 2002 model. Your information will only be used to notify you of any updates concerning the model. Your data will be kept strictly confidential.This is the explanation for the lack of any archived versions of the dataset. Web crawlers, such as the Internet Archive's Heritrix, are unable to fill out Web forms without site-specific knowledge, which in this case was obviously not available.

Similarly, in 2005 the Internet Archive captured pages from the EPFL site before the move to Michigan. They included this page describing the IMPACT2002+ method, which used a form to ask for:
your name, affiliation and your email-address, which will will enable us to keep you informed about important updates from time to time. None of your data will be transmitted to anyone else. Then you can download the following files concerning the IMPACT 2002+ method ... Your data are not used to control or restrict the download, but will help us to keep you informed about updates concerning the IMPACT 2002+ methodology.Again, archiving of the freely download-able data was prevented.

One obvious lesson from this is that authors should be strongly discouraged from forcing researchers to supply information, such as names and e-mail addresses, before they can download data that has been made freely available, because the result is likely to be, as in this case, that with the ravages of time the data will become totally unavailable. It seems likely that this dataset became unavailable as a side-effect of the Risk Science Center migrating to its own website rather than being a part of the School of Public Health's website.

Another lesson is the completely inadequate state of Institutional Repositories. The University of Michigan's IR, Deep Blue, contains only 6 of the 76 "Selected Publications" from Olivier Jolliet's Michigan home page, but it has PDFs for their full text. Infoscience, the EPFL IR lists 58 publications with Olivier Jolliet as an author, including the paper in question, but for that it says:
There is no available fulltext. Please contact the lab or the authors.and:
The IMPACT 2002+ method presently provides characterization factors for almost 1500 different LCI-results, which can be downloaded at is no longer the case. Note that ResearchGate claims to know about 177 publications from OlivierJolliet.

Patrick Hochstenbach: Penguins Are Back

planet code4lib - Sun, 2015-04-05 09:02
Filed under: Doodles Tagged: aprilfools, cartoon, comic, easter, Penguin

Galen Charlton: Three tales regarding a decrease in the number of catalogers

planet code4lib - Sat, 2015-04-04 20:25

Discussions on Twitter today – see the timelines of @cm_harlow and @erinaleach for entry points – got me thinking.

In 1991, the Library of Congress had 745 staff in its Cataloging Directorate. By the end of FY 2004, the LC Bibliographic Access Divisions had between 5061 and 5612 staff.

What about now? As of 2014, the Acquisitions and Bibliographic Access unit has 238 staff3.

While I’m sure one could quibble about the details (counting FTE vs. counting humans, accounting for the reorganizations, and so forth), the trend is clear: there has been a precipitous drop in the number of cataloging staff employed by the Library of Congress.

I’ll blithely ignore factors such as shifts in the political climate in the U.S. and how they affect civil service. Instead, I’ll focus on library technology, and spin three tales.

The tale of the library technologists

The decrease in the number of cataloging staff are one consequence of a triumph of library automation. The tools that we library technologists have written allow catalogers to work more efficiently. Sure, there are fewer of them, but that’s mostly been due to retirements. Not only that, the ones who are left are now free to work on more intellectually interesting tasks.

If we, the library technologists, can but slip the bonds of legacy cruft like the MARC record, we can make further gains in the expressiveness of our tools and the efficiencies they can achieve. We will be able to take advantage of metadata produced by other institutions and people for their own ends, enabling library metadata specialists to concern themselves with larger-scale issues.

Moreover, once our data is out there – who knows what others, including our patrons, can achieve with it?

This will of course be pretty disruptive, but as traditional library catalogers retire, we’ll reach buy-in. The library administrators have been pushing us to make more efficient systems, though we wish that they would invest more money in the systems departments.

We find that the catalogers are quite nice to work with one-on-one, but we don’t understand why they seem so attached to an ancient format that was only meant for record interchange.

The tale of the catalogers

The decrease in the number of cataloging staff reflects a success of library administration in their efforts to save money – but why is it always at our expense? We firmly believe that our work with the library catalog/metadata services counts as a public service, and we wish more of our public services colleagues knew how to use the catalog better.  We know for a fact that what doesn’t get catalogued may as well not exist in the library.

We also know that what gets catalogued badly or inconsistently can cause real problems for patrons trying to use the library’s collection.  We’ve seen what vendor cataloging can be like – and while sometimes it’s very good, often it’s terrible.

We are not just a cost center. We desperately want better tools, but we also don’t think that it’s possible to completely remove humans from the process of building and improving our metadata. 

We find that the library technologists are quite nice to work with one-on-one – but it is quite rare that we get to actually speak with a programmer.  We wish that the ILS vendors would listen to us more.

The tale of the library directors

The decrease in the number of cataloging staff at the Library of Congress is only partially relevant to the libraries we run, but hopefully somebody has figured out how to do cataloging more cheaply. We’re trying to make do with the money we’re allocated. Sometimes we’re fortunate enough to get a library funding initiative passed, but more often we’re trying to make do with less: sometimes to the point where flu season makes us super-nervous about our ability to keep all of the branches open.

We’re concerned not only with how much of our budgets are going into electronic resources, but with how nigh-impossible it is to predict increases in fees for ejournal subscriptions/ fees for ebook services.

We find that the catalogers and the library technologists are pleasant enough to talk to, but we’re not sure how well they see the big picture – and we dearly wish they could clearly articulate how yet another cataloging standard / yet another systems migration will make our budgets any more manageable.

Each of these tales is true. Each of these tales is a lie. Many other tales could be told. Fuzziness abounds.

However, there is one thing that seems clear: conversations about the future of library data and library systems involve people with radically different points of view. These differences do not mean that any of the people engaged in the conversations are villains, or do not care about library users, or are unwilling to learn new things.

The differences do mean that it can be all too easy for conversations to fall apart or get derailed.

We need to practice listening.

1. From testmony by the president of the Library of Congress Professional Guild to Congress on 6 March 2015. 2. From the BA FY 2004 report. This including 32 staff from the Cataloging Distribution Service, which had been merged into BA and had not been part of the Cataloging Directorate. 3. From testmony by the president of the Library of Congress Professional Guild to Congress on 6 March 2015.

Cynthia Ng: Musing: Playing Around with the NNELS Logo

planet code4lib - Sat, 2015-04-04 02:51
It’s come up recently that we might consider revising our logo. I saw a coworker playing around with it and thought I’d give it a try. The thinking behind it is simple. Transpose the letters into Braille, and then try to match the Braille version to a hexagonal grid. Turns out the hardest is the … Continue reading Musing: Playing Around with the NNELS Logo

HangingTogether: The Semi-Finals

planet code4lib - Fri, 2015-04-03 18:19

OCLC Research Collective Collections Tournament


Thirty-two conferences started this journey, and now only two remain. The OCLC Research Collective Collection tournament is just one step away from crowning a Champion. Throw your brackets away and buckle your seat belts, because the tournament semi-finals are over and the finals are next!

[Click to enlarge]

How many languages does your conference collective collection speak? Competition in the semi-finals centered around the number of languages represented in each conference’s collective collection.* In the first semi-finals match-up, Conference USA cruised to an easy victory over Summit League, 366 languages to 265 languages. In the second match-up, Atlantic 10 also had little trouble with its opponent, moving past Missouri Valley 374 languages to 289 languages. So Conference USA and Atlantic 10 will square off in the tournament finals, with the honor and glory of the title “2015 Collective Collections Tournament Champion” at stake!

As the results of the semi-finals competition show, conference collective collections are very multilingual. Atlantic 10 had the most languages of any competitor in this round, with more than 370. But even the conference with the fewest languages – Summit League – had 265 languages in its collective collection! Suppose that an average book is 1.25 inches thick. If Summit League stacked up one book for every language represented in its collection, the resulting pile would be almost 28 feet tall! If Atlantic 10 did it, the stack would be nearly 40 feet tall!

The mega-collective-collection of all libraries – as represented in the WorldCat bibliographic database – contains publications in 481 different languages. English is the most common language in WorldCat; here’s a look at the top 50 most frequently-found languages other than English:

[Word cloud created with Click to enlarge]

After English, the most common languages in WorldCat are German, French, Spanish, and Chinese. Despite the high number of English-language materials, more than half of the materials in WorldCat are non-English! And as we’ve seen, many of these non-English-language publications have found their way into the collective collections of our tournament semi-finalists! So are you interested in reading something in Urdu? Atlantic 10 has nearly 2,300 Urdu-language publications to choose from. How about Welsh? Conference USA can furnish you with nearly 1,400 publications in Welsh. No matter what language you’re interested in, these collective collections likely have something for you – they speak a lot of languages!

Bracket competition participants: Remember, even if the conference you chose is not in the Finals, hope still flickers! If no one picked the tournament Champion, all entrants will be part of a random drawing for the big prize!

Get set for the Tournament Finals! Results will be posted April 6.


*Number of languages represented in language-based (text or spoken) publications comprising each conference collective collection. Data is current as of January 2015.

More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

Round of 16: The plot thickens … and so do the books

Round of 8: Peaches and Pumpkins

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (13)

FOSS4Lib Updated Packages: The Great Reading Adventure

planet code4lib - Fri, 2015-04-03 16:59

Last updated April 3, 2015. Created by Jim Craner on April 3, 2015.
Log in to edit this page.

From The Great Reading Adventure website:

"The Great Reading Adventure is a robust, open source software designed to manage library reading programs. It is currently in its second version... The Great Reading Adventure was developed by the Maricopa County Library District with support by the Arizona State Library, Archives and Public Records, a division of the Secretary of State, with federal funds from the Institute of Museum and Library Services."

The Great Reading Adventure lets libraries and library consortia set up a full online summer reading program for patrons. Features include reporting, customization per library, digital badges, avatars, reading lists, and much more.

The software runs on a Windows IIS/MSSQL server.

License: MIT License Package Links Development Status: Production/StableOperating System: WindowsDatabase: MsSQL

John Miedema: Lila “tears down” old categories and suggests new ways of looking at content. Word concreteness is a good candidate.

planet code4lib - Fri, 2015-04-03 14:22

Many of the good things we love about language are essentially hierarchical. Narrative is linear: a beginning, middle, and end. Order shapes the story. Hierarchy gives a bird’s eye view, a table of contents, a summary that allows a reader to consider a work as a whole.

Lila will compute hierarchy by comparing passages on word qualities that suggest order. Concreteness is considered a good candidate. Passages with more abstract words express ideas and concepts, whereas passages with more concrete words express examples. Of the views that Lila can suggest, it is useful to have a view that presents abstract concepts first and concrete examples second. I have listed four candidate qualities here, but I will focus in the posts that follow on concreteness.

Quality Description Examples 1 Abstract Intangible qualities, ideas and concepts. Different than frequency of word usage. Both academic terms and colorful prose can have low word frequency. freedom (227*), justice (307), love (311) Concrete Tangible examples, illustrations and sensory experience grasshopper (660*), tomato (662), milk (670) 2 General Categories and groupings. Similar to 1, but 1 is more dichotomous and this one is more of a range. furniture Specific Particular instances La-Z-Boy rocker-recliner 3 Logical Analytical thinking, understatement and fact. Note the conflict with 1 and 2 — facts are both logical and concrete. The fastest land dwelling creature is the Cheetah. Emotional/Sentimental Feeling, emphasis, opinion. Can take advantage of the vast amount of sentiment measures available. The ugliest sea creature is the manatee. 4 Static Constancy and passivity It was earlier demonstrated that heart attacks can be caused by high stress. Dynamic Change and activity. Energy. Researchers earlier showed that high stress can cause heart attacks.

* Concreteness index. MRC Psycholinguistic database. Grasshopper is a more concrete word than freedom. Indexes like the MRC can be used to compute concreteness for passages.

Lila can compute hierarchy for passages, and for groups of passages. Together, it builds a hierarchy, a view of how the content can be organized. Think of what this offers a writer. A writer stuck in his or her manually produced categories and view can ask Lila for alternate views. Lila “tears down” the old categories and suggests a new way of looking at the content. It is unlikely that the writer will stick exactly to Lila’s view, but it could provide a fresh start or give new insight. And Lila can compute new views dynamically, on demand, as the content changes.

LITA: Agile Development: Tracking Progress

planet code4lib - Fri, 2015-04-03 14:00

Image courtesy of Wikipedia (Jeff Iasovski)

In my last post, I discussed effort estimation and scheduling, which leads into the beginning of actual development. But first, you need to decide how you’re going to track progress. Here are some commonly used methods:

The Big Board

In keeping with Agile philosophy, you should choose the simplest tool that gives you the functionality you need. If your team does all of its development work in the same physical space, you could get by with post-it notes on a big white board. There’s a lot to be said for a tangible object: it communicates the independent nature of each task or story in a way that software may not. It provides the team with a ready-made meeting point: if you want to see how the project is going, you have to go stand in front of the big board. A board can also help to keep projects lean and simple, because there’s only so much available space on it. There are no multiple screens or pages to hide complexity.

Sticky notes, however, are ephemeral in nature. You can lose your entire project plan to an overzealous janitor; more importantly, unless you periodically take pictures of your board, there’s no way to trace user story evolution. Personally, I like to use this method in the initial stages of planning; the board is a very useful anchor for user story definition and prioritization. Once we move into the development process, I find that moving into the virtual realm adds crucial flexibility and tracking functionality.


If the scope of the project is limited, it may be possible to track it using a basic office productivity suite like MS Office. MS Excel and similar spreadsheet tools are fairly easy to use, and they’re ubiquitous, which means your team will likely face a lower learning curve. Remember that in Agile the business side of the organization is an integral part of the development effort, and it may not make sense to spend time and effort to train sales and management staff on a complex tracking tool.

If you choose to go the spreadsheet route, however, you are giving up some functionality: it’s easy enough to create and maintain spreadsheets that give you project snapshots and track current progress, but this type of software is not designed to accurately measure long term progress and productivity, which helps you upgrade your processes and increase your team’s efficiency. There are ways to track Agile metrics using Excel, but if you find that you need to do that you may just want to switch to dedicated software anyway.

Tracking Software

There are several tracking tools out there that can help manage Agile projects, although my personal experience so far has been limited to to JIRA and its companion GreenHopper. JIRA is a fairly simple issue-tracking tool: you can create issues (manually or directly from a reporting form), add a description, estimate effort, prioritize, and assign to a team member for completion. You can also track it through the various stages of development, adding comments at each step of the way and preserving meaningful conversations about its progress and evolution. As you can see in this article comparing similar tools, JIRA’s main advantage is the lack of unnecessary UI complexity, which makes it easier to master. Its main shortcoming is the lack of sprint management functionality, which is what GreenHopper provides. With the add-on, users can create sprints, assign tickets to them, and track sprint progress.

Can all of this functionality be replicated using spreadsheets? Yes, although maintenance and authentication can becomes problematic as the complexity of the project increases. At some point a tool like JIRA starts to pay for itself in terms of increased efficiency, and most if not all of these products are web-based and offer some sort of free trial or small enterprise pricing. My advice is to do analyze your operations to determine if you need to go the tracking tool route, and then some basic research to identify popular options and their pros and cons. Once you’ve identified one or two options that seem to fit your needs, give then a try to see if they’re what you’re looking for.

Again, which method you go with will depend on how much effort you will need to spend up front (in training and adapting new software) versus later on (added maintenance and decreased efficiency).

How do you track user story progress? What are the big advantages/disadvantages of your chosen method? JIRA in particular seems to elicit strong feelings in users, positive or negative; what are your thoughts on it?

DuraSpace News: OR2015 Conference Stands Behind Commitment to Ensure All Participants are Treated With Respect

planet code4lib - Fri, 2015-04-03 00:00

Indianapolis, IN  The Open Repositories 2015 conference will take place June 8-11 in Indianapolis and is wholly committed to creating an open and inclusive conference environment. As expressed in its Code of Conduct, OR is dedicated to providing a welcoming and positive experience for everyone and to having an environment in which all colleagues are treated with dignity and respect.

LITA: Let’s Hack a Collaborative Library Website!

planet code4lib - Thu, 2015-04-02 16:02
A LITA Preconference at 2015 ALA Annual

Register online for the ALA Annual Conference and add a LITA Preconference

Friday, June 26, 2015, 8:30am – 4:00pm

In this hackathon attendees will learn to use the Bootstrap front-end framework and the Git version control system to create, modify and share code for a new library website. Expect a friendly atmosphere and a creative hands-on experience that will introduce you to web literacy for the 21st century librarian. The morning will consist of in-depth introductions to the tools, while the afternoon will see participants split into working groups to build a single collaborative library website.

What is Bootstrap

Bootstrap is an open-source, responsive designed, and front-end web framework that can be used to create complete website redesigns to rapid prototyping. It is useful for many library web applications, such as customizing LibGuides (version 2) or creating responsive sites. This workshop will give attendees a crash-course into the basics of what Bootstrap can do and how to code it. Attendees can work individually or in teams.

What is Git

Git is an open-source software tool that allows you to manage drafts and collaboratively work on projects – whether you’re building a library app, writing a paper, or organizing a talk. We will also talk about GitHub, a massively popular website that hosts git projects and has built-in features like issue tracking and simple web page hosting.

Additional resources

Bootstrap, LibGuides, & Potential Web Domination  – Discussion of the use of Bootstrap at the Van Library, University of St. Francis

Libraries using Bootstrap example:
Bradford County Public Library

Library Code Year Interest Group

This program was put together by the ALCTS/LITA Library Code Year Interest Group which is devoted to supporting members who want to improve their computer programming skills. Find out more here.


Kate Bronstad, Web Developer, Tisch Library, Tufts University
Kate is a librarian-turned-web developer for Tufts University’s Tisch Library. She works with git on a daily basis and teaches classes on git for the Boston chapter of Girl Develop It. Kate is originally from Austin, TX and has a MSIS from UT-Austin.

Heather J Klish, Systems Librarian, Tufts University
Heather is the Systems Librarian in University Library Technology at Tufts University. Heather has an MLS from Simmons College.


Junior Tidal, New York City College of Technology
Junior is the Multimedia and Web Services Librarian and Assistant Professor for the Ursula C. Schwerin Library at the New York City College of Technology, City University of New York. His research interests include mobile web development, usability, web metrics, and information architecture. He has published in the Journal of Web Librarianship, OCLC Systems & Services, Computers in Libraries, and code4Lib Journal. He has written a LITA guide entitled Usability and the Mobile Web published by ALA TechSource. Originally from Whitesburg, Kentucky, he has earned a MLS and a Master’s in Information Science from Indiana University.



  • LITA Member $235 (coupon code: LITA2015)
  • ALA Member $350
  • Non-Member $380


To register for any of these events, you can include them with your initial conference registration or add them later using the unique link in your email confirmation. If you don’t have your registration confirmation handy, you can request a copy by emailing You also have the option of registering for a preconference only. To receive the LITA member pricing during the registration process on the Personal Information page enter the discount promotional code: LITA2015

Register online for the ALA Annual Conference and add a LITA Preconference
Call ALA Registration at 1-800-974-3084
Onsite registration will also be accepted in San Francisco.

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4269 or Mark Beatty,

pinboard: NE Code4Lib- Eventbrite

planet code4lib - Thu, 2015-04-02 15:29
Registered! #NEC4L15 #code4lib


Subscribe to code4lib aggregator