You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 8 hours 53 min ago

LITA: 3D Printing Partnerships: Tales Of Collaboration, Prototyping, And Just Plain Panic

Fri, 2015-09-04 14:00


*Photo taken from Flickr w/Attribution CC License:

Many institutions have seen the rise of makerspaces within their libraries, but it’s still difficult to get a sense of how embedded they truly are within the academic fabric of their campuses and how they contribute to student learning. Libraries have undergone significant changes in the last five years, shifting from repositories to learning spaces, from places to experiences. It is within these new directions that the makerspace movement has risen to the forefront and begun to pave the way for truly transformative thinking and doing. Educause defines a makerspace as “a physical location where people gather to share resources and knowledge, work on projects, network, and build” (ELI 2013). These types of spaces are being embraced by the arts as well as the sciences and are quickly being adopted by the academic community because “much of the value of a makerspace lies in its informal character and its appeal to the spirit of invention” as students take control of their own learning (ELI 2013).

Nowhere is this spirit more alive than in entrepreneurship where creativity and innovation are the norm. The Oklahoma State University Library recently established a formal partnership with the School of Entrepreneurship to embed 3D printing into two pilot sections of its EEE 3023 course with the idea that if successful, all sections of this course would include a making component that could involve more advanced equipment down the road. Students in this class work in teams to develop an original product from idea, to design, to marketing. The library provides training on coordination of the design process, use of the equipment, and technical assistance for each team. In addition, this partnership includes outreach activities such as featuring the printers at entrepreneurship career fairs, startup weekends and poster pitch sessions. We have not yet started working with the classes, so much of this will likely change as we learn from our mistakes and apply what worked well to future iterations of this project.

This is all well and good, but how did we arrive at this stage of the process? The library first approached the School of Entrepreneurship with an idea for collaboration, but as we discovered, simply saying we wanted to partner would not be enough. We didn’t have a clear idea in mind, and the discussions ended without a concrete action plan. Fast forward to the summer, when the library was approached and asked about something that had been mentioned in the meeting-a makerspace. Were we interested in splitting the cost and pilot a project with a course? The answer was a resounding yes.

We quickly met several times to discuss exactly what we meant by “makerspace”, and we decided that 3D printing would be a good place to start. We drafted an outline that consisted of the equipment needed, which consisted of three Makerbot Replicator 5th generation printers and one larger Z18 along with the accompanying accessories and warranties. This information was gathered based on the collective experiences of the group along, with a few quick website searches to establish what other institutions were doing.

Next, we turned our attention to discussing the curriculum. While creating learning outcomes for making is certainly part of the equation, we had a very short time frame to get this done, so we opted for two sets of workshops for students with homework in between culminating in a certification to enable them to work on their product. The first workshop will walk them through using Blender to create an original design at a basic level, the second is designed to have them try out the printers themselves. In between workshops, they will watch videos and have access to a book to help them learn as they go. The certification at the end will consist of each team coming in and printing something (small) on their own after which they will be cleared to work on their own products. Drop-in assistance as well as consultation assistance will also be available, and we are determining the best way to queue requests as they come in knowing that we might have jobs printing over night, while others may come in at the very last minute.

Although as mentioned, we have just started on this project, we’ve learned several valuable lessons already that are worth sharing-they may sound obvious, but are still important to highlight:

  1. Be flexible! Nothing spells disaster like a rigid plan that cannot be changed at the last minute. We wanted a website for the project, we didn’t have time to create one. We had to wait until we received the printers to train ourselves on how they worked so that we can turn around and train the students. We are adapting as we go!
  2. Start small. Even two sections are proving to be a challenge with 40+ students all descending on a small space with limited printers. We hope they won’t come to blows, but we may have to play referee as much as consultant. There are well over 30 sections of this course that will present a much bigger challenge should we decide to incorporate this model into all of them.
  3. Have a plan in place, even if you end up changing it. We are now realizing that there are three main components to this collaboration all of which need a point person and support structure: tech support, curriculum, and outreach. There are 4 separate departments in the library (Research and Learning Services, Access Services, Communications, and IT) who are working together to make this a successful experience for all involved, not to mention our external partners.

Oh yes, and there’s the nagging thought at the end of each day-please, please, let this work. Fingers crossed!

Hydra Project: ActiveFedora 9.4.0 released

Fri, 2015-09-04 09:38

We are pleased to announce the release of ActiveFedora 9.4.0.

This release adds hash URIs for sub resources, stops using InboundRelationConnection for speed, and refactors some existing code.

Release notes can be found here:

SearchHub: Using Thoth as a Real-Time Solr Monitor and Search Analysis Engine

Fri, 2015-09-04 08:00
As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Mhatre Braga and Praneet Damiano’s session on how Trulia uses Thoth and Solr for real-time monitoring and analysis. Managing a large and diversified Solr search infrastructure can be challenging and there is still a lack of good tools that can help monitor the entire system and help the scaling process. This session will cover Thoth: an open source real-time Solr monitor and search analysis engine that we wrote and currently use at Trulia. We will talk about how Thoth was designed, why we chose Solr to analyze Solr and the challenges that we encountered while building and scaling the system. Then, we will talk about some Thoth useful features like integration with Apache ActiveMQ and Nagios for real-time paging, generation of reports on query volume, latency, time period comparisons and the Thoth dashboard. Following that, we will summarize our application of machine learning algorithms and its results to the process of query analysis and pattern recognition. Then we will talk about the future directions of Thoth, opportunities to expand the project with new plug-ins and integration with Solr Cloud. Damiano is part of the search team at Trulia where he also helps managing the search infrastructure and creating internal tools to help the scaling process. Prior to Trulia, he studied and worked for the University of Ferrara (Italy) where he completed his Master Degree in Computer science Engineering. Praneet works as a Data Mining Engineer on Trulia’s Algorithms team. He works on property data handling algorithms, stats and trends generation, comparable homes and other data driver projects at Trulia. Before Trulia, he got his Bachelors degree in Computer Engineering from VJTI, India and his Masters in Computer Science from the University of California, Irvine. Thoth – Real-time Solr Monitor and Search Analysis Engine: Presented by Damiano Braga & Praneet Mhatre, Trulia from Lucidworks Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Using Thoth as a Real-Time Solr Monitor and Search Analysis Engine appeared first on Lucidworks.

SearchHub: Lucene Revolution Presents, Inside Austin(‘s) City Limits: Stump The Chump!

Fri, 2015-09-04 00:35

It’s that time of year again folks…

Six weeks from today, Stump The Chump will be coming to Austin Texas at Lucene/Solr Revolution 2015.

If you are not familiar with “Stump the Chump” it’s a Q&A style session where “The Chump” (that’s me) is put on the spot with tough, challenging, unusual questions about Solr & Lucene — live, on stage, in front of hundreds of rowdy convention goers, with judges (who have all had a chance to review and think about the questions in advance) taking the opportunity to mock The Chump (still me) and award prizes to people whose questions do the best job of “Stumping The Chump”.

If that sounds kind of insane, it’s because it kind of is.

You can see for yourself by checking out the videos from past events like Lucene/Solr Revolution Dublin 2013 and Lucene/Solr Revolution 2013 in San Diego, CA. (Unfortunately no video of Stump The Chump is available from Lucene/Solr Revolution 2014: D.C. due to audio problems.)

Information on how to submit questions is available on the conference website.

I’ll be posting more details as we get closer to the conference, but until then you can subscribe to this blog (or just the “Chump” tag) to stay informed.

The post Lucene Revolution Presents, Inside Austin(‘s) City Limits: Stump The Chump! appeared first on Lucidworks.

Jonathan Rochkind: bento_search 1.4 released

Thu, 2015-09-03 19:36

bento_search is a ruby gem that provides standardized ruby API and other support for querying external search engines with HTTP API’s, retrieving results, and displaying them in Rails. It’s focused on search engines that return scholarly articles or citations.

I just released version 1.4.

The main new feature is a round-trippable JSON serialization of any BentoSearch::Results or Items. This serialization captures internal state, suitable for a round-trip, such that if you’ve changed configuration related to an engine between dump and load, you get the new configuration after load.  It’s main use case is a consumer that is also ruby software using bento_search. It is not really suitable for use as an API for external clients, since it doesn’t capture full semantics, but just internal state sufficient to restore to a ruby object with full semantics. (bento_search does already provide a tool that supports an Atom serialization intended for external client API use).

It’s interesting that once you start getting into serialization, you realize there’s no one true serialization, it depends on the use cases of the serialization. I needed a serialization that really was just of internal state, for a round trip back to ruby.

bento_search 1.4 also includes some improvements to make the specialty JournalTocsForJournal adapter a bit more robust. I am working on an implementation of JournalTocs featching that needed the JSON round-trippable serialization too, for an Umlaut plug-in. Stay tuned.

Filed under: General

Harvard Library Innovation Lab: Link roundup September 3, 2015

Thu, 2015-09-03 19:10

Goodbye summer

You can now buy Star Wars’ adorable BB-8 droid and let it patrol your home | The Verge

If only overdue fines could be put toward a BB-8 to cruise around every library.

World Airports Voronoi

I want a World Airports Library map.

Stephen Colbert on Making The Late Show His Own | GQ

Amazing, deep interview with Stephen Colbert

See What Happens When Competing Brands Swap Colors | Mental Floss

See competing brands swap logo colors

The Website MLB Couldn’t Buy

Major League Baseball’s worked hard to buy team domains. They don’t own though. It’s owned by two humans.

Zotero: Studying the Altmetrics of Zotero Data

Thu, 2015-09-03 18:26

In April of last year, we announced a partnership with the University of Montreal and Indiana University, funded by a grant from the Alfred P. Sloan Foundation, to examine the readership of reference sources across a range of platforms and to expand the Zotero API to enable bibliometric research on Zotero data.

The first part of this grant involved aggregating anonymized data from Zotero libraries. The initial dataset was limited to items with DOIs, and it included library counts and the months that items were added. For items in public libraries, the data also included titles, creators, and years, as well as links to the public libraries containing the items. We have been analyzing this anonymized, aggregated data with our research partners in Montreal, and now are beginning the process of making that data freely and publicly available, beginning with Impactstory and Altmetric, who have offered to conduct preliminary analysis (we’ll discuss Impactstory’s experience in a future post).

In our correspondence with Altmetric over the years, they have repeatedly shown interest in Zotero data, and we reached out to them to see if they would partner with us in examining the data. The Altmetric team that analyzed the data consists of about twenty people with backgrounds in English literature and computer science, including former researchers and librarians. Altmetric is interested in any communication that involves the use or spread of research outputs, so in addition to analyzing the initial dataset, they’re eager to add the upcoming API to their workflow.

The Altmetric team parsed the aggregated data and checked it against the set of documents known to have been mentioned or saved elsewhere, such as on blogs and social media. Their analysis revealed that approximately 60% of the items in their database that had been mentioned in at least one other place, such as on social media or news sites, had at least one save in Zotero. The Altmetric team was pleased to find such high coverage, which points to the diversity of Zotero usage, though further research will be needed to determine the distribution of items across disciplines.

The next step forward for the Altmetric team involves applying the data to other projects and tools such as the Altmetric bookmarklet. The data will be useful in understanding the impact of scholarly communication, because conjectures about reference manager data can be confirmed or denied, and this information can be studied in order to gain a greater comprehension of what such data represents and the best ways to interpret it.

Based on this initial collaboration, Zotero developers are verifying and refining the aggregation process in preparation for the release of a public API and dataset of anonymized, aggregated data, which will allow bibliometric data to be highlighted across the Zotero ecosystem and enable other researchers to study the readership of Zotero data.

Thom Hickey: Matching names to VIAF

Thu, 2015-09-03 18:11

The Virtual International Authority File (VIAF) currently has about 28 million entities created by a merge of three dozen authority files from around the world.  Here at OCLC we are finding it very useful in controlling names in records.  In the linked data world we are beginning to experience 'controlling' means assigning URIs (or at least identifiers that can easily be converted to URIs) to the entities.  Because of ambiguities in VIAF and the bibliographic records we are matching it to, the process is a bit more complicated than you might imagine. In fact, our first naive attempts at matching were barely usable.  Since we know others are attempting to match VIAF to their files, we thought a description of how we go about it would be welcome (of course if your file consists of bibliographic records and they are already in WorldCat, then we've already done the matching).  While a number of people have been involved in refining this process, most of the analysis and code was done by Jenny Toves here in OCLC Research over the last few years.

First some numbers: The 28 million entities in VIAF were derived from 53 million source records and 111 million bibliographic records. Although we do matching to other entities in VIAF, this post is about matching against VIAF's 24 million corporate and personal entities.  The file we are matching it to (WorldCat) consists of about 400 million bibliographic records (at least nominally in MARC-21), each of which have been assigned a work identifier before the matching described below. Of the 430 million names in author/contributor (1XX/7XX) fields in WorldCat we are able to match 356 million (or 83%).  If those headings were weighted by how many holdings are associated with them, the percentage controlled would be even higher, as names in the more popular records are more likely to have been subjected to authority control somewhere in the world.

It is important to understand the issues raised when pulling together the source files that VIAF is based on.  While we claim that better than 99% of the 54 million links that VIAF makes between source records are correct, that does not mean that the resulting clusters are 99% perfect.  In fact many of the more common entities represented in VIAF will have not only the a 'main' VIAF cluster, but one or more smaller clusters derived from authority records that we were unable to bring into the main cluster because of missing, duplicated or ambiguous information.  Another thing to keep in mind is that any relatively common name that has one or more famous people associated with it can be expected to have some misattributed titles (this is true for even the most carefully curated authority files of any size).

WorldCat has many headings with subfield 0's ($0s) that associate an identifier with the heading. This is very common in records loaded into WorldCat by some national libraries, such as French and German, so one of the first things we do in our matching is look for identifiers in $0's which can be mapped to VIAF.  When those mappings are unambiguous we use that VIAF identifier and are done.

The rest of this post is a description of what we do with the names that do not already have a usable identifier associated with them.  The main difficulties arise when there either are multiple VIAF clusters that look like good matches or we lack enough information to make a good match (e.g. no title or date match).  Since a poor link is often worse than no link at all, we do not make a link unless we are reasonably confident of it.

First we extract information about each name of interest in each of the bibliographic records:

  • Normalized name key:
    • Extract subfields a,q and j
    • Expand $a with $q when appropriate
    • Perform enhanced NACO normalization on the name
  • $b, $c's, $d, $0's, LCCNs, DDC class numbers, titles, language of cataloging, work identifier

The normalized name key does not include the dates ($d) because they are often not included in the headings in bibliographic records. The $b and $c are so variable, especially across languages, that they also ignored at this point.  The goal is to have a key which will bring together variant forms of the name without pulling in too many different entities together. After preliminary matching we do matching with more precision and $b, $c and $d are used for that.

Similar normalized name keys are generated from the names in VIAF clusters.

When evaluating matches we have a routine that scores the match based on criteria about the names:

  • Start out with '0'
    • A negative value implies the names do not match
    • A 0 implies the names are compatible (nothing to indicate they can't represent the same entity), but nothing beyond that
    • Increasing positive values imply increasing confidence in the match
  • -1 if dates conflict*
  • +1 if a begin or end date matches
  • +1 if both begin and end dates match
  • +1 if begin and end dates are birth and death dates (as opposed to circa or flourished)
  • +1 if there is at least one title match
  • +1 if there is at least one LCCN match
  • -3 if $b's do not match
  • +1 if $c's match
  • +1 if DDCs match
  • +1 if the match is against a preferred form

Here are the stages we go through.  At each stage proceed to the next if the criteria are not met:

  • If only one VIAF cluster has the normalized name from the bibliographic record, use that VIAF identifier
  • Collapse bibliographic information based on the associated work identifiers so that they can share name dates, $b and $c, LCCN, DDC
    • Try to detect fathers/sons in same bibliographic record so that we don’t link them to the same VIAF cluster
  • If a single best VIAF cluster (better than all others) exists – use it
    • Uses dates, $b, $c, titles, preferred form of name to determine best match as described above
  • Try the previous rule again adding LCC and DDC class numbers in addition to the other match points (as matches were made in the previous step, data was collected to make this easier)
    • If there is a single best candidate, use it
    • If more than one best candidate – sort candidate clusters based on the number of source records in the clusters. If there is one cluster that has 5 or more sources and the next largest cluster has 2 or less sources, use the larger cluster
  • Consider clusters where the names are compatible, but not exact name matches
    • Candidate clusters include those where dates and/or enumeration do not exist either in the bibliographic record or the cluster
    • Select the cluster based on the number of sources as described above
  • If only one cluster has an LC authority record in it, use that one
  • No link is made

Fuzzy Title Matching

Since this process is mainly about matching names, and titles are used only to resolve ambiguity, the process described here depends on a separate title matching process.  As part of OCLC’s FRBR matching (which happens after the name matching described here) we pull bibliographic records into work clusters, and each bibliographic record in WorldCat has a work identifier associated with it based on these clusters.  Once we can associate a work identifier with a VIAF identifier, that work identifier can be used to pull in otherwise ambiguous missed matches on a name.  Here is a simple example:

Record 1:

    Author: Smith, John

    Title: Title with work ID #1

Record 2:

    Author: Smith, John

    Title: Another title with work ID #1

Record 3:

    Author: Smith, John

    Title: Title with work ID #2

In this case, if we were able to associate the John Smith in record #1 to a VIAF identifier, we could also assign the same VIAF identifier to the John Smith in record #2 (even though we do not have a direct match on title), but not to the author of record #3. This lets us use all the variant titles we have associated with a work to help sort out the author/contributor names.

Of course this is not perfect.  There could be two different John Smith’s associated with a work (e.g. father and son), so occasionally titles (even those that appear to be properly grouped in a work) can lead us astray.

That's a sketch of how the name matching process operates.  Currently WorldCat is updated with this information once per month and it is visible in the various linked data views of WorldCat.

--Th & JT

*If you want to understand more about how dates are processed, our code{4}lib article about Parsing and Matching Dates in VIAF describes that in detail.

Library of Congress: The Signal: Seeking Comment on Migration Checklist

Thu, 2015-09-03 15:39

The NDSA Infrastructure Working Group’s goals are to identify and share emerging practices around the development and maintenance of tools and systems for the curation, preservation, storage, hosting, migration, and similar activities supporting the long term preservation of digital content. One of the ways the IWG strives to achieve their goals is to collaboratively develop and publish technical guidance documents about core digital preservation activities. The NDSA Levels of Digital Preservation and the Fixity document are examples of this.

Birds. Ducks in pen. (Photo by Theodor Horydczak, 1920) (Source: Horydczak Collection Library of Congress Prints and Photographs Division,

The latest addition to this guidance is a migration checklist. The IWG would like to share a draft of the checklist with the larger community in order to gather comments and feedback that will ultimately make this a better and more useful document. We expect to formally publish a version of this checklist later in the Fall, so please review the draft below and let us know by October 15, 2015 in the comments below or in email via ndsa at loc dot gov if you have anything to add that will improve the checklist.

Thanks, in advance, from your IWG co-chairs Sibyl Schaefer from University of California, San Diego, Nick Krabbenhoeft from Educopia and Abbey Potter from Library of Congress. Another thank you to the former IWG co-chairs Trevor Owens from IMLS and Karen Cariani from WGBH who lead the work to initially develop this checklist.

Good Migrations: A Checklist for Moving from One Digital Preservation Stack to Another

The goal of this document is to provide a checklist for things you will want to do or think through before and after moving digital materials and metadata forward to new digital preservation systems/infrastructures. This could entail switching from one system to another system in your digital preservation and storage architecture (various layers of hardware, software, databases, etc.). This is a relatively expansive notion of system. In some cases, organizations have adopted turn-key solutions whereby the requirements for ensuring long term access to digital objects are taken care of by a single system or application. However, in many cases, organizations make use of a range of built and bought applications and core functions of interfaces to storage media that collectively serve the function of a preservation system. This document is intended to be useful for migrations between either comprehensive systems as well as situations where one is swapping out individual components in a larger preservation system architecture.

Issues around normalization of data or of moving content or metadata from one format to another are out of scope for this document. This document is strictly focused on checking through issues related to moving fixed digital materials and metadata forward to new systems/infrastructures.

Before you Move:

  1. Review the state of data in the current system, clean up any data inconsistencies or issues that are likely to create problems on migration and identify and document key information (database naming conventions, nuances and idiosyncrasies in system/data structures, use metrics, etc.).
  2. Make sure you have fixity information for your objects and make sure you have a plan for how to bring that fixity information over into your new system. Note, that different systems may use different algorithms/instruments for documenting fixity information so check to make sure you are comparing the same kinds of outputs.
  3. Make sure you know where all your metadata/records for your objects are stored and that if you are moving that information that you have plans to ensure it’s integrity in place.
  4. Check/validate additional copies of your content stored in other systems, you may need to rely on some of those copies for repair if you run into migration issues.
  5. Identify any dependent systems using API calls into your system or other interfaces which will need to be updated and make plans to update, retire, or otherwise notify users of changes.
  6. Document feature parity and differences between the new and old system and make plans to change/revise and refine workflows and processes.
  7. Develop new documentation and/or training for users to transition from the old to the new system.
  8. Notify users of the date and time the system will be down and not accepting new records or objects. If the process will take some time, provide users with a plan for expectations on what level of service will be provided at what point and take the necessary steps to protect the data you are moving forward during that downtime.
  9. Have a place/plan on where to put items that need ingestion while doing the migration.  You may not be able to tell people to just stop and wait.
  10. Decide on what to do with your old storage media/systems. You might want to keep them for a period just in case, reuse them for some other purpose or destroy them. In any event it should be a deliberate, documented decision.
  11. Create documentation recording what you did and how you approached the migration (any issues, failures, or issues that arose) to provide provenance information about the migration of the materials.
  12. Test migration workflow to make sure it works – both single records and bulk batches of varying sizes to see if there are any issues.

After you Migrate

  1. Check your fixity information to ensure that your new system has all your objects intact.
  2. If any objects did not come across correctly, as identified by comparing fixity values, then repair or replace the objects via copies in other systems. Ideally, log this kind of information as events for your records.
  3. Check to make sure all your metadata has come across, spot check to make sure it hasn’t been mangled.
  4. Notify your users of the change and again provide them with new or revised user documentation.
  5. Record what is done with the old storage media/systems after migration.
  6. Assemble all documentation generated and keep with other system information for future migrations.
  7. Establish timeline and process for reevaluating when future migrations should be planned for (if relevant).

Relevant resources and tools:

This post was updated 9/3/2015 to fix formatting and add email information.

District Dispatch: Last chance to support libraries at SXSW

Thu, 2015-09-03 14:34

From Flickr

A couple of weeks ago, the ALA Washington Office urged support for library programs at South by Southwest (SXSW). The library community’s footprint at this annual set of conferences and activities has expanded in recent years, and we must keep this trend going! Now is your last chance to do your part, as public voting on panel proposals will end at 11:59 pm (CDT) this Friday, September 4th [Update: Now Monday, September 7th]. SXSW received more than 4,000 submissions this year—an all-time record—so we need your help more than ever to make library community submissions stand out. You can read about, comment on, and vote for, the full slate of proposed panels involving the Washington Office here.

Also, the SXSW library “team” that connects through the lib*interactive Facebook group and #liblove has compiled a list of library programs that have been proposed for all four SXSW gatherings. Please show your support for all of them. Thanks!

The post Last chance to support libraries at SXSW appeared first on District Dispatch.

LITA: Get Involved in the National Digital Platform for Libraries

Thu, 2015-09-03 13:00

Editor’s note: This is a guest post by Emily Reynolds and Trevor Owens.

Recently IMLS has increased its focus on funding digital library projects through the lens of our National Digital Platform strategic priority area. The National Digital Platform is the combination of software applications, social and technical infrastructure, and staff expertise that provides library content and services to all users in the U.S… in other words, it’s the work many LITA members are already doing!

Participants at IMLS Focus: The National Digital Platform

As libraries increasingly use digital infrastructure to provide access to digital content and resources, there are more and more opportunities for collaboration around the tools and services that they use to meet their users’ needs. It is possible for each library in the country to leverage and benefit from the work of other libraries in shared digital services, systems, and infrastructure. We’re looking at ways to maximize the impact of our funds by encouraging collaboration, interoperability, and staff training. We are excited to have this chance to engage with and invite participation from the librarians involved in LITA in helping to develop and sustain this national digital platform for libraries.

National Digital Platform convening report

Earlier this year, IMLS held a meeting at the DC Public Library to convene stakeholders from across the country to identify opportunities and gaps in existing digital library infrastructure nationwide. Recordings of those sessions are now available online, as is a summary report published by OCLC Research. Key themes include:


Engaging, Mobilizing and Connecting Communities

  • Engaging users in national digital platform projects through crowdsourcing and other approaches
  • Establishing radical and systematic collaborations across sectors of the library, archives, and museum communities, as well as with other allied institutions
  • Championing diversity and inclusion by ensuring that the national digital platform serves and represents a wide range of communities

Establishing and Refining Tools and Infrastructure

  • Leveraging linked open data to connect content across institutions and amplify impact
  • Focusing on documentation and system interoperability across digital library software projects
  • Researching and developing tools and services that leverage computational methods to increase accessibility and scale practice across individual projects

Cultivating the Digital Library Workforce

  • Shifting to continuous professional learning as part of library professional practice
  • Focusing on hands-on training to develop computational literacy in formal library education programs
  • Educating librarians and archivists to meet the emerging digital needs of libraries and archives, including cross-training in technical and other skills

We’re looking to support these areas of work with the IMLS grant programs available to library applicants.

IMLS Funding Opportunities

IMLS has three major competitive grant programs for libraries, and we encourage the submission of proposals related to the National Digital Platform priority to all three. Those programs are:

  • National Leadership Grants for Libraries (NLG): The NLG program is specifically focused on supporting our two strategic priorities, the National Digital Platform and Learning in Libraries. The most competitive proposals will advance some area of library practice on a national scale, with new tools, research findings, alliances, or similar outcomes. The NLG program makes awards up to $2,000,000, with funds available for both project and planning grants.
  • Laura Bush 21st Century Librarian Program (LB21): The LB21 program supports professional development, graduate education and continuing education for librarians and archivists. The LB21 program makes awards up to $500,000, and like NLG supports planning as well as project grants.
  • Sparks! Ignition Grants for Libraries: Sparks! grants support the development, testing, and evaluation of promising new tools, products, services, and practices. They often balance broad potential impact with an element of risk or innovation. The Sparks! program makes awards up to $25,000.

These programs can fund a wide range of activities. NLG and LB21 grants support projects, research, planning, and national forums (where grantees can hold meetings to gather stakeholders around a particular topic). The LB21 program also has a specific category for supporting early career LIS faculty research.

Application Process and Deadlines

Over the past year, IMLS piloted an exciting new model for our grant application process, which this year will be in place for both the NLG and LB21 programs. Rather than requiring a full application from every applicant, only a two-page preliminary proposal is due at the deadline. After a first round of peer review, a small subset of applicants will be invited to submit full proposals, and will have the benefit of the peer reviewers’ comments to assist in constructing the proposal. The full proposals will be reviewed by a second panel of peer reviewers before funding decisions are made. The Sparks! program goes through a single round of peer review, and requires the submission of a full proposal from all applicants.

The LB21 and NLG programs will both have a preliminary proposal application deadline on October 1, 2015, as well as an additional application deadline in February, 2016.

Are you considering applying for an IMLS grant for your digital library project? Do you want to discuss which program might be the best fit for your proposal? We’re always happy to chat, and love hearing your project ideas, so please email us at (Emily) and (Trevor).

SearchHub: How Bloomberg Executes Search Analytics with Apache Solr

Thu, 2015-09-03 08:00
As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Steven Bower’s session on how Bloomberg uses Solr for search analytics. Search at Bloomberg is not just about text, it’s about numbers, lots of numbers. In order for our clients to research, measure and drive decisions from those numbers we must provide flexible, accurate and timely analytics tools. We decided to build these tools using Solr, as Solr provides the indexing performance, filtering and faceting capabilities needed to achieve the flexibility and timeliness required by the tools. To perform the analytics required we developed an Analytics component for Solr. This talk will cover the Analytics Component that we built at Bloomberg, some use cases that drove it and then dive into features/functionality it provides. Steven Bower has worked for 15 years in the web/enterprise search industry. First as part of the R&D and Services teams at FAST Search and Transfer, Inc. and then as a principal engineer at Attivio, Inc. He has participated/lead the delivery of hundreds of search applications and now leads the search infrastructure team at Bloomberg LP, providing a search as a service platform for 80+ applications. Search Analytics Component: Presented by Steven Bower, Bloomberg L.P. from Lucidworks Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post How Bloomberg Executes Search Analytics with Apache Solr appeared first on Lucidworks.

William Denton: Access testimonial

Thu, 2015-09-03 02:05

I submitted a testimonial about the annual Access conference about libraries and technology:

The first time I went to Access was 2006 in Ottawa. I was one year out of library school. I was unemployed. I paid my own way. I didn’t want to miss it. Everyone I admired in the library technology world was going to be there. They were excited about it, and said how much they loved the conference every year. When I got there, the first morning, I thought, “These are my people.” I left admiring a lot of new acquaintances. Every year I go, I feel the same way and the same thing happens.

All true, every word. (That conference was where Dan Chudnov and I chatted over a glass of wine, which made it all the better.)

Here’s more reason why I like Access: the 2015 conference is in Toronto next week, and I’m running a hackfest about turning data into music. This is what my proposal looked like:

I emailed them a JPEG. They accepted the proposal. That’s my kind of conference.

I also have to mention the talk Adam Taves and I did in Winnipeg at Access 2010: After Launching Search and Discovery, Who Is Mission Control?. “A Tragicomedy in 8 or 9 Acts.” It’s a rare conference where you can mix systems librarianship with performance art.

But of all the write-ups I’ve done of anything Access-related, I think 2009’s DIG: Hackfest, done about Cory Doctorow when I was channelling James Ellroy, is the best:

Signs: “Hackfest.” We follow. People point. We get to the room. There are people there. They have laptops. They sit. They mill around. They stand. They talk. “Haven’t seen you since last year. How’ve you been?”

Vibe: GEEK.

Cory Doctorow giving a talk. No talks at Hackfest before. People uncertain. What’s going on? Cory sitting in chair. Cory working on laptop. Cory gets up. Paper with notes scribbled on it. He talks about copyright. He talks about freedom. He talks about how copyright law could affect US.

He vibes geek. He vibes cool.

DuraSpace News: DuraSpace at Innovatics—The Fifth International Congress on Technological Innovation

Thu, 2015-09-03 00:00

A Chilean academic library community comes together to share information and best practices.

Terry Reese: MarcEdit: Build New Field Tool

Wed, 2015-09-02 22:13

I’m not sure how I’ve missed creating something like this for so long, but it took a question from a cataloger to help me crystalize the need for a new feature.  Here was the question:

Add an 856 url to all records that uses the OCLC number from the 035 field, the title from 245 $a, and the ISSN, if present. This will populate an ILLiad form with the publication title, ISSN, call number (same for all records), and OCLC number. Although I haven’t worked it in yet, the link in our catalog will also include instructions to “click for document delivery form” or something like that.

In essence, the user was looking to generate a link within some records – however, the link would need to be made up of data pulled from different parts of the MARC record.  It’s a question that comes up all the time, and in many cases, the answer I generally give points users to the Swap Field Function – a tool designed around moving data between fields.  For fields that are to be assembled from data in multiple fields – multiple swap field operations would need to be run.  The difference here was how the data from the various MARC fields listed above, needed to be presented.  The swap field tool moves data from one subfield to another – where as this user was looking to pull data from various fields and reassemble that data using a specific data pattern.  And in thinking about how I would answer this question – it kind of clicked – we need a new tool.

Build New Field:

The build new field tool is the newest global editing tool being added to the MarcEditor tool kit.  The tool will be available in the Tools menu:

And will be supported in the Task Automation list.  The tool is designed around this notion of data patterns – the idea that rather than moving data between fields – some field data needs to be created via a specific set of data patterns.  The example provided by the user asking this question as:

  •[Title from 245$a]&&ISSN=[ISSN from 022$a]&CallNumber=[CallNumber from 099$a]&ESPNumber=[oclc number from the 035]

While the swap field could move all this data around, the tool isn’t designed to do this level of data integration when generating a new field.  In fact, none of MarcEdit’s present global editing tasks are configured for this work.  To address this gap, I’ve introduced the Build New Field tool:

The Build New Field tool utilizes data patterns to construct a new MARC field.  Using the example above, a user could create a new 856 by utilizing the following pattern:

=856  41$u{245$a}&ISSN={022$a}&CallNumber={099$a}&ESPNumber={035$a(OcLC)}

Do you see the pattern?  This tool allows users to construct their field, replacing the variable data to be extracted from their MARC records using the mnemonic structure: {field$subfield}.  Additionally, in the ESPNumber tag, you can see that in addition to the field and subfield, qualifying information was also included.  The tool allows users to provide this information, which is particularly useful when utilizing fields like the 035 to extract control numbers.

Finally, the new tool provides two additional options.  For items like proxy development, data extracted from the MARC record will need to be URL encoded.  By checking the “Escape data from URL” option, all MARC data extracted and utilized within the data pattern will be URL encoded.  Leaving this item unchecked will allow the tool to capture the data as presented within the record. 

The second option, “Replace Existing Field or Add New one if not Present” tells the tool what to do if the field exists.  If left unchecked, the tool will create a new field if this option is not selected (were the field defined in the pattern exists or not).  If you check this option, the the tool will replace any existing field data or create a new field if one doesn’t exist, for the field defined by your pattern.

Does that make sense?  This function will be part of the next MarcEdit release, so if you have questions or comments, let me know.


Nicole Engard: Bookmarks for September 2, 2015

Wed, 2015-09-02 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Thimble by Mozilla Thimble is an online code editor that makes it easy to create and publish your own web pages while learning HTML, CSS & JavaScript.
  • Google Coder a simple way to make web stuff on Raspberry Pi

Digest powered by RSS Digest

The post Bookmarks for September 2, 2015 appeared first on What I Learned Today....

Related posts:

  1. Coding Potpourri at AALL
  2. WordPress
  3. Why open source is good for … Everyone!

District Dispatch: ALA urges FCC to include internet in the Lifeline program

Wed, 2015-09-02 20:01

FCC Building in Washington, D.C.

This week the American Library Association (ALA) submitted comments with the Federal Communications Commission in its Lifeline modernization proceeding. As it has done with its other universal service programs, including most recently with the E-rate program, the Commission sought input from a wide variety of stakeholders on how best to transition a 20th century program to one that meets the 21st century needs of, in this case, low-income consumers.

Lifeline was established in 1985 to help make phone service more affordable for low-income consumers and has received little attention as to today’s most pressing communication need: access to broadband. ALA’s comments wholeheartedly agree with the Commission that broadband is no longer a “nice-to-have,” but a necessity to fully participate in civic society. We are clearly on record with the Commission describing the myriad of library services (which may be characterized by The E’s of Libraries®) that are not only dependent themselves on access to broadband, but that provide patrons with access to the wealth of digital resources so that libraries may indeed transform communities. We well understand the urgency of making sure everyone, regardless of geographic location or economic circumstances, has access to broadband and the internet as well as the ability to use it.

In addition to making broadband an eligible service in the Lifeline program, the Commission asks questions related to addressing the “homework gap” which refers to those families with school-age children who do not have home internet thus leaving these kids with extra challenges to school success. Other areas the Commission is investigating include whether the program should adopt minimum standards of service (for telephone and internet); if it should be capped at a specific funding level; and how to encourage more service providers to participate in the program.

Our Lifeline comments reiterate the important role libraries have in connecting (and transforming) communities across the country and call on the Commission to:

  • Address the homework gap as well as similar hurdles for vulnerable populations, including people with disabilities;
  • Consider service standards that are reasonably comparable to the consumer marketplace, are regularly evaluated and updated, and to the extent possible fashioned to anticipate trends in technology;
  • Allow libraries that provide WiFi devices to Lifeline-eligible patrons be eligible for financial support for the connectivity of those devices; and
  • Address the affordability barrier to broadband access through the Lifeline program, but continue to identify ways it can also promote broadband adoption.

We also reiterate the principles (pdf) outlined by The Leadership Conference on Civil and Human Rights and supported by ALA that call for universality of service for eligible households, program excellence, choice and competition, innovation, and efficiency, transparency and accountability.

Now that the comments are filed, we will mine the public comment system to read through other stakeholder comments and consult with other national groups in preparing reply comments (we get an opportunity to respond to other commenters as well as add details or more information on our own proposals). Reply comments are due to the Commission September 30. So as always with the Commission, there is more to come, which includes in-person meetings if warranted. Also, as always, many thanks to the librarians in the field and those who are also members of ALA committees who provided input and advice.

The post ALA urges FCC to include internet in the Lifeline program appeared first on District Dispatch.

Coral Sheldon-Hess: Libraries’ tech pipeline problem

Wed, 2015-09-02 17:17

“We’ve got a pipeline problem, so let’s build a better pipeline.” –Bess Sadler, Code4Lib 2014 Conference (the link goes to the video)

I’ve been thinking hard (for two years, judging by the draft date on this post) about how to grow as a programmer, when one is also a librarian. I’m talking not so much about teaching/learning the basics of coding, which is something a lot of people are working really hard on, but more about getting from “OK, I finished yet another Python/Rails/JavaScript/whatever workshop” or “OK, I’ve been through all of Code Academy/edX/whatever”—or from where I am, “OK, I can Do Interesting Things™ with code, but there are huge gaps in my tech knowledge and vocabulary”—to the point where one could get a full-time librarian-coder position.

I should add, right here: I’m no longer trying to get a librarian-coder position*. This post isn’t about me, although it is, of course, from my perspective and informed by my experiences. This post is about a field I love, which is currently shooting itself in the foot, which frustrates me.

Bess is right: libraries need 1) more developers and 2) more diversity among them. Libraries are hamstrung by expensive, insufficient vendor “solutions.” (I’m not hating on the vendors, here; libraries’ problems are complex, and fragmentation and a number of other issues make it difficult for vendors to provide really good solutions.) Libraries and librarians could be so much more effective if we had good software, with interoperable APIs, designed specifically to fill modern libraries’ needs.

Please, don’t get me wrong: I know some libraries are working on this. But they’re too few, and their developers’ demographics do not represent the demographics of libraries at large, let alone our patron bases. I argue that the dearth and the demographic skew will continue and probably worsen, unless we make a radical change to our hiring practices and training options for technical talent.

Building technical skills among librarians

The biggest issue I see is that we offer a fair number of very basic learn-to-code workshops, but we don’t offer a realistic path from there to writing code as a job. To put a finer point on it, we do not offer “junior developer” positions in libraries; we write job ads asking for unicorns, with expert- or near-expert-level skills in at least two areas (I’ve seen ones that wanted strong skills in development, user experience, and devops, for instance).

This is unfortunate, because developing real fluency with any skill, including coding, requires practicing it regularly. In the case of software development, there are things you can really only learn on the job, working with other developers (ask me about Git, sometime); only, nobody seems willing to hire for that. And, yes, I understand that there are lots of single-person teams in libraries—far more than there should be—but many open source software projects can fill in a lot of that group learning and mentoring experience, if a lone developer is allowed to participate in them on work time. (OSS is how I am planning to fill in those skills, myself.)

From what I can tell, if you’re a librarian who wants to learn to code, you generally have two really bad options: 1) learn in your spare time, somehow; or 2) quit libraries and work somewhere else until your skills are built up. I’ve been down both of those roads, and as a result I no longer have “be a [paid] librarian-developer” on my goals list.

Option one: Learn in your spare time

This option is clown shoes. It isn’t sustainable for anybody, really, but it’s especially not sustainable for people in caretaker roles (e.g. single parents), people with certain disabilities (who have less energy and free time to start with), people who need to work more than one job, etc.—that is, people from marginalized groups. Frankly, it’s oppressive, and it’s absolutely a contributing factor to libtech’s largely male, white, middle to upper-middle class, able-bodied demographics—in contrast to the demographics of the field at large (which is also most of those things, but certainly not predominantly male).

“I’ve never bought this ‘do it in your spare time’ stuff. And it turns out that doing it in your spare time is terribly discriminatory, because … a prominent aspect of oppression is that you have more to do in less spare time.” – Valerie Aurora, during her keynote interview for Code4Lib 2014 (the link goes to the video)

“It’s become the norm in many technology shops to expect that people will take care of skills upgrading on their own time. But that’s just not a sustainable model. Even people who adore late night, just-for-fun hacking sessions during the legendary ‘larval phase’ of discovering software development can come to feel differently in a later part of their lives.” – Bess Sadler, same talk as above

I tried to make it work, in my last library job, by taking one day off every other week** to work on my development skills. I did make some headway—a lot, arguably—but one day every two weeks is not enough to build real fluency, just as fiddling around alone did not help me build the skills that a project with a team would have. Not only do most people not have the privilege of dropping to 90% of their work time, but even if you do, that’s not an effective route to learning enough!

And, here, you might think of the coding bootcamps (at more than $10k per) or the (free, but you have to live in NYC) Recurse Center (which sits on my bucket list, unvisited), but, again: most people can’t afford to take three months away from work, like that. And the Recurse Center isn’t so much a school (hence the name change away from “Hacker School”) as it is a place to get away from the pressures of daily life and just code; realistically, you have to be at a certain level to get in. My point, though, is that the people for whom these are realistic options tend to be among the least marginalized in other ways. So, I argue that they are not solutions and not something we should expect people to do.

Option two: go work in tech

If you can’t get the training you need within libraries or in your spare time, it kind of makes sense to go find a job with some tech company, work there for a few years, build up your skills, and then come back. I thought so, anyway. It turns out, this plan was clown shoes, too.

Every woman I’ve talked to who has taken this approach has had a terrible experience. (I also know of a few women who’ve tried this approach and haven’t reported back, at least to me. So my data is incomplete, here. Still, tech’s horror stories are numerous, so go with me here.) I have a theory that library vendors are a safer bet and may be open to hiring newer developers than libraries currently are, but I don’t have enough data (or anecdata) to back it up, so I’m going to talk about tech-tech.

Frankly, if we expect members of any marginalized group to go work in tech in order to build up the skills necessary for a librarian-developer job, we are throwing them to the wolves. In tech, even able-bodied straight cisgender middle class white women are a badly marginalized group, and heaven help you if you’re on any other axis of oppression.

And, sure, yeah. Not all tech. I’ll agree that there are non-terrible jobs for people from marginalized groups in tech, but you have to be skilled enough to get to be that choosy, which people in the scenario we’re discussing are not. I think my story is a pretty good illustration of how even a promising-looking tech job can still turn out horrible. (TLDR: I found a company that could talk about basic inclusivity and diversity in a knowledgeable way and seemed to want to build a healthy culture. It did not have a healthy culture.)

We just can’t outsource that skill-building period to non-library tech. It isn’t right. We stand to lose good people that way.

We need to develop our own techies—I’m talking code, here, because it’s what I know, but most of my argument expands to all of libtech and possibly even to library leadership—or continue offering our patrons sub-par software built within vendor silos and patched together by a small, privileged subset of our field. I don’t have to tell you what that looks like; we live with it, already.

What to do?

I’m going to focus on what you, as an individual organization, or leader within an organization, can do to help; I acknowledge that there are some systemic issues at play, beyond what my relatively small suggestions can reach, and I hope this post gets people talking and thinking about them (and not just to wave their hands and sigh and complain that “there isn’t enough money,” because doomsaying is boring and not helpful).

First of all, when you’re looking at adding to the tech talent in your organization, look within your organization. Is there a cataloger who knows some scripting and might want to learn more? (Ask around! Find out!) What about your web content manager, UX person, etc.? (Offer!) You’ll probably be tempted to look at men, first, because society has programmed us all in evil ways (seriously), so acknowledge that impulse and look harder. The same goes for race and disability and having the MLIS, which is too often a stand-in for socioeconomic class; actively resist those biases (and we all have those biases).

If you need tech talent and can’t grow it from within your organization, sit down and figure out what you really need, on day one, versus what might be nice to have, but could realistically wait. Don’t put a single nice-to-have on your requirements list, and don’t you dare lose sight of what is and isn’t necessary when evaluating candidates.

Recruit in diverse and non-traditional spaces for tech folks — dashing off an email to Code4Lib is not good enough (although, sure, do that too; they’re nice folks). LibTechWomen is an obvious choice, as are the Spectrum Scholars, but you might also look at the cataloging listservs or the UX listservs, just to name two options. Maybe see who tweets about #libtechgender and #critlib (and possibly #lismicroaggressions?), and invite those folks to apply and to share your linted job opening with their networks.

Don’t use whiteboard interviews! They are useless and unnecessarily intimidating! They screen for “confidence,” not technical ability. Pair-programming exercises, with actual taking turns and pairing, are a good alternative. Talking through scenarios is also a good alternative.

Don’t give candidates technology vocabulary tests. Not only is it nearly useless as an evaluation tool (and a little insulting); it actively discriminates against people without formal CS education (or, cough, people with CS minors from more than a decade ago). You want to know that they can approach a problem in an organized manner, not that they can define a term that’s easily Googled.

Do some reading about impostor syndrome, stereotype threat, and responsible tech hiring. Model View Culture’s a good place to start; here is their hiring issue.

(I have a whole slew of comments about hiring, and I’ll make those—and probably repeat the list above—in another post.)

Once you have someone in a position, or (better) you’re growing someone into a position, be sure to set reasonable expectations and deadlines. There will be some training time for any tech person; you want this, because something built with enough forethought and research will be better than something hurriedly duct-taped (figuratively, you hope) together.

Give people access to mentorship, in whatever form you can. If you can’t give them access to a team within your organization, give them dedicated time to contribute to relevant OSS projects. Send them to—just to name two really inclusive and helpful conferences/communities—Code4Lib (which has regional meetings, too) and/or Open Source Bridge.


So… that’s what I’ve got. What have I missed? What else should we be doing to help fix this gap?


* In truth, as excited as I am about starting my own business, I wouldn’t turn down an interview for a librarian-coder position local to Pittsburgh, but 1) it doesn’t feel like the wind is blowing that way, here, and 2) I’m in the midst of a whole slew of posts that may make me unemployable, anyway ;) (back to the text)

** To be fair, I did get to do some development on the clock, there. Unfortunately, because I wore so many hats, and other hats grew more quickly, it was not a large part of my work. Still, I got most of my PHP experience there, and I’m glad I had the opportunity. (back to the text)


SearchHub: How Twitter Uses Apache Lucene for Real-Time Search

Wed, 2015-09-02 16:44
As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Michael Busch’s session on how Twitter executes real-time search with Apache Lucene. Twitter’s search engine serves billions of queries per day from different Lucene indexes, while appending more than hundreds of millions of tweets per day in real time. This session will give an overview of Twitter’s search architecture and recent changes and improvements that have been made. It will focus on the usage of Lucene and the modifications that have been made to it to support Twitter’s unique performance requirements. Michael Busch is architect in Twitter’s Search & Content organization. He designed and implemented Twitter’s current search index, which is based on Apache Lucene and optimized for realtime search. Prior to Twitter Michael worked at IBM on search and eDiscovery applications. Michael is Lucene committer and Apache member for many years. Search at Twitter: Presented by Michael Busch, Twitter from Lucidworks Twitter’s search engine serves billions of queries per day from different Lucene indexes, while appending more than hundreds of millions of tweets per day in real time. This session will give an overview of Twitter’s search architecture and recent changes and improvements that have been made. It will focus on the usage of Lucene and the modifications that have been made to it to support Twitter’s unique performance requirements. Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post How Twitter Uses Apache Lucene for Real-Time Search appeared first on Lucidworks.