You are here

Feed aggregator

FOSS4Lib Recent Releases: Siegfried - 1.2.0

planet code4lib - Fri, 2015-07-31 12:53

Last updated July 31, 2015. Created by Peter Murray on July 31, 2015.
Log in to edit this page.

Package: SiegfriedRelease Date: Friday, July 31, 2015

DPLA: Seeking Balance in Copyright and Access

planet code4lib - Thu, 2015-07-30 17:25

The most important word in discussions around copyright in the United States is balance. Although there are many, often strong disagreements between copyright holders and those who wish to provide greater access to our cultural heritage, few dispute that the goal is to balance the interests of the public with those of writers, artists, and other creators.

Since the public is diffuse and understandably pays little attention to debates about seemingly abstract topics like copyright, it has been hard to balance their interests with those of rightsholders, especially corporations, who have much more concentrated attention and financial incentives to tilt the scale. (Also, lawyers.) Unsurprisingly, therefore, the history of copyright is one of a repeated lengthening of copyright terms and greater restrictions on public use.

The U.S. Copyright Office has spent the last few years looking at possible changes to the Copyright Act given that we are now a quarter-century into the age of the web, and its new forms of access to culture enabled by mass digitization. Most recently, the Office issued a report with recommendations about what to do about orphan works and the mass digitization of copyrighted works. The Office has requested feedback on its proposal, as well as on other specific questions regarding copyright and visual works and a proposed “making available” right (something that DPLA has already responded to). Each of these studies and proposals impact the Digital Public Library of America and our 1,600 contributing institutions, as well as many other libraries, archives, and museums that seek to bring their extensive collections online.

We greatly appreciate that the Office is trying to tackle these complex issues, given how difficult it is to ascertain the copyright status of many works created in the last century. As the production of books, photographs, audio, and other types of culture exploded, often by orders of magnitude, and as rights no longer had to be registered, often changed hands in corporate deals, and passed to estates (since copyright terms now long outlast the creators), we inherited an enormous problem of unclear rights and “orphan works” where rightsholders cannot easily—or ever—be found. This problem will only worsen now that digital production has given the means to billions of people to become creators, and not just consumers, of culture.

Although we understand the complexity and many competing interests that the Office has tried to address in the report, we do not believe their recommendations achieve that critical principle of balance. In our view, the recommendations unfortunately put too many burdens on the library community, and thus too many restrictions on public access. The report seeks to establish a lengthy vetting process for scanned items that is simply unworkable and extraordinarily expensive for institutions that are funded by, and serve, the public.

Last week, with the help of DPLA’s Legal Advisory Committee co-chair Dave Hansen, we filed a response to one of the Office’s recent inquiries, focusing on how the copyright system can be improved for visual works like photographs. As our filing details, DPLA’s vast archive of photographs from our many partners reveals how difficult it would be for cultural heritage institutions to vet the rights status of millions of personal, home, and amateur photographs, as well as millions of similar items in the many local collections contained in DPLA.

These works can provide candid insights into our shared cultural history…[but] identifying owners and obtaining permissions is nearly impossible for many personal photographs and candid snapshots…Even if creators are identifiable by name, they are often not locatable. Many are dead, raising complicated questions about whether rights were transferred to heirs, or perhaps escheated to the state. Because creators of many of these works never thought about the rights that they acquired in their visual works, they never made formal plans for succession of ownership.

Thus, as the Office undertakes this review, we urge it to consider whether creators, cultural heritage institutions, and the public at large would be better served by a system of protection that explicitly seeks to address the needs, expectations, and motivations of the incredibly large number of creators of these personal, home and amateur visual works, while appropriately accommodating those creators for whom copyright incentives do matter and for whom licensing and monetization are important.

Rather than placing burdens on libraries and archives for clearing use of visual works, we recommend that the Copyright Office focus on the creation of better copyright status and ownership information by encouraging rightsholders, who are in the best position to provide that information, to step forward. You can read more about our position in the full filing.

When we launched in 2013, one of the most gratifying responses we received was an emotional email from Australian who found a photograph of his grandmother, digitized by an archive in Utah and made discoverable through DPLA. It’s hard to put a price on such a discovery, but surely we must factor such moments into any discussion of copyright and access. We should value more greatly the public’s access to our digitized record, and find balanced ways for institutions to provide such access.

Library of Congress: The Signal: Mapping Libraries: Creating Real-time Maps of Global Information

planet code4lib - Thu, 2015-07-30 13:43

The following is a guest post by Kalev Hannes Leetaru, a data scientist and Senior Fellow at George Washington University Center for Cyber & Homeland Security. In a previous post, he introduced us to the GDELT Project, a platform that monitors the news media, and presented how mass translation of the world’s information offers libraries enormous possibilities for broadening access. In this post, he writes about re-imagining information geographically.

Why might geography matter to the future of libraries?

Information occurs against a rich backdrop of geography: every document is created in a location, intended for an audience in the same or other locations, and may discuss yet other locations. The importance of geography in how humans understand and organize the world (PDF) is underscored by its prevalence in the news media: a location is mentioned every 200-300 words in the typical newspaper article of the last 60 years. Social media embraced location a decade ago through transparent geotagging, with Twitter proclaiming in 2009 that the rise of spatial search would fundamentally alter how we discovered information online. Yet the news media has steadfastly resisted this cartographic revolution, continuing to organize itself primarily through coarse editorially-assigned topical sections and eschewing the live maps that have redefined our ability to understand global reaction to major events. Using journalism as a case study, what does the future of mass-scale mapping of information look like and what might we learn of the future potential for libraries?

What would it look like to literally map the world’s information as it happens? What if we could reach across the world’s news media each day in real time and put a dot on a map for every mention in every article, in every language of any location on earth, along with the people, organizations, topics, and emotions associated with each place? For the past two years this has been the focus of the GDELT Project and through a new collaboration with online mapping platform CartoDB, we are making it possible to create rich interactive real-time maps of the world’s journalistic output across 65 languages.

Leveraging more than a decade of work on mapping the geography of text, GDELT monitors local news media from throughout the globe, live translates it, and performs “full-text geocoding” in which it identifies, disambiguates, and converts textual descriptions of location into mappable geographic coordinates. The result is a real-time multilingual geographic index over the world’s news that reflects the actual locations being talked about in the news, not just the bylines of where articles were filed. Using this platform, this geographic index is transformed into interactive animated maps that support spatial interaction with the news.

What becomes possible when the world’s news is arranged geographically? At the most basic level, it allows organizing search results on a map. The GDELT Geographic News Search allows a user to search by person, organization, theme, news outlet, or language (or any combination therein) and instantly view a map of every location discussed in context with that query, updated every hour. An animation layer shows how coverage has changed over the last 24 hours and a clickable layer displays a list of all matching coverage mentioning each location over the past hour.

Figure 1 – GDELT’s Geographic News Search showing geography of Portuguese-language news coverage during a given 24 hour period

Selecting a specific news outlet like or as the query yields an instant geographic search interface to that outlet’s coverage, which can be embedded on any website. Imagine if every news website included a map like this on its homepage that allowed readers to browse spatially and find its latest coverage of rural Brazil, for example. The ability to filter news at the sub-national level is especially important when triaging rapidly-developing international stories. A first responder assisting in Nepal is likely more interested in the first glimmers of information emerging from its remote rural areas than the latest on the Western tourists trapped on Mount Everest.

Coupling CartoDB with Google’s BigQuery database platform, it becomes possible to visualize large-scale geographic patterns in coverage. The map below visualizes all of the locations mentioned in news monitored by GDELT from February to May 2015 relating to wildlife crime. Using the metaphor of a map, this list of 30,000 articles in 65 languages becomes an intuitive clickable map.

Figure 2 – Global discussion of wildlife crime

Exploring how the news changes over time, it becomes possible to chart the cumulative geographic focus of a news outlet, or to compare two outlets. Alternatively, looking across global coverage holistically, it becomes possible to instantly identify the world’s happiest and saddest news, or to determine the primary language of news coverage focusing on a given location. By arraying emotion on a map it becomes possible to instantly spot sudden bursts of negativity that reflect breaking news of violence or unrest. Organizing by language, it becomes possible to identify the outlets and languages most relevant to a given location, helping a reader find relevant sources about events in that area. Even the connections among locations in terms of how they are mentioned together in the news yields insights into geographic contextualization. Finally, by breaking the world into a geographic grid and computing the topics trending in each location, it becomes possible to create new ways of visualizing the world’s narratives.

Figure 3 – All locations mentioned in the New York Times (green) and BBC (yellow/orange) during the month of March 2015

Figure 4 – Click to see a live animated map of the average “happy/sad” tone of worldwide news coverage over the last 24 hours mentioning each location

Figure 5 – Click to see a live animated map of the primary language of worldwide news coverage over the last 24 hours mentioning each location

Figure 6 – Interactive visualization of how countries are grouped together in the news media

Turning from global news to domestic television news, these same approaches can be applied to television closed captioning, making it possible to click on a location and view the portion of each news broadcast mentioning events at that location.

Figure 7 – Mapping the locations mentioned in American television news

Turning back to the question that opened this post – why might geography matter to the future of libraries? As news outlets increasingly cede control over the distribution of their content, they do so not only to reach a broader audience, but to leverage more advanced delivery platforms and interfaces. Libraries are increasingly facing identical pressures as patrons turn towards services (PDF) like Google Scholar, Google Books, and Google News instead of library search portals. If libraries embraced new forms of access to their content, such as the kinds of geographic search capabilities outlined in this post, users might find those interfaces more compelling than those of non-news platforms. The ability of ordinary citizens to create their own live-updating “geographic mashups” of library holdings opens the door to engaging with patrons in ways that demonstrate the value of libraries beyond as a museum of physical artifacts and connecting individuals across national or international lines. As more and more library holdings, from academic literature to the open web itself, are geographically indexed, libraries stand poised to lead the cartographic revolution, opening the geography of their vast collections to search and visualization, and making it possible for the first time to quite literally map our world’s libraries.

State Library of Denmark: Sampling methods for heuristic faceting

planet code4lib - Thu, 2015-07-30 10:25

Initial experiments with heuristic faceting in Solr were encouraging: Using just a sample of the result set, it was possible to get correct facet results for large result sets, reducing processing time by an order of magnitude. Alas, further experimentation unearthed that the sampling method was vulnerable to clustering. While heuristic faceting worked extremely well for most of the queries, it failed equally hard for a few of the queries.

The problem

Abstractly, faceting on Strings is a function that turns a collection of documents into a list of top-X terms plus the number of occurrences of these terms. In Solr the collection of documents is represented with a bitmap: One bit per document; if the bit is set, the document is part of the result set. The result set of 13 hits for an index with 64 documents could look like this:

00001100 01010111 00000000 01111110

Normally the faceting code would iterate all the bits, get the terms for the ones that are set and update the counts for those terms. The iteration of the bits is quite fast (1 second for 100M bits), but getting the terms (technically the term ordinals) and updating the counters takes more time (100 seconds for 100M documents).

Initial attempt: Sample the full document bitmap

The initial sampling was done by dividing the result set into chunks and only visiting those chunks. If we wanted to sample 50% of our result set and wanted to use 4 chunks, the parts of the result set to visit could be the one marked with red:

4 chunks: 00001100 01111110 00000000 01010111

As can be counted, the sampling hit 5 documents out of 13. Had we used 2 chunks, the result could be

2 chunks: 00001100 01111110 00000000 01010111

Only 2 hits out of 13 and not very representative. A high chunk count is needed: For 100M documents, 100K chunks worked fairly well. The law of large numbers helps a lot, but in case of document clusters (a group of very similar documents indexed at the same time) we still need both a lot of chunks and a high sampling percentage to have a high chance of hitting them. This sampling is prone to completely missing or over representing clusters.

Current solution: Sample the hits

Remember that iterating of the result bitmap itself is relatively fast. Instead of processing chunks of the bitmap and skipping between them, we iterate over all the hits and only update counts for some of them.

If the sampling rate is 50%, the bits marked with red would be used as sample:

50% sampling: 00001100 01111110 00000000 01010111

If the sampling rate is 33%, the bits for the sample documents would be

33% sampling: 00001100 01111110 00000000 01010111

This way of sampling is a bit slower than sampling on the full document bitmap as all bits must be visited, but it means that the distribution of the sampling points is as fine-grained as possible. It turns out that the better distribution gives better results, which means that the size of the sample can be lowered. Lower sample rate = higher speed.

Testing validity

A single shard from the Net Archive Search was used for testing. The shard was 900GB with 250M documents. Faceting was performed on the field links, which contains all outgoing links from indexed webpages. There are 600M unique values in that field and each document in the index contains an average of 25 links. For a full search on *:* that means 6 billion updates of the counter structure.

For this test, we look for the top-25 links. To get the baseline, a full facet count was issued for the top-50 links for a set of queries. A heuristic facet call was issued for the same queries, also for the top-50. The number of lines until the first discrepancy were counted for all the pairs. The ones with a count beneath 25 were considered faulty. The reason for the over provisioning was to raise the probability of correct results, which of course comes with a performance penalty.

The sampling size was set to 1/1000 the number of documents or roughly 200K hits. Only result sets sizes above 1M are relevant for validity as those below takes roughly the same time to calculate with and without sampling.

Heuristic validity for top 25/50

While the result looks messy, the number of faulty results was only 6 out of 116, for results set sizes above 1M. For the other 110 searches, the top-25 fields were correct. Raising the over provisioning to top-100 imposes a larger performance hit, but reduces the number of faulty results to 0 for this test.

Heuristic validity for top 25/100

Testing performance

The response times for full count faceting and heuristic faceting on the links field with over provision of 50 is as follows:

Heuristic speed for top 25/50

Switching from linear to logarithmic plotting for the y-axis immediately:

Heuristic speed for top 25/50, logarithmic y-axis

It can be seen full counting rises linear with result size, while sampling time is near-constant. This makes sense as the sampling was done by updating counts for a fixed amount of documents. Other strategies, such as making the sampling rate a fraction of the result size, should be explored further, but as the validity plot shows, the fixed strategy works quite well.

The performance chart for over provisioning of 100 looks very much like the one for 50, only with slightly higher response times for sampling. As the amount of non-valid results is markedly lower for an over provisioning of 100, this seems like the best speed/validity trade off for our concrete setup.

Heuristic speed for top 25/100, logarithmic y-axis


Heuristic faceting with sampling on hits gives a high probability of correct results. The speed up relative to full facet counting rises with result set size as sampling has near-constant response times. Using over provisioning allows for fine-grained tweaking between performance and chance of correct results. Heuristic faceting is expected to be the default for interactive use with the links field. Viability of heuristic faceting for smaller fields is currently being investigated.

As always, there is full source code and a drop-in sparse faceting Solr 4.10 WAR at GitHub.

Terry Reese: MarcEdit 6 Updates

planet code4lib - Thu, 2015-07-30 04:39

I hadn’t planned on putting together an update for the Windows version of MarcEdit this week, but I’ve been working with someone putting the Linked Data tools through their paces and came across instances where some of the linked data services were not sending back valid XML data – and I wasn’t validating it.  So, I took some time and added some validation.  However, because the users are processing over a million items through the linked data tool, I also wanted to provide a more user friendly option that doesn’t require opening the MarcEditor – so I’ve added the linked data tools to the command line version of MarcEdit as well. 

Linked Data Command Line Options:

The command line tool is probably one of those under-used and unknown parts of MarcEdit.  The tool is a shim over the code libraries – exposing functionality from the command line, and making it easy to integrate with scripts written for automation purposes.  The tool has a wide range of options available to it – and for users unfamiliar with the command line tool – they can get information about the functionality offered by querying help.  For those using the command line tool – you’ll likely want to create an environmental variable pointing to the MarcEdit application directory so that you can call the program without needing to navigate to the directory.  For example, on my computer, I have an environmental variable called: %MARCEDIT_PATH% which points to the MarcEdit app directory.  This means that if I wanted to run the help from my command line for the MarcEdit Command Line tool, I’d run the following and get the following results:

C:\Users\reese.2179>%MARCEDIT_PATH%\cmarcedit -help *************************************************************** * MarcEdit 6.1 Console Application * By Terry Reese * email: * Modified: 2015/7/29 *************************************************************** Arguments: -s: Path to file to be processed. If calling the join utility, source must be files delimited by the ";" character -d: Path to destination file. If call the split utility, dest should specify a fold r where split files will be saved. If this folder doesn't exist, one will be created. -rules: Rules file for the MARC Validator. -mxslt: Path to the MARCXML XSLT file. -xslt: Path to the XML XSLT file. -batch: Specifies Batch Processing Mode -character: Specifies character conversion mode. -break: Specifies MarcBreaker algorithm -make: Specifies MarcMaker algorithm -marcxml: Specifies MARCXML algorithm -xmlmarc: Specifics the MARCXML to MARC algorithm -marctoxml: Specifies MARC to XML algorithm -xmltomarc: Specifies XML to MARC algorithm -xml: Specifies the XML to XML algorithm -validate: Specifies the MARCValidator algorithm -join: Specifies join MARC File algorithm -split: Specifies split MARC File algorithm -records: Specifies number of records per file [used with split c mmand]. -raw: [Optional] Turns of mnemonic processing (returns raw data) -utf8: [Optional] Turns on UTF-8 processing -marc8: [Optional] Turns on MARC-8 processing -pd: [Optional] When a Malformed record is encountered, it will modi y the process from a stop process to one where an error is simply noted and a s ub note is added to the result file. -buildlinks: Specifies the Semantic Linking algorithm This function needs to be paired with the -options parameter -options Specifies linking options to use: example: lcid,viaf:lc oclcworkid,autodetect lcid: utilizes to link 1xx/7xx data autodetect: autodetects subjects and links to know values oclcworkid: inserts link to oclc work id if present viaf: linking 1xx/7xx using viaf. Specify index after colon. I no index is provided, lc is assumed. VIAF Index Values: all -- all of viaf nla -- Australia's national index vlacc -- Belgium's Flemish file lac -- Canadian national file bnc -- Catalunya nsk -- Croatia nkc -- Czech. dbc -- Denmark (dbc) egaxa -- Egypt bnf -- France (BNF) sudoc -- France (SUDOC) dnb -- Germany jpg -- Getty (ULAN) bnc+bne -- Hispanica nszl -- Hungary isni -- ISNI ndl -- Japan (NDL) nli -- Israel iccu -- Italy LNB -- Latvia LNL -- Lebannon lc -- LC (NACO) nta -- Netherlands bibsys -- Norway perseus -- Perseus nlp -- Polish National Library nukat -- Poland (Nukat) ptbnp -- Portugal nlb -- Singapore bne -- Spain selibr -- Sweden swnl -- Swiss National Library srp -- Syriac rero -- Swiss RERO rsl -- Russian bav -- Vatican wkp -- Wikipedia -help: Returns usage information

The linked data option uses the following pattern: cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options [linkoptions]

As noted above in the list, –options is a comma delimited list that includes the values that the linking tool should query.  A user, for example, looking to generate workids and uris on the 1xx and 7xx fields using – the command would look like:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid

Users interesting in building all available linkages (using viaf, autodetecting subjects, etc. would use:

<< cmarcedit.exe –s [sourcefile] –d [destfile] –buildlinks –options oclcworkid,lcid,autodetect,viaf:lc

Notice the last option – viaf. This tells the tool to utilize viaf as a linking option in the 1xx and the 7xx – the data after the colon identifies the index to utilize when building links.  The indexes are found in the help (see above).

Download information:

The update can be found on the downloads page: or using the automated update tool within MarcEdit.  Direct links:

Mac Port Update:

Part of the reason I hadn’t planned on doing a Windows update of MarcEdit this week is that I’ve been heads down making changes to the Mac Port.  I’ve gotten good feedback from folks letting me know that so far, so good.  Over the past few weeks, I’ve been integrating missing features from the MarcEditor into the Port, as well as working on the Delimited Text Translation.  I’ll now have to go back and make a couple of changes to support some of the update work in the Linked Data tool – but I’m hoping that by Aug. 2nd, I’ll have a new Mac Port Preview that will be pretty close to completing (and expanding) the initial port sprint. 

Questions, let me know.


District Dispatch: FASTR zooms out of Senate Committee

planet code4lib - Wed, 2015-07-29 20:24

Today, after many years of effort by our members and the open access community, the Fair Access to Science and Technology Research Act of 2015 (FASTR) was approved by unanimous voice vote of the Senate Committee on Homeland Security and Governmental Affairs. It now goes to the full Senate for consideration as early as this September. ALA thanks Committee Chair Ron Johnson (R-WI) and his staff for their hard work and wishes again to express its deep gratitude to Senator John Cornyn (R-TX) for his leadership and his staff’s tireless efforts toward ensuring that tax-payer funded research be and remain accessible to the public.

Photo by Andreas Levers

As ALA’s press release states, “FASTR would require federal departments and agencies with an annual extramural research budget of $100 million to develop a policy to ensure that researchers submit an electronic copy of the final manuscript accepted for publication in a peer-reviewed journal. Additionally, the bill would also require that each taxpayer-funded manuscript be made available to the public online and without cost, no later than twelve months after the article has been published in a peer-reviewed journal.”

While this may seem a small step, it is a critical, momentum-generating advance and the most meaningful legislative movement on FASTR that has ever occurred. Please stay tuned as we continue to monitor this issue and to go into overdrive if ongoing efforts to accelerate a vote on S.779 by the Senate in this calendar year gain traction.

Congratulations to all of you who helped “move FASTR” today and thanks for being ready to join ALA again when it’s time to tell Congress to floor it!

The post FASTR zooms out of Senate Committee appeared first on District Dispatch.

District Dispatch: Even though it is summer, CopyTalk webinars continue!

planet code4lib - Wed, 2015-07-29 19:28

From Lotus Head

Higher education universities and their libraries provide copyright information to the members of their community in different ways. Join us on CopyTalk this month to hear three universities describe the services they offer regarding copyright to their faculty, staff, and students. Our presenters will include Sandra Enimil, Program Director, University Libraries Copyright Resources Center from the Ohio State University, Pia Hunter, Visiting Assistant Professor and Copyright and Reserve Librarian from the University of Illinois at Chicago, and Cindy Kristof, Head of Copyright and Document Services from Kent State University.

CopyTalk will take place on August 6th at 11am Pacific/2pm Eastern time. After a brief introduction of our presenters, our speakers will present for 45 minutes, and we will end with a Q&A session (questions will be collected during the presentations).

Please join us at the webinar URL. Enter as a guest, no password required.

We are limited on the number of concurrent viewers we can have, so we ask you to watch with others at your institution if at all possible. The presentations are recorded and will be available online soon after the presentation. Oh yeah – it’s free!

The post Even though it is summer, CopyTalk webinars continue! appeared first on District Dispatch.

Jonathan Rochkind: III report: “WE LOVE THE LIBRARY, BUT WE LIVE ON THE WEB.”

planet code4lib - Wed, 2015-07-29 14:19

ILS Vendor III has released a report based on a survey of patrons at 7 UK academic libraries:

“WE LOVE THE LIBRARY, BUT WE LIVE ON THE WEB.” Findings around how academic library users view online resources and services (You have to register to download)

Some of the summary of findings from the report:

  • “User behaviours are increasingly pervasive, cutting across age, experience, and subject areas”
  • “Online anywhere, on any device, is the default access setting”
  • “Almost without exception, users are selecting different discovery tools to meet different requirements, ranging from known item searches to broad investigation of a new topic. Perhaps with some credit due to recent ‘discovery layer’ developments, the specialist library search is very much of interest in this bag of tools, alongside global search engines and more particular entry points such as Google Scholar and Wikipedia.”
  • Library Search is under informed scrutiny. Given a user base that is increasingly aware of the possibilities for discovery and subsequent access, there are frustrations regarding a lack of unified coverage of the library content, the failure to deliver core purposes well (notably, known item searches and uninterrupted flow-through to access), and unfavourable comparisons with global search engines in general and Google Scholar in particular. We note:
    • Global Search Engines – Whilst specialised tools are valued, the global search engines (and especially Google) are the benchmark.
    • Unified Search – Local collection search needs to be unified, not only across print and electronic, but also across curatorial silos (archives, museums, special collections, repositories, and research data stores).
    • . Search Confidence – As well as finding known items reliably and ordering results accordingly, library search needs to be flexible and intelligent, not obstructively fussy and inexplicably random.

I think this supports some of the directions we’ve been trying to take here. We’ve tried to make our system play well with Google Scholar (both directing users to Google Scholar as an option where appropriate, and using Umlaut to provide as good a landing page as possible when users come from Google Scholar and want access to licensed copies, phyisically held copies, or ILL services for items discovered).  We’ve tried to move toward a unified search in our homegrown-from-open-source-components catalog.

And most especially we’ve tried to focus on “uninterrupted flow-through to access”, again with the Umlaut tool.

We definitely have a ways to go in all these areas, it’s an uphill struggle in many ways , as discussed in my previous comments on the Ithaka report on Streamlining Access to Scholarly Resources.

But I think we’ve at least been chasing the right goals.

Another thing noted in the report:

  • “Electronic course readings are crucial (Sections 8, 12) Clearly, the greatest single issue raised in qualitative feedback is the plea for mandated / recommended course readings— and, ideally, textbooks—to be universally available as digital downloads,”

We’ve done less work locally in this direction, on course reserves in general, and I think we probably ought to. This is one area where I’d especially wonder if UK users may not be representative of U.S. users — but I still have no doubt that our undergraduate patrons spend enough time with course readings to justify more of our time then we’ve been spending on analyzing what they need in electronic systems and improving them.

The report makes a few recommendations:

  • “The local collection needs to be surfaced in the wider ecosystem.”
  • “Libraries should consider how to encompass non-text resources.”
  • “Electronic resources demand electronic workflows.”
  • “Libraries should empower users like any modern digital service. Increasing expectations exist across all user categories—likely derived from experiences with other services—that the library should provide ‘Apps’ geared to just-in-time support on the fly (ranging from paying a fine to finding a shelf) and should also support interactions for registered returning users with transaction histories, saved items, and profile-enabled automated recommendations.”
  • “Social is becoming the norm”

Other findings suggest that ‘known item searches’ are still the most popular use of the “general Library search”, although “carry out an initial subject search” is still present as well.  And that when it comes to ebooks, “There is notably strong support to be able to download content to use on any device at any time.”  (Something we are largely failing at, although we can blame our vendors).

Filed under: General

Islandora: Meet Your Developer: QA Dan

planet code4lib - Wed, 2015-07-29 13:35

With the Islandora Conference coming up , we thought it would be a good time to Meet some Developers, especially those who will be leading workshops. Kicking it off is Daniel Aitken, better known in the Islandora community as QA Dan, master of testing. Despite the name, Dan now works for discoverygarden, Inc as a developer, although he maintains ceremonial duties as Lord Regent of the QA Department. He's known for thorough troubleshooting on the listserv, some very handy custom modules, and will be leading workshops on How to Tuque and Solution Packs (Experts) at the upcoming Islandora Conference. Here's QA Dan in his own words:

Please tell us a little about yourself. What do you do when you’re not at work? Hmm … when I’m not working, and I decide to do something more interesting than sitting on the couch, I’m probably baking. Pies, biscuits, cookies … currently I’m working on making fishcakes from scratch. Batch one was less than stellar. I think I accidentally cooked the starch out of the potatoes.   How long have you been working with Islandora? How did you get started? I’ve been with discoverygarden for about three years now. I kind of randomly fell into it! I didn’t really know what to expect, but the team here is fantastic, and I’ve gotten the opportunity to work on so many fascinating projects that it’s been a blast.   Sum up your area of expertise in three words: Uh … code base security?   What are you working on right now? Right this second? Looking at a fix to the basic solr config that should prevent GSearch/Fedora from spinning its wheels in an unusual case where certain fields end in whitespace. Once I actually get some free time here in the QA department? Updating our Travis-CI .yaml scripts to use caching between builds so that hopefully we don’t have 45 minute-plus build processing times.   What contribution to Islandora are you most proud of? The testing back-ends! Y’know, all this stuff. To be fair, a bare-bones version was there before I got my hands on it, but it’s been updated to the point where it’s almost indistinguishable from its first form. It separates testing utilities from the actual test base class so that it can be shoehorned into other frameworks, like the woefully-underused, or even in included frameworks like the basically-magical datastream validation stuff! That way, no matter what you’re doing, you can manipulate Fedora objects during tests! It’s also been made easy to extend, hint hint.   What new feature or improvement would you most like to see? Does ‘consolidated documentation’ count? I feel like that’s what slips a lot of people up. I know we’ve been working on improving it - we have a whole interest group devoted to it - but it’s a multi-tendriled beast that needs to be tamed. We have appendices living in multiple places, API documentation that only lives in individual modules’ api.php files, and just … all kinds of other stuff. As a kind-of-but-not-really-an-end-user sort, I only really make improvements to technical documents like Working With Fedora Objects Programmatically via Tuque, and half the time these are ones I made myself because there was a desperate gap in the knowledgebase.   What’s the one tool/software/resource you cannot live without? Ten Million with a Hat! I don’t know what I did before I had it. Actually, I do; I flushed all my time down the toilet manually creating all the objects I use in my regular testing. So I made a thing with the concept of ‘just batch ingest a bunch of random objects, and modify each one via hooks’. Then, I started working on the hooks - things to add OBJs and generate derivatives and randomly construct MODS datastreams and do DC crosswalking and add things to bookmarks and whatnot - whatever fits the case for whatever I’m testing. Now I’ve gone from ingesting objects I need manually like some kind of chump to having Islandora take care of it for me. The moral of the story is that if you think you couldn’t live without a tool, probably just make it? Code is magic. Tuque is also magic.   If you could leave the community with one message from reading this interview, what would it be? Write tests and Travis integration for your contributed modules! I know it’s a time investment, but I’ve put a lot of work into making it easier for you! There’s even a guideline here. It’ll tell you all about how to poke at things inside Islandora and make assertions about what comes back, like whether or not objects are objects and whether or not the datastreams exist and are well-formed (they tend to rely on the actual contents of the binary, and never on extensions or mime types). Well-written high-level tests can tell you if you’ve broken something that you didn’t expect to, and Travis can tell you all sorts of things about the quality of your code per-commit. A tiny weight is lifted off my shoulder every time I see a project I’ve never encountered before that has a ‘tests’ folder and a ‘.travis.yml’ and a big green ‘PASSING’ in the README on GitHub.   Happy Islandora-ing!

In the Library, With the Lead Pipe: Why Diversity Matters: A Roundtable Discussion on Racial and Ethnic Diversity in Librarianship

planet code4lib - Wed, 2015-07-29 13:00

Image by Flickr User webtreats(CC-BY 2.0)

In Brief: 

After presenting together at ACRL 2015 to share research we conducted on race, identity, and diversity in academic librarianship, we reconvene panelists Ione T. Damasco, Cataloger Librarian at the University of Dayton, Isabel Gonzalez-Smith, Undergraduate Experience Librarian at the University of Illinois, Chicago, Dracine Hodges, Head of Acquisitions at Ohio State University, Todd Honma, Assistant Professor of Asian American Studies at Pitzer College, Juleah Swanson, Head of Acquisition Services at the University of Colorado Boulder, and Azusa Tanaka, Japanese Studies Librarian at the University of Washington in a virtual roundtable discussion. Resuming the conversation that started at ACRL, we discuss why diversity really matters to academic libraries, librarians, and the profession, and where to go from here. We conclude this article with a series of questions for readers to consider, share, and discuss among colleagues to continue and advance the conversation on diversity in libraries.


Earlier this year, at the Association of College and Research Libraries (ACRL) 2015 conference, the authors of this article participated in a panel discussion entitled “From the Individual to the Institution: Exploring the Experiences of Academic Librarians of Color”1 which covered research the panelists had conducted on institutional racism, structures of privilege and power, and racial and ethnic identity theory in academic libraries and among academic librarians. The hour-long, standing-room only session scraped the surface of conversations that are needed among academic librarians on issues of diversity, institutional racism, microaggressions, identity, and intersectionality. It was our intent with the ACRL panel to plant the seeds for these conversations and for critical thought in these areas to further germinate. We saw these conversations begin to take shape during and after the panel discussion on Twitter, and overheard in the halls of the Oregon Convention Center. As Pho and Masland write in the final chapter of The Librarian Stereotype, “we are now at a point where discussions about the intersectionality of gender, sexuality, race, and ethnicity in librarianship are happening among a wider audience . . . These difficult conversations about diversity are the first steps toward a plan of action” (2015, p. 277). These conversations must continue to grow.

The discussion of racial and ethnic diversity in libraries is a subset of the larger discussion of race in the United States. For anyone participating in these discussions, the experience can be difficult and uncomfortable. Such discussions can be academic in nature, but very often they are personal and subjective. In the United States, our long history of avoiding difficult and meaningful conversations about race has made it challenging for some people to perceive or comprehend disparities in representation and privilege. Fear often plays a significant role as a barrier to engaging in these conversations. Fear of the unknown, fear of rejection, fear of change, and the perceived possibility of losing control can complicate these discussions. Participants in these conversations have to be willing to concede a certain amount of vulnerability in order to move the discussion forward, but vulnerability makes many people uncomfortable, which in turn makes it easy to just avoid the discussion altogether.

What follows is a virtual roundtable discussion where we speak openly about why diversity really matters, what actions can be taken, and suggest questions for readers to consider, share, and discuss in honest and open conversations with colleagues. At times, authors reveal the very real struggle to articulate or grapple with the questions, just as one might encounter in a face-to-face conversation. But, ultimately, by continuing this conversation we work to advance our profession’s understanding of the complexity of race and ethnic diversity in librarianship, and to strive toward creating sustainable collaborations and lasting change in a profession that continues to face significant challenges in maintaining race and ethnic diversity.

Before launching into the roundtable discussion, we acknowledge that an additional challenge when talking about race is the use of terminology and language that intellectualizes some of the real-world experiences and feelings we face. Terminology is useful due to its ability to create precision in meaning, but it also can alienate and turn away readers who use different language or terms to express similar experiences, feelings, or concepts. Yet in order to have a critical discussion of race and diversity, it is important that we engage in the use of particular terms that help us to identify, explain, and analyze issues and experiences that will help us to advance the conversation in deeper and more meaningful ways. In this article we do use terms that draw from a common critical lexicon, and we have made an effort to define and/or footnote many of these terms for readers who might be unfamiliar with these terms.

Why does diversity matter?

Juleah: Why does diversity matter? This question was posed to the audience at the end of our ACRL panel (Swanson, et al., 2015) , as something to reflect upon. For our virtual roundtable, I’m re-asking this question, because this question warrants meaningful discussion. Let’s go around the “table” and start with Ione.

Ione: When the question was first posed to us, I struggled with articulating a response that was more than just an intuitive reaction. My first thought was that diversity matters because we don’t live and work in a vacuum of homogeneity. But I realize that’s both a naïve and inaccurate answer, as there are many places where people still live in segregated areas in terms of race, and that there are work environments that for many reasons, tend to have a homogeneous pool of employees. It’s not enough to say that diversity matters because the world is diverse.

Isabel: Ione’s initial comment about wanting to respond beyond her intuition reminds me of Isabel Espinal’s “A New Vocabulary for Inclusive Librarianship: Applying Whiteness Theory to our profession” piece where she discusses Sensate Theory, an anthropology framework in discussing whiteness. I agree with Ione’s reaction of wanting to articulate why racial and ethnic diversity is important, how painful prejudice and discrimination can feel, and the need for acknowledgement of the disparities that exist in different communities’ experiences and history due to race/ethnicity. Discovering Espinal’s exploration of sensate theory was thrilling for me because she says that the theory emphasizes gut reactions – emotion and the senses (Espinal, 145). Librarians of color may react with “a very angry or very tearful reaction or both…the experience of encountering whiteness in the library setting is one that is felt in the body; it is more than an intellectual abstraction.” (145) This really resonated with me because I consider myself an intelligent, composed person but when my colleagues or I experience discrimination due to our race/ethnicity, I can’t help but feel an initial overwhelmingness. This is then immediately followed by a process of checking my emotions to find ways to articulate myself in an intellectual way as a means to be acknowledged and understood. As a person of color, this is what discussing the relevance and meaning behind diversity means to me – a struggle between gut reaction and articulation.

Dracine: This question is a challenge. Nevertheless, most people who come into this profession want to be of service directly or indirectly to others. Libraries of every variety exist to serve their respective constituents through access to information and spaces for collaboration.
With that in mind, I think diversity matters in relation to the relevance of services being provided to meet practical and extraordinary needs. Needs that are diverse not only because of ethnicity and race, but also because of religion, gender, socioeconomic status, physical ability, etc.

With recent headlines related to racism and violence, it is easy to see the connectivity of libraries in the pursuit of social justice ideals. So much of the conversation we’ve been having pertains to administrative and cultural constructs that frustrate diversity. These are large and lofty issues in scope. I often think their enormity makes us dismissive of the tangible impacts of diversity in the commonplace work performed in libraries every day.

I’ve heard many anecdotal stories from colleagues, both of color and white, who were able to customize or enhance instruction for an individual or group because of personal insights and experiences related to issues like English as a Foreign Language and format accessibility. Perhaps mountains were not moved, but to the individuals who benefitted hills were climbed.

Isabel: Dracine’s example of instructors tailoring their sessions for a particular class of students based on factors like language is a great example of how librarians are tuning into the identity aspect of the communities they serve. Juleah, Azusa, and I have been using identity theory to think about diversity initiatives from an angle that takes into account the individual experience at a more fundamental level. Because identity is so dynamic and in constant flux, it is often constructed from the internal sense of self as well as the external, social level. Consider it like the messages we internalize from what we see on tv, read in history books or who possesses roles of authority in our institutions, who sits at the reference desk. It makes sense that your colleagues customize their instruction because we intuitively sense that people respond positively to another person who is like themselves. That’s why the library ethnic caucuses are important – they establish a sense of community which provide some individuals with a sense of community and belonging. Ethnic identity theory helps us understand this phenomenon.

Azusa: As Dracine says above, diversity matters because the libraries must accommodate diverse user groups as well as librarian population. Ione mentioned during our panel how the field of Library and Information Science (LIS) and higher education in general views diversity as a problem to be solved (Swanson, et al., 2015). Diversity, in race, ethnicity, sexuality, age, social background, and more, will bring power to the libraries where balanced views and all kinds of possibilities are inevitable for successful research and teaching. Diversity is not a problem, but an asset for the institution.

Juleah: When we talk about diversity and why it matters in academic libraries, I think what we’re really trying to get at are two different concepts: 1) diversity in relation to the library profession’s role in social justice (Morales, Knowles, Bourg, 2014) and 2) diversity in relation to organizational culture within libraries.

To be honest, I think our profession, librarians as a whole, but more specifically academic librarians, are in the midst of a professional culture crisis. I think this stems from the homogeneity within our professional ranks. What we get to do as academic librarians today is incredible, from pushing our campuses into open access models for research output to being active participants in conversations about managing massive amounts of data. But are we proud of the homogeneity and the stagnant racial and ethnic diversity within the profession? I don’t think we are.

I think diversity matters because, right now, it allows us the opportunity to reinvent our organizational and professional culture into something that is not reliant on homogeneity of people and ideas, but rather looks toward what we bring to the future of higher education.

Ione: Juleah’s comment about diversity in academic libraries being two separate concepts are actually intertwined, and are worth exploring at the same time. I think her first point about libraries and social justice poses difficult questions for us as a profession—how far do we take social responsibility as academic libraries? As academic librarians? How do we reconcile social responsibility with the missions of our institutions, and what do we do when they are out of alignment? Connecting these to her second point, internally, how far do we take a social justice concept of diversity in terms of our daily work as librarians? Can we even agree upon a definition of social justice in terms of diversity? I think Todd raised an important question during the panel (Swanson, et al., 2015) when he said, “The question is, is diversity a social justice? Is racial equity part of an institutional mission? If it isn’t, then we have to interrogate that.”

If we think of our libraries as microcosms of the world around us, I don’t think we can ignore the fact that oppressive structures of power which exist in our culture are reproduced within the structures that exist in higher education, in our universities and colleges, and in our academic libraries, often unknowingly and sometimes with the best of intentions. Numbers aren’t everything, but the lack of positive movement in terms of racial demographics in our field is a cause for concern. And just adding more people of “diverse” backgrounds does nothing to address structural problems with an institution. I think as we move as a society to undo oppression of marginalized identities, libraries, as places that serve larger communities, do bear a responsibility to undo their own oppressive structures and question why things have stayed the same over the years in our profession.

Isabel: You’re right, Ione. Like I said at our panel at ACRL, you can’t just hire a person of color and call it diversity (Swanson, et al., 2015). If we’re going to pursue diversity initiatives at the student and professional level, we need to identify what long-term success looks like for our field and what resonates with individuals. What Juleah, Azusa, and I found in our research was that racial and ethnic identity theory helps us understand why librarians of color may respond well to ethnic causes or liaisoning for students of color groups and how they may feel a sense of loneliness in a predominately white institution or perceive their race/ethnicity is used to pigeon-hole their professional responsibilities.

Diversity matters because we all play a part in the messages we disseminate, regardless of how we identify. Librarians contribute towards the preservation and accessibility of information, representations of authority in the intellectual sphere, and advocating against censorship. What is the message that our collections, library staff representation, research, or programming gives to the communities we serve? And what are we doing to serve our patrons in ways that take into account their race and/or ethnicity?

Todd: To add to what Isabel said about the librarian’s role in the preservation and accessibility of information, I think at a profound foundational level, libraries are involved in an epistemological project. In other words, as an institution that collects, preserves, and distributes information, libraries serve the function of helping to create and circulate knowledge in our society. How institutions construct and curate information, and how users access and synthesize that information, are not outside the realm of the political. Especially in the case of academic libraries, which encompass a scholarly mission of furthering intellectual growth and scholarly communication, thinking carefully and deeply about the types of knowledge that is both included and excluded is crucial to the mission of the library and its relation to broader society.

Isabel: NPR recently recently featured Michelle Obama’s commencement speech to the predominantly African-American class of Martin Luther King Jr. Preparatory High School in the south side of Chicago where she mentions how the famous American author Richard Wright was not being allowed to check out books at the public library because he was black (Obama, 2015). I instantly thought of Todd’s point when I heard it on the radio – that the American library’s past was once a place of exclusion, and how it still remains political. The First Lady’s point was to inspire the graduating class to persevere beyond their struggles towards achieving greatness – a message intended to resonate with the students because it was coming from an accomplished, powerful, fellow South Sider of Chicago.

Todd: That example also reminds me of how E.J. Josey, writing in 1972, identified academic libraries as having a unique role to play in the black liberation movement. Even today, as higher education continues to be a site of privilege for some and exclusion for others, diversity and educational equity is something that we still need to work on. Thus, in relationship to libraries and higher education, diversity is important to consider in how we think about all aspects of the ‘life cycle of information,’ particularly when it comes to the ways in which historically underrepresented groups and historically underrepresented forms of knowledge and practices have not been included in – and at times, systematically excluded from – collection building and user services.

Ione: Many of us who work in academic libraries have encountered “diversity training” at one point or another, and in the course of that training, we may have been presented with statistics from both business and higher education that demonstrate the value of diversity in specific ways. For example, many businesses highlight the importance of being able to work effectively in a global market, and higher education has followed that line of thinking in terms of promoting diversity as a way of building student competence in intercultural interactions as a key component of their college education. Another reason diversity is often touted as a component of an effective workplace is that studies have shown that more often than not, more diverse work teams have proven to be highly productive. But I find these market-driven motivations for promoting diversity to be very superficial and highly problematic.

Todd: The approach to diversity that Ione describes is part of a growing concern regarding the “neoliberalization of the library” (Hill, 2010; Pateman, 2003), including increased privatization, a shrinking public sphere, and a market-driven approach to issues like diversity. Failure to think about how diverse communities have been and continue to be impacted by such trends, and along with it the perpetuation of the implicit race and class privileges, will only lead to the further homogenization and privatization of places, practices, and services.
When considering issues of race and racial representation in the library, I think it’s important that we move beyond an additive model and think about the epistemological. People of color (as well as other disenfranchised groups) are more than just laboring bodies, more than just token representatives of a diverse workforce under the conditions of capitalism, but also possess, practice, and embody different ways of understanding and inhabiting the world, which as Juleah points out, can help to reinvent the culture of the library, and higher education, more generally. It is this possibility of transformation that I think is why diversity matters.

Juleah: This has been a captivating discussion so far, addressing themes from homogeneity in the profession, organizational culture, race and identity, issues of social justice, and ultimately critically examining our role as librarians to the communities we serve. We could spend more time on this question, but similar to a time limit in a real world discussion, we have a word count. So, let’s move on to the next question.

Where do we go from here?

Juleah: Often times, after engaging in critical discourse, when the conversation ends, we are left wondering what to do next. Rather than leaving this for the reader to consider after finishing this article, let’s address this issue here. Now that we have touched about why diversity matters, where do we go from here?

Ione: Participating in the ACRL panel really challenged me to think about my own approaches to researching diversity, which had previously been focused on understanding the experiences of individuals of color. However, as Todd had pointed out during the panel (Swanson, et al., 2015), I think we all need to be more versed in critical perspectives around identity (and intersectionality)2 in order to have more effective conversations about how racism and other forms of oppression continue to be produced and reproduced in our organizations. Listening to the experiences of those who have been marginalized3 may motivate us to move towards a more socially just world, but developing critical competencies and deepening our knowledge base in critical theory can give us the tools to actually dismantle those structures that have marginalized them in the first place.

Dracine: During the panel, I made a comment regarding my own relief upon hearing my director say diversity was not my issue (Swanson, et al., 2015). For me this was important because even as a librarian of color my professional expertise is not diversity. However, if you want to talk about getting Arabic language books through U.S. customs, then sure, I might have some thoughts. I care about diversity for the very reasons that have been discussed and definitely want to leave the profession better than I found it. I think it’s important to acknowledge that how that happens may look different for each individual. The biggest takeaway for me was the obvious need for a reset or a refresh on the question of diversity in libraries. We’ve begun to have what feels like genuine conversations that will hopefully combat the diversity fatigue felt by both librarians of color and perhaps our white counterparts.

Ione: Arm yourself with knowledge, and then have the courage to use that knowledge to start dialogues with your colleagues, administrators, faculty, and staff, not just in your library but across your campuses to examine existing policies and practices that have left far too much room for discrimination (both implicit and explicit) to occur. And I mention courage because these are not easy conversations to have, or even to initiate. It’s easy for defensiveness to arise in these conversations, and for emotions to get rather heated, but I think it is possible to move through those communication barriers and get to a place of actual growth.

Juleah: When talking about diversity in academic libraries with colleagues of varying racial and ethnic backgrounds, acknowledging that institutional racism4 does exist, regardless of intent and well-meaning, can, in fact, be very freeing in a conversation, because institutional racism is not about us-versus-them, or you-versus-me, but instead it’s a collective outcome to be analyzed and critiqued collectively by an organization. The question becomes not, “What are we doing wrong?” but instead, “How can we change our outcomes?”

Ione: Another thing I would recommend is seeking out other campus partners with expertise in mediating these types of conversations. For example, a few years ago, our campus hosted a series of “Dialogues on Diversity” that brought together small cohorts of faculty and staff from different units to attend a series of dialogue sessions mediated by trained facilitators to try to build a better sense of community across differences. It was a very small step, and it did not transform our campus culture overall, but I do think it helped create a network of people across the university who obviously cared about bridging differences in order to improve our overall campus climate. Through that program, I met people with whom I have since worked on initiatives and programs related to diversity.

Isabel: Great suggestions Ione. My institution did diversity dialogues in collaboration with campus partners and the sessions include the perspectives from people of different experiences and backgrounds. It’s a productive way to navigate through the uncomfortable tension between the personal and the systemic contributions towards diversity. I would also suggest that librarians, regardless of race/ethnicity or hierarchy in their institutions, pay attention to recent discussions in our profession regarding microaggressions, which are often unintentional comments “that convey rudeness, insensitivity and demeans a person’s’ racial heritage or identity” (Sue et al, 2007). The LIS Microaggressions tumbr project reminds us that we are all capable of demeaning someone despite our best intentions, but we also have the opportunity to truly listen when we are being called out, being humbled by the experience, and learning from it. At a personal level, this one thing we can and must all do – listen.

Todd: One of the important points that was discussed at the panel and that we continue to discuss here is trying to come up with ways to transform both the profession and the various institutions that we work at. Crucial to such a consideration is identifying where power lies. Of course, we all exercise power in different ways. The key is to figure out how to exercise our power to make lasting, sustainable change at the structural level. And we can’t just be acting alone. We need to create movements and build alliances, and this often entails creative forms of coalition building. (Although I suppose all forms of coalition are creative.)

Ruth Wilson Gilmore (2007) makes a point of stressing that we need to identify both likely and unlikely allies. We need to be better about doing that in the LIS field. At the ACRL panel, one of the audience members noted that ALA is 98% white (Swanson, et al., 2015). Obviously, change in terms of the percentages of people of color in ALA, or the LIS field in general, is not going to happen overnight, so how do we work with that 98% so that we are creating coalitions with people who can be good allies.

A helpful way of thinking about institutional alliances is what Scott Frickel (2011) calls “shadow mobilizations,” which entails creating informal networks of activism among diverse stakeholders within the constraints of the institution. I think such a strategy can be effective in building alliances within and between different constituent groups in the LIS fields. One of the points that I raised in the ACRL panel was that we need to recognize the complexity of people’s identity, how our positionalities encompass intersectional identities and affiliations that are not always immediately visible and legible (Swanson, et al., 2015). So even though ALA or the profession is predominantly white, that whiteness is not monolithic. It is inflected through categories such as class, gender, sexuality, religion, ability, etc. By understanding diversity, including racial diversity, through a framework that is sensitive to how it is always already constituted through these other intersections, we can forge multiple coalitions in ways that are complex, nuanced, and durable. Ultimately, this would mean that we are constructing a movement based on a diversity politics that is founded on a quest for social justice and social transformation rather than token representation or inclusion.

Ione: In terms of higher education and academic libraries, I think we really need to question hiring practices, and tenure and promotion practices. As I mentioned during the ACRL panel (Swanson, et al., 2015) back in March, the idea of “organizational fit” is a problematic concept in terms of search committee discussions. While it is never an official criterion for an applicant, I think search committees reinforce the status quo when they use language to deny an applicant a position because of their perceived inability to fit the existing organizational culture. I think we also need to take a closer look at how we write our position descriptions, how we write our mission statements, essentially, what do we convey about ourselves as organizations to potential applicants?

Todd: This requires all of us to take a critical, self-reflexive look at our complicity in maintaining the status quo and our roles in facilitating the goals of social change. For example, we can take some lessons from those working in other fields—like the STEM (science, technology, engineering, math) fields—that are also struggling to recruit and retain historically underrepresented groups. Attention is being given to how to make STEM more culturally relevant to people of color and other marginalized groups so that there are alternative pathways to pursue it in terms of scholarship and profession (Basu & Barton, 2007; Lee & Buxton, 2010; Lyon, Jafri, & St Louis, 2012). As we continue to build on efforts to diversify the LIS field, I think looking at other strategies, interrogating the current field and its practices, and asking questions such as how do we make LIS more culturally relevant and what alternative pathways can be developed to increase recruitment and retention of people of color and other marginalized groups are important facets for us to consider.

Azusa: The ACRL Diversity Committee’s Diversity Standards: Cultural Competency for Academic Libraries may be a good guide for some libraries to develop local approaches in diversifying populations and recruiting and maintaining a diverse library workforce. University of Washington Bothell and Cascadia Community College Campus Library Diversity Team was formed with the guidelines in the Diversity Standards and adapted some of the eleven standards in it to develop training sessions in cultural awareness and cross cultural communication (Lazzaro, Mills, Garrard, Ferguson, Watson, & Ellenwood, 2014). The outcome was quite positive, and their assessments indicates that structured opportunity to think and learn about diversity and cultural differences by sharing and hearing personal experiences from their colleagues, which can be odd otherwise, was particularly helpful. If your institution has staff members from different cultures, developing cultural awareness from each other is one good way to start.

Questions for our readers

Juleah: Emphasized throughout this article, a continued conversation on diversity, particularly racial and ethnic diversity in the profession, is needed. As we conclude this roundtable discussion, what questions do you offer to reader that will carry this conversation forward?

Todd: As many people have noted, there is a very noticeable racial disparity in the LIS profession, and this has been something that has been talked about for a while now (Espinal, 2001; Galvan, 2015; Honma, 2005; Peterson, 1996). I think a useful way of framing it so that we move beyond the “deficit model” that targets individuals or communities, is to flip the question and ask:

  • Is there a particular deficit in the LIS profession itself that is not attractive to people of color to pursue?
  • Are there ways that the LIS field (and all of us who work in that field, whether as librarians, faculty, administrators, etc.) promotes, intentionally or unintentionally, structures and cultures that may be deemed exclusionary to those who have been historically marginalized and underrepresented?
  • How can we (as individuals, coalitions, institutions) create change?

Ione: We need to start asking some big questions in LIS education and higher education in general.
In terms of LIS education:

  • Do current curricular offerings at ALA-accredited library schools address critical theories of identity and how they intersect with theories of information and the practice of librarianship?
  • How do we encourage faculty teaching in LIS to develop coursework that addresses these issues?
  • For LIS students who plan to pursue academic librarianship as a career path, are tenure and promotion issues raised in their courses so that these new librarians come into their academic workplaces prepared to take on the challenges of earning tenure?

In terms of higher education:

  • If we truly value diversity in all its forms, are we doing everything we can to really show that?
  • Do we talk about valuing different leadership styles, different communication styles, or innovative ways of looking at existing practices?

Azusa: Other questions I would like to ask the readers are:

  • Why diversity among LIS matters particularly for academic library?
  • How is it related to many academic libraries’ vision and mission—supporting the faculty and students’ teaching and learning?
  • Is it because diversity among librarians encourage the users to approach us?
  • Is it because diversity encourages the users to think out of box which is fundamental in researching, teaching, and learning?

Dracine: Ever practical, I would ask readers to contemplate the context of their environment and remember the difficulty we all have with engaging this topic. Discussions about diversity should be diverse. Diversity urgencies may be different from one institution to the next. With that in mind, I think it is important to consider:

  • What is the signal to noise ratio? A discussion about diversity could fill an ocean and after awhile it becomes white noise. However, a meaningful discussion should start by focusing on aspects that are critical and tangible to your specific community/organization.
  • Also, what are the rules of engagement? This seems like a mundane question, but it is a rather important one in terms of creating the space for real and penetrating dialogue.

Juleah: A great deal of what we’ve discussed are learned concepts, either through reading and research, or through lived experiences. Yet, these concepts are complex and cannot simply be conveyed through a sound byte of information.

  • What innovative ways can we educate and teach colleagues and students about complex issues like microaggressions, institutional racism, and privilege, reflecting both traditional means of teaching such as lectures and readings, and through learned experiences?


  • Evaluate the culture at your organization/institution. To what degree is the issue of diversity upheld at your institution and how does it differ to that of your library?
  • If your institution’s mission actively values diversity, what is the campus or community doing about it? Who are the key players and how can you partner with them?
  • From your personal experience, what are the biggest stumbling blocks in the discussions pertaining to diversity? How does it impact how you are able (or not) to dialogue with someone of a different experience than yours?
  • Change can occur at every level – personal, institutional, and professional. As a librarian, where do you feel most empowered to enact change? Where do you find the greatest obstacles?

Thank you to our external reviewer Frans Albarillo, internal reviewers Ellie Collier and Cecily Walker and publishing editor Annie Pho. Your insights and guidance helped us shape and reshape, and reshape some more, our article.

Works Cited:

Basu, S. J., & Barton, A. C. (2007). Developing a sustained interest in science among urban minority youth. Journal of Research in Science Teaching, 44(3), 466–489.

Cohen, C. J. (1999). The boundaries of blackness: AIDS and the breakdown of Black politics. Chicago: University of Chicago Press.

Crenshaw, K. (1991). Mapping the margins: Intersectionality, identity politics, and violence against women of color. Stanford Law Review, 43(6), 1241-1299.

Espinal, I. (2001). A new vocabulary for inclusive librarianship: applying whiteness theory to our profession. In L. Castillo-Speed, (Ed.), The power of language/El poder de la palabra: selected Papers from the Second REFORMA National Conference (pp. 131–49). Englewood, CO: Libraries Unlimited.

Frickel, S. (2011). Who are the experts of environmental health justice? In G. Ottinger & B. R. Cohen (Eds.), Technoscience and environmental justice: expert cultures in a grassroots movement (pp. 21-40). Cambridge, Mass.: MIT Press.

Galvan, A. (2015). Soliciting performance, hiding bias: whiteness and librarianship. In the Library with the Lead Pipe. Retrieved from

Garibay, J. C., (2014). Diversity in the Classroom. Los Angeles, CA: UCLA Diversity & Faculty Development. Retrieved from

Gilmore, R. W. (2007). In the shadow of the shadow state. In Incite! Women of Color Against Violence (Ed.), The revolution will not be funded: beyond the non-profit industrial complex (pp.41-52). Cambridge, Mass.: South End Press.

Hill, D. (2010). Class, capital and education in this neoliberal and neoconservative period. In S. Macrine, P. Maclaren, and D. Hill (Eds.), Revolutionizing pedagogy: education for social justice within and beyond global neo-liberalism (pp. 119–144). New York: Palgrave Macmillan.

Honma, T. (2005). Trippin’ over the color line: The invisibility of race in library and information studies. InterActions: UCLA Journal of Education and Information Studies, 1(2), 1-26. Retrieved from

Institutional racism. (2014). In Scott, J.(Ed.), A Dictionary of Sociology. Retrieved from

Josey, E. J. (1972). Libraries, reading, and the liberation of black people. The Library scene, 1(1), 4-7.

Lazzaro, A. E., Mills, S., Garrard, T., Ferguson, E., Watson, M., & Ellenwood, D. (2014). Cultural competency on campus Applying ACRL’s Diversity Standards. College and Research Libraries News, 75, 6, 332-335. Retrieved from

Lee, O., & Buxton, C. A. (2010). Diversity and equity in science education: Research, policy, and practice. New York: Teachers College Press.

Lyon, G. H., Jafri, J., & St. Louis, K. (2012). Beyond the pipeline: STEM pathways for youth development. Afterschool Matters, 16, 48–57.

Morales, M., Knowles, E. C., & Bourg, C. (2014). Diversity, Social Justice, and the Future of Libraries. portal: Libraries and the Academy, 14(3), 439-451. DOI: 10.1353/pla.2014.0017

Obama, M., (2015, June 9). Remarks by the First Lady at Martin Luther King Jr. Preparatory High School Commencement Address. Speech presented at Martin Luther King Jr. Preparatory High School Commencement, Chicago, IL. Retrieved from

Pateman, J. (2003). Libraries contribution to solidarity and social justice in a world of neo-liberal globalisation. Information for Social Change, 18. Retrieved from

Peterson, L. (1996). Alternative perspectives in library and information science: Issues of race. Journal of Education for Library and Information Science, 37(2), 163–174.

Pho, A., & Masland, T. (2014). The revolution will not be stereotyped: Changing perceptions through diversity. In N. Pagowsky & M. Rigby (Eds.), The librarian stereotype: Deconstructing perceptions & presentations of information work (pp. 257-282). Chicago: Association of College and Research Libraries. Retrieved from

Ridley, C., & Kelly, S. (2006). Institutional racism. In Y. Jackson (Ed.), Encyclopedia of multicultural psychology. (pp. 256-258). Thousand Oaks, CA: SAGE Publications, Inc. doi:

Solorzano, D., & Huber, L. (2012). Microaggressions, racial. In J. Banks (Ed.), Encyclopedia of diversity in education. (pp. 1489-1492). Thousand Oaks, CA: SAGE Publications, Inc. Retrieved from

Sue, D. W., Capodilupo, C.M., Torino, G.C., Bucceri, J.M., Holder, A.M.B., Nadal, K.L., & Esquilin, M. (2007). Racial Microaggressions in everyday life: Implications for clinical practice. American Psychologist, 62 (4), 271-286.

Swanson, J., Tanaka, A., Gonzalez-Smith, I., Damasco, I.T., Hodges, D., Honma, T., Espinal, I., (2014, March 26). From the Individual to the Institution: Exploring the Experiences of Academic Librarians of Color [Audio recording]. Retrieved from

  1. A recorded slidecast presentation, including full audio of the ACRL 2015 panel discussion “From the Individual to the Institution: Exploring the Experiences of Academic Librarians of Color” is freely available.
    Users who do not already have access will need to establish an account in order to view this and other ACRL 2015 recorded slide cast presentations.
  2. Intersectionality is a concept developed by critical race scholar Kimberle Crenshaw (1991) that seeks to examine the “multiple grounds of identity” that shape our social world. This theory recognizes that categories such as race, class, gender, sexuality, etc. are not mutually exclusive but, rather, are interconnected and co-constituted and therefore cannot be examined independently of each other. It also recognizes the interconnectedness of systems of oppression that shape the structural, political, and representational aspects of identity.
  3. Marginalization refers to the way in which the dominant group uses institutions, laws, ideologies, and cultural norms to disempower, control, and oppress minority groups (Cohen, 1999). Marginalization can occur in various realms, including but not limited to the political, economic, and social, and can include being excluded from decision-making processes and institutions, denied access to resources, segregation and stigmatization based on perceived identity.
  4. Institutional racism is the sometimes intentional, but more often unintentional policies, practices, or customs, that prevent or exclude racial groups from equal participation in an institution (Ridley & Kelly, 2006; Dictionary of Sociology, 2014).

Christina Harlow: Walkthrough Of Geonames Recon Service

planet code4lib - Wed, 2015-07-29 00:00

This came out of documentation I was writing up for staff here at UTK. I apologize if it is too UTK-workflow specific.

I’m working currently on migrating a lot of our non-MARC metadata collections from older platforms using a kind of simple Dublin Core to MODS/XML (version 3.5, we’re currently looking at 3.6) that will be ingested into Islandora. That ‘kind of simple Dublin Core’ should be taken as: there was varying levels of metadata oversight over the years, and folks creating the metadata had different interpretations of the Dublin Core schema - a well-documented and well-known issue/consideration for working with such a general/flexible schema. Yes, there are guidelines from DCMI, but for on-the-ground work, if there is no overarching metadata application profile to guide and nobody with some metadata expertise (or investment) to verify that institution-wide, descriptive (or any type, for that matter) metadata fields are being used consistently, it is no surprise that folks will interpret metadata fields in different ways with an eye to their own collection/context. This issue increases when metadata collections grow over time, occur with little to no documentation, and a lot of the metadata creation is handed off to content specialists, who might then hand it off to their student workers. If you are actually reading my thoughts right now, well thanks, but also you probably know the situation I’m describing well.

Regardless, I’m not here to talk about why I think my job is important, but rather about a very particular but useful procedure and tool that make up my general migration/remediation work, which also happens to be something I’m using and documenting right now for UTK cataloger reskilling purposes. I have been working with some of the traditional MARC catalogers to help with this migration process, and so far the workflow is something like this:

  1. I pull the original DC (or other) data, either from a csv file stored somewhere, or, preferably, from an existing OAI-PMH DC/XML feed for collections in (soon to be legacy) platforms. This data is stored in a GitHub repository [See note below] as the original data for both version control and “But we didn’t write this” verification purposes.
  2. A cleaned data directory is made in that GitHub repo, where I put a remediation files subdirectory. I will review the original data, see if an existing, documented mapping makes sense (unfortunately, each collection usually requires separate mapping/handling), and pull the project into OpenRefine. In OpenRefine, I’ll do a preliminary ‘mapping’ (rename columns, review the data to verify my mapping as best I can without looking at the digitized objects due to time constraints). At this point, I will also note what work needs to be done in particular for that dataset. I’ll export that OpenRefine project and put it into the GitHub repo remediation files subdirectory, and also create or update the existing wiki documentation page for that collection.
  3. At this point, I will hand off the OpenRefine project to one of the catalogers currently working on this metadata migration project. They are learning OpenRefine from scratch but doing a great job of getting the hang of both the tool and the mindset for batch metadata work. I will tell them some of the particular points they need to work on for that dataset, but also they are trained to check that the mapping holds according to the UTK master MODS data dictionary and MAP, as well as that controlled access points have appropriate terms taken from the selected vocabularies/ontologies/etc. that we use. With each collection they complete, I’m able to give them a bit more to handle with the remediation work, which has been great.
  4. Once the catalogers are done with their remediation work/data verification, I’ll take that OpenRefine project they worked on, bring it back into OpenRefine on my computer, and run some of the reconciliation services for pulling in URIs/other related information we are currently capturing in our MODS/XML. One of the catalogers is starting to run some of these recon services herself, but it is something I’m handing over slowly because there is a lot of nuance/massaging to some of these services, and the catalogers working on this project only currently do so about 1 day a week (so it takes longer to get a feeling for this).
  5. I review, do some reconciliation stuff, get the complex fields together that need to be for the transform, then export as simple XML, take that simple XML and use my UTK-standard OpenRefine XML to MODS/XML XSLT to generate MODS/XML, then run encoding/well-formed/MODS validation checks on that set of MODS/XML files.
  6. Then comes the re-ingest to Islandora part, but this is already beyond the scope of what I meant this post to be.

GitHub Note: I can hear someone now: ‘Git repositories/GitHub is not made for data storage!’ Yes, yes, I know, I know. It’s a cheat. But I’m putting these things under version control for my own verification purposes, as well as using GitHub because it has a nice public interface I can point to whenever a question comes up about ‘What happened to this datapoint’ (and those questions do come up). I don’t currently, but I have had really good luck with using the Issues component of GitHub too for guiding/centralizing discussion about a dataset. Using GitHub also has had the unintended but helpful consequence of highlighting to content specialists who are creating the metadata just why we need metadata version control, and why the metadata updates get frozen during the review, enhancement and ingest process (and after that, metadata edits can only happen in the platform). But, yes, GitHub was not made for this, I know. Maybe we need dataHub. Maybe there is something else I *should* be using. Holla if you know what that is.

Okay, so I’m in step 4 right now, with a dataset that was a particular pain to remediate/migrate because the folks who did the grouping/digitization pulled together a lot of different physical objects into one digital object. This is basically the digital equivalent of ‘bound-withs’. However, the cataloger who did some of the remediation did a great job of finding, among other datapoints, the subject_geographic terms, getting them to subject_geographic, and normalizing the datapoint to a LCNAF/LCSH heading where possible. I’m about to take this and run my OpenRefine Geonames recon service against it to pull in coordinates for these geographic headings where possible. As folks seem to be interested in that recon service, I’m going to walk through that process here and now with this real life dataset.


So here is that ready-for-step-4 dataset in LODRefine (Linked Open Data Refine, or OpenRefine with some Linked Data extensions baked in; I need to write more about that later):

You can see from that portion a bit of what work is going on here. What I’m going to target in on right now is the subject_geographic column, which has multiple values per record (records in this instance are made up of a number of rows. This helps centralize the reconciliation work, but will need to be changed to 1 record = 1 row before pulling out for XML transformations). Here is the column, along with a text facet view to see the values we will be reconciling against Geonames:

Look at those wonderfully consistent geographic terms, thanks to the cataloger’s work! But, some have LoC records and URIs, some don’t, some maybe have Geonames records (and so coordinates), some might not… so let’s go ahead and reconcile with Geonames first. To use the Geonames service, I already have a copy of the Geonames Recon Service on my computer, and I have updated my local machine’s code to have my own private Geonames API name. See more here:

I’m then going to a CLI (on my work computer, just plain old Mac Terminal),

change to the directory where I have my local Geonames recon service code stored,

then type in the command ‘python –debug’. The Geonames endpoint should fire up on your computer now. You may get some warning notes like I have below, which means I need to do some updating to this recon service or to my computer’s dependencies installation (but am going to ignore for time being while recon service still works because, well, time is at a premium).

Note, during all of this, I already have LODRefine running in a separate terminal and the LODRefine GUI in my browser.

Alright, with all that running, lets hop back to our web browser window where LODRefine GUI is running with my dataset up. I’ve already added this Geonames as a reconciliation service, but in case you haven’t, you would still go first to the dropdown arrow for any column (I’m using the column I want to reconcile here, subject_geographic), then to Reconcile > Start Reconciling.

A dialog box like this should pop up:

I’ve already got GeoNames Reconciliation Service added, but if you don’t, click on ‘Add Standard Service’ (in the bottom left corner), then add the localhost URL that the Geonames python flask app you started up in the Terminal before is running on (for me and most standard set ups, this will be

I will cancel out of that because I already have it running, and then click on the existing GeoNames Reconciliation Service in the ‘Reconcile column’ dialog box. If you just added the service, you should have the same thing as me showing now upon adding the service:

There are a few type options to choose from:

  • geonames/name = search for the cells’ text just in the names field in a Geonames record
  • geonames/name_startWith = search for Geonames records where the label starts with the cells’ text
  • geonames/name_equals = search for an exact match between the Geonames records and the cells’ text
  • geonames/all = just do keyword search of Geonames records with our cells’ text.

Depending on the original data you are working with, the middle two options can return much more accurate results for your reconciliation work. However, because these are LoC-styled headings (with the mismatching of headings style with Geonames I’ve described recently in other posts as well as in the for this Geonames Recon code), I’m going to go with geonames/all. If you haven’t read those other thoughts, basically, the Geonames name for Richmond, Virginia is just ‘Richmond’, with Virginia, United States, etc. noted instead in the hierarchy portion of the record. This makes sense but makes for bad matching with LoC-styled headings. Additionally, the fact that a lot of these geographic headings refer to archaeological dig sites and not cities/towns/other geopolitical entities also means a keyword search will return better results (in that it will return results at all).

Sidenote: See that ‘Also use relevant details from other columns’ part? This is something I’d love to use for future enhancements to this recon service (maybe refer to hierarchical elements there?) as well as part of a better names (either LCNAF or VIAF) recon service I’m wanting to work more on. Names, in particular, personal names and reconciliation is a real nightmare right now.

Alright, so I select ‘geonames/all’ then I click on the ‘Start reconciling’ button in the bottom right corner. Up should pop a yellow notice that reconciliation is happening. Unfortunately, you can’t do more LODRefine work while that is occuring, and depending on your dataset size, it might take a while. However, one of the benefits of using this reconciliation service (versus a few others ways that exist for reconciliation against an API in LODRefine) is speed.

Once the reconciliation work is done, up should pop a few more facet boxes in LODRefine - the judgement and the best candidate’s score boxes, as well as the matches found as hyperlinked options below each cell in the column. Any cell with a value considered a high score match to a Geonames value will be associated with and hyperlinked to that Geonames value automatically.

Before going through matches and choosing the correct ones where needed, I recommend you change the LODRefine view from rows to records - as long as the column you are reconciling in is not the first column. Changing from records to rows then editing the first column means that, once you go back to records view, the records groupings may have changed and no longer be what you intent. But for any other column, the groupings remain intact.

Also, take a second to look at the Terminal again where Geonames is running. You should see a bunch of lines showing the API query URLs used for each call, as well as a 200 response when a match is found (I’m not going to show you this on my computer as each API call has my personal Geonames API name/key in it). Just cool to see this, I think.

Back to the work in LODRefine, I’m going to first select to facet the results with judgement:none and then unselect ‘error’ in the best candidate’s score facet box.

If you’re looking at this and thinking ‘1 match? that is not really good’, well… 1. yes, there are definite further improvements needed to have Geonames and LoC-styled headings work better together, but… 2. library data has a much, much higher bar for this sort of batch work and the resultant quality/accuracy expected, so 1 auto-determined match in a set of geographic names focused on perhaps not well known archaeological sites is okay with me. Plus, the Geonames recon service is not done helping us yet.

Now you should have a list of cells with linked options below each value:

What I do now is review the options, and choose the double check box for what is the correct Geonames record to reconcile against. The double check box means that what I choose for this cell value will also be applied to all other cells in LODRefine that have that same value.

If I’m uncertain, I can also click on any of the options, and the Geonames record for that option will show up for my review. Also, for each option you select as the correct one for that cell, those relevant cells should then disappear from the visible set due to our facet choices.

Using these functionalities, I can go through the possible matches fairly quickly, and much more quickly all other work included than doing this matching entirely manually. Due to the constraints of library data’s expected quality, this sort of semi-automated, enhanced-manual reconciliation is really where a lot of this work will occur for many (but not all) institutions.

If in reviewing the matches, if there is no good match presented, you can choose ‘create new topic’ to pass through the heading as found, unreconciled with Geonames.

Now I’m done my review (which took about 5 minutes for this set), I can see that I have moved from 1 matched heading to 106 matched headings (I deselected the ‘None’ facet in the judgment box and closed the ‘best match facet’ box).

However, there are still 134 headings that were matched to nothing in Geonames. Click on that ‘none’ facet in the judgment box, and leaving the geographic_subject column text facet box up, I can do a quick perusal of what didn’t find a match, as well as check on headings that seem like should have had a match in Geonames. However, for this dataset, I see a lot of these are archaeological dig sites, which probably aren’t in Geonames, so the service worked fairly well so far. This is also how you’ll find some typos or other errors as well, and any historical changes in names that may have occured. For the facet values that I do find in Geonames, I click to edit the facet value and go ahead and add the coordinates, which is what I pull from Geonames currently (we opt to choose the LoC URI for these headings at present, but this is under debate).

Note: datasets with more standard geographic names (cities, states, etc) will have much better results doing the above described work. However, I want to show here a real life example of something I want to pull in coordinates from Geonames for, like archaeological or historical sites.

I end up adding Coordinates for 5 values which weren’t matched to Geonames either because of typos or because the site is on the border of 2 states (a situation LoC and Geonames handle differently). I fixed 7 typos as well in this review.

Now I’m done reconciling, I want to capture the Geonames Coordinates in my final value. First I close all the open facet boxes in LODRefine. Now on that subject_geographic column, I am going to click on the column header triangle/arrow and choose Edit Cells > Transform.

In the Custom text transform on column subject_geographic box that appears, in the Expression text area, I will put in the following:

if(isNonBlank(, value + substring(," | ")), value)

Lets break this out a bit:

  • value = the cell’s original value that was then matched against Geonames.
  • = the name (and coordinates because we’re using the Geonames recon service I cobbled together) of the value we choose as a match in the reconciliation process.
  • = the URI for that matched value from the reconciliation process.
  • Why isn’t there cell.recon.match.coords? Yes, I tried that, but it involves hacking the core OpenRefine recon service backend more than I’m willing to do right now
  • if(test, do this, otherwise do that) = not all of the cells had a match in Geonames, so I don’t want to change those unmatched cells. The if statements then says “if there is a reconciliation match, then pull in that custom bit, otherwise leave the cell value as is.”
  • substring(,“ | “)) = means I just want to pull everything in that value after the pipe - namely, the coordinates. I am leaving the name values as is because they are currently matched against LoC for our metadata.

Why do we need to run this transform? Because although we have done reconciliation work in LODRefine, if I was to pull this data out now (say export as CSV), the reconciliation data would not come with it. LODRefine is still storing the original cell values in the cells, with the reconciliation data laid over top of it. This transform will change the underlying cell values to the reconciled values I want, where applicable.

After running the transform, you can remove the reconciliation data to see exactly what the underlying values now look like. And remember there is always the Undo tab in LODRefine if you need to go back.

What our cell values look like now:

And, ta-da! Hooray! At this point, I can shut down the Geonames reconciliation python flask app running in that terminal by going to the Terminal window it is running in and typing in cntl + C. Back in LODRefine, remember to change back from rows view to records view (links to do this are in the top left corner).

Thoughts on this process

Some may think this process seems a bit extreme for pulling in just coordinates. However…

  1. Remember that this seems extreme for the first few times or when you are writing up documentation explaining it (especially if you are as verbose as I am). In practice, this takes me maybe at most 20 minutes for a dataset of this size (73 complex MODS records with 98 unique subject_geographic headings). It gets faster and easier, and is definitely more efficient still, than completely manually updates, and remains far more accurate than completely automated reconciliation options.
  2. If so moved, I could pull in Geonames URIs as part of this work, which would be even better. However, because of how we handle our MODS at present, we don’t. But the retrieval of URIs and other identifiers for such datapoints is a key benefit.
  3. For datasets larger than 100 quasi-complex records, this is really the only way to go at present and considering our workflows. I don’t want to give these datasets to the catalogers and ask them to add coordinates, URIs, or other reconciled information because they need to focus on the batch work on this process - checking the mappings, getting values in appropriate formats or encodings - and not manually searching each controlled access point in a record or row then copy and pasting that information from some authority source. But this is all a balancing act.
  4. This process also has the added benefit of making very apparent typos and other such mistakes. Unfortunately, I’m not as aware in my quick blog ramblings.

Hope this is helpful for others.

Harvard Library Innovation Lab: Link roundup July 28, 2015

planet code4lib - Tue, 2015-07-28 16:48

I see a theme here — computers are entertainers, directors, performers.

A Sort of Joy

“How can a database be performed?” What a wonderful question to ask as MoMA releases its object collection metadata.

Editor by NYTLabs

Fine-grained annotation and tagging as you type

Genius E-Ink Parking Signs Change Based on the Day | Mental Floss

Another fantastic use of E-Ink

GIFs of Japanese Life

“gorgeously illustrated 8-bit animations, which beautifully capture daily life”

The Next Wave

What big thing is next? Materials science and augmented reality are prime.

Islandora: Islandora Community Stories

planet code4lib - Tue, 2015-07-28 15:46

During the Islandora Conference, we hope to collect stories from community members about how and why they got started with Islandora. Alex Kent from the Conference Planning Team will arrange casual in person video interviews from those who are willing, taking about 10-15 minutes of your time. We'll be using iPhone/iPADs to record the interviews. To participate, seek out Alex at the conference or drop an email to to set up a time. 

After the conference we'll compile the interviews and make them available on the Islandora site as a way to highlight community members and Islandora's value.

If you do not wish to be interviewed in person, you can take the survey online here.

Those who participate at the conference will be entered in a drawing to win one of five Islandora Tuque Tuques.

LibUX: A useful function for making querySelectorAll() more like jQuery

planet code4lib - Tue, 2015-07-28 15:44

Through LibUX I try to evangelize the importance of speed — or the perception of speed — to the net value of the user experience. People care.

Of the many tweaks we can make to improve web performance, we might try to ween our code from javascript libraries where it’s unnecessary. Doing so removes bloat in a couple of ways: first, by literally reducing the number of bytes required to render or add functionality to a site or app; second — and, more importantly — scripts just process faster in the browser if they have fewer methods to refer to.

As I write this I am weening myself from jQuery, and even though newer utilities like querySelector (MDN) do the trick by using jQuery-like syntax, they’re not quite the Coca-Cola mouth-watering sweetness of $( selector ).doSomething().

The difference between document.querySelectorAll( '.pie' ) (MDN) and $( '.pie' ) is that the object returned by the former is an array-like-but-not-an-array NodeList that doesn’t give you the immediate access to manipulate each instance of that element in the document. With jQuery, to add cream to every slice of pie you might write

$( '.pie' ).addClass( 'cream' );

The no-jQuery way requires that you deal with the NodeList yourself. This example is only three additional lines — but it’s enough to make me whine a little.

var pie = document.querySelectorAll( '.pie' ); for ( var i = 0; i < pie.length; i++ ) { pie[i].classList.add( 'cream' ); } A useful helper function

The following wrapper allows for use of a jQuery-like dollar-sign selector that lets you iterate through these elements as a simple array: $$( selector ).forEach( function( el { doSomething() });. I have adopted this from seeing its use in some of Lea Verou‘s projects.

function $$(selector, context) { context = context || document; var elements = context.querySelectorAll(selector); return; }

The array-like NodeList is turned into a regular array with elements ) (MDN)), which can add convenience and otherwise mitigate some of the withdrawal we in Generation jQuery feel when iterating through the DOM.

$$( '.pie' ).forEach( function( pie ) { pie.classList.add( 'cream' ); });

The post A useful function for making querySelectorAll() more like jQuery appeared first on LibUX.

William Denton: Updated Westlake footnote

planet code4lib - Tue, 2015-07-28 15:43

I updated the list of fictional footnotes with more information on Don’t Ask (1993) by Donald E. Westlake, which I just read (it’s a Dortmunder):

Two chapter headings have footnotes that identify them as “Optional—historical aside—not for credit.” Chapter six mentions a street with “a whole block of taxpayers.” This is footnoted: “A temporary structure, commonly one story in height and containing shops of the most ephemeral sort. Constructed by owners of the land when a delay is anticipated, sometimes of several decades’ duration, between the razing of the previous unwanted edifice and the erection of the new blight on the landscape. Called a ‘taxpayer’ because that’s what it does.+” The second footnote, indented under the first, says, “Didn’t expect a footnote in a novel, did you? And a real informative one, too. Pays to keep on your toes.”

The t.p. verso of my 1994 Mysterious Press paperback edition has this:

Enjoy lively book discussion online with CompuServe. To become a member of CompuServe call 1-800-848-8199 and ask for the Time Warner Trade Publishing forum. (Current members: GO:TWEP.)

I called the number but got a fast busy.

FOSS4Lib Recent Releases: ArchivesSpace - 1.3.0

planet code4lib - Tue, 2015-07-28 14:23

Last updated July 28, 2015. Created by cdibella on July 28, 2015.
Log in to edit this page.

Package: ArchivesSpaceRelease Date: Tuesday, June 30, 2015


Subscribe to code4lib aggregator