You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 1 hour 23 min ago

LITA: Understanding Creative Commons Licensing

Fri, 2015-09-25 14:00

Creative Commons (CC) is a public copyright license. What does this mean? It means it allows for free distribution of work that would otherwise be under copyright, providing open access to users. Creative Commons licensing provides both gratis OA licensing and libre OA  licensing (terms coined by Peter Suber). Gratis OA is free to use, libre OA is free to use and free to modify.

How does CC licensing benefit the artist? Well, it allows more flexibility with what they can allow others to do with their work. How does it benefit the user? As a user, you are protected from copyright infringement, as long as you follow the CC license conditions.

CC licenses: in a nutshell with examples

BY – attribution | SA – share alike | NC – non-commercial | ND – no derivs

CC0 – creative commons zero license means this work is in the public domain and you can do whatever you want with it. No attribution is required. This is the easiest license to work with. (example of a CC0 license: Unsplash)

BY – This license means that you can do as you wish with the work but only as long as you provide attribution for the original creator. Works with this type of license can be expanded on and used for commercial use, if the user wishes, as long as attribution is given to the original creator. (example of a CC-BY license: Figshare ; data sets at Figshare are CC0; PLOS)

BY-SA – This license is an attribution licenses and share alike license meaning that all new works based on the original work will carry the same license. (example of a CC-BY-SA license: Arduino)

BY-NC – this license is another attribution license but the user does not have to retain the same licensing terms as the original work. The catch, the user must be using the work non-commercially. (example of a BY-NC license: Ely Ivy from the Free Music Archive)

BY-ND – This license means the work can be shared, commercially or non-commercially, but without change to the original work and attribution/credit must be given. (example of a BY-ND license: Free Software Foundation)

BY-NC-SA – This license combines the share alike and the non-commercial with an attribution requirement. Meaning, the work can be used (with attribution/credit) only if for non-commercial use and any and all new works retain the same BY-NC-SA license. (example of a CC BY-NC-SA: Nursing Clio see footer or MITOpenCourseWare)

BY-NC-ND – This license combines the non-commercial and non-derivative licenses with an attribution requirement. Meaning, you can only use works with this license with attribution/credit for non-commercial use and they cannot be changed from the original work. (example of a BY-NC-ND license: Ted Talk Videos)

DuraSpace News: The ACRL Toolkit–An Open Access Week Preparation Assistant

Fri, 2015-09-25 00:00

Washington, DC  Let ACRL’s Scholarly Communication Toolkit help you prepare to lead events on your campus during Open Access Week, October 19-25, 2015. Open Access Week, a global event now entering its eighth year, is an opportunity for the academic and research community to continue to learn about the potential benefits of Open Access, to share what they’ve learned with colleagues, and to help inspire wider participation in helping to make Open Access a new norm in scholarship and research.

DuraSpace News: Cineca Releases Version 5.3.0 of DSpace-CRIS

Fri, 2015-09-25 00:00

From Michele Mennielli, Cineca

As announced in August’s DuraSpace Digest, just a few days after the release of the version 5.2.1, on August 25, 2015 Cineca released Version 5.3.0 of DSpace-CRIS. The new version is aligned with the latest DSpace 5 release and includes a new widget that supports the dynamic properties of CRIS objects to support hierarchy classification such as ERC Sectors, MSC (Mathematics Subject Classification).

District Dispatch: You might as well have stayed in DC

Thu, 2015-09-24 19:48

Nashville is site of U.S Judiciary Committee listening tour, photo courtesy of Matthew Percy, flickr photo share.

At this stage of the copyright reform effort, the U.S. House Judiciary Committee is meeting with stakeholders for “listening sessions,” which give concerned rights holders or users of content an opportunity to make their case for a copyright fix. To reach a broader audience, the Committee is going on the road to reach individuals and groups around the country, and one would think, to hear a range of opinions from the community. So, on September 22, they went to Nashville, a music mecca, to hold a listening session regarding music copyright reform.

Music, perhaps more than any other form of creative expression, needs to be re-examined. New business models for digital streaming, fair royalty rights, and requests for transparency have all created a need for clarity on who gets paid for what in the music business. We need policy that answers this question in a way that’s fair to everyone. One thing has been agreed on by copyright stakeholders thus far—people should be compensated for their intellectual and creative work. Wonderful.

But lo and behold—the same industry and trade group lobbyists that always get a chance to meet with the Congressional representatives and staff in DC turned out to be mostly the only music stakeholder groups that were invited to speak. What gives?

It looks like the House merely gathered the usual suspects—a list of “who do we know (already)?” to the table. It would have been simple for the Committee to convene a wide gamut of music stakeholders together to paint a full picture of the state of the music industry, given the fact that they met in Nashville. Ultimately, however, other key stakeholders (Out of the Box, Sorted Noise, community radio, music educators, librarians, archivists, and consumers) were not heard, and only one (older) version of the state of the music industry (that the Committee already knows about) took center stage.
So, why go to Nashville?

Don’t get me wrong. It is a good thing that the Committee wants to hear from all stakeholders and it is thoughtful to hold listening sessions in geographically diverse locations, but you have to give people you don’t already know an opportunity to speak. That’s the only way to learn about new business models and how best to cultivate music creators of tomorrow—to truly understand how the creativity ecosystem can thrive in the future and then what legislative changes are needed to realize that future.

The post You might as well have stayed in DC appeared first on District Dispatch.

District Dispatch: CopyTalk on international trade treaty

Thu, 2015-09-24 19:34

By trophygeek

What does a trade agreement have to do with libraries and copyright? Expert Krista Cox who has traveled the world promoting better policies for the intellectual property chapter of the Trans-Pacific Partnership Agreement (TPP) will enlighten us at our next CopyTalk webinar.

There is no need to pre-register for this free webinar! Just go to: on October 1, 2015 at 2 p.m. EST/11 a.m. PST.

Note that the webinar is limited to 100 seats so watch with colleagues if possible. An archived copy will be available after the webinar.

The Trans-Pacific Partnership Agreement (TPP) is a large regional trade agreement currently being negotiated between twelve countries: Australia, Brunei, Canada, Chile, Japan, Malaysia, Mexico, New Zealand, Peru, Singapore, the United States and Vietnam. The agreement has been negotiated behind closed doors, but due to various leaks of the text it is apparent that the TPP will include a comprehensive chapter on intellectual property, including specific provisions governing copyright and enforcement. In addition to requiring other countries to change their laws, the final agreement could lock-in controversial provisions of US law and prevent reform in certain areas.

Krista Cox is Director of Public Policy Initiatives at the Association of Research Libraries (ARL). In this role, she advocates for the policy priorities of the Association and executes strategies to implement these priorities. She monitors legislative trends and participates in ARL’s outreach to the Executive Branch and the US Congress.

Prior to joining ARL, Krista worked as the staff attorney for Knowledge Ecology International (KEI) where she focused on access to knowledge issues as well as TPP. Krista received her JD from the University of Notre Dame and her BA in English from the University of California, Santa Barbara. She is licensed to practice before the Supreme Court of the United States, the Court of Appeals for the Federal Circuit, and the State Bar of California.

The post CopyTalk on international trade treaty appeared first on District Dispatch.

DPLA: Supporting National History Day Researchers

Thu, 2015-09-24 15:14

In 2015, DPLA piloted a National History Day partnership with National History Day in Missouri, thanks to the initiative of community rep Brent Schondelmeyer. For 2016, DPLA will be partnering with NHDMO and two new state programs: National History Day – California and National History Day in South Carolina. For each program, DPLA designs research guides based on state and national topics related to the contest theme, acts an official sponsor, and offers a prize for the best project that extensively incorporates DPLA resources. 

In this post, NHDMO Coordinator Maggie Mayhan describes the value of DPLA as a resource for NHD student researchers. To learn more about DPLA and National History Day partnerships, please email

Show-Me History

Each year more than 3,500 Missouri students take part in National History Day (NHD), a unique opportunity for sixth- through twelfth-grade students to explore the past in a creative, hands-on way. While producing a documentary, exhibit, paper, performance, or website, they become an expert on the topic of their choosing.

In following NHD rules, students quickly learn that the primary sources they are required to use in their projects also help them to tell their stories effectively. But where do they start their search for those sources? How can it be manageable and meaningful?

Enter Digital Public Library of America (DPLA). Collecting and curating digital sources from libraries, museums, and archives, the DPLA portal connects students and teachers with the resources that they need. For students who cannot easily visit specialized repositories to work with primary sources, DPLA may even be the connection that enables them to participate in National History Day.

National History Day in Missouri loves how DPLA actively works to fuse history and technology, encouraging students to use modern media to access and share history. Knowing how to use new technologies to find online archives, databases, and other history sources is important for future leaders seeking to explore the past.

Seeing the potential for a meaningful collaboration in which students uncover history through the DPLA collections and put their own stamp on it through National History Day projects, the Digital Public Library of America became a major program sponsor in 2015.

Additionally, DPLA sponsors a special prize at the National History Day in Missouri state contest, awarded to the best documentary or website that extensively incorporates DPLA resources. The 2015 prize winners, Keturah Gadson and Daniela Hinojosa from Pattonville High School in St. Louis, pointed out that DPLA access was important for their award-winning website about civil rights activist Thurgood Marshall:

We found that the sources on the Digital Public Library of America fit amazingly into our research and boosted it where we were lacking… the detail we gained from looking directly at the primary sources was unmatched…DPLA sources completed our research wonderfully.

National History Day in Missouri is excited to continue this partnership in 2016, and we look forward to seeing what resources students will discover as they focus on the 2016 contest theme, Exploration, Encounter, Exchange in History.

LITA: LITA Forum early bird rates end soon

Thu, 2015-09-24 14:00
LITA and LLAMA Members

There’s still time to register for the 2015 LITA Forum at the early bird rate and save $50
Minneapolis, MN
November 12-15, 2015


LITA Forum early bird rates end September 30, 2015
Register Now!

Join us in Minneapolis, Minnesota, at the Hyatt Regency Minneapolis for the 2015 LITA Forum, a three-day education and networking event featuring 2 preconferences, 3 keynote sessions, more than 55 concurrent sessions and 15 poster presentations. This year including content and planning collaboration with LLAMA.

Why attend the LITA Forum

Check out the report from Melissa Johnson. It details her experience as an attendee, a volunteer, and a presenter. This year, she’s on the planning committee and attending. Melissa says most people don’t know is how action-packed and seriously awesome this years LITA Forum is going to be. Register now to receive the LITA and LLAMA members early bird discount:

  • LITA and LLAMA member early bird rate: $340
  • LITA and LLAMA member regular rate: $390

The LITA Forum is a gathering for technology-minded information professionals, where you can meet with your colleagues involved in new and leading edge technologies in the library and information technology field. Attendees can take advantage of the informal Friday evening reception, networking dinners and other social opportunities to get to know colleagues and speakers and experience the important networking advantages of a smaller conference.

Keynote Speakers:

  • Mx A. Matienzo, Director of Technology for the Digital Public Library of America
  • Carson Block, Carson Block Consulting Inc.
  • Lisa Welchman, President of Digital Governance Solutions at ActiveStandards.

The Preconference Workshops:

  • So You Want to Make a Makerspace: Strategic Leadership to support the Integration of new and disruptive technologies into Libraries: Practical Tips, Tricks, Strategies, and Solutions for bringing making, fabrication and content creation to your library.
  • Beyond Web Page Analytics: Using Google tools to assess searcher behavior across web properties.

Comments from past attendees:

“Best conference I’ve been to in terms of practical, usable ideas that I can implement at my library.”
“I get so inspired by the presentations and conversations with colleagues who are dealing with the same sorts of issues that I am.”
“After LITA I return to my institution excited to implement solutions I find here.”
“This is always the most informative conference! It inspires me to develop new programs and plan initiatives.”

Forum Sponsors:

EBSCO, Ex Libris, Optimal Workshop, OCLC, InnovativeBiblioCommons, Springshare, A Book ApartRosenfeld Media and Double Robotics.

Get all the details, register and book a hotel room at the 2015 Forum Web site.

See you in Minneapolis.

LITA: September Library Tech Roundup

Thu, 2015-09-24 14:00
Image courtesy of Flickr user kalexanderson (CC BY).

Each month, the LITA bloggers share selected library tech links, resources, and ideas that resonated with us. Enjoy – and don’t hesitate to tell us what piqued your interest recently in the comments section!

Brianna M.

Cinthya I.

I’m mixing things up this month and have been reading a lot on…

John K.

Hopefully this isn’t all stuff you’ve all seen already:

Whitni Watkins

These are all over the place as I’ve been bouncing back and forth between multiple interests I’ve been finger dipping in.

LibUX: On the User Experience of Ebooks

Thu, 2015-09-24 01:57

So, when it comes to ebooks I am in the minority: I prefer them to the real thing. The aesthetic or whats-it about the musty trappings of paper and ink or looming space-sapping towers of shelving just don’t capture my fancy. But these are precisely the go-to attributes people wax poetically about — and you can’t deny there’s something to it.

In fact, beyond convenience ebooks don’t have much of an upshot. They are certainly not as convenient as they could be. All the storytelling power of the web is lost on such a stubbornly static industry where print – where it should be most advantageous – drags its feet. Write in the gloss on, but not in an ebook; embellish a narrative with animation at the New York Times (a newspaper), but not in an ebook; share, borrow, copy, paste, link-to anything but an ebook.

Note what is lacking when it comes to ebook’s advantages: the user experience. True, some people certainly prefer an e-reader (or their phone or tablet), but a physical book has its advantages as well: relative indestructibility, and little regret if it is destroyed or lost; tangibility, both in regards to feel and in the ability to notate; the ability to share or borrow; and, of course, the fact a book is an escape from the screens we look at nearly constantly. At the very best the user experience comparison (excluding the convenience factor) is a push; I’d argue it tilts towards physical books.


All things being equal, where it lacks can be made-up by the no-cost of its distribution, but the rarely discounted price of the ebook is often more expensive for those of us in libraries or higher ed – if not substantially subjectively so given that readers neither own nor can legally migrate their ebook-as-licensed-software to a device, medium, or format where the user experience can be improved.

This aligns with findings which show while ebook access improves (phones, etc …) their reading doesn’t meaningfully pull away from the reading of print books.

Recent hullabaloo involving the ebookalypse may be a misreading which ignores data from sales of ebooks without isbns (loathed self-publishers) in which Amazon dominates because of the ubiquity of Kindle and its superior bookstore. There, big-publisher books are forced to a fixed price using an Amazon-controlled interface wherein authors add and easily publish good content on the cheap. We are again reminded that investing in even a slightly better user experience than everyone else is good business:

  • the price of ebooks are competitively low – or even free;
  • ebooks, through Kindles or the Kindle App, can be painlessly downloaded that while largely encumbered by DRM doesn’t require inconvenient additional software or – worst – to be read on a computer;
  • and features like WhisperSync enhance the reading experience in a way that isn’t available in print.

Other vendors, particularly those available to libraries, have so far been able to only provide a fine user experience that doesn’t do much for their desirability for either party.

The post On the User Experience of Ebooks appeared first on LibUX.

District Dispatch: ALA Congratulates Dr. Kathryn Matthew

Wed, 2015-09-23 22:35

Dr. Kathryn Matthew, Director, Institute of Museum and Library Services.

U.S. Senate confirms Matthew as Director of the Institute of Museum and Library Services

Washington, DC— In a statement, American Library Association (ALA) President Sari Feldman commented on the United States Senate’s confirmation of Dr. Kathryn K. Matthew as director of the Institute of Museum and Library Services (IMLS).

“We commend President Obama on Dr. Matthew’s appointment and the U.S. Senate for her confirmation. Communities across the nation will greatly benefit from her experience in bringing museums and libraries and the sciences together as resources readily accessible to families, students and others in our society.”

The Institute, an independent United States government agency, is the primary source of federal support for the nation’s 123,000 libraries and 35,000 museums.

“I am honored to have been nominated by President Barack Obama and to have received the confidence from the Senate through their confirmation process. I look forward to being appointed to serve as the fifth Director of the Institute of Museum and Library Services,”  Dr. Matthew said. “I am eager to begin my work at IMLS to help to sustain strong libraries and museums that convene our communities around heritage and culture, advance critical thinking skills, and connect families, researchers, students, and job seekers to information.”

Dr. Matthew will serve a four-year term as the Director of the Institute. The directorship of the Institute alternates between individuals from the museum and library communities.

ALA appreciates the exemplary service of Maura Marx, who served as IMLS Acting Director since January 19, 2015, following the departure of IMLS Director Susan H. Hildreth, at the conclusion of her four-year term. Marx is currently the deputy director for library services.  ALA has enjoyed good, close and collaborative relationship with Hildreth and with Anne-Imelda Radice, who served as IMLS Director from 2006-2010, and looks forward to a similarly strong and cooperative relationship with Dr. Matthew.

Dr. Matthew’s career interests have centered around supporting and coaching museums and other nonprofits, large and small, who are focused on propelling their programs, communications, events, and fundraising offerings to a higher level of success. Dr. Matthew’s professional experience spans the breadth of the diverse museum field. Through her many different leadership positions, she brings to the agency a deep knowledge of the educational and public service roles of museums, libraries, and related nonprofits.

Trained as a scientist, Dr. Matthew’s 30-year museum career began in curatorial, collections management, and research roles at the Academy of Natural Sciences in Philadelphia and Cranbrook Institute of Science. She worked with a variety of collections including ornithology, paleontology, fine arts, and anthropology. She then moved into management, exhibits and educational programs development, and fundraising and marketing roles, working at the Santa Barbara Museum of Natural History, the Virginia Museum of Natural History, The Nature Conservancy, the Historic Charleston Foundation, and The Children’s Museum of Indianapolis. She was also a science advisor for the IMAX film “Tropical Rainforest,” produced by the Science Museum of Minnesota.

In addition she was Executive Director of the New Mexico Museum of Natural History and Science, a state-funded museum. In that role she worked with corporations, federal agencies, public schools, and Hispanic and Native American communities to offer STEM-based programs. “Proyecto Futuro” was a nationally-recognized program that began during her tenure.

Dr. Matthew has worked on three museum expansion projects involving historic buildings; Science City at Union Station, in Kansas City, Missouri, and the Please Touch Museum at Memorial Hall and The Chemical Heritage Foundation, both in Philadelphia.

Over her 30-year career, she has been active as a volunteer to smaller nonprofits, board member, and award winning peer reviewer for the American Alliance of Museums’ Accreditation and Museum Assessment Programs. Her board service has included two children’s museums, a wildlife rehabilitation center, and a ballet company.

The post ALA Congratulates Dr. Kathryn Matthew appeared first on District Dispatch.

District Dispatch: Six takeaways from new broadband report

Wed, 2015-09-23 21:52

ALA participated at a White House roundtable on new federal broadband recommendations, (photo by www.GlynLowe .com via Flickr)

On Monday the inter-agency Broadband Opportunity Council (BOC) released its report and recommendations on actions the federal government can take to improve broadband networks and bring broadband to more Americans. Twenty-five agencies, departments and offices took part in the Council, which also took public comments from groups like the ALA.

The wide-ranging effort opened the door to address outdated program rules as well as think bigger and more systemically about how to more efficiently build and maximize more robust broadband networks.

Here are six things that struck me in reading and hearing from other local, state and national stakeholders during a White House roundtable in which ALA participated earlier this week:

  1. It’s a big deal. The report looks across the federal government through a single lens of what opportunities for and barriers to broadband exist that it may address. Council members (including from the Institute of Museum and Library Services) met weekly, developed and contributed action plans, and approved the substance of the report. That’s a big job—and one that points to the growing understanding that a networked world demands networked solutions. Broadband (fixed and mobile) is everyone’s business, and this report hopefully begins the process of institutionalizing attention to broadband across sectors.
  2. It’s still a report…a first step toward action. There’s no new money, but some action items will increase access to federal programs valued at $10 billion to support broadband deployment and adoption. The US Department of Agriculture (USDA), for instance, will develop and promote new funding guidance making broadband projects eligible for the Rural Development Community Facility Program and will expand broadband eligibility for the RUS Telecommunications Program. Both of these changes could benefit rural libraries.
  3. It’s a roadmap. Because the report outlines who will do what and when, it provides a path to consider next steps. Options range from taking advantage of new resources to advising on new broadband research to increasing awareness of new opportunities among community partners and residents.
  4. “Promote adoption and meaningful use” is a key principle. ALA argued that broadband deployment and adoption should be “married” to drive digital opportunity, and libraries can and should be leveraged to empower and engage communities. Among the actions here is that the General Services Administration (GSA) will modernize government donation, excess and surplus programs to make devices available to schools, libraries and educational non-profits through the Computers for Learning program, and the Small Business Administration (SBA) will develop and deploy new digital empowerment training for small businesses.
  5. IMLS is called out. It is implicated in seven action items, and the lead on two related to funding projects that will provide libraries with tools to assess and manage broadband networks and expanding technical support for E-rate-funded public library Wi-Fi and connectivity expansions. IMLS also will work with the National Science Foundation and others to develop a national broadband research agenda. The activity includes review existing research and resources and considering possible research questions related to innovation, adoption and impacts (to name a few).
  6. A community connectivity index is in the offing. It is intended to help community leaders understand where their strengths lie and where they need to improve, and to promote innovative community policies and programs. I can think of a few digital inclusion indicators for consideration—how about you?

National Telecommunications and Information Administration (NTIA) Chief Lawrence Strickling noted that the report is “greater than the sum of its parts” in that it increased awareness of broadband issues across the government and brought together diverse stakeholders for input and action. I agree and am glad the Council built on the impactful work already completed through NTIA’s Broadband Technology Opportunities Program (BTOP). As with libraries and the Policy Revolution! initiative, we must play to our strengths, but also think differently and more holistically to create meaningful change. It’s now up to all of us to decide what to do next to advance digital opportunity.

The post Six takeaways from new broadband report appeared first on District Dispatch.

Jonathan Rochkind: bento_search 1.5, with multi-field queries

Wed, 2015-09-23 20:25

bento_search is a gem that lets you search third party search engine APIs with standardized, simple, natural ruby API. It’s focused on ‘scholarly’ sources and use cases.

Version 1.5, just released, includes support for multi-field searching:

searcher = ENV['SCOPUS_API_KEY']) results = => { :title => '"Mystical Anarchism"', :author => "Critchley", :issn => "14409917" })

Multi-field searches are always AND’d together, title=X AND author=Y; because that was the only use case I had and seems like mostly what you’d want. (On our existing Blacklight-powered Catalog, we eliminated “All” or “Any” choices for multi-field searches, because our research showed nobody ever wanted “Any”).

As with everything in bento_search, you can use the same API across search engines, whether you are searching Scopus or Google Books or Summon or EBSCOHost, you use the same ruby code to query and get back results of the same classes.

Except, well, multi-field search is not yet supported for Summon or Primo, because I do not have access to those proprietary projects or documentation to make sure I have the implementation right and test it. I’m pretty sure the feature could be added pretty easily to both, by someone who has access (or wants to share it with me as an unpaid ‘contractor’ to add it for you).

What for multi-field querying?

You certainly could expose this feature to end-users in an application using a bento_search powered interactive search. And I have gotten some requests for supporting multi-field search in our bento_search powered ‘articles’ search in our discovery layer; it might be implemented at some point based on this feature.

(I confess I’m still confused why users want to enter text in separate ‘author’ and ‘title’ fields, instead of just entering the author’s name and title in one ‘all fields’ search box, Google-style. As far as I can tell, all bento_search engines perform pretty well with author and title words entered in the general search box. Are users finding differently? Do they just assume it won’t, and want the security, along with the more work, of entering in multiple fields? I dunno).

But I’m actually more interested in this feature for other users than directly exposed interactive search.

It opens up a bunch of possibilities for a under-the-hood known-item identification in various external databases.

Let’s say you have an institutional repository with pre-prints of articles, but it’s only got author and title metadata, and maybe the name of the publication it was eventually published in, but not volume/issue/start-page, which you really want for better citation display and export, analytics, or generation of a more useful OpenURL.

So you take the metadata you do have, and search a large aggregating database to see if you can find a good match, and enhance the metadata with what that external database knows about the article.

Similarly, citations sometimes come into my OpenURL resolver (powered by Umlaut) that lack sufficient metadata for good coverage analysis and outgoing link generation, for which we generally need year/volume/issue/start-page too. Same deal.

Or in the other direction, maybe you have an ISSN/volume/issue/start-page, but don’t have an author and title. Which happens occasionally at the OpenURL link resolver, maybe other places. Again, search a large aggregating database to enhance the metadata, no problem:

results = => { :issn => "14409917", :volume => "10", :issue => "2", :start_page => "272" })

Or maybe you have a bunch of metadata, but not a DOI — you could use a large citation aggregating database that has DOI information as a reverse-DOI lookup. (Which makes me wonder if CrossRef or another part of the DOI infrastructure might have an API I should write a BentoSearch engine for…)

Or you want to look up an abstract. Or you want to see if a particular citation exists in a particular database for value-added services that database might offer (look inside from Google Books; citation chaining from Scopus, etc).

With multi-field search in bento_search 1.5, you can do a known-item ‘reverse’ lookup in any database supported by bento_search, for these sorts of enhancements and more.

In my next post, I’ll discuss this in terms of DOAJ, a new search engine added to bento_search in 1.5.

Filed under: General

LITA: Jobs in Information Technology: September 23, 2015

Wed, 2015-09-23 18:39

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week:

Systems and Web Services Librarian, Concordia College, Moorhead, MN

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

SearchHub: How Bloomberg Scales Apache Solr in a Multi-tenant Environment

Wed, 2015-09-23 17:09
As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Bloomberg engineer Harry Hight’s session on scaling Solr in a multi-tenant environment. Bloomberg Vault is a hosted communications archive and search solution, with over 2.5 billion documents in a 45TB Solr index. This talk will cover some of the challenges we encountered during the development of our Solr search backend, and the steps we took to overcome them, with emphasis on security and scalability. Basic security always starts with different users having access to subsets of the documents, but gets more interesting when users only have access to a subset of the data within a given document, and their search results must reflect that restriction to avoid revealing information. Scaling Solr to such extreme sizes presents some interesting challenges. We will cover some of the techniques we used to reduce hardware requirements while still maintaining fast responses times. Harry Hight is a software engineer for Bloomberg Vault. He has been working with Solr/Lucene for the last 3 years building, extending, and maintaining a communications archive/e-discovery search back-end. Efficient Scalable Search in a Multi-Tenant Environment: Presented by Harry Hight, Bloomberg L.P. from Lucidworks Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post How Bloomberg Scales Apache Solr in a Multi-tenant Environment appeared first on Lucidworks.

Library of Congress: The Signal: Improving Technical Options for Audiovisual Collections Through the PREFORMA Project

Wed, 2015-09-23 16:03

The digital preservation community is a connected and collaborative one. I first heard about the Europe-based PREFORMA project last summer at a Federal Agencies Digitization Guidelines Initiative meeting when we were discussing the Digital File Formats for Videotape Reformatting comparison matrix. My interest was piqued because I heard about their incorporation of FFV1 and Matroska, both included in our matrix but not yet well adopted within the federal community. I was drawn first to PREFORMA’s format standardization efforts – Disclosure and Adoption are two of the sustainability factors we use to evaluate digital formats on the Sustainability of Digital Formats website – but the wider goals of the project are equally interesting.

In this interview, I was excited to learn more about the PREFORMA project from MediaConch’s Project Manager Dave Rice and Archivist Ashley Blewer.

Kate: Tell me about the goals of the PREFORMA project and how you both got involved. What are your specific roles?

MediaConch Project Manager Dave Rice. Photo courtesy of Dave Rice

Dave: The goals of the PREFORMA project are best summarized by their foundational document called the PREFORMA Challenge Brief (PDF). The Brief describes an objective to “establish a set of tools and procedures for gaining full control over the technical properties of digital content intended for long-term preservation by memory institutions”. The brief recognizes that although memory institutions have honed decades of expertise for the preservation of specific materials, we need additional tools and knowledge to achieve the same level of preservation control with digital audiovisual files.

For initial work, the PREFORMA consortium selected several file formats including TIFF, PDF/A, lossless FFV1 video, the Matroska container, and PCM audio. After a comprehensive proposal process, three suppliers were selected to move forward with development. A project called VeraPDF focusing on PDF/A is led by a consortium comprised of Open Preservation Foundation, PDF Association, Digital Preservation Coalition, Dual Lab, and KEEP SOLUTIONS. The TIFF format is addressed by DPF Manager led by Easy Innova. Ashley and I work as part of the team. Our project is called MediaConch and focuses on the selected audiovisual formats: Matroska, FFV1, and PCM. MediaArea is led by Jérôme Martinez, who is the originator and principal developer of MediaInfo.

MediaConch Archivist Ashley Blewer. Photo courtesy of Ashley Blewer.

Ashley: Dave and Jérôme have collaborated in the past on open source software projects such as BWF MetaEdit (developed by AudioVisual Preservation Solutions as part of a FADGI initiative to support embedded metadata) and QCTools. QCTools, developed by BAVC with support from the National Endowment for the Humanities, was profiled in a blog post last year. Dave had also brought me in to do some work on the documentation and design of QCTools. When QCTools development was wrapping up, we submitted a proposal to PREFORMA and were accepted into the initial design phase. During that phase, we competed with other teams to deliver the software structure and design. We were then invited to continue to Phase II of the project: the development prototyping stage. We are currently in month seven (out of 22) of this second phase.

The majority of the work happens in Europe, which is where the software development team is based. Jérôme Martinez is the technical lead of the project. Guillaume Roques works on MediaConchOnline, database management, and performance optimization. Florent Tribouilloy develops the graphical user interface, reporting, and metadata extraction.

Here in the U.S., Dave Rice works as project manager and leads the team in optimizations for archival practice, system OAIS compliance, and format standardization. Erik Piil focuses on technical writing, creation of test files, and file analysis. Tessa Fallon leads community outreach and standards organization, mostly involving our plans to improve the standards documentation for both the Matroska and FFV1 formats through the Internet Engineering Task Force. I work on documentation, design and user experience, as well as some web development. Our roles are somewhat fluid, and often we will each contribute to tasks such as analyzing bitstream trace outputs to writing press releases for the latest software features.

PREFORMA: PREservation FORMAts for culture information/e-archives

Kate: The standardization of digital formats is a key piece in the PREFORMA puzzle as well as being something we consider when evaluating the Disclosure factor in the Sustainability of Digital Formats website. What’s behind the decision to pursue standardization through the Internet Engineering Task Force instead of an organization like the Society of Motion Picture and Television Engineers? What’s the process like and where are you now in the sequence of events? From the PREFORMA perspective, what’s to be gained through standardization?

Dave: A central aspect of the PREFORMA project is to create a conformance checker that would be able to process files and report on the state to which they deviate or conform to their associated specification. Early in the development of our proposal for Matroska and FFV1, we realized that the state of the specification compromised how effectively and precisely we could create a conformance checker. Additionally as we interviewed many archives that were using FFV1 and/or Matroska for preservation we found that the state of the standardization of these formats was the most shared concern. This research led us to include efforts towards facilitating the further standardization of both FFV1 and Matroska through an open standards body into our proposal. After reaching agreement from the FFmpeg and Matroska communities, we developed a standardization plan (PDF), which was included in our overall proposal.

As several standards organizations were considered, it was important to gain feedback on the process from several stakeholder communities. These discussions informed our decision to approach the IETF, which appeared the most appropriate for the project needs as well as the needs of our communities. The PREFORMA project is designed with significant emphasis and mandate on an open source approach, including not only the licensing requirements of the results, but also a working environment that promotes disclosure, transparency, participation, and oversight. The IETF subscribes to these same ideals; the standards documents are freely and easily available without restrictive licensing and much of the procedure behind the standardization is open to research and review.

The IETF also strives to promote involvement and participation; their recent conferences include IRC channels, audio stream, video streams per meeting and an assigned IRC channel representative to facilitate communication between the room and virtual attendees. In addition to these attributes, the format communities involved (Matroska, FFmpeg, and libav) were already familiar with the IETF from earlier and ongoing efforts to standardize open audiovisual formats such as Opus and Daala. Through an early discovery process we gathered the requirements and qualities needed in a successful standardization process for Matroska and FFV1 from memory institutions, format authors, format implementation communities, and related technical communities. From here we assessed standards bodies according to traits such as disclosure, transparency, open participation, and freedom in licensing, confirming that IETF is the most appropriate venue for standardizing Matroska and FFV1 for preservation use.

At this stage of the process we presented our proposal for standardization of Matroska and FFV1 standardization at the July 2015 IETF93 conference. After soliciting additional input and feedback from IETF members and the development communities, we have a proposed working group charter under consideration that encompasses FFV1, Matroska, and FLAC. If accepted, this will provide a venue for the ongoing standardization work on these formats towards the specific goals of the charter.

I should point out that other PREFORMA projects are involved in standardization efforts as well. The Easy Innova team are working on furthering TIFF standardization in their TIFF/A initiative.

Kate: Let’s talk about two formats of interest for this project, FFV1 and Matroska. What are some of the unique features of these formats that make them viable for preservation use and for the goals of PREFORMA?

Initial draft of MediaConch IETF process.

Dave: FFV1 is a very efficient lossless video codec from the FFmpeg project that is designed in a manner responsive to the requirements of digital preservation. A number of archivists participated and reviewed efforts to design, standardize, and test FFV1 version 3. The new features in FFV1 version 3 included more self-descriptive properties to store its own information regarding field dominance, aspect ratio, and colorspace so that it is not reliant on a container format to store this information. Other codecs that rely heavily on its container for technical description often face interoperability challenges. FFV1 version 3 also facilitates storage of cyclic redundancy checks in frame headers to allow verification of the encoded data and stores error status messages. FFV1 version 3 is also a very flexible codec allowing adjustments to the encoding process based on different priorities such as size efficiency, data resilience, or encoding speed. For the past year or two, FFV1 may be seen at a tipping point for preservation use. Its speed, accessibility, and digital preservation features make it an increasingly attractive option for lossless video encoding that can be found in more and more large scale projects; the standardization of FFV1 through an open standards organization certainly plays a significant role in the consideration of FFV1 as a preservation option.

Matroska is an open-licensed audiovisual container format with extensive and flexible features and an active user community. The format is supported by a set of core utilities for manipulating and assessing Matroska files, such as mkvtoolnix and mkvalidator. Matroska is based on EBML, Extensible Binary Meta Language. An EBML file is comprised of one of many defined “Elements”. Each element is comprised of an identifier, a value that notes the size of the element’s data payload, and the data payload itself. Matroska integrates a flexible and semantically comprehensive hierarchical metadata structure as well as digital preservation features such as the ability to provide CRC checksums internally per selected elements. Because of its ability to use internal, regional CRC protection it is possible to update a Matroska file to log OAIS events without any compromise to the fixity of its audiovisual payload. Standardization efforts are currently renewed with an initial focus on Matroska’s underlying EBML format. For those who would like to participate I’d recommend contributing to the EBML specification GitHub repository or joining the matroska-devel mailing list.

Ashley: Matroska is especially appealing to me as a former cataloger and someone who has migrated data between metadata management systems because of its inherent ability to store a large breadth of descriptive metadata within the file itself. Archivists can integrate content descriptions directly into files. In the event of a metadata management software sunsetting or potential loss occurring during the file’s lifetime of duplication and migration, the file itself can still harbor all the necessary intellectual details required to understand the content.

MediaConch’s plan to integrate into OAIS workflows.

It’s great to have those self-checking mechanisms in place to set and verify fixity inherently built into a file format’s infrastructure instead of requiring an archivist to do supplemental work on top by storing technical requirements, checksums, and descriptive metadata alongside a file for preservation purposes. By using Matroska and FFV1 together, an archivist can get full coverage of every aspect of the file. And if fixity fails, the point where that failure occurs can be easily pinpointed. This level of precision is ideal for preservation and as harbinger for archivists in the future. Since error warnings can be frame/slice-level specific, assessing problems becomes much easier. It’s like being able to use a microscope to analyze a record instead of being limited to plain eyesight. It avoids the problem of “I have a file, it’s not validating against a checksum that represents the entirety of a file, and it’s a 2 hour long video. Where do I begin in diagnosing this problem?”

Kate: What communities are currently using them? Would it be fair to say that ffv1 and Matroska are still emerging formats in terms of adoption in the US?

Ashley: Indiana University has embarked upon a project to digitally preserve all of its significant audio and video recordings in the next four years. Mike Casey, director of technical operations for the Media Preservation Initiative project confirmed in a personal email that “after careful examination of the available options for video digitization formats, we have selected FFV1 in combination with Matroska for our video preservation master files.”

Dave: The Wikipedia page for FFV1 has an initial list of institutions using or considering FFV1. Naturally users do not need to announce publicly that they use it but there’s been an increase in messages to related communities forums.

Plan to integrate into the open source community/outreach strategy

Kate: Do you expect that the IETF standardization process will likely help increase adoption?

Ashley: I think a lot of people are unsure of these formats because they aren’t currently backed by a standards body. Matroska has been around for a long time and is a sturdy open source format. Open source software can have great community support but getting institutional support isn’t usually a priority. We have been investing time into clarifying the Matroska technical specifications in anticipation of a future release.

The harder case to be made regarding adoption in libraries and archives is with FFV1, as this codec is relatively new, less familiar, and has yet to be fully standardized. Access to creating FFV1 encoded files is limited to people with a lot of technical knowledge.

Kate: One of my favorite parts of my job is playing format detective in which I use a set of specialized tools to determine what the file is – the file extension isn’t always a reliable or specific enough marker – and if the file has been produced according to the specifications of a standard file format. But the digital preservation community needs more flexible and more accurate format identification and conformance toolsets. How will PREFORMA contribute to the toolset canon?

Ashley: The initial development with MediaConch began with creating an extension of MediaInfo, which is already heavily integrated into many institutions in the public and private sectors as a microservice to gather information about media files. The MediaConch software will go beyond just providing useful information about the file and help ensure that the file is what it says it is and can continually be checked through routine services to ensure the file’s integrity far into the future.

MediaConch GUI with policy editor displaying parameters.

A major goal for PREFORMA is the extensibility of the software being developed — working across all computer platforms, working to check files at the item level or in batches, and cross-comparability between the different formats. We collaborate with Easy Innova and veraPDF to discover and implement compatible methods of file checking. The intent is to avoid creating a tool that exists within a silo. Even though we are three teams working on different formats, we can, in the end, be compatible through API endpoints, not just for the three funded teams but to other specialized tools or archival management programs like Archivematica. Keeping the software open source for future accessibility and development is not optional — it’s required by the PREFORMA tender.

Dave: Determining if a file has been produced according to the specifications of a standard file format is a central issue to PREFORMA and unfortunately there are not nearly enough tools to do so. I credit Matroska for developing a utility, mkvalidate, alongside the development of their format specifications, but to have this type of conformance utility accompany the specification is unfortunately a rarity.

Our current role in the PREFORMA project is fairly specific to certain formats but there are some components of the project which contribute to file format investigation. Already we have released a new technical metadata report, MediaTrace, which may be generated via MediaInfo or MediaConch. The MediaTrace report will help with advanced ‘format detective’ investigations as it presents the entire structure of an audiovisual file in an orderly way. The report may be used directly, but within our PREFORMA project it plays a crucial role in supporting conformance checks of Matroska. MediaConch is additionally able to display the structure of Matroska files and will eventually allow metadata fixes and repairs to both Matroska and FFV1.

MediaArea seeks input and feedback on the standard, specifications and future of each format for future development of the preservation-standard conformance checker software. If you work with these formats and are interested in contributing your requirements and/or test files, please contact us at

David Rosenthal: Canadian Government Documents

Wed, 2015-09-23 15:00
Eight years ago, in the sixth post to this blog, I was writing about the importance of getting copies of government information out of the hands of the government:
Winston Smith in "1984" was "a clerk for the Ministry of Truth, where his job is to rewrite historical documents so that they match the current party line". George Orwell wasn't a prophet. Throughout history, governments of all stripes have found the need to employ Winston Smiths and the US government is no exception. Government documents are routinely recalled from the FDLP, and some are re-issued after alteration.Anne Kingston at Maclean's has a terrifying article, Vanishing Canada: Why we’re all losers in Ottawa’s war on data, about the Harper administration's crusade to prevent anyone finding out what is happening as they strip-mine the nation. They don't even bother rewriting, they just delete, and prevent further information being gathered. The article mentions the desperate struggle Canadian government documents librarians have been waging using the LOCKSS technology to stay ahead of the destruction for the last three years. They won this year's CLA/OCLC Award for Innovative Technology, and details of the network are here.

Read the article and weep.

LITA: A Linked Data Journey: Introduction

Wed, 2015-09-23 14:00

retrieved from Wikipedia, created by Anja Jentzsch and Richard Cyganiak


Linked data. It’s one of the hottest topics in the library community. But what is it really? What does it look like? How will it help? In this series I will seek to demystify the concept and present practical examples and use-cases. Some of the topics I will touch on are:

  • The basics
  • Tools for implementing linked data
  • Interviews with linked data practitioners
  • What can you do to prepare?

In this part one of the series I will give a brief explanation of linked data; then I will attempt to capture your interest by highlighting how linked data can enhance a variety of library services, including cataloging, digital libraries, scholarly data, and reference.

What is Linked Data?

I’m not going to go into the technical detail of linked data, as that isn’t the purpose of this post. If you’re interested in specifics, please, please contact me.

At its core, linked data is an idea. It’s a way of explicitly linking “things” together, particularly on the web. As Tim Berners-Lee put it:

The Semantic Web isn’t just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.

Resource Description Framework is a framework for realizing linked data. It does so by employing triples, which are fundamentally simple (though RDF can become insanely complex), and by uniquely identifying “things” via URIs/URLs when possible. Here is a quick example:

Jacob Shelby schema:worksFor Iowa State University

Behind each of those three “things” is a URL. Graph-wise this comes out to be:

courtesy of W3C’s RDF Validator

This is the basic principle behind linked data. In practice there are a variety of machine-readable languages that are able to employ the RDF model, among them are XML, JSON-LD, TTL, and N-Triples. I won’t go into any specifics, but I encourage you to explore these if you are technologically curious.

What will it be able to do for you?

So, the whole idea of linked data is fine and dandy. But what can it do for you?  Why even bother with it? I am now going to toss around some ways linked data will be able to enhance library services. Linked data isn’t at full capacity yet, but it is rapidly becoming flesh and bone. The more the library community “buys into” linked data and prepares for it, the quicker and more powerful linked data will become. Anywho, here we go.

I should clarify that all of these examples conform to the concept of linked open data. There is such a thing as linked “closed” (private) data.


Right now the traditional cataloging world is full of metadata with textual values (strings) and closed, siloed databases. With linked data it can become a world full of uniquely-identified resources (things) and openly available data.

With linked data catalogers will be able to link to linked data vocabularies (there are already a plethora of linked data vocabularies out there, including the Library of Congress authorities and the Getty vocabularies). For users this will add clarification to personal names and subject headings. For catalogers this will eliminate the need for locally updating authorities when a name/label changes. It will also help alleviate the redundant duplication of data.

Digital Libraries

The “things instead of strings” concept noted above rings true for non-MARC metadata for digital libraries. Digital library staff will be able to link to semantic vocabularies.

Another interesting prospect is that institutions will be able to link metadata to other institutions’ metadata. Why would you do this? Maybe another institution has a digital resource that is closely related with one of yours. Linked data allows this to be done without having to upload another institution’s metadata into a local database; it also allows for metadata provenance to be kept intact (linked data explicitly points back to the resource being described).

Scholarly Data

Linked data will help scholarly data practitioners more easily keep works and data connected to researchers. This can be done by pointing to a researcher’s ORCID ID or VIVO ID as the “creator”. It will also be possible to pull in researcher profile information from linked data services (I believe VIVO is one; I’m not sure about ORCID).


Two words: semantic LibGuides. With linked data, reference librarians would be able to pull in data from other linked data sources such as Wikipedia (actually, DBpedia). This would allow for automatic updates when the source content changes, keeping information up-to-date with little effort on the librarian’s part.

To take this idea to the extreme: what about a consortial LibGuide knowledge base? Institutions could create, share, and reuse LibGuide data that is openly and freely available. The knowledge base would be maintained and developed by the library community, for the public. I recently came across an institution’s LibGuides that are provided via a vendor. To gain access to the LibGuides you had to log in because of vendor restrictions. How lame is that?


Maybe I’m be a little too capricious, but given time, I believe these are all possible. I look forward to continuing this journey in future posts. If you have any questions, ideas, or corrections, feel free to leave them in a comment or contact me directly. Until next time!

District Dispatch: Shutdown could threaten library funding

Wed, 2015-09-23 13:04

With just a just a few calendar days, and even fewer legislative days left before the end of the fiscal year at midnight on September 30th, Congressional leaders are struggling to avert a Federal government shutdown by acting to fund the government as of October 1. The options available for leaders are few, however, and several roadblocks stand in the way: a block of conservative members are demanding that Congress defund Planned Parenthood; other Members are calling for dramatic cuts in non-Defense programs without compromising with Democrats, who are seeking reductions in defense spending to increase funds for some domestic priorities.

Government shutdown threatens library funding (istock photos)

As a result, Congress may be forced to adopt a series of Continuing Resolutions: short-term stopgap measures to keep the government’s doors open while leaders seek to resolve controversial issues and differences in spending priorities that have split the parties as well as reportedly created fissures within the Republican Party.

For library community priorities, and the education community in general, the appropriations process has been a mixed bag to date. The Library Services and Technology Act (LSTA) received funding of $180.9 million in the FY15 Omnibus funding bill passed last December. The Obama Administration proposal requested that Congress increase that sum to $186.6 million in its FY 2016 budget. The House Appropriations Committee, however, approved it’s a funding bill with a smaller increase to $181.1 million, while the Senate Committee provided $181.8 million. Innovative Approaches to Literacy (IAL) would receive level funding of $25 million under the Senate’s bill, although no funds were requested for the program by the Administration or the House.

Final appropriations for LSTA and IAL — included in the Labor, Health and Human Services, Education, and Related Agencies Appropriations bill — have yet to be considered on the Floor in the House or Senate and the likelihood that any such individual appropriations bill will be considered as a stand-alone bill is small and diminishing at this late stage of Congress’ funding cycle.

Funding for education priorities in general also is facing rough seas with significant cuts proposed by both House and Senate Appropriators. Overall education funding would be cut $2.7 billion by the House and $1.7 billion by the Senate. The Obama Administration had proposed an education funding increase of $3.6 billion.

The pathways forward for FY16 funding are uncertain. It is virtually guaranteed, however, that between now and the end of this month you will be hearing and reading the terms government shutdown, continuing resolution, Omnibus package, Planned Parenthood, and defense versus domestic spending more and more every day.

Might want to keep those earplugs and eyeshades handy!

The post Shutdown could threaten library funding appeared first on District Dispatch.

In the Library, With the Lead Pipe: Editorial: Summer Reading 2015

Wed, 2015-09-23 13:00

Photo by Flickr user Moyan_Brenn (CC BY 2.0)

Editors from In The Library With The Lead Pipe are taking a break from our regular schedule to share our summer reading favs. Tell us what you’ve been reading these last few months in the comments!


It is, of course, winter where I live. This makes it a great time to curl up with a nice big fat tome, and I have been spending the long winter nights reading Peter Watson’s The great divide: history and human nature in the Old World and the New. Watson takes the reader on a journey from 15,000BC through the Great Flood, following the first humans across Eurasia and through the rise and fall of empires until the Conquistadors appeared. He explains how, according to our best guesses, humans came to be in the Americas, when it happened, and why the new world they found there led the Americans in such a different direction to the ‘old world’ of Eurasia. This is a fascinating book in many ways. Covering archaeology, religion, botany, geology and very ancient history, Watson attempts to explain why the pre-Columban Americas had such comparatively short-lived civilisations, bloody religions, and localised cultures.


I’ve recently joined a mini book club with my brother and a few of our friends who are also Marvel Unlimited subscribers. Our rough plan is to choose arcs that are completed in 6 issues. Several of us have a tendency to spiral off though. So far we’ve done Longshot Saves the Marvel Universe (2013), Rogue (2004), Young Avengers (2005), and Deathlok (2014). Young Avengers in particular spiraled off into the whole 12 issue run as well as Truth: Red, White & Black (2003) which linked up nicely but came from a separate Twitter recommendation.

I’ve also nostalgically torn through everything Rogue and Gambit, which lead to meeting Pete Wisdom and exploring New Excalibur (2005). And I’m devouring the recent spate of female leads: Captain Marvel (2014), Ms Marvel (2014), Thor (2014), and The Unbeatable Squirrel Girl (2015). The metadata in Marvel Unlimited is awkward and the coverage can be spotty, which leads to lots of online searching to figure out what to read next. I’m grateful for friends who can share tips such as, “For further reading I suggest Avengers Children’s Crusade which is basically issues 13+. Issue 4 has its own separate entry on Marvel Unlimited for no reason, and Avengers Children’s Crusade Young Avengers is by the same author and comes between issues 4 and 5.”


The summer here has flown by pretty quickly. I’m beginning a research project for an upcoming book, The Feminist Reference Desk, edited by Maria Accardi, so I’ve been doing a lot of reading to prep for that. The article that I’d really recommend everyone read is  Library Feminism and Library Women’s History: Activism and Scholarship, Equity and Culture by Suzanne Hildenbrand. It’s really insightful and explains how gender roles impact our profession.

For more light reading, I have been reading Modern Romance the new book by Aziz Ansari and Eric Klinenberg. The research is interesting and Aziz is pretty funny. Last, I totally judged a book by it’s cover and started reading The Woman Destroyed by Simone de Beauvoir. So far I’ve only read the first story, but I’m not sure how I feel about it. I hope everyone had a good summer!


This summer I haven’t done my due diligence in the reading department. I always have a mental list of books to read based on people’s recommendations, but there are never enough hours in the day to get through everyone’s great suggestions! During my road trip to Indiana University Libraries Information Literacy Colloquium in August, I listened to a book on CD, which is my absolute favorite way to read. It was Jodi Picoult’s gripping novel Change of Heart. Always exploring controversial topics through fiction, things unravel at the beginning of Picoult’s novel when a woman’s husband and child are brutally murdered. The plot thickens as the death row inmate who committed the murders seeks to bequeath his heart to a little girl in need of a heart transplant. Not just any girl, but the biological daughter and sister of the victims murdered. The novel forced me to consider a new twist on an already uncomfortable subject.


For the first time since I was a kid, I’m using the public library to borrow books again. I know that’s such a bad librarian thing to admit, but since I worked in academic libraries, I got everything I needed through the school. Using my public library has been… interesting. It’s fascinating to see things from the strictly-patron side again (and that’s a whollllle different Lead Pipe article)! I’ve been devouring books lately. Some of my favorites have been Miguel Street by V.S. Naipaul, All My Puny Sorrows by Miriam Toews, and Geek Love by Katharine Dunn. I flew through The Paris Wife by Paula McLain (yeah, I’m a couple years late on that one) and I was totally surprised/disturbed by Sarah Waters’ Fingersmith. I’m in the middle of CA Conrad’s Ecodeviance and I’m about to start Emily Gould’s Friendship. I’ve also been reading some books about Minnesota history/culture and rocks/minerals specific to Lake Superior, since that’s where I’m living now!


Discussing summer reading makes me a little nervous. Since I joined Twitter about a year and half ago my book reading markedly declined. At the same time my eyesight deteriorated and my child became a toddler. Whatever combination of factors may have prompted it, the fact is that I read fewer books than ever before. My overall reading, however, has not been reduced, principally because Twitter (for me this means mainly ‘library Twitter’) is directing me daily to various news articles, journal articles, blog posts, websites, and other shorter-form writing that absorbs most of my reading time on any given day.

Yet I did find the time for at least ONE book this summer! I am working on a project tracing the careers of German academic librarians through the turbulent decades of the mid-20th century. For this research I read a book of essays about Austrian women librarians who confronted persecution under the Nazis and were either forced into emigration, imprisoned, persecuted, tortured, or even murdered. Entitled “Austrian Women Librarians on the Run: Persecuted, Suppressed, Forgotten?” [Ilse Korotin, ed., Österreichische Bibliothekarinnen auf der Flucht: vefolgt, verdrängt, vergessen? Wien: Praesens Verlag, 2007.], this small book contains a rich trove of remarkable and often harrowing stories of women librarians who found themselves forced out of their jobs and their homes before and during World War II. Of particular interest to me was how the political commitments of many women were strengthened by the experiences of persecution, expulsion, and exile. Many of those who survived the war and the Holocaust saw librarianship as an integral part of their struggles against racism and sexism, and their commitments to social justice.


Over the last couple of years, I’ve gotten back into reading real-life books, and it has been wonderful. Some distant friends started a sci-fi/fantasy book club, and it’s gotten me back into reading. We used to meet monthly via Google Hangouts from our varying locations (the power of technology!), and I’ve recently joined up with another book club that’s in-the-flesh. In the meantime, I’ve been enjoying some easy escapist reading. Most recently, I finished The Curious Incident of the Dog in the Night-Time by Mark Haddon. Finding his neighbor’s dog dead in the front yard (and this is how the book opens, y’all), young Chris decides to do some detecting in order to find out whodunnit. Mayhem ensues, of course.

My favorite book of the summer though has definitely been Magonia by Maria Dahvana Headly. I became obsessed with the unique, otherworldliness of this book. Reading sci-fi and fantasy, I sometimes find it difficult to escape recurring themes and common tropes. What I liked most about this one was that it blew all of that out of the water. I’d never experienced a world like Magonia before, and I loved that.

State Library of Denmark: Light validation of Solr configuration

Wed, 2015-09-23 10:17

This week we were once again visited by the Edismax field alias bug in Solr: Searches with boosts, such as foo^2.5, stopped working. The problem arises when an alias with one or more non-existing fields is defined in solrconfig.xml and it is tedious to track down as one needs to check for existence of all the fields referenced.

We have a 10+ different Solr setups and we use aliases in most of them. So a quick script was whipped together:, which (…wait for it…) validates Solr configs. Nothing fancy and it tends to report false positives when things are commented out in the XML files. Still, it does check that

  • all fields in schema.xml references existing field types
  • all copyFields in schema.xml references existing fields
  • all fields referenced in solrconfig.xml are defined in schema.xml
  • no alias in has the same name as a field in schema.xml

Some of these problems, such as referencing a non-existing fields in mlt.fl or pf in solrconfig.xml, are silent and hard to track down: Solr does not complain and searches seem to work. But in the case of misspellings of field names, the result is poorer quality searches as the intended functionality is not activated.

Cross-validation of fields used in solrconfig.xml and schema.xml would be nice to have as part of Solr core startup, but until then the script might be of use. Get it at GitHub.