You are here

Feed aggregator

Eric Hellman: "Free" can help a book do its job

planet code4lib - Thu, 2015-02-26 18:32

(Note: I wrote this article for NZCommons, based on my presentation at the 2015 PSP Annual Conference in February.)

Every book has a job to do. For many books, that job is to make money for its creators. But a lot of books have other jobs to do. Sometimes the fact that people pay for books helps that job, but other times the book would be able to do its job better if it was free for everyone.

That's why Creative Commons licensing is so important. But while CC addresses the licensing problem nicely, free ebooks face many challenges that make it difficult for them to do their jobs.

Let's look at some examples.

When Oral Literature in Africa was first published in 1970, its number one job was to earn tenure for the author, a rising academic. It succeeded, and then some. The book became a classic, elevating an obscure topic and creating an entire field of scholarly inquiry in cultural anthropology. But in 2012, it was failing to do any job at all. The book was out of print and unavailable to young scholars on the very continent whose culture it documented. Ruth Finnegan, the author, considered it her life's work and hoped it would continue to stimulate original research and new insights. To accomplish that, the book needed to be free. It needed to be translatable, it needed to be extendable.

Nga Reanga Youth Development: Maori Styles, an Open Access book by Josie Keelan, is another example of an academic book with important jobs to do. While its primary job is a local one, the advancement of understanding and practice in Maori youth development, it has another job, a global one. Being free helps it speak to scholars and researchers around the world.

Leanne Brown's Good and Cheap is a very different book. It's a cookbook. But the job she wanted it to do made it more than your usual cookbook. She wanted to improve the lives of people who receive "nutrition assistance"- food stamps, by providing recipes for nutritious and healthy meals that can be made without spending much money. By being free, Good and Cheap helps more people in need eat well.

My last example is Casey Fiesler's Barbie™ I Can Be A Computer Engineer The Remix! Now With Less Sexism! The job of this book is to poke fun at the original Barbie™ I Can Be A Computer Engineer, in which Barbie needs boys to do the actual computer coding. But because Fiesler uses material from the original under "fair use", anything other than free, non-commercial distribution isn't legal. Barbie, remixed can ONLY be a free ebook.

But there's a problem with free ebooks. The book industry runs on a highly evolved and optimized cradle-to-grave supply chain, comprising publishers, printers, production houses, distributors, wholesalers, retailers, aggregators, libraries, publicists, developers, cataloguers, database suppliers, reviewers, used-book dealers, even pulpers. And each entity in this supply chain takes its percentage. The entire chain stops functioning when an ebook is free. Even libraries (most of them) lack the processes that would enable them to include free ebooks in their collections.

At, we ran smack into this problem when we set out to bring books into the creative commons. We helped Open Book Publishers crowd fund a new ebook edition of Oral Literature in Africa. The ebook was then freely available, but it wasn't easy to make it free on Amazon, which dominates the ebook market. We couldn't get the big ebook aggregators that serve libraries to add it to their platforms. We realized that someone had to do the work that the supply chain didn't want to do.

Over the past year, we've worked to turn into a "bookstore for free books". The transformation isn't done yet, but we've built a database of over 1200 downloadable ebooks, licensed under Creative Commons or other free licenses. We have a long way to go, but we're distributing over 10,000 ebooks per month. We're providing syndication feeds, developing relationships with distributors, improving metadata, and promoting wonderful books that happen to be free.

The creators of these books still need to find support. To help them, we've developed three revenue programs. For books that already have free licenses, we help the creators ask for financial support in the one place where readers are most appreciative of their work- inside the books themselves. We call this "thanks for ungluing".

For books that exist as ebooks but need to recoup production costs, we offer "buy-to-unglue". We'll sell these books until they reach a revenue target, after which they'll become open access. For books that exist in print but need funding for conversion to open access ebook, we offer "pledge-to-unglue", which is a way of crowd-funding the conversion.

After a book has finished its job, it can look forward to a lengthy retirement. There's no need for books to die anymore, but we can help them enjoy retirement, and maybe even enjoy a second life. Project Gutenberg has over 50,000 books that have "retired" into the public domain. We're starting to think about the care these books need. Formats change along with the people that use them, and the book industry's supply chain does its best to turn them back into money-earners to pay for that care.

Recently we received a grant from the Knight Foundation to work on ways to provide the long-term care that these books need to be productive AND free in their retirements. GITenberg, a collaboration between the folks at and ebook technologist Seth Woodward is exploring the use of Github for free ebook maintenance. Github is a website that supports collaborative software development with source control and workflow tools. Our hope is that the ingredients that have made Github wildly successful in the open source software world will will prove to by similarly effective in supporting ebooks.

It wasn't so long ago that printing costs made free ebooks impossible. So it's no wonder that free ebooks haven't realized their full potential. But with cooperation and collaboration, we can really make wonderful things happen.

State Library of Denmark: Long tail packed counters for faceting

planet code4lib - Thu, 2015-02-26 18:14

Our home brew Sparse Faceting for Solr is all about counting: When calculating a traditional String facet such as

Author - H.C.Andersen (120) - Brothers Grimm (90) - Arabian Nights (7) - Carlo Collodi (5) - Ronald Reagan (2)

the core principle is to have a counter for each potential term (author name in this example) and update that counter by 1 for each document with that author. There are different ways of handling such counters.

Level 0: int[]

Stock Solr uses an int[] to keep track of the term counts, meaning that each unique term takes up 32 bits or 4 bytes of memory for counting. Normally that is not an issue, but with 6 billion terms (divided between 25 shards) in our Net Archive Search, this means a whopping 24GB of memory for each concurrent search request.

Level 1: PackedInts

Sparse Faceting tries to be clever about this. An int can count from 0 to 2 billion, but if the maximum number of documents for any term is 3000, there will be a lot of wasted space. 2^12 = 4096, so in the case of maxCount=3000, we only need 12 bits/term to keep track of it all. Currently this is handled by using Lucene’s PackedInts to hold the counters. With the 6 billion terms, this means 9GB of memory. Quite an improvement on the 24GB from before.

Level 2: Long tail PackedInts with fallback

Packing counters has a problem: The size of all the counters is dictated by the maxCount. Just a single highly used term can nullify the gains: If all documents share one common term, the size of the individual counters will be log(docCount) bits. With a few hundred millions of documents, that puts the size to 27-29 bits/term, very close to the int[] representation’s 32 bits.

Looking at the Author-example at the top of this page, it seems clear that the counts for the authors are not very equal: The top-2 author has counts a lot higher than the bottom-3. This is called a long tail and it is a very common pattern. This means that the overall maxCount for the terms is likely to be a lot higher than the maxCount for the vast majority of the terms.

While I was loudly lamenting of all the wasted bits, Mads Villadsen came by and solved the problem: What if we keep track of the terms with high maxCount in one structure and the ones with a lower maxCount in another structure? Easy enough to do with a lot of processing overhead, but tricky do do efficiently. Fortunately Mads also solved that (my role as primary bit-fiddler is in serious jeopardy). The numbers in the following explanation are just illustrative and should not be seen as the final numbers.

The counter structure

We have 200M unique terms in a shard. The terms are long tail-distributed, with the most common ones having maxCount in the thousands and the vast majority with maxCount below 100.

We locate the top-128 terms and see that their maxCount range from 2921 to 87. We create an int[128] to keep track of their counts and call it head.

head Bit 31 30 29 … 2 1 0 Term h_0 0 0 0 … 0 0 0 Term h_1 0 0 0 … 0 0 0 … 0 0 0 … 0 0 0 Term h_126 0 0 0 … 0 0 0 Term h_127 0 0 0 … 0 0 0

The maxCount for the terms below the 128 largest ones is 85. 2^7=128, so we need 7 bits to hold each of those. We allocate a PackedInt structure with 200M entries of 7+1 = 8 bits and call it tail.

tail Bit 7* 6 5 4 3 2 1 0 Term 0 0 0 0 0 0 0 0 0 Term 1 0 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 Term 199,999,998 0 0 0 0 0 0 0 0 Term 199,999,999 0 0 0 0 0 0 0 0

The tail has an entry for all terms, including those in head. For each of the large terms in head, we locate its position in tail. At the tail-counter, we set the value to term’s index in the head counter structure and set the highest bit to 1.

Let’s say that head entry term h_0 is located at position 1 in tail, h_126 is located at position 199,999,998 and h_127 is located at position 199,999,999. After marking the head entries in the tail structure, it would look like this:

tail with marked heads Bit 7* 6 5 4 3 2 1 0 Term 0 0 0 0 0 0 0 0 0 Term 1 1 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 Term 199,999,998 1 1 1 1 1 1 1 0 Term 199,999,999 1 1 1 1 1 1 1 1

Hanging on so far? Good.

Incrementing the counters
  1. Read the counter value from the tail structure: count = tail.get(ordinal)
  2. Check if bit 7 is set: if (count & 128 == 128)
  3. If bit 7 is set, increment the head counter: & 127)
  4. If bit 7 is not set, increment the tail counter: tail.set(ordinal, count+1)
Pros and cons

In this example, the counters takes up 6 billion * 8 bits + 25 * 128 * 32 bits = 5.7GB. The performance overhead, compared to the PackedInts version, is tiny: Whenever a head bit is encountered, there will be an extra read to get the old head value before writing the value+1. As head will statistically be heavily used, it is likely to be in Level 2/3 cache.

This is just an example, but it should be quite realistic as approximate values from the URL field in our Net Archive Index has been used. Nevertheless, it must be stressed that the memory gains from long tail PackedInts is highly dictated by the shape of the long tail curve.


It is possible to avoid the extra bit in tail by treating the large terms as any other term, until their tail-counters reaches maximum (127 in the example above). When a counter’s max has been reached, the head-counter can then be located using a lookup mechanism, such as a small HashMap or maybe just a linear scan through a short array with the ordinals and counts for the large terms. This would reduce the memory requirements to approximately 6 billion * 7 bits = 5.3GB. Whether this memory/speed trade-off is better or worse is hard to guess and depends on result set size.

Implementation afterthought

The long tail PackedInts could implement the PackedInts-interface itself, making it usable elsewhere. Its constructor could take another PackedInts filled with maxCounts or a histogram with maxbit requirements.

Heureka update 2015-02-27

There is no need to mark the head terms in the tail structure up front. All the entries in tail acts as standard counters until the highest bit is set. At that moment the bits used for counting switches to be a pointer into the next available entry in the head counters. The update workflow thus becomes

  1. Read the counter value from the tail structure: count = tail.get(ordinal)
  2. Check if bit 7 is set: if (count & 128 == 128)
  3. If bit 7 is set, increment the head counter: & 127)
  4. If bit 7 is not set, increment the tail counter: tail.set(ordinal, count+1)
  5. If the counter reaches bit 7, change the counter-bits to be pointer-bits:
    if ((count+1) == 128) tail.set(ordinal, 128 & headpos++)
Pros and cons

The new logic means that initialization and resetting of the structure is simply a matter of filling them with 0. Update performance will be on par with the current PackedInts implementation for all counters, whose value is within the cutoff. After that the penalty of an extra read is paid, but only for the overflowing values.

The memory overhead is unchanged from the long tail PackedInts implementation and still suffers from the extra bit used for signalling count vs. pointer.

Real numbers 2015-02-28

The store-pointers-as-values has the limitation that there can only be as many head counters as the maxCount for tail. Running the numbers on the URL-field for three of the shards in our net archive index resulted in bad news: The tip of the long tail shape was not very pointy and it is only possible to shave 8% of the counter size. Far less than the estimated 30%. The Packed64 in the table below is the current structure used by sparse faceting.

Shard 1 URL: 228M unique terms, Packed64 size: 371MB tail BPV required memory saved head size 11 342MB 29MB / 8% 106 12 371MB 0MB / 0% 6

However, we are in the process of experimenting with faceting on links, which has quite a higher point in the long tail shape. From a nearly fully build test shard we have:

8/9 build shard links: 519M unique terms, Packed64 size: 1427MB tail BPV required memory saved head size 15 1038MB 389MB / 27% 14132 16 1103MB 324MB / 23% 5936 17 1168MB 260MB / 18% 3129 18 1233MB 195MB / 14% 1374 19 1298MB 130MB / 9% 909 20 1362MB 65MB / 5% 369 21 1427MB 0MB / 0% 58

For links, the space saving was 27% or 389MB for the nearly-finished shard. To zoom out a bit: Doing faceting on links for our full corpus with stock Solr would take 50GB. Standard sparse faceting would use 35GB and long tail would need 25GB.

Due to sparse faceting, response time for small result sets is expected to be a few seconds for the links-facet. Larger result sets, not to mention the dreaded *:* query, would take several minutes, with worst-case (qualified guess) around 10 minutes.

Three-level long tail 2015-02-28


  • pointer-bit: Letting the values in tail switch to pointers when they reach maximum has the benefit of very little performance overhead, with the downside of taking up an extra bit and limiting the size of head.
  • lookup-signal: Letting the values in tail signal “find the counter in head” when they reach maximum, has the downside that a sparse lookup-mechanism, such as a HashMap, is needed for head, making lookups comparatively slow.

New idea: Mix the two techniques. Use the pointer-bit principle until there is no more room in head. head-counters beyond that point all get the same pointer (all value bits set) in tail and their position in head is determined by a sparse lookup-mechanism ord2head.

This means that

  • All low-value counters will be in tail (very fast).
  • The most used high-value counters will be in head and will be referenced directly from tail (fast).
  • The less used high-value counters will be in head and will require a sparse lookup in ord2head (slow).

Extending the pseudo-code from before:

value = tail.get(ordinal) if (value == 255) { // indirect pointer signal } else if (value & 128 == 128) { // pointer-bit set & 127) } else { // term-count = value value++; if (value != 128) { // tail-value ok tail.set(value) } else { // tail-value overflow head.set(headpos, value) if (headpos < 127) { // direct pointer tail.set(128 & headpos++) } else { // indirect pointer tail.set(255) ord2head.put(ordinal, headpos++) } } }

Raffaele Messuti: SKOS Nuovo Soggettario, api e autocomplete

planet code4lib - Thu, 2015-02-26 17:00

Come creare una api per un form con autocompletamento usando i termini del Nuovo Soggettario, con i Sorted Sets di Redis e Nginx+Lua.

District Dispatch: Boots on the ground advocacy

planet code4lib - Thu, 2015-02-26 15:27

I don’t think I need to overemphasize for you the important roles that libraries play in our communities. The epicenter of progress and knowledge, libraries have evolved to meet the educational and technological needs of their patrons.

These institutions of learning and creation need to be protected and improved at all costs. And in the wake of the sweeping changes to both the House and the Senate in the 2014 Congressional elections, it is more important than ever that we speak up on behalf of libraries and the communities they serve.

Your firsthand library experience – from behind the reference desk or as a patron – is an invaluable part of helping legislators to understand the impact that libraries have in the day to day lives of their constituents. Without you, they may not realize what happens to a community when library budgets get cut and staff are let go, let alone how legislation on net neutrality, copyright, or privacy can involve libraries too. We need to urge Members of Congress to think about how the policy and legislation they are working on could harm or help libraries.  To do that, we need boots on the ground here in Washington, D.C. – we need library advocates. That’s why we’re inviting YOU to National Library Legislative Day 2015!

This two-day advocacy event brings hundreds of librarians, trustees, library supporters, and patrons to Washington, D.C. to meet with their Members of Congress to rally support for libraries issues and policies. This year, National Library Legislative Day will be held May 4-5, 2015. Participants will receive advocacy tips and training, along with important issues briefings prior to their meetings.

Registration information and hotel booking information are available on the ALA Washington Office website.

The post Boots on the ground advocacy appeared first on District Dispatch.

Hydra Project: Hydra-Head 8.0.0 released

planet code4lib - Thu, 2015-02-26 13:26

We are pleased to announce the final release of hydra-head version 8.0.0!

Hydra-head 8.x is planned to be the final major version of the software to support Fedora Commons Repository version 3.x.

Release notes are here:

DuraSpace News: Layne Johnson to Leave Position as VIVO Project Director

planet code4lib - Thu, 2015-02-26 00:00

Winchester, MA  Dr. Layne Johnson will leave his position as VIVO Project Director effective February 28, 2015.  Layne has worked hard in the interests of VIVO's growth and sustainability, and we've seen some excellent progress. After reaching the important milestone of documenting a strategic plan for the project, Layne has decided to pass the role of leading the community through its implementation to a successor.  

HangingTogether: Transcription vs. Transliteration

planet code4lib - Wed, 2015-02-25 20:44

This post is co-authored by Karen Coombs, OCLC Senior Product Analyst

Our virtual dialog began with Karen C’s tweet:




But Karen S-Y couldn’t respond in just the 140 characters Twitter allows. Instead she sent an email:

 Transcribing transliteration” from a piece is almost an oxymoron. It rarely occurs. Transliteration by definition is converting one writing system (e.g., Chinese characters) into another writing system (e.g, Latin-script characters, or romanization).  Catalogers in Anglo-American countries will transliterate non-Latin titles using ALA/LC romanization for the writing system on the piece; other countries may use other transliteration schemes.

You will generally find transliterated titles whenever there is a non-Latin title (in MARC, stored in the 880-245 field). But OCLC doesn’t support all scripts, and not everyone takes advantage of the scripts OCLC does support – e.g., we support Cyrillic but only 10% of all Russian-language titles in WorldCat have the Cyrillic that appears on the piece.  The ALA/LC romanization for Cyrillic is distinctly different from the ISO standard used by almost everyone else, so where we rely only on the romanized strings, the same title in Cyrillic may be represented by different clusters using different transliteration schemes. (In the graphic that precedes this entry, two romanizations are shown for the Russian “War and Peace”.)

In general, it’s better to rely on the non-Latin script title if we have it than any transliteration that may be also be in the record. The non-Latin script titles will be transcribed from the piece and any transliteration will be supplied by a cataloger, which may or may not match the transliteration supplied by another cataloger…

Karen C. wrote back: 

I think you answered the question the user was asking when you said that “The ALA/LC romanization for Cyrillic is distinctly different from the ISO standard used by almost everyone else, so where we rely only on the romanized strings, the same title (with the same Cyrillic string) will be represented by different clusters using different transliteration schemes.”

The user asked, “Your API returns texts in Russian in a strange transliteration format. As I see, it’s not ISO-9. For example, this text: “Oni vernulis? na rodnui?u? planetu, gde za vremi?a? mezhzve?znogo pole?ta proshlo bol?she sta let i vse? tak izmenilos?, chto Zemli?a? stala chuzhoi? im”. Please, can you tell me, how to convert this format into correct Cyrillic?”

At least I understand the why now.

Karen S-Y commented:

It also happens to be the case where there is almost a one-to-one correspondence between romanized Russian and its Cyrillic counterpart. That is why most libraries didn’t bother adding the Cyrillic. Since the system requires that if you put in non-Latin script you also enter the romanization, it represents “double work.”

This prompted Karen C. to ask:

Does the MARC record have any way to tell you if a title was romanized?

Karen S-Y answered:

By inference, yes.

If the language code is for a language not written in Latin characters, and there is no 880 in the MARC record, then the non-English information in the record is by definition all romanized (non-English information if the language of cataloging is English).

The following table shows the percentages of WorldCat records describing materials in the top 15 languages that are written in non-Latin scripts that WorldCat supports represented by the original script (transcribed from the piece) and by transliteration only (supplied by the cataloger). Most records for languages written in Cyrillic and Indic scripts contain transliterations only.

 Top 15 languages in WorldCat written in non-Latin character sets

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (54)

SearchHub: Infographic: 15 Years of The Apache Software Foundation

planet code4lib - Wed, 2015-02-25 18:43
As the commercial stewards of Apache Solr, we know how crucial the open source software movement is to organizations all over the world and how important the Apache Software Foundation is in governing dozens of projects used by thousands of companies. Let’s take a look at it all got started:

The post Infographic: 15 Years of The Apache Software Foundation appeared first on Lucidworks.

LITA: 2015 Election Slate

planet code4lib - Wed, 2015-02-25 16:10

The LITA Board is pleased to announce the following slate of candidates for the 2015 spring election as follows:

Candidates for Vice-President/President-elect

  • Nancy Coylar
  • Aimee Fifarek

Candidates for Directors at Large, 2 elected for 3 year terms

  • Frank Cervone
  • Martin Kalfatovic
  • Susan Sharpless Smith
  • Ken Varnum

See candidate bios and statements for more information; voting in the 2015 ALA election will begin at 9 a.m. Central Time on March 24, 2015. Ballots will close at 11:59 p.m. Central Time on May 1. Election results will be announced on May 8. Check here for information about the general ALA election

The slate was recommended by the Nominating Committee. Karen G. Schneider is chair of the committee and Pat Ensor, Adriene Lim, and Chris Evjy are the committee members. The Board thanks the Nominating Committee for all their work. Be sure to thank these candidates for agreeing to serve, and the Nominating Committee for developing the slate. Best wishes to all.

District Dispatch: Gearing up for South by Southwest

planet code4lib - Wed, 2015-02-25 15:42

“What the Future?” is not the usual interpretation of its acronym, but the question is at the heart of the upcoming South by Southwest (SXSW) Interactive and EDU technology conferences.

Participant enthusiasm is evident in “Libraries: The Ultimate Playground,”one of the Core Conversations that have become a hallmark of SXSW.

One answer to this question for public libraries, entrepreneurs and our communities is “Co-working, Collaboration and Creation @ your library.” This also is the title of our SXSW panel (Sunday, March 15, 9:30 a.m.) which brings together ALA’s national perspective and research, a dynamic example from the District of Columbia Public Library’s Dream Lab, and one of the “lab” partners, MapStory.

Public libraries have always been places for “makers” to connect and collaborate, but HOW this is happening in libraries continues to shift and expand. Hundreds of libraries now support co-work and mobile work spaces, as well as hosting maker programming and resources—together leveraging tech and social networks, specialized content and staff, and convenient locations, according to new data from the national Digital Inclusion Survey.

Starting in mid-2013, the Martin Luther King Jr. Memorial Library in Washington D.C. embarked on a journey to transform itself into a place for residents of diverse backgrounds and interests to connect, learn and “make” ideas into working realities. The space is called the “Dream Lab,” and the library already has attracted more than 56 local entrepreneurial projects. One of these projects is MapStory, an online social cartographic platform that empowers anyone to map historical change over time using open data.

In exchange for membership in the Dream Lab, each project commits to offering one hour of programming per month to engage the public in their efforts and ideas—extending the social network and empowering the community in a transformative way.

Through these collaborations, libraries serve as catalysts for learning and action, and help build a stronger knowledge economy more accessible to all people. I can’t wait to join Jon Marino and Nicholas Kerelchuk to talk about how we can build the future together with enterprising partners.

I’m also excited to again be part of a larger Libraries, Archives and Museums presence at SXSW. Together we believe that SXSW is a key forum for library professionals to position libraries and connect with other innovators and community connectors; pull ideas from other industries to feed our innovation at the intersection of science, technology, art, commerce and the public good; and to participate in emerging policy conversations directly relevant to our professional values. Check out the lib*interactive at SXSW guide to sessions here.

You can also catch school and public library folks presenting at EDU (March 9-12), including “Schools and Libraries: Rethinking Learning Together”, “Designing and Deploying for Adult Learners”, “Schools’ Vortex: Innovative Library Makerspaces”, and  “Connected Learning Networks in Austin.”

I hope I’ll see many of you there! If you can’t make it to SXSW, let me take your story with me. What is your library doing to support co-working, collaboration and/or creation? Comment here on the blog or email me directly at

The post Gearing up for South by Southwest appeared first on District Dispatch.

In the Library, With the Lead Pipe: Beyond the Threshold: Conformity, Resistance, and the ACRL Information Literacy Framework for Higher Education

planet code4lib - Wed, 2015-02-25 13:30

Photo by Flickr user laroyo (CC BY-NC 2.0)

In Brief: The recently adopted ACRL Framework for Information Literacy for Higher Education has generated much critique and discussion, including many important reflections on the nature of information literacy and librarianship itself. This article provides a brief consideration of some of these responses and as well a critique of the Framework from the perspective of critical information literacy. It argues that although the Framework demonstrably opens up possibilities for an information literacy instruction that encourages students to question the power structures and relations that shape information production and consumption, it nonetheless rests on a theoretical foundation at odds with that goal. It urges librarians to embrace the Framework yet also resist it, in the tradition of critical librarians who have practiced resistance to the instrumentalization of the library for neoliberal ends.

The ACRL Framework: Off to a Running Start

By design and in its implementation, the process of drafting the ACRL Information Literacy Framework for Higher Education invited intense scrutiny and critique. The Task Force charged with creating the Framework began its work in March 2013 and released the first draft for public scrutiny in February of the following year.1 Through three drafts to the final version, which was made public in January 2015 and officially “filed” (approved by the ACRL Board of Directors) shortly thereafter, the effort toward transparency included many opportunities for input that will help the Framework earn a strong measure of democratic consent and broad participation. But a successful launch and general adoption are by no means assured, as the resistance to plans to scrap the existing Standards has been and may continue to be strong.2 Less easy for the Task Force to control or to even keep track of, however, were reactions on blogs, Twitter, other social media, and in informal conversations, and these arenas continue to produce new and sometimes unexpected reactions to the proposed Framework.

It should have been expected that the lively debates among librarians have included searching, systematic, and thoroughgoing critiques of both the fundamental assumptions and the theory underlying the Framework and even its reason for existing at all.3 The draft process has provided an opportunity for many people to talk about the meaning and purpose of information literacy instruction, information literacy in general, and even librarianship itself. As Barbara Fister stated in one of her early commentaries on the Framework, “it is an opportunity for us to rethink how we do this and what kind of learning really matters.” (Fister 2014a) I consider the Framework already to be a success because the debate it has generated contributes to our thinking about and practice of librarianship in invigorating, productive, and necessary ways.

In this article I will review and compare some of the critiques of the Framework voiced thus far. I will also offer a critique of my own that attempts to read the Framework from the perspective of critical information literacy and critical librarianship. Librarians who identify with these labels, generally speaking, seek to anchor information literacy practice and librarianship as a whole to a commitment to both principles of social justice and a systematic critique of the power relations within which our field operates.4 But first it is important to cite some of the ample evidence that a wide range of reflective librarians are embracing aspects of the Framework and running with it, with exciting results and prospects. This has happened because of the Framework’s flexibility. Troy Swanson, a member of the Task Force, points out that the Framework “can enable us to get to real student learning because it can be adapted to align with your goals as a teacher.” (Swanson 2015) Megan Oakleaf has echoed this sentiment, stating: “Essentially, librarians can use the Framework as inspiration to focus on concepts, rather than exclusively on tools and techniques, and those concepts can be added or subtracted as student and faculty needs change.” (Oakleaf 2014)  Andy Burkhart has described a library lesson for an ethnography class for which he utilized the threshold concept “Research as Inquiry.” His conclusion is that “Using these threshold concepts may not work for everyone, but I can see them being exceedingly helpful to frame lessons and curricula. They help you focus on what is really important as opposed to getting stuck in what you think you are supposed to be teaching. Instead of just teaching a lesson about doing ethnographic research I taught a lesson about inquiry and asking increasingly sophisticated questions.” (Burkhart, 2014) Arguing along similar lines more recently, Lori Townsend, Silvia Lu, Amy Hofer, and Korey Brunetti have defended threshold concepts for being flexible and versatile while still rooted in common problem areas specific to academic disciplines, even while allowing that the specific threshold concepts in the Framework are necessarily only provisional and may not work for everyone: “If you consider your content with the threshold concepts criteria in mind, it helps identify some things that might prove problematic for students and stall their learning, yet that are needed in order to move forward in their understanding.” (Townsend, et al 2015)

Of more importance to my argument in this article is the evidence that librarians interested in critical information literacy and critical pedagogy are also inspired by the Framework. At at an upcoming LOEX15 session, for example, Eamon Tewell and Kate Angell will share how the Frame “Authority is Contextual and Constructed” “emboldened” them to construct “new ways to empower learners and discuss authority’s role in evaluating resources.” (“Elevating” 2015) Kevin Seeber has shown how the frame “Format as Process” allowed him to teach critical thinking skills in the context of web-scale discovery (Seeber 2015). Examples such as these (which seem to be multiplying daily) demonstrate the pedagogical value and potential of the Framework. Although several critical librarians have found threshold concepts and/or the Framework5 wanting in one or more areas, and although I am also critical of both, I don’t believe that these critiques invalidate the Framework. In fact, despite the reservations that I will outline below, the Framework does not contradict or undermine the possibility of a critical information literacy instruction or critical pedagogy, but may very well encourage it, which is a vital point that librarians should remember. Many librarians who are committed to critical librarianship seem to share this view since they see the Framework as more liberating pedagogically than it is constricting.6

A Variety of Responses

Before exploring the responses from the perspectives of critical information literacy and critical librarianship, I would like to provide an overview of some of the other critiques of the Framework’s drafts. These critiques have been diverse, ranging from stunned incomprehension to almost utopian celebration.

One subset of responses to the Framework has made suggestions for improvement or requested clarification. These criticisms generally accept the Framework on its own terms and are concerned with its practicality, implementation, adaptability, and accessibility. People ask how the Frames or the threshold concepts upon which they are based will work in practice, what challenges will be posed by adopting the Framework, how different it will be from the Standards in this respect, and how librarians, faculty, and administrators will be convinced to replace explicit Standards with a set of guidelines that are less prescriptive.7

Many librarians are also anxious about potentially forfeiting the gains in information literacy instruction that have been achieved in the years since 2000, and which were at least partially premised on implementation and ‘selling’ of the Standards. This has already proven to be one of the Framework’s major stumbling blocks; at least some librarians have voiced alarm that losing standards spells trouble for their IL programs and their libraries.8

Related to the concerns about the Framework’s accessibility and saleability to stakeholders have been complaints about the drafts’ alleged ‘jargon.’ Even though much of the language that earned this epithet has been removed as a result of these complaints, some voices continue to describe the Framework’s use of language borrowed from threshold concept theory as jargon.9 This concern about language is partly tied to the anxieties about strained relationships with stakeholders, but it also reflects a certain resistance among academic librarians to theory imported from other disciplines into library practice or even into LIS scholarship.10 Even someone sympathetic to the ideas behind the Framework and to threshold concepts themselves might object to their explicit inclusion in a document which is intended to be used as a guide for establishing cross-disciplinary and inter-administrative relationships.11

But I would caution that although it may be true that the Framework’s reliance on threshold concepts may put off some faculty members and administrators with whom we want to collaborate, we run the risk at the other extreme of offering something of only limited persuasive potential from the perspective of ideas, and this could jeopardize potential understanding and ultimately collaboration from non-library faculty members across the institution. Librarians, as members of the academic community, must be prepared to engage with the scholarship and research of our peers if we wish them to engage with ours. And the most serious evidence of such engagement is to find specifically library-related applications of theoretical approaches from such fields as education, psychology and anthropology. To embrace theory from other disciplines will inevitably require us to learn to adapt concepts and language from those fields. In other words, it will require the introduction of novel concepts and ideas, reflected in new vocabulary. But rather than be afraid of such importations, we should engage them to test their foundations as well as their usefulness.

Another set of critiques has dissected the theoretical approach of the Framework, and while not complaining so much about jargon, still finds it flawed, often fatally so. These critiques have been thorough. They tend to focus on the theory of threshold concepts and its application in the frames themselves and subject it to interrogation and detailed analysis. Lane Wilkinson (a former member of the Task Force) has provided perhaps the most exhaustive analysis and deconstruction of threshold concept theory and its application to information literacy in the Framework. He set out to demolish much of the theory and language of the Framework in several detailed blog posts over the summer of 2014. (Wilkinson 2014a-g) Wilkinson’s contentions are varied but his principal focus is mostly on the conceptual (in)coherence and contradictions of threshold concepts. There is less attention given to considerations of the ways that political, social, economic, and cultural power structures and relations are reflected by or are challenged by this approach to information literacy (although his discussion does not entirely exclude these concerns).12

A Critical Information Literacy Perspective on Threshold Concept Theory

It is a main tenet of critical information literacy that information literacy instruction should resist the tendency to reinforce and reproduce hegemonic knowledge, and instead nurture students’ understandings of how information and knowledge are formed by unequal power relations based on class, race, gender, and sexuality. Threshold concept theory, both as it was originally formulated and as it is applied in the Framework, can be seen as a reification of privileged knowledge that is historically and culturally contingent.13 Threshold concepts attempt to align information literacy goals with the way that knowledge functions in our existing information system. Threshold concepts were elaborated specifically to better enable students to master the difficult specialized fields of knowledge that define the various academic and professional disciplines. But they may end up functioning as the means to merely reinforce disciplinary boundaries and institutional hierarchies.

Morgan has noted how the Framework’s effort to present threshold concepts in this way has produced an elision of their origins and contexts: “threshold concepts are treated as immanent entities, unique to specific disciplines, and not as essentially contingent.” (Morgan 2012, 7) Fister also cautions, in referring to the first draft: “…we need to bear in mind how these thresholds we define are cultural constructs and avoid assuming upper-middle-class white American experiences that might seem hostile or exclusionary to those who don’t fit that assumed identity.” (Fister 2014a) If threshold concepts are cultural constructs, then a critical information literacy must move beyond them somehow. While threshold concepts may have an important place in the process of learning, information literacy must demand that the concepts themselves be questioned as part of the critique of the structure of knowledge that a critical pedagogy encourages.

It is possible to see threshold concepts as an efficient way of getting students to become expert practitioners of existing disciplines. They do this, in a sense, by learning the rules. Threshold concepts can be viewed as the habits of mind that one must have in order to make sense within a given intellectual community. Wilkinson has noted that threshold concept theory has oversimplified or even misrepresented the true nature of academic disciplines, whose competing discourses reveal the opposite of what the theory claims: “The entire theory of threshold concepts has a funny way of oversimplifying the very real distinctions and difficulties that are inherent in a body of knowledge.” (Wilkinson 2014a)  I would add to this point that teaching students how to function within an academic discourse can be perilously close to teaching students how to conform, how to get along, how to succeed. We want our students to succeed, but do we want the system that will enable their success to succeed as well? Some may, but many librarians committed to critical librarianship do not. For the latter group the question is, how can we encourage student success without supporting the underlying structure of the system within which that success will take place?

Much of the rhetoric of information literacy, including that of the Framework, represents the world of information (the Framework refers to it as the “information ecosystem”) as something that must be mastered by individual students making their own ways through an educational institution out into the world. Information literacy instruction is intended, positively and even progressively, to empower those individuals to succeed on their own terms to the greatest extent possible. It does this by inculcating habits of thinking and working that are most often described under the heading ‘critical thinking.’ But the problem with even some progressive information literacy rhetoric is that it does not question the fundamental units with which it is working: the individual information consumer/producer on the one hand, and the system of information on the other hand. The Framework, despite its (debatable) greater theoretical sophistication, its great flexibility as a tool for enabling dynamic and creative information literacy instruction, and its emphasis on collaborative learning, still posits as its goal an individual student who has become a master or expert of our system of information. And even though it seeks to empower that individual, who could potentially work to change the conditions of information production and dissemination that exist today, the Framework necessarily concentrates its efforts on the solitary mastery of the existing system.

Some critics have found the Framework too narrowly focused on library-centered activities and skills, and they have questioned whether the specific threshold concepts in the Frames are uniquely characteristic of the ‘field’ of librarianship or information literacy. Fister states that as a librarian she isn’t particularly “interested in helping students think like librarians, but rather as curious people who understand how information works so that they can be curious effectively and maybe change the world while they’re at it.” (Fister 2014b) Nicole Pagowsky has also expressed this sentiment in a blog post reacting to the first draft, referring specifically to the frame ‘Format as Process’ (renamed ‘Information Creation as a Process’ in the final draft). The frames pay insufficient attention to the factors beyond academia that shape students’ consumption and production of information: “I was hoping to see a discussion on marginalized groups and whose voices get to be heard in traditional publishing and media (and why). These are important conversations to have with students, and particularly so when we are encouraging them to be creators of information, joining the conversation themselves. What impact might avenues of publishing have on their ability to be vocal when considering their perspective and identity? How is privilege intertwined in format and volume?” (Pagowsky 2014) These observations indicate that instruction librarians interested in integrating an understanding of these larger issues into information literacy will need to supplement and/or alter the frames’ more restricted purview.

But even in its narrow focus, the Framework rests on questionable assumptions. The frame ‘Scholarship as Conversation’14 tends to idealize or even naturalize the process of knowledge production in disciplinary fields. It presents scholarly research as a largely honorable pursuit, viewed in isolation from the forces operating around (and within) it: “Research in scholarly and professional fields is a discursive practice in which ideas are formulated, debated, and weighed against one another over extended periods of time.” As described, it does not pay sufficient attention to the ways that some voices are suppressed, silenced, and marginalized because they do not fit the proscribed boundaries of that field – which are, in the end, determined by a consensus of practitioners whose professional reputations and livelihoods often depend on the preservation of these boundaries and conventions. In other words, threshold concepts describe knowledge creation in a decontextualized manner, even though the Framework tries to acknowledge the academic context of knowledge creation.

While one might not share Wilkinson’s denial that scholarship is in fact a ‘conversation,’ one can’t ignore how politics and power play a decisive role in the production of knowledge. (Wilkinson 2014b) It is a common complaint within many academic fields that conformity, uniformity, predictability and consensus are all-too common features of scholarship – are these the results of a ‘conversation’ or something else? Is it possible to build into the Framework an acknowledgement that scholarship and research themselves are always functioning within particular economic, social, and political systems that help determine the features and structure of the ‘scholarly conversation?’ Or must information literacy instruction move beyond threshold concepts altogether, even if it begins with them as a way of entering into and to some extent identifying with the existing structure of knowledge and expertise? These questions speak to one of the basic conundrums of critical librarianship and critical information literacy, namely: how does one teach students to understand and make the best use of existing systems of knowledge while at the same time prompt them to question the validity and structure of those systems? It’s a similar conundrum faced by all scholars and teachers in academia who see themselves as committed to radical social change: how can one be a part of the system of oppression yet claim to be fighting against it?15

To better appreciate the perils of relying on threshold concepts, it may help to consider the needs for which the theory was originally developed. They were proposed by educational theorists Erik Meyer and Ray Land with reference to teaching concepts in economics. That discipline, at least as it is practiced in the ‘western world’ today, functions largely as a closed field based on a broad consensus about the universal validity (at least in the abstract) of the so-called ‘free market’, in other words, the universality and inevitability of capitalism. Economics, as an academic field, tends to naturalize capitalism and works to maintain the belief that the rules/laws of that system are simply the rules/laws of economics as such (even economists like Thomas Piketty who dare to challenge some of the field’s pieties still share this core faith16). It is very difficult for an economist who questions the fundamental assumptions of capitalism or denies its “laws” to succeed in or even enter the field, and the refusal to accept the field’s central concepts prevents communication at a basic level with the vast majority of its practitioners. It is likely that a typical economist – someone who would be considered an ‘expert’ or ‘authority’  – would judge a person making such a challenge not only ineligible to participate in the field, but perhaps even a threat to it.

Meyer and Land do not pay attention to the limitations posed by established fields of knowledge, but rather the challenges that outsiders, or learners, face when trying to enter into productive learning, or ‘conversation’, within the field. Their insight was to identify certain seemingly universal characteristics of knowledge within disciplines that can be treated as concepts that one has to master in order to function successfully as a practitioner in that field. These ‘troublesome concepts’, once grasped, allow the learner to readily understand the assumptions and terms of debate in a field. But I would argue that at this point the learner has in some sense reached the starting point, not the end point, of learning on a deeper level. Now the task is to question what one has just learned – and this is where the question of information literacy’s ultimate goal returns.

What is the Purpose of Information Literacy Instruction?

From a critical information literacy perspective, the Framework’s larger assumptions pose perhaps the biggest problems. One has to do with the term ‘information literacy’ and its complicated history. Critical information literacy has sought to increase awareness of how much the information literacy agenda has been set and supported by broader structural forces in academia and the world at large that may in fact be at odds with the core values of librarianship, progressive learning and radical social change. Christine Pawley, in a trenchant and erudite critique of what she calls “Information Literacy Ideology”, states that information literacy “has contributed to the decontextualization of information, obscuring the specific conditions of its production.” (Pawley 2003, 425) This decontextualization allows people to forget, or not to learn, that “Information never stands alone – it is always produced and used in ways that represent social relationships,” and that those relationships “reflect the underlying patterns that structure society.” (Pawley 2003, 433) Pawley has informed us that “information literacy” was elaborated at the beginning of the digital age and was intended largely to recuperate forms and markers of the authority from the age of print that were feared to be slipping away from librarians’ control: “…institutional practices of information literacy have the effect of reestablishing relations of authority and authenticity that developed over three centuries for the print production of commodified information.” (Pawley 2003, 440)

Much of information literacy instruction, yesterday and today, is focused on preparing students to succeed in both academia and the world beyond it – which more often than not amounts to teaching them skills of research and thinking that will enable them to function as productive independent minds in a competitive, rapidly changing economic environment. In other words, information literacy is designed to improve students’ chances at getting jobs and succeeding in their chosen professions.17 No one who teaches and cares about students would object to that goal. But a critical information literacy expects more than this (and wants more for students); it pushes information literacy instruction, in various ways, not to be limited by this goal. Moreover, critical information literacy even looks beyond ‘lifelong learning’, since the question always should be asked, what actually is ‘lifelong learning’ and what is its purpose? As Cathy Eisenhower and Dolsey Smith have argued, lifelong learning and critical thinking fall within the realm of neoliberal rationality which push the learner “toward a perpetual anxiety of regulation, of adjustment, of optimization—and toward reason’s perpetual self-improvement.” (Eisenhower and Smith, 2009)

Information Literacy Instruction is also About Resistance

Chris Bourg, in a 2014 address at Duke University Libraries, insisted that despite the fact that “neoliberalism is toxic for higher education…research libraries can & should be sites of resistance.” (Bourg 2014). Critical librarianship is at pains always to show that the existing information system mirrors the larger social and political order, which is characterized by a radically asymmetrical distribution of power, and is shot through, systematically and structurally, by racism, sexism, homophobia, militarism, and class oppression. An advocacy of progressive literacy of any kind within this system and environment, requires resistance on the part of the librarian: resistance to existing regimes of knowledge, as institutionalized by academic disciplines and departments (and enforced by academic rules and administrative bureaucracies), resistance to the commodification of knowledge, and even resistance to the stated goals of higher education as they are commonly promoted, especially by administrators, politicians, bureaucrats and educational reformers. Failing to resist all too easily provides reinforcement to the existing system, and helps reproduce it.

Joy James and Edmund T. Gordon have described the problematic position that a radical or activist intellectual necessarily assumes within academia. Their observations are relevant to the aspirations of critical information literacy and the basic dilemma that questions around the Framework have called attention to. They claim that academic institutions “are at best liberal-reformist in their institutional policies and at worst complicit with the global military-industrial and consumer-commercial, complex that enforces and/or regulates the marginalization and impoverishment of the majority of the world.” They note that “Institutions of higher education have a vested interest in keeping scholarship ‘objective’ (mystifying), ‘nonpolitical’ (nonsubversive) and ‘academic’ (elitist) and in continuing to reserve the most advanced technical training for that small portion of the world’s population who will manage the rest, as well as consume or control its resources and political economies.” With such an important mandate, anyone who works within academia (i.e. who is engaged in the ‘scholarly conversation’) is subjected  to intense pressures to conform: “…incentives offered by the academy reward those whose knowledge production contributes to elite power…That same system diminishes the production of potentially transgressive political knowledge by questioning its ‘objective’ status or ‘scientific’ value.” The participation of radical intellectuals in academic institutions actually “strengthens [those institutions] by allowing them to make hegemonic claims to fostering ‘academic freedom,’ a ‘marketplace of ideas,’ and rational neutrality…” (James and Gordon 2008, 367-9). This perilous position, James and Gordon argue, can only be remedied by exiting the academy and establishing solidarities with oppressed peoples organizing against the system (even if one keeps one’s ‘day job’ as a teacher and researcher within the academy). Whether or not one accepts their conclusion, one can take their description of the situation of the activist scholar and apply it to critical information literacy, whose practitioners should always be aware of the reifying and recuperative functions of information literacy in the academy. Eisenhower and Smith have argued along similar lines, as I indicated above, but they believe that librarians may be in a position from which to exercise a greater freedom of action vis-a-vis the pressures to conform, by virtue of their marginal or liminal position within the academy. Although librarians’ status varies very widely across academia, and it is therefore difficult to make such generalizations, the opening they suggest is nonetheless one that all librarians should seek, whatever their situations may be. (Eisenhower and Smith 2009)

As long as we recognize the structural function of information and knowledge in our pedagogy, we can help bridge the gap between academia and the struggle for social justice. With respect to this goal, using the term ‘information ecosystem’, as the Framework does, is not helpful. I recognize that the term has entered our daily vocabulary, but whether one intends to or not, the term works to reify information, despite the first frame’s title, “Authority is Constructed and Contextual.” And even though the term stresses the rapidly changing nature of that system, it does not emphasize its artificiality and arbitrariness, that it is a reflection of a specific distribution of power. To describe natural processes requires a comprehension of complex and often rapid changes. But changes in knowledge are anything but natural. In the pages of this journal Joshua Beatty has pointed out the Framework’s “neoliberal underpinnings” (something it shares in common with the Standards). He usefully traces the history of using ecological language such as ‘information ecosystem’ to describe social forces to the business world of the 1990s, when today’s neoliberal order took shape. Beatty convincingly links this naturalized language to a revived social Darwinism in which only fittest survive in a cutthroat world of brutal competition. We are competing with others to acquire and produce the best information possible, and it is up to us (and our helpers, teachers and librarians), to acquire the necessary skills and smarts to do this. (Beatty 2014, 10-11) When we unwittingly adopt this language to describe the learning processes that we wish to encourage, we may be leaving fundamental neoliberal premises unquestioned.

While the framework does an admirable job of showing how threshold concepts can help shift information literacy toward a pedagogy that stresses the development of self-critical and self-conscious learning in the student, it does not state as its goal the formation of possible solidarities for the student to help change the information system itself, nor the hierarchies of knowledge and status within academia. Furthermore, by continuing to stress the individual learner, it obscures the fact that any real change would actually require collective understanding and action rather than individualized learning. In this way the Framework continues to do the work that the Standards were doing all along. But the vital difference between the two, perhaps, is the enhanced opportunities for critical interventions that the Framework provides and even encourages.18

From a critical information literacy perspective, then, it appears that the specific type of information literacy advocated by the Framework is one which accepts the existence of a particular regime of knowledge, and demands that we as librarians focus our energies on making students and faculty competent citizens of that regime, even if dynamic, critical, and progressive ones. Here again we are faced with the dilemma outlined above: students have immediate needs to be met – they are working on research papers, projects, reports, and theses. They not only need information and sources, and to learn how to conduct research, they also need to master the conceptual frameworks that will enable them to effectively and convincingly make persuasive arguments. All of this very sophisticated and complex instruction needs to be done in a short space of time. Librarians have to help so many of them, with insufficient resources, and not enough time. Where can an information literacy that raises an awareness of the contingent and arbitrary nature of the information system, be fit? When does it take place? Can something like ACRL’s Framework possibly incorporate such a vision without undermining itself?

The answers to these questions are varied and complex, and they are being explored by the many librarians who theorize and practice critical information literacy. They have taught us that we must assume a position of resistance rather than conformity to the existing information regime, if we wish to see it changed at all. Part of the solution has to do with the content of library instruction. For instance, in teaching specific research or searching skills, the examples that we use in the classroom and at the reference desk can provide opportunities to question information regimes in more systematic ways.19 Another part has to do with our everyday practice as librarians, inside and outside of the classroom. We can find a long tradition of resistance on the part of librarians, not only against the banning of books or spying by the government, but also against the very structure of information and knowledge that they are supposed to be the guides for unlocking.20 Resistance is shown by librarians who take proactive measures in pushing for open access, calling out or refusing rapacious vendor contracts, or finding ways to actually make our profession more diverse, just to name a few areas. But what does resistance in information literacy instruction look like? I think we will see more creative examples in the coming years, thanks, ironically perhaps, to the Framework, which, as I stated at the outset, has opened up the possibilities for action and maneuver on the part of instruction librarians, despite its ideological baggage. In this sense it is a progressive document, but it will require librarians to resist it in order for it to become a radical one.

My thanks go to Ellie Collier and Emily Drabinski for their many trenchant comments and suggestions and also for helpfully encouraging me to emphasize my own voice in the editing of this article. Thanks to Robert Farrell, Barbara Bonus-Smit, and Julia Furay for inviting me to give an earlier version of this paper at ACRL/NY-LILAC’s panel on the Framework in October, 2014 at Barnard College. Thanks also to Donna Witek for her brilliant reflections and inspiring efforts toward making the Framework understood for critical librarians as well as for her encouragement in my own efforts in this regard. Special thanks to Rory Litwin for pointing me to some of the rich history of librarian activism in the (near) past, and some of the written record of that history. Thanks finally to all my interlocutors over the last several months online and in person who have grappled with the Framework, especially to the #critlib community that generates a treasure trove of wisdom and practical insight every fortnight.


Works Cited

Accardi, Maria T., Emily Drabinski, and Alana Kumbier, eds. (2009) Critical Library Instruction: Theories and Methods. Library Juice Press.

ACRL Board of Directors. (2015) “Action Form: ACRL MW15 Doc 4.0.” January 16, 2015.

Adler, Kate. (2013) “Radical Purpose: The Critical Reference Dialogue at a Progressive Urban College.” Urban Library Journal, 19, 1.

Beatty, Joshua. (2014) “Locating Information Literacy within Institutional Oppression.” In the Library with the Lead Pipe. September 24, 2014.

Berg, Jacob. (2014a) “The Draft Framework for Information Literacy for Higher Education: Some Initial Thoughts.” BeerBrarian. 25 Feb. 2014.

—. (2014b) “The (Second) Draft for Information Literacy for Higher Education: My Thoughts.” BeerBrarian. 11 Jul. 2014.

Bourg, Chris. (2014) “The Neoliberal Library: Resistance is not Futile.” Feral Librarian. 16 Jan. 2014.

Burkhardt, Andy. (2014) “Threshold Concepts in Practice: An Example from the Classroom.” Information Tyrannosaur. 4 Mar. 2014.

Drabinski, Emily. (2014) “Toward a Kairos of Library Instruction.” The Journal of Academic Librarianship 40: 480-485.

Eisenhower, Cathy and Dolsy Smith. (2009) “The Library as ‘Stuck Place’: Critical Pedagogy in the Corporate University.” In Critical Library Instruction: Theories and Methods, edited by Maria T. Accardi, Emily Drabinski, and Alana Kumbier. Duluth, MN: Library Juice Press, 305-318.

“Elevating Source Evaluation: Teaching and Un-teaching Authority in the Critical Library Classroom.” (2015) LOEX 2015 – Sessions. Loex. Web. 18 February 2015.

Fister, Barbara. (2014a) “On the Draft Framework for Information Literacy.” Library Babel Fish. Inside Higher Ed, 27 Feb. 2014.

—. (2014b) “Crossing Thresholds and Learning in Libraries.” Library Babel Fish. Inside Higher Ed, 22 May 2014.

—. (2015) “The Information Literacy Standards/Framework Debate.” Library Babel Fish. Inside Higher Ed, 22 Jan. 2015.

Gregory, Lua, and Shana Higgins. (2013) Information Literacy and Social Justice: Radical Professional Praxis. Duluth, MN: Library Juice Press.

Hicks, Alison. (2013) “Cultural Shifts: Putting Critical Information Literacy into Practice.” Communications in Information Literacy, 7 Aug. 2013.

Hofer, Amy R, Lori Townsend, and Korey Brunetti. (2012) “Troublesome Concepts and Information Literacy: Investigating Threshold Concepts for IL Instruction.” portal: Libraries in the Academy 12(2): 387-405.

Hofer, Amy R, Lori Townsend, and Korey Brunetti. (2011) “Threshold Concepts and Information Literacy.” portal: Libraries in the Academy 11(3): 853-869.

James, Joy and Edmund T. Gordon. (2008) “Afterword: Activist Scholars or Radical Subjects?” In Charles Hale, ed., Engaging Contradictions: Theory, Politics, and Methods of Activist Scholarship. Berkeley, CA: University of California Press. 367-73.

Kagan, Alfred. (2015) Progressive Library Organizations. A Worldwide History. Jefferson, NC: McFarland & Company, Inc., Publishers.

Klipfel, Kevin Michael. (2014) “This I Overheard…Threshold Concepts Getting Laughed Out of the Room” Rule Number One: A Library Blog. 3 Nov. 2014.

Matthews, Brian. (2011) “What Can You Do to Help With Troublesome Knowledge?

Librarians and Threshold Concepts,” The Ubiquitous Librarian, Chronicle of Higher Education Blog Network, 3 Aug. 2011.

Morgan, Patrick. (2015) “Pausing at the Threshold.” portal: Libraries and the Academy. 15(1).

Oakleaf, Megan. (2014) “A Roadmap for Assessing Student Learning Using the New Framework for Information Literacy for Higher Education.” Journal of Academic Librarianship. Preprint.

Olson, Hope. (2001)  “The Power to Name: Representation in Library Catalogs.” Signs 26(3): 639-668.

Pagowsky, Nicole. (2014) “Thoughts on ACRL’s New Draft Framework for ILCSHE.” Nicole Pagowsky (pumpedlibrarian). 2 Mar. 2014.

Pawley, Christine. (2003) “Information Literacy: A Contradictory Coupling.” Library Quarterly 73(4): 422-452.

Samek, Toni. (2001) Intellectual Freedom and Social Responsibility in American Librarianship, 1967-1974. Jefferson, NC: McFarland & Company, Inc., Publishers.

Seeber, Kevin. (2015) “Teaching “Format as a Process” in an Era of Web-scale Discovery.” Reference Services Review 43:1: 19-30.

Swanson, Troy. (2015) “The IL Framework and IL Standards Cannot Coexist.” 12 Jan. 2015. Tame The Web. Web. 12 Jan. 2015.

Tewell, Eamon. (2014) “Tying Television Comedies to Information Literacy: Mixed-Methods Investigation.” Journal of Academic Librarianship 40: 134-141.

Townsend, Lori, Silvia Lu, Amy R. Hofer, and Korey Brunetti. (2015) “What’s Wrong with Threshold Concepts?” ACRlog. 30 Jan. 2015.

Wilkinson, Lane. (2013) “Information Literacy: Standards, Skills, and Virtues.” Sense and Reference. 5 Jun. 2013.

—. (2014a) “The Problem with Threshold Concepts.” Sense and Reference. 19 Jun. 2014.

—. (2014b) “Is Scholarship a Conversation?” Sense and Reference. 10 Jul. 2014.

—. (2014c) “Is Research Inquiry?” Sense and Reference. 15 Jul. 2014.

—. (2014d) “Is Authority Constructed and Contextual?” Sense and Reference. 22 Jul. 2014.

—. (2014e) “Is Format a Process?” Sense and Reference. 25 Jul. 2014.

—. (2014f) “Is Searching Exploration?” Sense and Reference. 29 Jul. 2014.

—. (2014g) “Does Information Have Value?” Sense and Reference. 5 Aug. 2014.

  1. See the ACRL Board of Directors Action Form from January 15, 2015 for background and details about the process of drafting the Framework.
  2. See for example the Open Letter of New Jersey Librarians criticizing the Framework and protesting the sunsetting of the ACRL Standards, which indicates that there will be opposition not only to the replacement of the Standards, but also the wholesale adoption of the Framework that is meant to replace it. The recent acceptance of the Framework by the ACRL Board of Directors was accompanied by a temporary cancellation of plans to sunset the Standards.
  3. For space reasons, I refer the reader to Lane Wilkinson’s helpful list of blog posts, which is now out of date, but provides a good starting point for reading a variety of responses to the Framework’s early drafts.
  4. The literature on critical information literacy and critical librarianship (which are closely related and overlap quite a bit, but should not be equated) is rapidly expanding. See especially the contributions in Accardi, Drabinski, and Kumbier 2009; and Gregory and Higgins 2013. Probably the most extensive (though not exhaustive) bibliography of critical information literacy we have is to be found in the recent dissertation by B. A. McDonough, Critical Information Literacy in Practice, 2014.
  5. As Townsend, et al, have indicated in their recent ACRLog post, the Framework contains only one possible collection and interpretation of threshold concepts among many: potentially there might be other valid threshold concepts applicable to information literacy instruction. (Townsend, et al 2015)
  6. Intriguingly, Fister has decided to embrace the Framework because it returns to the original promise of the ACRL Standards when they were established in 2000: they “were meant as a starting place, that each library should adapt them to fit their local cultures and needs. They weren’t set in stone.” (Fister 2015)
  7. The Task Force’s report attached to the final draft notes that in its tally of the 206 comment forms submitted following the release of the third draft, 67.4% of respondents supported the Framework. (ACRL MW15 DOC 4.0) Some librarians have already embarked on projects to help librarians map the Standards to the Framework in order to help them understand the continuities and make the transition easier in redesigning instruction programs. See, for example, Amada Hovious’ “Alignment Charts for ACRL Standards and Proposed Framework (work in progress)” and her related blog post.
  8. Troy Swanson, in a recent response to some of these concerns, convincingly defends the Framework on the following grounds: “if we believe that information literacy matters in the lives of our students and see information literacy as a form of empowerment for our students, the idea that we should write standards because that’s what everyone else is doing feels hollow.” And, even more boldly, he suggests that “our profession has the opportunity to take the lead in moving away from the mechanistic bureaucracy of standards-based education. I do not know many faculty members who honestly think that more standards and more standardization will improve teaching and learning.” (Swanson 2015) But Emily Drabinski urges us to take pause in this enthusiasm, alerting people how the Framework’s ACRL imprimatur insures that it will function in much the same way the Standards has, only now libraries will have to shoulder the burden of spelling out the specific learning outcomes to which these guidelines may point. She suggests as an alternative a kairos of library instruction that does not rely on a universal, fixed document of any kind and that instead always responds to the specific time and place in which the instruction takes place. (Drabinski 2014) Critical advocates of the Framework, myself included, would claim in response that it actually encourages kairos by not prescribing any specific learning outcomes or benchmarks.  
  9. Andy Burkhart, in a blog post, reported librarians at a discussion at 2014 ALA Midwinter referring to threshold concepts themselves as ‘jargon.’ (Burkhart, 2014)
  10. A distrust of theory from other fields of course is not limited to LIS or librarians. Academic history, for example, has maintained a stubborn resistance to theoretical approaches and many of its practitioners and gatekeepers are on the lookout for ‘jargon’ in an effort to maintain clarity and rigorous thinking. The writing and teaching of history is, like the practice of librarianship, self-consciously a ‘popular,’ or public-oriented undertaking, and the assumption is that it should always only utilize language immediately familiar to the widest possible audience. However well-intentioned and even democratic this impulse is, it is often used as an excuse for avoiding an engagement with developments in other fields, or for simply dismissing them. For a brief and spirited plea for the importance of theory for history, see Joan Wallach Scott, “Wishful Thinking,” Perspectives on History (December 2012)
  11. Kevin Michael Klipfel disapproves of threshold concept theory not so much because it has been borrowed from another field, but because it has not been ‘scientifically’ validated and is therefore not legitimate in the eyes of his colleagues in other disciplines. See his blog post, “This I Overheard…Threshold Concepts Getting Laughed Out of the Room,” Rule Number One: A Library Blog, 3 Nov. 2014. But even long before the Framework committee went to work revising the ACRL Standards, some librarians embraced threshold concepts precisely because there were positive signs of acceptance and enthusiasm from colleagues. See, for example, Brian Matthews, “What Can You Do to Help With Troublesome Knowledge? Librarians and Threshold Concepts,” The Ubiquitous Librarian, Chronicle of Higher Education Blog Network, 3 Aug. 2011.
  12. In a similar vein to Wilkinson, Patrick Morgan, in a recent article, has described the Framework as ‘inchoate.’ (Morgan 2014) And Jacob Berg, in his blog, has echoed and elaborated on some of Wilkinson’s critique, but his objections, too, I would classify more as linguistic-philosophical. (Berg 2014a-b)
  13. For an explanation of how threshold concept theory has been applied to information literacy instruction, see the foundational articles by Hofer, Townsend, and Brunetti 2011, 2012.
  14. This is the frame’s name in the final draft of the Framework.
  15. This is a question answered brilliantly by Joy James and Edmund T. Gordon who conclude that in some sense there is no such thing as ‘radical’ scholarship and teaching, since true activism actually cannot take place within the academy. (James and Gordon 2008) Cathy Eisenhower and Dolsey Smith offer a different view that sees more room for action for librarians in this regard because of their more liminal position in the academy. (Eisenhower and Smith 2009) See more on this in the section below.
  16. Slavoj Zizek and David Harvey each have noted that Piketty is best described as a utopian capitalist: for him there is no questioning the assumption that it is the only system that ‘works.’ See David Harvey, “Afterthoughts on Piketty’s Capital”; Slavoj Zizek, “Towards a Materialist Theory of Subjectivity,” lecture, The Birkbeck Institute for the Humanities, University of London, May 22, 2014.
  17. Drabinski shows how this justification became the “procustrean bed” into which IL instruction was placed by the Standards, and she argues convincingly that the Framework inevitably is driven by the same mandate, which has only intensified since 2000. (Drabinski 2014)
  18. Beatty, in fact, is another critical librarian who acknowledges that “there are many ways in which the Framework significantly improves on the Standards.” (Beatty 2014, 4)
  19. For examples of this see Hicks 2013; Adler 2013; Tewell 2014.
  20. I cite here as likely the most well-known example the legendary Sanford Berman. A general history of librarians and resistance, not only in the United States but globally, has yet to be written. Such a narrative would be useful for our present struggles since sometimes it seems that librarians believe that only recently has the library been challenged as a supposedly ‘neutral’ grounds for the discovery of knowledge. But librarians have been doing so since at least the 1930s. There are of course some excellent studies that are focused in scope, but very instructive for for this purpose: see especially, Samek 2001; Olson 2001;  Kagan 2015. Also helpful is the archive of the ALA Social Responsiblities Roundtable Newsletter and the archive of the blog Library Juice

Mark E. Phillips: DPLA Metadata Analysis: Part 2 – Beyond basic stats

planet code4lib - Wed, 2015-02-25 13:00
More stats for subjects

In my previous post I displayed some of the statistics that are readily available from Solr as part of its StatsComponent functionality (if you haven’t used this part of Solr yet you really should). There are a few other things that we could collect to get a more complete picture of a metadata field.

So far we have min, max, number of records, total number of subjects, sumofsquares, mean, and standard deviation. The other values I think we should take a look at are the following.

Records without Subjects – Number of records without subjects.

Percent of records without subjects – Percentage of the Hubs records that don’t have subjects

Mode – Number of subjects-per-record that is the most common for a specific Hub.

Unique Subjects – Unique subject strings are present for a specific Hub

Hub Unique Subjects – Number of subjects that are unique to that Hub.

Entropy of the field. – This calculation is a measure of the uncertainty in the metadata field, but for our purposes it is a good measure to understand how the distribution of subjects happens in the records.

Below is a table that contains the fields listed above,  plus some relevant fields from the previous post. Each Hub has a row in this table.

Hub Name Records Records Without Subjects % without Subjects Avg. Subjects per records Subject Count Mode Unique Subjects # of subjects unique to hub Entropy ARTstor 56,342 6,586 11.7 3.5 3 9,560 4,941 0.73 Biodiversity Heritage Library 138,288 10,326 7.5 3.3 2 22,004 9,136 0.65 David Rumsey 48,132 30,167 62.7 0.5 0 123 30 0.76 Digital Commonwealth 124,804 6,040 4.8 2.4 1 41,704 31,094 0.77 Digital Library of Georgia 259,640 3,216 1.2 4.4 2 132,160 114,689 0.67 Harvard Library 10,568 167 1.6 2.5 2 9,257 7,204 0.76 HathiTrust 1,915,159 525,874 27.5 1.4 1 685,733 570,292 0.88 Internet Archive 208,953 44,872 21.5 1.8 1 56,911 28,978 0.80 J. Paul Getty Trust 92,681 73,978 79.8 0.4 0 2,777 1,852 0.60 Kentucky Digital Library 127,755 117,790 92.2 0.2 0 1,972 1,337 0.62 Minnesota Digital Library 40,533 0 0 5 4 24,472 17,545 0.74 Missouri Hub 41,557 11,451 27.6 2.3 0 6,893 4,338 0.69 Mountain West Digital Library 867,538 49,473 5.7 3 1 227,755 192,501 0.68 National Archives and Records Administration 700,952 619,212 88.3 0.3 0 7,086 3,589 0.63 North Carolina Digital Heritage Center 260,709 41,323 15.9 3.3 2 99,258 84,203 0.66 Smithsonian Institution 897,196 29,452 3.3 6.4 7 348,302 325,878 0.62 South Carolina Digital Library 76,001 7,460 9.8 3 2 23,842 18,110 0.72 The New York Public Library 1,169,576 208,472 17.8 1.7 1 69,210 52,002 0.62 The Portal to Texas History 477,639 58 0 11 10 104,566 87,076 0.49 United States Government Printing Office (GPO) 148,715 1,794 1.2 3.1 2 174,067 105,389 0.92 University of Illinois at Urbana-Champaign 18,103 4,221 23.3 3.8 0 6,183 3,076 0.63 University of Southern California. Libraries 301,325 35,106 11.7 2.9 2 65,958 51,822 0.59 University of Virginia Library 30,188 229 0.8 3.2 1 3,736 2,425 0.60

In looking at the row for The Portal to Texas History we can see that of the 477,639 records in the dataset, 58 of them do not have any subjects,  which is a very small percentage (0.01214306202 to be exact).  From there we can go to the average of 11 subjects per record with a mode of 10,  nothing earth shaking here,  just more info.  There are 104,566 unique subjects in the Portal’s dataset with 87,076 of those being unique to only the Portal.  Finally the entropy for the Portal’s subject field is 0.49,  if compared to GPO’s which is 0.92 you can interpret this to mean that the subject values are more “clumpy” for the Portal,  (a smaller number of subjects are used across a larger number of records) than for GPO (a larger number of subjects are used across records).

The following two tables further illustrate the entropy values on the Portal’s and GPO’s subjects. The first table is the top ten subjects and the number of records with those subjects from the GPO’s dataset

National security–United States 1,138 United States. Congress. House–Rules and practice 748 Terrorism–United States–Prevention 718 United States. Department of Defense–Appropriations and expenditures 631 United States 536 Social security–United States–Periodicals 487 Emergency management–United States 485 Medicare 441 Consumer protection–United States 417 Wisconsin–Maps 406

Now take a look at the top ten subjects and their counts for the Portal.

Places 310,404 United States 306,597 Texas 305,551 Business, Economics and Finance 248,455 Communications 223,783 Newspapers 221,422 Advertising 218,527 Journalism 217,737 Landscape and Nature 76,308 Geography and Maps 70,742

So with the entropy value,  you can read a lower number to be more like the Portal’s subjects and the higher number to be more like GPO’s.  At the extreme,  a value of 1.0 would mean that every subject is used by one record, and value of 0 would mean that there is only one subject with all of the records using said subject.

Shared Subjects

In creating the table above I had to work out the number subjects that a hub has uniquely.  In doing so I went ahead and calculated this number for the whole dataset to find out how much subject overlap occurs.

The table below displays the breakdown of how subjects are distributed across Hub collection.  For example if two Hubs have the subject “Laws of Texas” then it is said to be shared by two Hubs.  The breakdown for the metadata in the DPLA is as follows.

# of Hubs with subject Count 1 1,717,512 2 114,047 3 21,126 4 8,013 5 3,905 6 2,187 7 1,330 8 970 9 689 10 494 11 405 12 302 13 245 14 199 15 152 16 117 17 63 18 62 19 32 20 20 21 7 22 7

Most of the subjects 1,717,512 to be exactly occur in only one Hub’s collection.

There are seven different subjects that are common across 22 of the 23 Hubs in the DPLA metadata dataset,  if you are curious,  theses subjects are the following:

There should be one final post in this series where I can hopefully suggest what we should do with this data.

Again, if you want to chat about this post,  hit me up on Twitter.


District Dispatch: House Committee holds hearing on network neutrality

planet code4lib - Wed, 2015-02-25 09:05

On the eve of the FCC’s expected vote in favor of strong net neutrality rules, the House committee provided a preview of the challenges ahead in defending the open Internet. Republicans on the House Energy and Commerce Committee today expressed concerns at its hearing, The Uncertain Future of the Internet, ahead of the FCC’s anticipated net neutrality vote. The FCC is expected tomorrow to approve an Order reclassifying Internet service providers as telecommunications services subject to a higher level of regulation.

Anyone who tuned in to today’s hearing at the House Energy and Commerce Committee hearing would have come away with two vastly divergent views of where the expected FCC ruling on net neutrality will be taking the internet. Congress, let alone the public, has yet to learn the details of what the FCC will be voting on forcing today’s hearing to focus on broad net neutrality principals.

ALA has urged the FCC to approve reclassification as the only means of ensuring a fair and open internet.

In his opening statement, Chairman Greg Walden (R-OR) noted that the FCC adoption “may not ultimately provide net neutrality protections for American consumers; that might lay the groundwork for future regulation of the Internet; that may raise rates for the American Internet users; and that could stymie Internet adoption, innovation, and investment.”

Several Republicans on the Committee echoed the Chairman’s concern calling on the FCC to delay its vote and allow Congress to enact legislation which, they argue, should have authority to make reclassification decisions over broadband providers. Republicans in the House and Senate are circulating draft legislation intended to supersede FCC action but have to date attracted any Democrats willing to co-sponsor legislation.

Democrats on the Committee, led by ranking member Anna Eshoo (D-CA), counter that the FCC should proceed with regulating the internet providers as the only means of ensuring network neutrality.

One thing is certain with network neutrality: Congress will be talking about this issue for some time. Republicans in the House and Senate have hinted that they intend to maintain pressure on the FCC through legislation, oversight and budget hearings, and possible investigations into White House actions appearing to pressure the FCC, an independent agency, towards adopting a particular course of action by approving reclassification.

The post House Committee holds hearing on network neutrality appeared first on District Dispatch.

DuraSpace News: Universidad de la Sabana: Colombia Evolves Its Institutional Repository

planet code4lib - Wed, 2015-02-25 00:00

Chía, Colombia  Universidad de la Sabana´s institutional repository provides services to the university’s researchers, students and the academic community of this prestigious Colombian University. Arvo Consultores has helped the institution upgrade their repository to Dspace 4.2.  Improvements include an enhanced interface, with a new adaptive and responsive interface –the first Mirage2 interface in a Colombian institution.

HangingTogether: New MARC Usage Data Available

planet code4lib - Tue, 2015-02-24 21:40

I just finished updating my “MARC Usage in WorldCat” web site that summarizes and reports on how MARC elements have been used in the 333,518,928 MARC records in WorldCat as of 1 Jan 2015.

Not surprisingly, the totals for new fields such as 336, 337, and 338 have shot up, from some 9-10 million occurrences in January 2014 to 40-50 million occurrences in January 2015.

Also, it appears that well over 33 million records have come in through the Digital Collection Gateway.

As always, if you wish to see the summarized contents of any subfield just let me know. And don’t forget about the visualizations (pictured).

About Roy Tennant

Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.

Mail | Web | Twitter | Facebook | LinkedIn | Flickr | YouTube | More Posts (86)

FOSS4Lib Upcoming Events: CollectionSpace Walkthrough March 2015

planet code4lib - Tue, 2015-02-24 20:53
Date: Friday, March 27, 2015 - 12:00 to 13:00Supports: CollectionSpace

Last updated February 24, 2015. Created by Peter Murray on February 24, 2015.
Log in to edit this page.

From the announcement:

Curious about CollectionSpace?

Join Megan Forbes, Community Outreach Manager, for the first in a series of bi-monthly walkthroughs. More than just a demo of features and functionality, the walkthrough will:

FOSS4Lib Upcoming Events: CollectionSpace Open House Feb 2015

planet code4lib - Tue, 2015-02-24 20:49
Date: Friday, February 27, 2015 - 12:00 to 13:00Supports: CollectionSpace

Last updated February 24, 2015. Created by Peter Murray on February 24, 2015.
Log in to edit this page.

From the announcement:

Curious about CollectionSpace?
Join the program staff, leadership, functionality, and technical working group members, implementers, and special guests for a bi-monthly open house. Bring your questions (functional, technical, operational), your ideas, and your projects to this free-ranging conversation.

Mark E. Phillips: DPLA Metadata Analysis: Part 1 – Basic stats on subjects

planet code4lib - Tue, 2015-02-24 18:01

One a recent long flight (from Dubai back to Dallas) I spent some time working with the metadata dataset that the Digital Public Library of American’s (DPLA) provides on its site.

I was interested in finding out the following pieces of information.

  1. What is the average number and standard deviation of subjects-per-record in the DPLA
  2. How does this number compare across the partners?
  3. Is there any different that we can notice between Service-Hubs and Content-Hubs in the DPLA in relation to the subject field usage.
Building the Dataset

The DPLA makes the full dataset of their metadata available for download as single file, and I grabbed a copy before I left the US because I knew it was going to be a long flight.

With a little work I was able to parse all of the metadata records and extract some information I was interested in working with, specifically the subjects for records.

So after parsing through the records to get a list of subjects per record and the Service-Hub or Content-Hub that the record belongs to I loaded this information into Solr to use for analysis. We are using Solr for another research project related to metadata analysis at the UNT Libraries (in addition to our normal use of Solr for a variety of search tasks) so I wanted to work on some code that I could use for a few different projects.

Loading the records into the Solr index took quite a while (loading ~1,000 documents per second into Solr).

So after a few hours of processing I had my dataset and I was able to answer my first question pretty easily using Solr’s built-in statsComponent functionality.  For a description of this view the documentation on Solr’s documentation site.

Answering the questions

The average number of subjects per record in the DPLA = 2.99 with a standard deviation of 3.90. There are records with 0 subjects (1,827,276) and records with as many as 1,476 subjects (this record btw).

Answering question number two involved a small script to create a table for us, you will find that table below.

Hub Name min max count sum sumOfSquares mean stddev ARTstor 0 71 56,342 194,948 1351826 3.460083064 3.467168662 Biodiversity Heritage Library 0 118 138,288 454,624 3100134 3.287515909 3.407385646 David Rumsey 0 4 48,132 22,976 33822 0.477353943 0.689083212 Digital Commonwealth 0 199 124,804 295,778 1767426 2.369940066 2.923194479 Digital Library of Georgia 0 161 259,640 1,151,369 8621935 4.43448236 3.680038874 Harvard Library 0 17 10,568 26,641 88155 2.520912188 1.409567895 HathiTrust 0 92 1,915,159 2,614,199 6951217 1.365003637 1.329038361 Internet Archive 0 68 208,953 385,732 1520200 1.84602279 1.966605872 J. Paul Getty Trust 0 36 92,681 32,999 146491 0.356049244 1.20575216 Kentucky Digital Library 0 13 127,755 26,009 82269 0.203584987 0.776219692 Minnesota Digital Library 1 78 40,533 202,484 1298712 4.995534503 2.661891328 Missouri Hub 0 139 41,557 97,115 606761 2.336910749 3.023203782 Mountain West Digital Library 0 129 867,538 2,641,065 17734515 3.044321978 3.34282307 National Archives and Records Administration 0 103 700,952 231,513 1143343 0.330283671 1.233711342 North Carolina Digital Heritage Center 0 1,476 260,709 869,203 8394791 3.333996908 4.591774892 Smithsonian Institution 0 548 897,196 5,763,459 56446687 6.423857217 4.652809633 South Carolina Digital Library 0 40 76,001 231,270 1125030 3.042986277 2.354387181 The New York Public Library 0 31 1,169,576 1,996,483 6585169 1.707014337 1.648179106 The Portal to Texas History 0 1,035 477,639 5,257,702 69662410 11.00768991 4.96771802 United States Government Printing Office (GPO) 0 30 148,715 457,097 1860297 3.073644219 1.749820977 University of Illinois at Urbana-Champaign 0 22 18,103 67,955 404383 3.753797713 2.871821391 University of Southern California. Libraries 0 119 301,325 863,535 4626989 2.865792749 2.672589058 University of Virginia Library 0 15 30,188 95,328 465286 3.157811051 2.332671249

The columns are min which is the minimum number of subjects per record for a given Hub,  Minnesota Digital Library stands out here as the only Hub that has at least one subject for each of their 40,533 items.  The column max shows the highest number of subjects per record.  Two groups, The Portal to Texas History and North Carolina Digital Heritage Center have at least one record with over 1,000 subject headings. The column count is the number of records that each Hub had when the analysis was performed. The column sum is the total number of subject values for a given Hub,  note this is not the number of unique subject, that information is not present in this dataset. The column mean shows the average number of subjects per Hub and stddev is the standard deviation from this number.  The Portal to Texas History is at the top end of the average with 11.01 subjects per record and the Kentucky Digital Library is on the low end with 0.20 subjects per record.

The final question was if there were differences between the Service-Hubs and the Content-Hubs, that breakdown is in the table below.

Hub Type                                  min max count sum sumOfSquares mean stddev Content-Hub 0 548 5,736,178 13,207,489 84723999 2.302489393 3.077118385 Service-Hub 0 1,476 2,276,176 10,771,995 109293849 4.73249652 5.061612337

It appears that there is a higher number of subjects per record for the Service-Hubs over the Content-Hubs,  over 2x with 4.73 for Service-Hubs and 2.30 for Content-Hubs.

Another interesting number is that there are 1,590,456 records contributed by Content-Hubs, 28% of that collection that do not have subjects compared to 236,811 records contributed by Service-Hubs or 10% that do not have subjects.

I think individually we can come up with reasons that these numbers differ the ways they do. There are reasons to all of this, where did the records come from, were they generated as digital resource metadata records initially or using an existing set of practices such as AACR2 in the MARC format? How does that change the numbers? Are there things that the DPLA is doing to the subjects when they normalize them that change the way they are represented and calculated? I know that for The Portal to Texas History some of our subject strings are being split into multiple headings in order to improve retrieval within the DPLA and are thus inflating our numbers a bit in the tables above. I’d be interested to chat with anyone interested in this topic who has some “here’s why” explanations to the numbers above.

Hit me up on Twitter if you want to chat about this.


Subscribe to code4lib aggregator