You are here

Feed aggregator

Eric Lease Morgan: What is old is new again

planet code4lib - Thu, 2015-10-22 10:40

The “how’s” of librarianship are changing, but not the “what’s”.

(This is an outline for my presentation given at the ADLUG Annual Meeting in Rome (October 21, 2015). Included here are also the one-page handout and slides, both in the form of PDF documents.)

Linked Data

Linked Data is a method of describing objects, and these objects can be the objects in a library. In this way, Linked Data is a type of bibliographic description.

Linked Data is a manifestation of the Semantic Web. It is an interconnection of virtual sentences known as triples. Triples are rudimentary data structures, and as the name implies, they are made of three parts: 1) subjects, 2) predicates, and 3) objects. Subjects always take the form of a URI (think “URL”), and they point to things real or imaginary. Objects can take the form of a URI or a literal (think “word”, “phrase” or “number”). Predicates also take the form of a URI, and they establish relationships between subjects and objects. Sets of predicates are called ontologies or vocabularies and they present the languages of Linked Data.

Through the curation of sets of triples, and through the re-use of URIs, it is often possible to make explicit assuming information and new knowledge.

There are an increasing number of applications enabling libraries to transform and convert their bibliographic data into Linked Data. One such application is called the ALIADA.

When & if the intellectual content of libraries, archives, and museums is manifested as Linked Data, then new relationships between resources will be uncovered and discovered. Consequently, one of the purposes of cultural heritage institutions will be realized. Thus, Linked Data is a newer, more timely method of describing collections; what is old is new again.

Curation of digital objects

The curation of collections, especially in libraries, does not have to be limited to physical objects. Increasingly new opportunities regarding the curation of digital objects represent a growth area.
With the advent of the Internet there exists an abundance of full-text digital objects just waiting to be harvested, collected, and cached. It is not good enough to link and point to such objects because links break and institutions (websites) dissolve.

Curating digital objects is not easy, and it requires the application of traditional library principles of preservation in order to be fulfilled. It also requires systematic organization and evaluation in order to be useful.

Done properly, there are many advantages to the curation of such digital collections: long-term access, analysis & evaluation, use & re-use, and relationship building. Examples include: the creation of institutional repositories, the creation of bibliographic indexes made up of similar open access journals, and the complete works of an author of interest.

In the recent past I have created “browsers” used to do “distant reading” against curated collections of materials from the HathiTrust, the EEBO-TCP, and JSTOR. Given a curated list of identifiers each of the browsers locally caches the full text of digital object object, creates a “catalog” of the collection, does full text indexing against the whole collection, and generates a set of reports based on the principles of text mining. The result is a set of both HTML files and simple tab-delimited text files enabling the reader to get an overview of the collection, query the collection, and provide the means for closer reading.

How can these tools be used? A reader could first identify the complete works of a specific author from the HathiTrust, say, Ralph Waldo Emerson. They could then identify all of the journal articles in JSTOR written about Ralph Waldo Emerson. Finally the reader could use the HathiTrust and JSTOR browsers to curate the full text of all the identified content to verify previously established knowledge or discover new knowledge. On a broader level, a reader could articulate a research question such as “What are some of the characteristics of early American literature, and how might some of its authors be compared & contrasted?” or “What are some of the definitions of a ‘great’ man, and how have these definitions changed over time?”

The traditional principles of librarianship (collection, organization, preservation, and dissemination) are alive and well in this digital age. Such are the “whats” of librarianship. It is the “hows” of the librarianship that need to evolve in order the profession to remain relevant. What is old is new again.

William Denton: Metal librarianship

planet code4lib - Thu, 2015-10-22 00:47

A short pointer to Netanel Ganin’s blog post Some help for the Iron Maiden researchers, which gives MARC-formatted LCSH subject headings for Iron Maiden songs. Metal librarianship! Enough with the cats and cardigans, people. Throw the horns.

District Dispatch: Getting down to business on Policy Revolution!

planet code4lib - Wed, 2015-10-21 20:11

George Washington University School of Business, Washington, D.C. — Source: Wikimedia

In November 2013, the ALA Office for Information Technology Policy – rabble rousers that we are – started a revolution…a policy revolution, that is. Realizing that perceptions of what libraries do among decision makers and the public do not always reflect reality, we spearheaded an effort by that very name (Policy Revolution!) to increase the library community’s visibility and capacity for engagement in national policymaking. One of the most significant products to come from this Gates-funded initiative to date is a public policy agenda outlining national priorities for the library community across numerous policy arenas.

Now that the agenda is public, we’re focused on advancing its priorities through purposeful, organized advocacy and thoughtful collaboration with decision makers and influencers across all sectors. One of the first steps we have taken to affix rubber to road on Policy Revolution! is to zero in on areas of policy we feel the library community is best positioned to impact in the coming years.

One such area is small business and entrepreneurship. In a recent post celebrating National Start-Up Day across America, I outlined a number of the ways in which library small business and entrepreneurship services advance the innovation economy. The post describes library activities and programs that help entrepreneurs and aspiring entrepreneurs gain access to capital; access critical information about starting a business; prototype ideas for new products, and more. To advance the Policy Revolution! initiative, we must figure out how to leverage and expand upon the great work that’s being done in libraries on small business and entrepreneurship. It is with this goal in mind that I attended the 6th annual October Entrepreneurship Research & Policy Conference at George Washington University last week.

Delivering one of the keynote speeches, Winslow Sargeant, the Small Business Administration’s Chief Counsel for Advocacy from August 2010 until January 2015, noted that there are nearly 28.5 million small businesses– employing over 56 million people – in the United States. He described research institutions as critical “centers of knowledge” that catalyze the innovation these small business firms drive forward.

After his talk, Dr. Sargeant assured me that libraries fall within his definition of “centers of knowledge.” He touted libraries as repositories of information that entrepreneurs use to create innovative products and services. The longer I work on small business and entrepreneurship issues for ALA, the more convinced I become that libraries – school, public and academic alike – are the best kind of “knowledge centers” for advancing the innovation economy.

Dr. Sargeant asserted: “Research is the transformation of money into knowledge, and innovation is the transformation of knowledge into money.” Libraries democratize this proposition. With nothing more than your library card, your drive and your imagination, you can create knowledge that breeds innovation. In short, libraries replace “money” with “information,” thus: “Research is the transformation of information into knowledge, and innovation is the transformation of knowledge into money.”

This seems a solid premise from which to start our Policy Revolution! work to raise the library community’s profile in the entrepreneurship/small business arena. In the coming months, we will trumpet library leadership in this arena and work with leaders across the public, private and non-profit sectors to identify projects and activities that will expand the library community’s capacity to help all people participate in the innovation economy.

Read more about Policy Revolution on District Dispatch here.

The post Getting down to business on Policy Revolution! appeared first on District Dispatch.

Jonathan Rochkind: Blacklight Community Survey Results

planet code4lib - Wed, 2015-10-21 19:19

On August 20th I announced a Blacklight Community Survey to the blacklight and code4lib listservs, and it was also forwarded on to the hydra listserv by a member.

Between August 20th and September 2nd, I received 18 responses. After another week of no responses, I shut off the survey. It’s taken me until now to report the results, sorry!

The Survey was implemented using Google Docs. You can see the survey instrument here, access the summary of results from Google Docs here, and the complete spreadsheet of responses here.  The survey was intentionally anonymous.

My own summary with limited discussion follows below. 

Note: The summary of results incorrectly reports 24 responses rather than 18; I accidentally didn’t delete some test data before releasing the survey, and had no way to update the summary count. However, the spreadsheet is accurate; and the summaries for individual questions are accurate (you’ll see they each add up to 18 responses or fewer), except for the Blacklight version questions which have a couple test answers in the summary version. Sorry!

I am not sure if 18 responses should be considered a lot or a little, or what percentage of Blacklight implementations it represents. It should definitely not be considered a valid statistical sample; I think of it more like getting together people who happen to be at a conference to talk about their experiences with Blacklight, but I think such a view into Blacklight experiences is still useful.

I do suspect that Hydra gets more use than these results would indicate, and Hydra users of Blacklight are under-represented. I’m not sure why, but some guesses might be that Hydra implementations of blacklight are disproportionately done by vendors/contractors, or are more likely to be “release and forget about it” implementations — in either case meaning the host institutions are less likely to maintain a relationship to the Blacklight community, and find out about or care to respond to the survey.

Institutional Demographics and Applications

The majority (12 out of 18) respondents are Academic Libraries. Along with one public library, one museum, one vendor/contractor, two national libraries or consortiums, and one ‘other’.

I was unsurprised to see that the majority of use of Blacklight is for “special collection” or “institutional repository” type use. Only 1/3rd of respondents use Blacklight for a “Library catalog/discovery” application, with the rest “A Single special-purpose collection” (5 of 18), “Institutional/Digital collections repository (multiple collections)” (11, the majority of 18 respondents), or “Other” (4).

At my place of work, when we first adopted Blacklight the primary use case for existing implementations and developers were library catalog/discovery, but I had seen the development efforts mostly focusing on other use cases lately, and it makes sense to see a shift in uses to majority “repository” or “special-purpose collection” uses along with that.

A majority (1o of 18) respondents run more than 1 Blacklight application, which I did find a bit surprising, but may go along with “repository” type use, where each repo or collection gets it’s own BL app?  6 respondents run only one BL app, and 2 respondents are working on BL app(s) in development not yet in production.

Only 3 respondents (including myself) use Blacklight to host “No digital content, just metadata records”; 3 more just digital content, and the remaining 12 (the majority) some of each.

A full 8 of 18 include at least some MARC-origin metadata in their apps, 2 more than the number reporting using their app for “Library catalog/discovery”. Not quite a majority, but it seems MARC is definitely not dead in BL-land. “Dublin Core” and “Content from a Fedora Repository”, at 9 respondents each, only barely beat out MARC.

With 9 respondents reporting using “Content from a Fedora Repo”, and 11 reporting “Institutional/Digital collections repository” I expected this would mean lots of Hydra use. But in a later question we’ll examine in more detail later, only 4 respondents reported using “hydra-head (hydra framework)” in their app, which I find surprising. I don’t know if this is accurate, or respondents missed or didn’t understand the checkbox at that later question.

Versions of Blacklight in Use, and Experience with Upgrading

Two respondents is actually still deploying an app with Blacklight 3.x.

Two more are still on Blacklight 4.x — one of those runs multiple apps with some of them already on 5.x but at least one not yet upgraded; the other runs only one app on BL 4.30.

The rest of respondents on all on a Blacklight 5.x, but they are diverse 5.x releases from 5.5 to 5.14.  At the time the survey data was collected, only four of 18 respondents had crossed the BL 5.12 boundary, where lots of deprecations and refactorings were introduced. 5.12 had been released for about 5 months at that point.  That is, many months after a given BL version was released, most BL implementations (at least in this sample) still had not upgraded to it.

Just over half of respondents, 10 of 18 have never actually upgraded a Blacklight app across a major version (eg 3.x to 4.x or 4.x to 5.x); the other 8 have.

Oddly, the two respondents reporting themselves to be still running at least one BL 3.x app also said they did have experience upgrading a BL app across major versions. Makes me wonder why some of their apps are still on 3.x. None of the respondents still deploying 4.x said they had experience upgrading a BL app across a major version.

It seems that BL apps are in general not being quickly upgraded to keep up with BL releases. Live production BL deployments in the wild use a variety of BL versions, even across major versions, and some may have never been upgraded since install.

Solr Versions In Use

Only 16 of 18 respondents reported the version of Solr they are using (actually we asked for the lowest version of Solr they were using, if they had multiple Solrs used with BL).

A full 14 of these 16 are using some variety of Solr 4.x, with a large variety of 4.x Solrs in use from 4.0 to 4.10.

No respondents were still running Solr 3.x, but one poor soul is still running Solr 1.4. And only one respondent was running a Solr 5.x. It sounds like it may be possible for BL to drop support for Solr 3.x (or has that already happened), but requiring Solr 5.x would probably be premature.

I’m curious how many people have upgraded their Solr, and how often; it may be that the preponderance of Solr 4.x indicates that most installations were first deployed when Solr was in 4.x.

Rails Versions in Use

Four of 18 respondents are still using Rails 3.x, the rest have upgraded to 4.x — although not all to 4.2.

Those using Rails 3.x also tended to be the ones still reporting old BL versions in use, including BL 3.x.  I suspect this means that a lot of installations get deployed and never have any dependencies upgraded. Recall 10 of 18 respondents have never upgraded BL across a major version.  Although many of the people reporting running old Rails and old BL have upgraded BL across a major version (I don’t know if this means they used to be running even older versions of BL, of that they’ve upgraded some but not others).

If it isn’t broke don’t fix it might sometimes work, for a “deploy and done” project that never receives any new features or development. But I suspect a lot of these institutions are going to find themselves in trouble when they realize they are eventually running old unsupported versions of Rails, ruby, or BL, especially if a security vulnerability is discovered.  Even if a backport security patch is released for an old unsupported Rails or ruby version they are using (no guarantee), they may lack local expertise to actually apply those upgrades; or upgrading Rails may require upgrading BL as well to work with later Rails, which can be a very challenging task.

Local Blacklight Development Practices and Dependencies

A full 16 of 18 respondents report apps that include locally-developed custom features. 1 more respondent didn’t answer, only 1 said their app(s) did not.

I was surprised to see that only 2 respondents said they had hired a third-party vendor or contractor to install, configure, or develop a BL app. 2 more had hired a contractor; and 2 more said they were vendors/contractors for others.

I know there are people doing a healthy business in Blacklight consulting, especially Hydra; I am guessing that most of their clients are not enough involved in the BL community to see and/or want to answer this survey. (And I’m guessing many of those installations, unless the vendor/contractor has a maintenance contract, were also “deploy and ignore” installations which have not been upgraded since release).

So almost everyone is doing local implementation of features, but not by hiring a vendor/contractor, actually doing them in-house.

I tried to list every Blacklight plugin gem I could find distributed, and ask respondents which they used. The leaders were blacklight_advanced_search (53%) and blacklight_range_limit (39%).  Next were geoblacklight and hydra-head, each with 4 respondents (31%) claiming use. Again, I’m mystified how so few respondents can be using hydra-head when so many report IR/fedora uses. No other plugin got more than 3 respondents claiming use. I was surprised that only one respondent claimed sufia use.

Blacklight Satisfaction and Evaluation

Asking how satisfied you are with blacklight, on a scale of 1 (least) to 5 (most), the median score was 4, pretty respectable.

Looking at free form answers for what people like, don’t like, or want from Blacklight.

A major trend in what people like is Blacklight’s flexibility, customizability, and extensibility:

  • “The easily extendable and overridable features make developing on top of Blacklight a pleasure.”
  • “…Easy to configure faceting and fields.”
  • “…the ability to reuse other community plugins.”
  • “The large number of plugins that enhance the search experience…”
  • “We have MARC plus lots of completely randomly-organized bespoke cataloging systems. Blacklight committed from the start to be agnostic as to the source of records, and that was exactly what we needed. The ability to grow with Blacklight’s feature set from back when I started using it, that was great…”
  • “Easily configurable, Easily customizable, Ability to tap into the search params logic, Format specific partial rendering”

Major trends in what people don’t like or find most challenging about Blacklight is difficulty of upgrading BL:

  • “When we have heavily customized Blacklight applications, upgrading across major versions is a significant stumbling block.”
  • “Being bound together with a specific Bootstrap causes enormous headaches with updating”
  • “Upgrades and breaking of backwards compatibility. Porting changes back into overridden partials because much customization relies on overriding partials. Building custom, complicated, special purpose searches using Blacklight-provided methods [is a challenge].”
  • “Upgrading is obviously a pain-point; although many of the features in newer versions of Blacklight are desirable, we haven’t prioritized upgrading our internal applications to use the latest and greatest.”
  • “Varied support for plugins over versions [is a challenge].”
  • “And doing blacklight upgrades, which usually means rewriting everything.”
  • “Rapid pace of development. New versions are released very quickly, and staying up to date with the latest version is challenging at times. Also, sometimes it seems that major changes to Blacklight (for example, move from Bootstrap 2 to Bootstrap 3) are quasi-dictated by needs of one (or a handful) of particular institutions, rather than by consensus of a wider group of adopters/implementors. Also, certain Blacklight plugins get neglected and start to become less and less compatible with newer versions of Blacklight, or don’t use the latest methods/patterns, which makes it more of a challenge to maintain one’s app.”
  • “Getting ready for the upgrade to 6.0. We’ve done a lot of local customizations and overrides to Blacklight and some plugins that are deprecated.”

As well as difficulty in understanding the Blacklight codebase:

  • “Steep learning curve coming from vanilla rails MVC. Issues well expressed by B Armintor here:”
  • “Code churn in technical approach (often I knew how something was done but find out it has changed since the last time I looked). Can sometimes be difficult to debug the source given the layers of abstraction (probably a necessary evil however).”
  • “Too much dinking around and mucking through lengthy instructions and config files is required to do simple things. BL requires someone with substantial systems skills to spend a lot of time to use — a luxury most organizations don’t have. Skinning BL is much more painful than it needs to be as is making modifications to BL behaviors. BL requires far more time to get running and has more technical/skill dependencies than other things we maintain. In all honesty, what people here seem to like best about BL is actually functionality delivered by solr.”
  • “Figuring out how to alter blacklight to do our custom development.”
  • “Understanding and comprehension of how it fits together and how to customise initially.”
  • “Less. Simplicity instead of more indirection and magic. While the easy things have stayed easy anything more has seemed to be getting harder and more complicated. Search inside indexing patterns and plugin. Better, updated, maintained analytics plugin.”
  • “A more active and transparent Blacklight development process. We would be happy to contribute more, but it’s difficult to know a longer-term vision of the community.”
What does it mean?

I’ve separated my own lengthy interpretation, analysis, and evaluation based on my own personal judgement into a subsequent blog post. 

Filed under: General

David Rosenthal: ISO review of OAIS

planet code4lib - Wed, 2015-10-21 17:22
ISO standards are regularly reviewed. In 2017, the OAIS standard ISO14721 will be reviewed. The DPC is spearheading a praiseworthy effort to involve the digital preservation community in the process of providing input to this review, via this Wiki.

I've been critical of OAIS over the years, not so much of the standard itself, but of the way it was frequently mis-used. Its title is Reference Model for an Open Archival Information System (OAIS), but it is often treated as if it were entitled The Definition of Digital Preservation, and used as a way to denigrate digital preservation systems that work in ways the speaker doesn't like by claiming that the offending system "doesn't conform to OAIS". OAIS is a reference model and, as such, defines concepts and terminology. It is the concepts and terminology used to describe a system that can be said to conform to OAIS.

Actual systems are audited for conformance to a set of OAIS-based criteria, defined currently by ISO16363. The CLOCKSS Archive passed such an audit last year with flying colors. Based on this experience, we identified a set of areas in which the concepts and terminology of OAIS were inadequate to describe current digital preservation systems such as the CLOCKSS Archive.

I was therefore asked to inaugurate the DPC's OAIS review Wiki with a post that I entitled The case for a revision of OAIS. My goal was to encourage others to post their thoughts. Please read my post and do so.

DPLA: New DPLA browser search options

planet code4lib - Wed, 2015-10-21 14:55

Thanks to the excellent work of DPLA Community Rep Shaun Akhtar (thanks, Shaun!), Firefox and Internet Explorer users can make use of a new OpenSearch plugin that will add the DPLA as one of your browser’s known search providers. Firefox users may also install it directly through the Mozilla Add-ons site. This is valuable because it gets you to DPLA content faster and more often.

Install the DPLA OpenSearch plugin to search our catalog right from your browser’s search tool (pictured: Firefox)

There is also a new route for users of DuckDuckGo, a privacy-friendly search engine. DuckDuckGo’s “bang” (“!”) operators let you directly search particular sites – for example, the query “!w Jesse Owens” will directly search Wikipedia for “Jesse Owens”. A new operator has been registered for DPLA, “!dpla”, and you can try it out in DuckDuckGo with “!dpla Jesse Owens” or “!dpla [the query of your choice]”.

To check out Shaun on GitHub, click here.

To learn more about how you can develop using DPLA’s open API or data sets, check out our Developers section.

Lastly, if you’ve developed an app or tool that makes use of DPLA data in some way, drop us a line! We’d love to hear about it (and maybe add it to our App Library).

Harvard Library Innovation Lab: Link roundup October 21, 2015

planet code4lib - Wed, 2015-10-21 14:44

A little late to the party, but happy to be using the taco emoji!

How Taco Emoji—and Hittite Hieroglyphs—Get to Your Screen | Mental Floss

How emoji come to be

The Internet’s Dark Ages

the web … was intended to be a messaging system, not a library

An Error Leads to a New Way to Draw, and Erase, Computing Circuits

Etch A Sketch for circuits. Draw a circuit with light, then erase it (with light) if you want to draw a new path.

Searching the world for original Pizza Hut buildings

Any Library Huts out there?

Will digital books ever replace print?

Come on digital book world! You have so much potential. Keep innovating.

Conal Tuohy: Bridging the conceptual gap: Museum Victoria’s collections API and the CIDOC Conceptual Reference Model

planet code4lib - Wed, 2015-10-21 14:44

A Museum Victoria LOD graph about a teacup, shown using the LODLive visualizer.This is the third in a series of posts about an experimental Linked Open Data (LOD) publication based on the web API of Museum Victoria.

The first post gave an introduction and overview of the architecture of the publication software, and the second dealt quite specifically with how names and identifiers work in the LOD publication software.

In this post I’ll cover how the publication software takes the data published by Museum Victoria’s API and reshapes it to fit a common conceptual model for museum data, the “Conceptual Reference Model” published by the documentation committee of the Internal Council of Museums. I’m not going to exhaustively describe the translation process (you can read the source code if you want the full story), but I’ll include examples to illustrate the typical issues that arise in such a translation.

The CIDOC Conceptual Reference Model

The CIDOC CRM, as it’s usually called, is a system of concepts for analysing and describing the content of museum collections. It is not intended to be a replacement for the Collection Management Systems which museums use to store their data; it is rather intended to function as a kind of lingua franca, through which content from a variety of systems can be expressed in a generally intelligible way.

The Conceptual Reference Model covers a wide range of museological concerns: items can be described in terms of their materials and mode of construction, as well as by who made them, where and when, and for what purpose.

The CRM also provides a framework to describe the events in which objects are broken into pieces, or joined to other objects, damaged or repaired, created or utterly destroyed. Objects can be described in terms of the symbolic and intellectual content which they embody, which are themselves treated as “intellectual objects”. The lineage of intellectual influence can be described, either speculatively, in a high-level way, or by explicitly tracing and documenting the influences that were known have taken place at particular times and locations. The legal history of objects can also be traced through transfer of ownership and custody, commission, sale and purchase, theft and looting, loss and discovery. Where the people involved in these histories are known, they too can be named and described and their histories interwoven with those of other people, objects, and ideas.

Core concepts and additional classification schemes

The CRM framework is quite high level. Only a fairly small number of very general types of thing are defined in the CRM; only concepts general enough to be useful for any kind of museum; whether a museum of computer games or of classical antiquity. Each of these concepts is identified by an alphanumeric code and an English-language name. In addition, the CRM framework allows for arbitrary typologies to be added on, to be used for further classifying pretty much anything. This is to allow all the terms from any classification system used in a museum to be exported directly into a CRM-based dataset, simply by describing each term as an “E55 Type". In short, the CRM consists of a fairly fixed common core, supplemented by a potentially infinite number of custom vocabularies which can be used to make fine distinctions of whatever kind are needed.

Therefore, a dataset based on the CRM will generally be directly comparable with another dataset only in terms of the core CRM-defined entities. The different classification schemes used by different datasets remain “local” vocabularies. To achieve full interoperability between datasets, these distinct typologies would additionally need to be aligned, by defining a “mapping” table which lists the equivalences or inequivalences between the terms in the two vocabularies. For instance, such a table might say that the term “moulded” used in Museum Victoria’s collection is more or less the same classification as “molding (forming)” in the Getty Art and Architecture thesaurus.

Change happens through “events”

To model how things change through time, the CRM uses the notion of an “event”. The production of a physical object, for instance, is modelled as an E12 Production event (NB concepts in the CRM are all identified by an alphanumeric code). This production event is linked to the object which it produced, as well as to the person or persons who played particular creative roles in that event. The event may also have a date and place associated with it, and may be linked to the materials and to the method used in the event.

On a somewhat philosophical note, this focus on discrete events is justified by the fact that not all of history is continuously documented, and we necessarily have a fragmentary knowledge of the history of any given object. Often a museum object will have passed through many hands, or will have been modified many times, and not all of this history is known in any detail. If we know that person A created an object for person B, and that X years later the object turned up in the hands of person C, we can’t assume that the object remained in person B’s hands all those X years. A data model which treated “ownership” as a property of an object would be liable to making such inflated claims to knowledge which is simply not there. Person C may have acquired it at any point during that period, and indeed there may have been many owners in between person B and person C. This is why it makes sense to document an object’s history in terms of the particular events which are known and attested to.

Museum Victoria’s API

How does Museum Victoria’s data fit in terms of the CIDOC model?

In general the model works pretty well for Museum Victoria, though there are also things in MV’s data which are not so easy to express in the CRM.


Museum Victoria describes items as “Things made and used by people”. These correspond exactly to the notion of E22 Man-Made Object in the CIDOC CRM (if you can excuse the sexist language), described as comprising “physical objects purposely created by human activity.”

Every MV item is therefore expressed as an E22 Man-Made Object.


Museum Victoria’s objects have an objectName property which is a simple piece of text; a name or title. In the CIDOC CRM, the name of an object is something more complex; it’s an entity in its own right, called an E41 Appellation. The reason why a name is treated as more than just a simple property of an object is that in the CRM, it must be possible to treat an object’s name as an historical phenomenon; after all, it will have been applied to an object by a particular person (the person who created the object, perhaps, or an archaeologist who dug it out of the ground, or a curator or historian), at some historical point in time. An object may have a number of different names, each given it by different people, and used by different people at different times.

However, because the Museum Victoria names are simple (a single label) we can ignore most of that complexity. We only need to define an E41 Appellation whose value is the name, and link the E41 Appellation to the E22 Man-Made Object using a P1 is identified by association.

Articles, Items and their relationships

The MV API provides access to a number of “articles” which are documents related to the Museum’s collection. For each article, the API shows a list of the related collection items; and for each item, you can get the corresponding list of related articles. Although the exact nature of the relationship isn’t made explicit, it’s reasonable to assume that an item is in some way documented by the articles that are related to it. In the CIDOC CRM, such an article is considered an E31 Document, and it bears a P70 documents relationship to the item which it’s about.

If the relationship between an item and article is easy to guess, there are a couple of other relationships which are a little less obvious: an article also has a list of related articles, and each item also has a list of related items. What is that nature of those relationships? In what way exactly does article X relate to article Y, or item A to item B? The MV API’s documentation doesn’t say, and it wouldn’t surprise me if the Museum’s collection management system leaves this question up to the curators’ judgement.

A bit of empirical research seemed called for. I checked a few of the related items and the examples I found seemed to fall into two categories:

  • One item is a photograph depicting another item (the specific relationship here is really “depicts”)
  • Two items are both photographs of the same subject (the relationship is “has the same subject as”).

Obviously there are two different kinds of relationship here in the Museum’s collection, both of them presented (through the API) in the same way. As a human, I can tell them apart, but my proxy software is not going to be able to. So I need to find a more general relationship which subsumes both the relationships above, and fortunately, the CIDOC CRM includes such a relationship, namely P130 shows features of.

This property generalises the notions of “copy of” and “similar to” into a dynamic, asymmetric relationship, where the domain expresses the derivative, if such a direction can be established.
Otherwise, the relationship is symmetric. It is a shortcut of P15 was influenced by (influenced) in a creation or production, if such a reason for the similarity can be verified. Moreover it expresses similarity in cases that can be stated between two objects only, without historical knowledge about its reasons.

For example, I have a photograph of a piece of computer hardware (which is the relatedItem), and the photo is therefore a kind of derivative of the hardware (though the Museum Victoria API doesn’t tell me which of the objects was the original and which the derivative). In another example I have two photos of the same house; here there’s a similarity which is not due to one of the photos being derived from the other.

Ideally, it would be preferable to be able to represent these kinds of relationships more precisely; for instance, in the case of the two photos of the house, one could generate a resource that denotes the actual physical house itself, and link that to the photographs, but because the underlying data doesn’t include this information in a machine-readable form, the best we can do is to say that the two photos are similar.

Production techniques

Some of the items in the Museum’s collection are recorded as having been produced using a certain “technique”. For instance, archaeological artefacts in the MV collection have a property called archeologyTechnique, which contains the name of a general technique, such as moulded, in the case of certain ceramic items.

This corresponds to the CRM concept P32 used general technique, which is described like so:

This property identifies the technique or method that was employed in an activity.
These techniques should be drawn from an external E55 Type hierarchy of consistent terminology of
general techniques or methods such as embroidery, oil-painting, carbon dating, etc.

Note that in CIDOC this “general technique” used to manufacture an object is not a property of the object iself; it’s a property of the activity which produced the object (i.e. the whole process in which the potter pressed clay into a mould, glazed the cup, and fired it in a kiln).

Note also that, for the CIDOC CRM, the production technique used in making these tea-cups is not the text string “moulded”; it is actually an abstract concept identified by a URI. The string “moulded” is just a human-readable name attached as a property of that concept. That same concept might very well have a number of other names in other languages, or even in English there’s the American variant spelling “molded”, and synonyms such as “cast” that could all be alternative names for the same concept.

Translating a Museum Victoria item with a technique into the CRM therefore involves identifying three entities:

  • the object itself (an E22 Man-Made Object);
  • the production of the object (an E12 Production activity);
  • the technique used in the course of that activity to produce the object (an E55 Type of technique)

These three entities are then linked together:

  • The production event “P32 used general technique" of the technique; and
  • The production event “P94 has created" the object itself.

The items, articles, specimens and species in the Museum’s API are all already first-class objects and can be easily represented as concepts in Linked Data. The archeologyTechnique field also has a fairly restricted range of values, and each of those values (such as “moulded”) can be represented as a Linked Data concept as well. But there are a number of other fields in the Museum’s API which are in the form of relatively long pieces of descriptive text. For example, an object’s objectSummary field contains a long piece of text which describes the object in context. For example, here’s the objectSummary of one our moulded tea cups:

This reconstructed cup was excavated at the Commonwealth Block site between 1988 and 2003. There is a matching saucer that was found with it. The pattern is known as 'Moss Rose' and was made between 1850 and 1851 by Charles Meigh, Son & Pankhurst in Hanley, Staffordshire, England.

Numerous crockery pieces were found all over the Little Lon site. Crockery gives us a glimpse of everyday life in Melbourne in the 1880s. In the houses around Little Lon, residents used decorated crockery. Most pieces were cheap earthenware or stoneware, yet provided colour and cheer. Only a few could afford to buy matching sets, and most china was probably acquired second-hand. Some were once expensive pieces. Householders mixed and matched their crockery from the great range of mass-produced designs available. 'Blue and white' and the 'willow' pattern, was the most popular choice and was produced by English potteries from 1790.

It’s not quite as long as an “article” but it’s not far off it. Another textual property is called physicalDescription, and has a narrower focus on the physical nature of the item:

This is a glazed earthenware teacup which has been reconstructed. It is decorated with a blue or black vine and leaf design around outside and inside of the cup which is known as 'Moss Rose' pattern.

The CIDOC CRM does include concepts related to the historical context and the physical nature of items, but it’s not at all easy to extract that detailed information from the descriptive prose of these, and similar fields. Because the information is stored in a long narrative form, it can’t be easily mapped to the denser data structure of a Linked Data graph. The best we can hope to do with these fields is to treat them as notes attached to the item.

The CIDOC CRM includes a concept for attaching a note: P3 has note. But to represent these two different types of note, it’s necessary to extend the CRM by creating two new, specialized versions (“sub-properties”) of the property called P3 has note, which I’ve called P3.1 objectSummary and P3.1 physicalDescription.


It’s possible to recognise three distinct patterns in mapping an API such as Museum Victoria’s to a Linked Data model like the CIDOC CRM.

  1. Where the API provides access to a set of complex data objects of a particular type, these can
    be mapped straight-forwardly to a corresponding class of Linked Data resources (e.g. the items, species, specimens, and articles in MV’s API).
  2. Where the API exposes a simple data property, it can be straightforwardly converted to a Linked Data property (e.g. the two types of notes, in the example above).
  3. Where the API exposes a simple data property whose values come from a fairly limited range (a “vocabulary”), then those individual property values can be assigned identifiers of their own, and effectively promoted from simple data properties to full-blown object properties (e.g. the production techniques in Museum Victoria’s API).

It’s been an interesting experiment, to generate Linked Open Data from an open API using a simple proxy: I think it shows that the technique is a very viable mechanism for institutions to break into the LOD cloud and contribute their collection in a standardised manner, without necessarily having to make any changes to their existing systems or invest in substantial software development work. To my mind, making that first step is a significant barrier that holds institutions and individuals back from realising the potential in their data. Once you have a system for publishing LOD, you are opening up a world of possibilities for external developers, data aggregators, and humanities researchers, and if your data is of interest to those external groups, you have the possibility of generating some significant returns on your investment, and the possibility of “harvesting” some of that work back into your institution’s own web presence in the form of better visualizations, discovery interfaces, and better understanding of your own collection.

Before the end of the year I hope to explore some further possibilities in the area of user interfaces based on Linked Data, to show some of the value that these Linked Data publishing systems can support.

In the Library, With the Lead Pipe: A Critical Take on OER Practices: Interrogating Commercialization, Colonialism, and Content

planet code4lib - Wed, 2015-10-21 10:30

Photo by Flickr user arbyreed (CC BY NC 2.0)

In Brief

Both Open Educational Resources (OER) and Open Access (OA) are becoming more central to many librarians’ work and the core mission of librarianship, in part because of the perceived relationship between openness and social justice. However, in our excitement about the new opportunities afforded by open movements, we might overlook structural inequalities present within these movements. In this article, I utilize some of the useful critiques OA has generated to inform the discussion of OER creation and practice. I then hone in on the conversation around OER specifically to suggest starting points for how librarians and other LIS professionals can construct more thoughtful OER practices.


This spring, the Association of College and Research Libraries (ACRL) held their 2015 biennial conference in Portland. While I attended multiple sessions and poster presentations on Open Access (OA) and Open Educational Resources (OER), Heather Joseph’s invited paper session, “Open Expansion: Connecting the Open Access, Open Data and OER Dots,” left the most lasting impression on me. Joseph’s presentation focused on the different embodiments of openness and how collaboration between the efforts could be transformative. While explaining the Open Data front, Joseph’s presentation stopped on a photo of an oil rig. A few slides later, she summarized politicians’ take on open data, explaining that while President Obama had called data a “valuable national commodity,” Dutch politician Neelie Kroes had gone a step further and named data “the new oil for the digital age” (Joseph, 2015; Kroes, 2012). Joseph (2015) went on to explain that Kroes’ assertion was that “national economies and national destinies [were] going to rise and fall on understanding how to get the most value from data.”

Right before I listened to Kroes’ words, which seemed so profoundly nationalistic and exploitative to me in that moment, I saw the photo of the rig and thought about western conquest and our pursuit of other nations’ natural resources. This sparked a deep realization within me. I found that all of the discussions I had engaged in about openness—including Joseph’s presentation—were about shared goals or shared politics. The shared risks were often left unaddressed. I started to consider how openness, when disconnected from its political underpinnings, could become as exploitative as the traditional system it had replaced. I began to reflect on the ways in which I had used, or experienced others’ use of, openness as a solution for poverty or development—often in a way that was disconnected from an understanding of systemic inequality.

This article, which is an intentional critique of OER praxis, has given me the space to explore these questions. OER are digital learning objects that are shared under “an intellectual property license that permits their free use and re-purposing by others.” Under this definition, learning objects can mean almost anything used as educational material, including tutorials, videos, guides, lesson plans, and syllabi. The Open Education movement is different than the OA movement, which is focused on the free and unrestricted use of research materials and literature. However, like Open Education, OA works to enable deeper unrestricted analysis so that scholars can read articles but also “crawl them for indexing, pass them as data to software, or use them for any other lawful purpose” (Chan, et. al, 2015, para 3).

This article uses a critique of OER creation and practice as a proxy for the open movement in LIS generally. Thus, it utilizes some of the useful critiques OA has engendered to inform the discussion of OER, which is less developed. While the intention is not to conflate OA or critiques of OA with OER, it is worth noting that both evoke a similar rhetoric of openness and, as such, share similarities that enable us to apply lessons learned in one domain into the other.

The first section will explore critiques of OER and openness in relation to commercialization, colonialism, and content. While not exhaustive, these critiques address issues of labor, the corporatization of higher education, oppressive learning formats, imperialism, and technocratic discourse around development and the information poor. This broad overview will provide a useful framework for understanding how openness generally—and Open Education specifically—can be improved.

I will then offer tangible suggestions for how librarians and other LIS professionals can construct more thoughtful OER practices. These include thinking critically about the language we use when engaging stakeholders; moving beyond cost and marketing for our institutions and focusing on open pedagogy and student-centered learning; using OER creation as an opportunity to talk to students about labor and knowledge production; and challenging whose knowledge matters globally. These are not meant to be “solutions” but instead starting points. I do not provide a suggestion for every critique but instead advocate for the use of open, critical pedagogy as a method for engaging with several of the critiques mentioned, as it can make our practices more deliberate and authentically engage students in issues of openness.

I believe that OER have value. I believe that equitable access to research and the data that accompanies that research is imperative and a goal our profession should continue working toward. But I also believe that it is worth our time to be intentional, to be cognizant of our position within increasingly corporatized institutions and consider how we might be furthering the goals of those institutions, to think seriously about how we can be actively dismantling power structures instead of perpetuating them, and to remind ourselves why we think open is worth fighting for in the first place. In explaining the difference between critique and criticism, author and screenwriter Balogun Ojetade (2012) writes, “Critique is not in service of a single ‘truth’…Critique opens questioning and makes single-truths unstable so as to be more inclusive of difference” (para 5). Our professional conversation around openness risks being in service of a single truth. My hope is that nuanced critique can help us move these conversations forward in a thoughtful way.

Critiques of OER & Openness Labor & the Commercialization of Higher Education

Academic labor is currently structured around tenure. In other words, tenure-track faculty members do not have to rely solely on dividends from their research output because their institution compensates them for doing research. However, as higher education increasingly relies on adjunct labor, this model is compromised. In order to offer more classes for less money, adjuncts are compensated by the number of courses they teach instead of their research output. As money is taken away from educators, how is the relationship between openness and labor changed (Drabinski, et. al, 2015)? Or, in more pointed terms, how does openness exasperate labor issues? Do institutions expect adjuncts to continue to create the same level of output a faculty member would, including OER creation?

One of the major critiques OA has received is that it can make labor become more invisible (Roh, Drabinski, & Inefuku, 2015). The invisibility of the labor required to do the actual work behind making a publication OA is often “distant” from the rhetoric behind why OA is important, creating a disconnect between values and practice (Drabinski, et. al, 2015). Further, less “academic” work that is fundamental to maintaining OA publications (metadata creation, for example) becomes devalued (Roh, Drabinski, & Inefuku, 2015). Matthew Cheney (2015) argues that we do open systems, including OER, a great disservice if we do not talk about the labor and technology structures needed to make them possible.

Thus, there are two important labor issues related to OER creation. The first is that OER creation is not rewarded in the current tenure system. Faculty members are often granted tenure because of their research impact, which might relate to OA but not OER. Further, beyond compensation, tenure provides (or has historically provided) some level of protection to take professional risks. As the concept of tenure becomes compromised and the number of positions having tenure-level protection decreases in the United States, the incentive for faculty to devote time to exploring OER creation is also compromised. The second is that adjuncts might be expected to create learning objects and even deposit them as OER but the current system does not reward them monetarily for the extra labor involved in doing so. If both parties continue to create OER, their labor might become unrecognized and devalued.

The way in which this academic labor is applied at an institutional level is also worth discussing. OA advocates have started to realize that OA, separated from its political underpinnings, can quickly become a governmental and commercial source of revenue (Lawson, 2015; Watters, 2014). In the case of OER creation specifically, openness can also become a source of branding and marketing for universities (Huijser, Bedford, & Bull, 2008). Librarians should continually question who benefits from supporting openness. We should then recognize that any open movement that happens within a neoliberal institution might further politics or initiatives that do not align with our values.  

Roxanne Shirazi (2015) recently wrote about librarians’ relationships with their employers, particularly as boosters of their university’s brand. While her post is focused on scholarly communication, labor, and copyright more broadly, Shirazi asserts that institutions are often more than willing to promote prestigious or interesting projects but “when it comes to financially and structurally supporting the sustained work of the individuals behind them” it is a different story (para 4). This applies to OER creation and application. Institutions might be willing to publicize lower costs for their students but what steps are they taking to rectify the labor issues described above for adjuncts?

OER projects also obviously require labor beyond the creation of the actual learning object. OER repositories have to be maintained and updated. OER have to be organized and assigned metadata for discovery to be effective. We must also continue to think about how this labor is funded. One funding model is a for-profit company to pursue this work. One example is Lumen, which has worked closely with several colleges and universities to implement OER.

Another funding model is for a repository or institution to find donor support. MIT is a leader in OER creation and the pioneer of OpenCourseWare (OCW) production. d’Oliveira and Lerman found that MIT received $1,836,000 in philanthropic funding and donations to support the OCW initiative in 2009 alone, which covered about 51 percent of that year’s annual operating costs (as cited in Winn, 2012, p. 142). We should consider what it means for donors to underwrite the sustainability of our institutions’ projects (Winn, 2012) and how making more sustainable change might be compromised by this funding model (Kanwar, Kodhandaraman, and Umar, 2010).

In short, we must recognize that the changing labor system and the continued commercialization of higher education are not disconnected from our work with OER. Joss Winn (2012) challenges open advocates to apply the Marxist view of social wealth to openness, stressing that being open does not offer an alternative to “the capitalist form of social domination” (p. 134). He contends that OER, under capitalism, ensure that “employees are as productive as possible within the limits of time and space” by creating an object that can defy these constraints to create continuous institutional value and promotion (p. 141). We must think critically about whether our open work is doing the social justice, political work we envision it doing. If we fail to ask these questions, we risk endorsing programs that align more with profit than with access.

Colonialism & Imperialistic Practices

In “Beyond the ‘Information Rich and Poor’: Future Understandings of Inequality in Globalising Informational Economies,” Ingrid Burkett (2000) identifies five assumptions that have been historically made about the role of information in international development:

  1. Give the poor a computer and they will move from being information poor to information rich.
  2. Information inequality is a North/South issue.
  3. Access to more information enriches people’s lives.
  4. The ‘information society’ will be more democratic and participatory.
  5. Given enough information we can solve all the world’s problems. (p. 680)

Burkett (2000) asserts that these five assumptions egregiously simplify both economic and social global inequality. Every librarian should consider how any of these myths might be embodied in their current language around the need for openness. For example, in trying to explain why OA is important to stakeholders, I have sometimes defaulted to talking about the need to share information with developing nations. Yet, understanding inequality through the lens of these narrow “truths” should give us pause.

A dichotomy of superior/inferior ways of knowing has been established within these discourses and the assumptions that were made to employ this rhetoric. The first assumption is that the Global South will remain ignorant and underdeveloped until it has access to the West’s knowledge, which is an idea that is historically grounded in presidential conceptions of development (Haider & Bawden, 2006). The second assumption is that the West should focus on the spread of its information instead of facilitating a true knowledge exchange, which illustrates what type of information is valued. Burkett (2000) finds that even asserting that some are “information poor” overlooks the types of information that might be important to a specific community. She states, “people may be ‘poor’ in terms of the information they can retrieve from the Internet but be rich in ways which could never be calculated in the Western scientific paradigm—in terms of sustainability, social relationships, community and cultural traditions” (p. 690).

The assumption that is most relevant to the discussion of OER here is that access to more information—which is different than access to knowledge (Burkett, 2000)—will alter exploitative colonialist histories and deeply rooted structural oppression. We see these assumptions being made in conversations surrounding the digital divide (Watters, 2015) and in the implementation of programs like One Laptop Per Child1 where access to technology—often technology that is not sustainable or integrated into the lives of the people supposed to be using it in a meaningful way (Burkett, 2000)—is seen as a viable opportunity for development and progress, often in a manner that is blind to an understanding of structural issues. Unfortunately, some research has found that these beliefs are well represented in LIS literature. In 2006, Haider and Bawden conducted an interpretive analysis of 35 English articles published between 1995 and 2005 in Library and Information Science journals, found by searching “information poverty OR poor.” They find that the “‘information poor’ are positioned as the legitimate target of professional practice” in LIS (p. 373). Many of the close readings they did identified language that connected a country or region’s educational inequality with a lack of professional librarians in that area, creating rhetoric that ignores the complexities of why inequality exists and positioning the librarian as savior (Haider & Bawden, 2006).

OER has also been connected to development and is often cited in conversations about global rights, specifically the right to education.2 Western universities sometimes use the need for global access to educational materials as an explanation for their commitment to OER creation.3 These explanations, while possibly well meaning, are destructive. They overestimate what OER can reasonably accomplish and use OER as a legitimate “solution” for larger inequalities. OER are only one piece of the solution and are not a substitute for an adequately funded and staffed education system (Bates, 2015).

When we consider who leads the Open Education movement, it is clear that these assumptions are in some ways also actively practiced within the movement. Right now, many OER aggregators function as somewhere to “dump” content or lessons already created in the hope that someone somewhere will be able to use it (Huijser, Bedford, & Bull, 2008). This is a problem because context is what makes an OER transferrable (Huijser, Bedford, & Bull, 2008). It is also a problem because “content creation (including educational content) on the Web is currently heavily dominated by the developed and English-speaking world” (Huijser, Bedford, & Bull, 2008, para 9). For example, Wiki Educator’s “Exemplary Collection of Open eLearning Content Repositories,” which has been cited as an important list of repositories (Atenas, 2012; Watters, 2012), is composed of primarily American and European-based repositories. Javiera Atenas’ list, which includes data from OER Research Hub, contains more global OER initiatives; still, over half of the repositories listed are Western. The creation of OER by Western institutions is not in itself a bad thing. However, it becomes troubling when these institutions promise that their OER will be useful or applicable to all learners globally for educational purposes. It is also disconcerting when access to content is touted as the educational solution when in reality affordable, sustainable “access to programs leading to credentials” is the real barrier (Bates, 2011, para 27).

Kim Christen (2012), an anthropologist at Washington State University, researches openness—specifically the openness of cultural heritage objects—and its connection to colonialism. She asserts that the “collecting history of Western nations is comfortably forgotten in the celebration of freedom and openness” (p. 2876). Her work rejects the argument that “information wants to be free” and instead asserts that information wants to be contextualized (Christen, 2012). She has done important work to provide that context to cultural heritage objects by creating licenses and a CMS that give power and autonomy back to indigenous communities. By using these tools, the community is able to decide if objects should be open, closed to the community, or open to a specific community or during a particular time based on the historical sharing of objects by season, status, or gender.

I believe that her assertions create a valuable framework for understanding OER advocacy. A learning object with relevant context, an application that is not culture-specific, and the capacity to be truly localized and understood is more important than a learning object that is simply free. In addition, while moving beyond a North-South information flow and developing a mechanism for reciprocal sharing is the goal, librarians should be cognizant of what risks other nations face in sharing their educational materials. We might find that having a conversation about these risks and contexts is more important than complete openness.

Content, Format, & Audience

In addition to how OER are used and discussed, the form of the OER itself has been critiqued. Open Educational Resources (OER) can sometimes be used synonymously with textbooks or traditional learning objects like worksheets and lesson plans. However, OER, when defined broadly, can also include wikis, LibGuides, tutorials, syllabi, apps, and websites. This divide between what OER usually refer to and what it can include illustrates an important underlying assumption made about OER. We often think that OER are created in the academy for the academy. Because OER are often presented as a response to the price of educational resources increasing exponentially, their potential use is sometimes stunted. OER can also be used outside of traditional academic settings for self-learning purposes.

How, then, do OER continue to reproduce the academy, even if they are used for other purposes, both in format and in content? Many scholars have critiqued textbooks as a stagnant, oppressive format. Shaffer (2014) defines the traditional textbook as a “physically and legally fixed expression of ideas from a scholar outside [the class] learning community” (para 3). Wiggins & McTighe (2005), the authors of Understanding by Design, state that textbooks “can easily hide from students (and teachers) the true nature of the subject and the world of scholarship. Like an encyclopedia, few textbooks help students understand the inquiries, arguments, and judgments behind the summaries” (p. 230). Drabinski, et al. (2015) find textbooks “historically contingent” and the reproduction of them unrevolutionary. Why, then, are open textbooks often used as an example (if not the example) of OER? Why are there such extensive efforts to create more open textbooks?4 Further, how do textbooks, as the primary form of OER shared, limit self-learners outside of the academy? For example, when the goal is to present historically linear “truths” about a subject, more iterative and active forms of self-learning might be hindered.

This applies to content as well as format. If self-learners or even other instructors are going to use content meaningfully, OER have to move past the content “dump” (Huijser, Bedford, & Bull, 2008) toward context and an understanding of how and why the OER was made. Audrey Watters (2015) contends that ed-tech is coded with “[p]rivileges, ideologies, expectations, [and] values” (para 46). The same is true of OER. When learning objects are stripped of their environment, learning from them becomes more challenging (Bates, 2011).  Localization—going beyond simply translating an object and instead truly situating it in culture, values, and educational need (Pullin, Hassin, & Mora, 2007)—is vital, particularly as a large amount of Western OER continue to be created. Librarians can start by teaching others the importance of metadata and documentation in order to make OER more localizable.

Suggestions for OER Praxis

The following section builds upon the previous critiques of openness to provide starting points for more thoughtful, intentional OER practices within librarianship.

Use Realistic Language

After Haider (2007) performed a close reading of international OA documents, including mission statements and declarations like the Budapest Open Access Initiative, she found that OA was discussed alongside concepts “such as humanity, poverty, cultural heritage, or equity, which are all highly charged notions entangled with strong connotations and related to various agendas” (p. 454). Like OA, Open Education can sometimes be discussed in highly-charged terms. It is also often presented as a solution, not only for the rising costs of textbooks and other learning materials, but also for fixing education globally (see footnote ii). First and foremost, librarians need to be honest with stakeholders about what OER can accomplish. While sharing educational materials with other nations can foster learning, it is not that simple. OER should not be presented as the answer to structural inequality or used to disregard or replace serious funding issues in other nations’ higher education systems.

Librarians can situate OER within historical, economic, and cultural practices that make their capacity more clear. In other words, when we talk to stakeholders we can complicate access instead of simplifying it. We should continually stress that OERs are “important in helping to widen access to learning opportunities, but ultimately…are enhancements rather than a replacement for a well-funded public education system, which remains the core foundation for enabling equal access to educational opportunities” (Bates, 2015, key takeaway 6).

Interrogate Whose Knowledge Matters Globally

When talking to stakeholders, librarians might also move beyond the rhetoric of access to discuss reciprocal sharing. Even if it is free for “developing” nations to read papers (or access OER), it may still be too expensive for some scholars to publish these objects, further limiting the amount of reciprocal sharing happening and making research from other nations less visible (Bonaccorso, et al., 2014; Czerniewicz, 2013). Librarians can use language that problematizes access as a value, making the idea of true “access” more complex than simply giving other nations the ability to view Western content.

Move Past Dumping Toward Possible Localization (Or, Do Outreach Beyond the Learning Object)

Librarians should assert that the paywall is just one obstacle of many that learners in other nations face when utilizing an OER. Technology, language, and applicability are also important factors. What does it take for an OER to not just be translated but truly localized, truly applicable to others’ educational needs and prior understanding?

We can start by focusing on teaching instructors and OER creators how to design OER that are “easily adaptable to local needs” and can be easily translated, situated, and expanded upon (Huijser, Bedford, & Bull, 2008). Thus, our outreach to faculty about OER creation is shortsighted if it only discusses the actual learning object. We should be proactive about teaching faculty how to create documentation and supply metadata that gives meaning to their OER and makes it more discoverable. We should also teach instructors about technical standards and technological infrastructure required for accessing OER, especially videos and other objects that require a high bandwidth to view, and how this might exclude specific audiences (Pullin, Hassin, & Mora, 2007).

Move Beyond Cost

Librarians must acknowledge that while their institutions might be concerned with global education at some level, the marketization of OER might play a role in how OER work is funded, sustained, and prioritized. Quite simply, OER and OCW create “potentially beneficial marketing opportunities for universities and, by extension, a potential supply of future fee-paying students” (Huijser, Bedford, & Bull, 2008). This is not just a distraction but also a conflict of interest.

The price of textbooks has increased 812 percent between 1978 and 2012 (Moxley, 2013) and this phenomenon affects students’ ability to engage in class in very real ways. Increasing access to educational materials, especially to students of lower socioeconomic status, is important work. Still, David Wiley (2013) has found that there are “much bigger victories to be won with openness” than cost (para 1). This is because we, as educators, can utilize OER in ways that are more meaningful than just making content free.

Robin DeRosa (2015) argues that there are a lot of ways that institutions could potentially save students money, including changing class sizes and closing facilities. She calls educators to advocate for OER use not because of “the health of the institution” but instead for “the empowerment of the learner” (DeRosa, 2015). When librarians advocate for OER creation and use, they should go beyond using rhetoric about cost or access and also explain how OER can be used to improve pedagogy. Librarians should also continually consider their role in furthering the goals of their institution and if they could have a role in shaping their institution’s future goals.

Use Open Pedagogy

Giroux (2002) writes that higher education cannot be viewed “merely as [a site] of commercial investment” because it is a public good where students gain a public voice and come to terms with their own power and agency (p. 432). The previous section challenged librarians to think beyond OER’s value in saving students’ money and instead apply OER to student learning. There are at least two ways that this can happen. The first is by incorporating the tenets of open pedagogy into library instruction sessions. The second is by using student OER creation as a springboard for important conversations about knowledge production. Librarians can also be active in helping other instructors, including faculty, learn how to do this in their classroom.

David Wiley (2015) has claimed that there is “nothing about OER adoption that forces innovative teaching practices on educators” (para 13). OER use becomes more meaningful in the classroom when it is combined with critical pedagogy, which fosters student agency and nurtures reflection and growth (Stommel, 2014). Robin DeRosa (2015) defines open pedagogy as instruction that:

  • Prioritizes community and collaboration instead of content
  • Connects the academy with the wider public
  • Is skeptical of end-points, final products, gatekeeping, and experts

Librarians can start by working toward instructional practices that embody these values. But it is naïve not to recognize that librarians face obstacles in doing so, particularly in having autonomy and power over what their instruction sessions will cover because of faculty members’ limited understanding of our work (Accardi, 2015; Wallis, 2015). Thus, if faculty on campus are not integrating open pedagogy into their classroom, it can be difficult for librarians to do so as well.

I would challenge us to think about our impact more broadly. While we might not have control over whether a class’ final research assignment is open or collaborative, we can start these conversations on campus. If we do outreach about openness or OER, it should cover the mechanics (like repositories and licensing) as well as how OER might be integrated into the classroom through open pedagogy. Librarians that do instruction can also use these tenets in their sessions or for-credit classes. We can spark interest by presenting research as a continuous community endeavor for students. If there is an opportunity to teach a for-credit course, we should explore how students might become producers of OER and other open content.

As an example, my institution is currently discussing how faculty might move away from assigning the traditional research paper and instead craft research assignments that empower students to create. Any consultation my team has with instructors about their research assignments should not only discuss the potential use of OER but also OER creation as an option for giving students agency over their learning. These conversations should continue to define OER broadly to include public-facing, hackable, iterative learning objects like wikis and blogs, instead of focusing solely on just textbooks.  

Teach Critical Openness & Labor

As students engage with OER, how can librarians help them understand knowledge production, intellectual property, and the privacy issues inherent in their project? Further, how can librarians leverage students’ experience creating OER as an opportunity to teach issues of labor as a response to the corporatization of higher education?

As students develop understanding in an area and are asked to create an open research project, they should also develop an understanding of how complex information creation is. The goal is for them to grasp that information is a social, public process instead of a final product (Lawson, Sanders, and Smith, 2015). First and foremost, students should be asked to reflect on this process. Librarians should advocate for continued reflection so that students can meaningfully consider the challenges inherent in creating instead of merely focusing on what was created.

One of the most important conversations librarians might have about knowledge production is about unseen labor. This conversation about labor can spark larger conversations about funding cuts, the adjunctification of higher education, and faculty reward systems. Cheney (2015) recommends being transparent with students about how funding in the higher education system works so that OER can be created. He proposes that instructors explain how tuition dollars fund faculty salaries, which support faculty research and instructional activities (Cheney, 2015). These funds, in addition to endowments or donations, enable faculty to create OER at no charge because they do not depend on revenue from OER for income. I would propose that we also push students by asking, “but what if the tenure track model is eliminated and faculty are suddenly supported by a wage that directly corresponds only with the number of classes they teach?” As students consider how much time it takes to complete their project and create an OER, librarians can facilitate these conversations.

As a disclaimer, while asking students to create OER in order to explore these issues firsthand is a great first step, this practice can become coercive or uncomfortable for students. If we ask them to create OER we cannot do so in order to take advantage of free labor to create more useful learning objects. We must also remember that some open practices might be based on behaviors that students are not comfortable with (Weller, 2014), including publishing their work in open, online venues. David Wiley (2013) proposes that educators build a place of trust with students when adopting open pedagogy. This happens by being transparent about why each activity is useful for learning and giving tangible examples of what a successful open project might look like (Wiley, 2013). This might also include asking students to think critically about whether or not they would like their project to be open, instead of requiring it to be. The conversation around why they might consider openness is much more valuable than simply making it a requirement.


To borrow language from Audrey Watters (2015), I believe that OER do not “magically flatten hierarchies” (slide 9). They are produced, used, and shaped by important historical and cultural contexts. Free and unrestricted access to OER is just one step in improving education, not the primary solution.

Librarians are apt to do the integral work of reframing and complicating the OER movement. Our extensive understanding of copyright, instructional design, and discovery, combined with our interest in social justice, makes us natural leaders for helping others understand why Open Education matters. However, entertaining uncritical conceptions of development, the “information poor,” and the marketization of OER actually compromises our ability to do the work that we claim to value. The politics of our campuses or leadership can (and do) limit how loudly our voices carry within our institutions (Accardi, 2015; Wallis, 2015). Still, our critical perspective is needed now more than ever. 


Many thanks to the In the Library with the Lead Pipe team for guiding me through my first peer-review publication process! I’d like to specifically thank my internal reviewer, Erin Dorney, and my publishing editor, Hugh Rundle, for their guidance and support throughout this journey. A huge thank you to my external editor, Robin DeRosa, who gave me the inspiration, confidence, and footing to write this article. Thanks for making both my writing and my ideas stronger.

Thanks also to Kyle Shockey, Heidi Johnson, Mattias Darrow, and Cara Evanson for their valuable insights on earlier drafts of this article. I couldn’t have done it without you! Thanks to Sveta Stoytcheva for convincing me that this idea was worth submitting and pushing me to stick with and trust this process. I so appreciate that even over 4,000 miles away, you’re still empowering me to be the best librarian I can be. Finally, thanks to everyone who supported me during this project, either professionally or personally.

References & Further Reading

Accardi, M. (2015, May 14). I do not think that the Framework is our oxygen mask. Retrieved from

Anjiah, L. (2006). Open access: Is it a futile option for developing countries? Proceedings from the Coady International Institute: The Open Access Movement and Information for Development. Retrieved from

Atenas, J. (2012, Oct 22). Directory of OER repositories. Retrieved from

Bates, T. (2015). The implications of ‘open’ for course and program design: Towards a paradigm shift? Retrieved from

Bates, T. (2011, Feb 6). OERs: The good, the bad and the ugly. Retrieved from

Bonaccorso, E., et al. (2014). Bottlenecks in the open-access system: Voices from around the globe. Journal of Librarianship and Scholarly Communication 2(2): eP1126.

Burkett, I. (2000). Beyond the ‘information rich and poor’: Future understandings of inequality in globalizing informational economies. Futures, 32(7), 679-694.

Chan, L. & Costa, S. (2005). Participation in the global knowledge commons: Challenges and opportunities for research dissemination in developing countries. New Library World, 106(3/4), 141 – 163.

Chan, et al. (2002). Budapest Open Access initiative. Retrieved from

Cheney, M. (2015, July 10). Gratis or libre, or, who pays for your bandwidth? Retrieved from

Christen, K. (2012). Does information really want to be free? Indigenous knowledge systems and the question of openness. International Journal of Communication, 6, 2870-2893.

Czerniewicz, L. (2013, April 29). Inequitable power dynamics of global knowledge production and exchange must be confronted head on. Retrieved from

d’Oliveira, C. & Lerman, S. (2009). OpenCourseWare: Working through financial challenges. Retrieved from’oliveira_lerman.html

DeRosa, R. (2015). Beyond the buck: An expanded vision for Open Access. Talk presented at the University System of New Hampshire’s 2015 Open Educational Resources Unconference. Retrieved from

DeRosa, R. (2015, May 28). The open syllabus: A practical guide to open pedagogy in your course. Retrieved from

Drabinski, E., et al. (2015, Mar 25). Notes from open access, labor, and knowledge production. Retrieved from

Giroux, H. (2002). Neoliberalism, corporate culture, and the promise of higher education: The university as a democratic public sphere. Harvard Educational Review 72(4), 425-464.

Haider, J. (2007). Of the rich and the poor and other curious minds: On open access and ‘development’. Aslib Proceedings, 59 (4/5), 449-461.

Haider, J. & Bawden, D. (2006). Pairing information with poverty: Traces of development discourse in LIS. New Library World, 107(9/10), 371-385.

Huijser, H., Bedford, T., & Bull, D. (2008). OpenCourseWare, global access and the right to education: Real access or marketing ploy? The International Review of Research in Open and Distributed Learning, 9(1).

Joseph, H. (2015, March). Open expansion: Connecting the open access, open data, and OER dots. Invited Talk presented at the meeting of the Academic and Research Libraries (ACRL) Conference, Portland, OR. Retrieved from

Kansa, E. (2014, Jan 27). It’s the Neoliberalism, stupid: Why instrumentalist arguments for Open Access, Open Data, and Open Science are not enough. Retrieved from

Kanwar, A., Kodhandaraman, B. &, Umar, A. (2010). Toward sustainable open education resources: A perspective from the global south. American Journal of Distance Education, 24(2), 65-80.

Kember, S. (2014). Opening out from open access: Writing and publishing in response to neoliberalism. Ada: A Journal of Gender, New Media, and Technology, 4.

Kraft, T. (2015). On labor, learning conditions, and affordable education. Hybrid Pedagogy.

Kroes, N. (2012). Digital agenda and open data. Retrieved from

Lawson, S. (2015, June 15). Financial transparency and the political influence of commercial publishing. Retrieved from

Lawson, S. (2015, Oct 21). The politics of open access. Retrieved from

Lawson, S. (2015): The politics of open access. Retrieved from

Lawson, S., Sanders, K., & Smith, L. (2015). Commodification of the information profession: A critique of higher education under neoliberalism. Journal of Librarianship and Scholarly Communication, 3(1), eP1182.

Moxley, J. (2013). Open textbook publishing. Retrieved from

Oblinger, D., & Lombardi, M. (2008). Common knowledge: Openness in higher

Education. In T. Iiyoshi and M.S. Vijay Kumar (Eds.), Opening Up Education

The Collective Advancement of Education through Open Technology, Open Content, and Open Knowledge, pp. 389–400. Cambridge, MA: MIT Press.

Ojetade, B. (2012, March 6). A critic critiques criticism critically. Retrieved from

Powell, A. (2015, May 21). Availability does not equal access. Scholarly Kitchen. Retrieved from

Pullin, A., Hassin, K. & Mora, M. (2007, Nov). Conference report: Open Education 2007. Retrieved from

Roh, C., Drabinski, E., & Inefuku, H. (2015). Scholarly communication as a tool for social justice and diversity. Paper presented at the meeting of the Academic and Research Libraries (ACRL) Conference, Portland, OR. Retrieved from

Reich, J. (2011). Open educational resources expand educational inequalities. Retrieved from

Rosen, J., & Smale, M. (2015). Open digital pedagogy=Critical pedagogy. Hybrid Pedagogy.

Salaita, S. (2015, Oct 6). Why I was fired. Chronicle of Higher Education. Retrieved from

Shaffer, K. (2014). The critical textbook. Hybrid Pedagogy.

Shirazi, R. (2015, August 11). Work for hire: Library publishing, scholarly communication, and academic freedom. Retrieved from

Stommel, J. (2014). Critical digital pedagogy: A definition. Hybrid Pedagogy.

Vandegrift, M. (2014, June 2). The miseducation of scholarly communication: Beyond binaries and toward a transparent, information-rich publishing system. Retrieved from

Wallis, L. (2015, May 12). Smash all the gates, part 2: Professional silenc*. Retrieved from

Watters, A. (2012, Aug 22). OER repositories & directories. Retrieved from

Watters, A. (2014, Nov 16). From “open” to justice #OpenCon2014. Retrieved from

Watters, A. (2015, April 8). Ed-tech’s inequalities. Retrieved from

Weller, M. (2014). The battle for open: How openness won and why it doesn’t feel like a victory. London, UK: Ubiquity Press.

Wiggins, G. & McTighe, J. (2005). Understanding by design (2nd ed.). Alexandria, VA: Association for Supervision and Curriculum Development.

Wiley, D. (2015, Jan 31). Open pedagogy: The importance of getting in the air. Retrieved from

Wiley, D. (2013, Oct 21). What is open pedagogy? Retrieved from

Winn, J. (2012). Open education: From the freedom of things to the freedom of people. In L. Bell, H. Stevenson, & M. Neary (Eds.), Towards teaching in public: Reshaping the modern university (133-183). New York, NY: Continuum International Publishing Group.


  1. While Burkett alludes to how technology can exasperate inequalities (p. 684), there are more tangible examples of how this discourse is specifically used with One Laptop per Child. In 2012, Audrey Watters summarized the failures of OLPC initiative. Within her summary, she maintains that Nicholas Negropont, the head of foundation, truly believes that “children can learn (and teach each other) on their own. Children are naturally inquisitive; they are ingenious. Access to an Internet-enabled computing device is sufficient” (para 13). Another example is a Guardian article from 2005 were Negropont states “Poverty can only be eliminated through education” (para 6). This rhetoric, combined with inadequate teacher training and the failure of the program, illustrates how dropping technology into a community, without context or purpose, is not meaningful.
  2. The twenty-sixth article of the Universal Declaration of Human Rights states that “everyone has the right to education” and that education “shall promote understanding, tolerance and friendship among nations” (article 26). These ideas are often cited and developed in conversations around OER. One example is the Cape Town Open Education Declaration, which states, “[OER] constitute a wise investment in teaching and learning for the 21st century… They will help teachers excel in their work and provide new opportunities for visibility and global impact. They will accelerate innovation in teaching. They will give more control over learning to the learners themselves. These are strategies that make sense for everyone” (para 10) and “we have an opportunity to dramatically improve the lives of hundreds of millions of people around the world through freely available, high-quality, locally relevant educational and learning opportunities” (para 11). Another document that employs this language is the 2012 Paris OER Declaration, which was created by UNESCO. It is important to note that the language that situates OER as a solution stems from rhetoric used about education as a solution more generally. One example includes remarks from US Secretary of Education, Arne Duncan: “[e]ducation is still the key to eliminating gender inequities, to reducing poverty, to creating a sustainable planet, and to fostering peace. And in a knowledge economy, education is the new currency by which nations maintain economic competitiveness and global prosperity…Closing the achievement gap and closing the opportunity gap is the civil rights issue of our generation” (as cited in Watters, 2015, para 2).
  3. This language is usually present on the institution’s repository or webpage. Examples include MIT’s OCW site, which states “educators improve courses and curricula, making their schools more effective; students find additional resources to help them succeed; and independent learners enrich their lives and use the content to tackle some of our world’s most difficult challenges, including sustainable development, climate change, and cancer eradication” (para 2) and Open Michigan’s site, which notes that the initiative will “dramatically [expand] the University’s global impact and influence and strengthening it as a point of reference for learning and teaching materials for educators and learners worldwide” (para 2).
  4. Some current examples include the University of Minnesota’s Open Textbook Library, Kansas State University’s Open/ Alternative Textbook Initiative, and Portland State University’s Open Access Textbook Initiative. These are not necessarily examples of linear or oppressive learning objects but instead examples of how we continue to replicate textbooks in an open environment.

Library of Congress: The Signal: Digital Library Federation to Host National Digital Stewardship Alliance

planet code4lib - Tue, 2015-10-20 21:11

The National Digital Stewardship Alliance announced that it has selected the Digital Library Federation (DLF), a program of the Council on Library and Information Resources (CLIR), to serve as NDSA’s institutional home starting in January 2016. The selection and announcement follows a nationwide search and evaluation of cultural heritage, membership, and technical service organizations, in consultation with NDSA working groups, their members, and external advisors.

Launched in 2010 by the Library of Congress as a part of the National Digital Information and Infrastructure and Preservation Program with over 50 founding members, the NDSA works to establish, maintain, and advance the capacity to preserve our nation’s digital resources for the benefit of present and future generations. For an inaugural four-year term, the Library of Congress provided secretariat and membership management support to the NDSA, contributing working group leadership, expertise, and administrative support. Today, the NDSA has 165 members, including universities, government and nonprofit organizations, commercial businesses, and professional associations.

CLIR and DLF have, respectively, a 60- and 20-year track record of dedication to preservation and digital stewardship, with access to diverse communities of researchers, administrators, developers, funders, and practitioners in higher education, government, science, commerce, and the cultural heritage sector.

“We are delighted at this opportunity to support the important work of the NDSA and collaborate more closely with its leadership and vibrant community,” said DLF Director Bethany Nowviskie. “DLF shares in NDSA’s core values of stewardship, collaboration, inclusiveness, and open exchange. We’re grateful for the strong foundation laid for the organization by the Library of Congress, and look forward to helping NDSA enter a new period of imagination, engagement, and growth.“

CLIR President Chuck Henry added, “The partnership between NDSA and DLF should prove of significant mutual benefit and national import: both organizations provide exemplary leadership by promoting the highest standards of preservation of and access to our digital cultural heritage. Together they will guide us wisely and astutely further into the 21st century.”

The mission and structure of the NDSA will remain largely unchanged and it will be a distinct organization within CLIR and DLF, with all organizations benefiting from the pursuit of common goals while leveraging shared resources. “The Library of Congress fully supports the selection of DLF as the next NDSA host and looks forward to working with NDSA in the future,” said Acting Librarian of Congress David Mao. “The talent and commitment from NDSA members coupled with DLF’s deep experience in supporting collaborative work and piloting innovative digital programs will ensure that NDSA continues its excellent leadership in the digital stewardship community.”

“The Library of Congress showed great vision and public spirit in launching the NDSA. And with the Library’s support and guidance, NDSA has grown to embrace a broad community of information stewards,” said Micah Altman, chair of the NDSA Coordinating Committee. “With the support and leadership of CLIR and DLF we aspire to broaden and catalyze the information stewardship community to safeguard permanent access to the world’s scientific evidence base, cultural heritage, and public record.”

CLIR is an independent, nonprofit organization that forges strategies to enhance research, teaching, and learning environments in collaboration with libraries, cultural institutions, and communities of higher learning. It aims to promote forward-looking collaborative solutions that transcend disciplinary, institutional, professional, and geographic boundaries in support of the public good. CLIR’s 186 sponsoring institutions include colleges, universities, public libraries, and businesses.

The Digital Library Federation, founded in 1995, is a robust and diverse community of practice, advancing research, learning, and the public good through digital library technologies. DLF connects its parent organization, CLIR, to an active practitioner network, consisting of 139 member institutions, including colleges, universities, public libraries, museums, labs, agencies, and consortia. Among DLF’s NDSA-related initiatives are the eResearch Network, focused on data stewardship across disciplines, and the CLIR/DLF Postdoctoral Fellows program, with postdocs in data curation for medieval, early modern, visual studies, scientific, and social science data, and in software curation.

Jonathan Rochkind: Blacklight Strengths, Weaknesses, Health, and Future

planet code4lib - Tue, 2015-10-20 20:49
My Own Personal Opinion Analysis of Blacklight Strength, Weaknesses, Health, and Future

My reflections on the Blacklight Community Survey results, and my own experiences with BL.

What people like about BL is it’s flexibility; what people don’t like is it’s complexity and backwards/forwards compatibility issues.

Developing any software, especially shared library/gem software, it is difficult to create a package which is on the one hand very flexible, extensible, and customizable; and on the other maintains a simple and consistent codebase, backwards compatibility with easy upgrades, and simple installation with a shallow learning curve for common use cases.

In my software engineering career, I see these tensions as one of the fundamental challenges in developing shared software. It’s not unique to Blacklight, but I think Blacklight is having some difficulties in weathering that challenge.

I think the diversity of Blacklight versions in use is a negative indicator for community health. People on old unsupported versions of BL (or Rails) can run into bugs which nobody can fix for them; and even if they put in work on debugging and fixing them themselves, it’s less likely to lead to a patch that can be of use to the larger BL community, since they’re working on an old version. It reduces the potential size of our collaborative development community. And it puts those running old versions of BL (or Rails) in a difficult spot eventually after much deferred upgrading, when they find themselves on unmaintained software with a very challenging upgrade path across many versions.

Also, if when a new BL release is dropped it’s not actually put into production by anyone (not even core committers?) for many months, that increases the chances that severe bugs are present but not yet found in even months-old releases (we have seen this happen), which can be a vicious circle that makes people even more reluctant to upgrade.

And we have some idea why BL applications aren’t being upgraded: Even though only a bare minority of respondents reported going through a major BL upgrade, issues with difficulty of upgrading are a very major represented theme in reported biggest challenges with Blacklight.  I know I personally have found that maintaining a BL app responsibly (which to me means keeping up with Rails and BL releases without too much lag) has had a much higher “total cost of ownership” than I expected or desire; you can maybe guess that part of my motivation in releasing this survey was to see if I was alone, I see I am not.

I think these pain points are likely to get worse: many existing BL deployments may have been originally written for BL 5.x and not yet had to deal with them but will; and many people currently using a “release and forget and never upgrade” practice may come to realize this is untenable. (“Software is a growing organism”, Ranganathan’s fifth law. Wait, Ranganathan wasn’t talking about software?)

To be fair, Blacklight core developers have gotten much better at backwards compatibility — in BL 4.x and especially 5.x — in the sense that backwards-incompatible changes within a major BL version are attempted, with much success, to be kept minimal to non-existent (in keeping with semver‘s requirements for release labelling).  This is a pretty major accomplishment.

But the backwards compatibility is not accomplished by minimizing code or architectural churn or change. Rather the changes are still pretty fast and furious, but the old behavior is left in and marked deprecated. Ironically, this has the effect of making the BL codebase even more complicated and hard to understand, with multiple duplicative or incompatible architectural elements co-existing and sometimes never fully disappearing. (More tensions between different software quality values, inherent challenges to any large software project.)  In BL 5.x, the focus on maintaining backwards compat was fierce — but we sometimes got deprecated behavior in one 5.x release, with suggested new behavior, where that suggested new behavior was sometimes itself deprecated in a future 5.x release in favor of yet newer behavior.  Backwards compatibility is strictly enforced, but the developer’s burden of keeping up with churn may not be as lightened as one would expect.

Don’t get me wrong, I think some of the 5.x changes are great designs. I like the new SearchBuilder architecture. But it was dropped in bits and pieces over multiple 5.x releases, without much documentation for much of it, making it hard to keep up with as a non-core developer not participating in writing it.  And the current implementation still has, to my mind, some inconsistencies or non-optimal choices (like `!` on the end of a method or lack thereof being inconsistently used to signal a method mutates the receiver vs returns-a-dup) — which now that they are in a release, need to be maintained for backwards compatibility (or if changed in a major version drop, still cause backwards compat challenges for existing app maintainers; just labeling it a major version doesn’t reduce these challenges, only reducing the velocity of such changes does).

In my own personal opinion, Blacklight’s biggest weakness and biggest challenge for continued and increased success is figuring out ways to maintain the flexibility, while significantly reducing code complexity, architectural complexity, code churn, and backwards incompatibility/deprecation velocity.

What can be done (in my own personal opinion)?

These challenges are not unique to Blacklight, they are tensions and challenges, in my observation/opinion/experience with nearly any shared non-trivial codebase.  But Blacklight can, perhaps, choose to take a different tack to approaching them, focus on different priorities in code evolution, think of practices to adopt to strike a better balance.

The first step is consensus on the nature of the problem (which we may not have, this is just my own opinion on the nature of the problem; I’m hoping this survey can help people think about BL’s strengths and weaknesses and build consensus).

In own brainstorming about possible approaches, I come up with a few, tentative, brainstorm-quality, possibilities:

  • Require documentation for major architectural components. We’ve built a culture in BL (and much of open source world) that a feature isn’t done and ready to merge until it’s covered by tests; I think we should have a similar culture around documentation, a feature isn’t done and ready to merge until documented. Which we lack in BL, and much of the open source world. But this can add a challenge, in a codebase with high churn, you now have to make sure to update the docs lest they become out of date and inaccurate too (something BL also hasn’t always kept up with)….
  • Accept less refactoring of internal architecture to make the code cleaner and more elegant.  Sometimes you’ve just got to stick with what you’ve got, for longer, even if the change would improve code architecture, as many of them have.  There’s an irony here. Often the motivation for an internal architectural refactoring is to better support something one wants to do. You can do that in the current codebase, but in a hacky not really supported way, that’s likely to break in future BL versions. You want to introduce the architecture that will let you do what you want in a ‘safer’ way, for forwards compatibility. But the irony is that the constant refactorings to introduce these better architectures actually have a net reduction on forwards compatibility, as they are always breaking some existing code.
  • Be cautious of the desire to expel functionality to external plugins. External BL plugins generally receive less attention, they are likely to not be up to date with current BL, and it has been difficult to figure out what version of an external plugin actually is compatible with what version of BL. If you’re always on the bleeding edge, you don’t notice, but if you have an older version of BL and are maybe trying to upgrade to a new one, figuring out plugin compatibility in BL can be a major nightmare. Expelling code to plugins makes core BL easier to maintain, but at the cost of making the plugins harder to maintain, less likely to receive maintainance, and harder to use for BL installers.  If the plugin code is an edge case not used by many people that may make sense. But I continue to worry about the expulsion of MARC support to a plugin. MARC is not used by as many BL implementers as it used to be, but “library catalog/discovery” is still BL use for a third of survey respondents, and MARC is still used by nearly half.
  • Do major refactorings in segregated branches, only merging into master (and including in releases) when they are “fully baked”. What does fully baked mean? I guess maybe it means understanding the use cases that will need to be supported; having a ‘story’ about how to use the architecture for those use cases; having a few people actually looked over the code and tried it out and given feedback.In BL 5.x, there were a couple major architectural refactorings, but they were released in dribs and drabs over multiple BL releases, sometimes reversing themselves, sometimes after realizing there were important use cases which couldn’t be supported. This adds TCO/maintenance burden to BL implementers, and adds backwards-compat-maintaining burder to BL core developers when they realize something already released should have been done differently.If I understand right, the primary motivation for some of the major 5.x-6.0 architectural refactorings was to support ElasticSearch as an alternate back-end. But, while these refactorings have already been released, there has actually been no demonstration of using ElasticSearch as a front-end, it’s not done yet. Without such demonstrations trying and testing the architecture, how confident can we be that these refactorings will actually be sufficient or the right direction for the goal? Yet more redesigns may be needed before we get there.

When I brought up this last point with a core BL developer, he said that it was unrealistic to expect this could be possible, because of the limited developer resources available to BL.

It’s true that there are very few developers making non-trivial commits to BL, and that BL does function in an environment of limited developer resources, which is a challenge. However, in fact, studies have shown that most succesful open source projects have the vast majority of commits contributed by only 1-3 developers. (Darnit, I can’t find the cite now).

I wonder if beyond developer resources as a ‘quantity’, the nature of the developers and their external constraints matters. Are many of the core BL developers working for vendors, where most hours need to be billable to clients on particular projects which need to be completed as quickly as practical?  Or working for universities that have a similar ‘entrepeneurial’ approach where most developer hours are spent on ‘sprints’ for particular features on particular projects?

Is anyone given time to steward the overall direction and architectural soundness of Blacklight?  If nobody really has such time, it’s actually a significant accomplishment that BL’s architecture has continued to evolve and improve regardless. But it’s not a surprise that it’s done so in a fairly chaotic and high-churn way, where people need to get just enough to accomplish the project in front of them into BL, and into a release asap.

I suspect that BL may, at this point in it’s development, need a bit more formality and transparency in who makes major decisions. (Eg, who decided that supporting ElasticSearch was a top priority, and how?) (And I say this as someone that, five years ago at the beginning of BL, didn’t think we needed any more formality there then a bunch of involved developers reaching consensus on a weekly phone call (that I don’t think happens anymore?). But I’ve learned from experience, and BL is at a different point in it’s life cycle now.)

In software projects where I do have some say (I haven’t made major commits to BL in at least 2-3 years), where they are projects that are expected to have long lives, I’ve come to try to push for a sort of “slow programming” (compare to ‘slow food’ etc) approach. Consider changes carefully, even at the cost of reducing velocity of improvements, release nothing into master before it’s time, prioritize backwards compatibility over time (not just over major releases, but actual calendar time). Treat your code like a bonsai tree, not a last-minute term paper.  But sometimes you can get away with this sometimes you can’t, sometimes your stakeholders will let you sometimes they won’t, and sometimes it isn’t really the right decision.

Software design is hard!

Filed under: General

pinboard: The Code4Lib Journal – Editorial Introduction: It’s All About Data, Except When It’s Not.

planet code4lib - Tue, 2015-10-20 16:46
RT @yo_bj: #code4lib Journal now out! It’s All About Data, Except When It’s Not: #mashcat #libtech #metadata

District Dispatch: Digital Inclusion Survey documents libraries transforming

planet code4lib - Tue, 2015-10-20 15:30

This week marks the release of one of the most powerful tools ALA and public libraries have to make our case in the digital age—the 2014 Digital Inclusion Survey. Not only does it provide the most current and granular data available on library technology and programming resources, but the Information Policy & Access Center team at the University of Maryland is doing more with the data than ever before (more on that later).

So what did we learn this year?

  • Helping people identify health insurance resources was the top health and wellness program offering from public libraries at 59.4 percent. Programs related to helping patrons local and evaluate free health information (57.7%) and use subscription health and wellness resources (56.2%) were right behind, and roughly 20 percent of libraries now offer fitness classes and bring in healthcare providers for screening services at the library.

Connecticut State Librarian Ken Wiggin isn’t surprised. “What we are hearing from our health exchange is that in addition to assisting individuals to register, many of those who have registered lack an understanding of how to utilize their health insurance,” he said. The state library will hold a health library fair for librarians across Connecticut. A panel of health information experts from the National Network of Libraries of Medicine, our state department of Public Health, and other agencies will discuss the resources that their agencies provide. There also will be information tables from these and many other health agencies.

  • Digital content offerings continue to climb, with more than 90 percent of public libraries offering e-books, online homework assistance (95%) and online language learning (56%), to name a few. Recent data from library ebook supplier OverDrive finds that more than 120 million e-books and audiobooks were borrowed from libraries they supply in the first nine months of 2015, representing year-over-year growth of almost 20 percent.

  • For the first time, the survey also looked at the age of library buildings and found 1970 (!) was the average year that library locations opened. The report also finds a correlation between building renovations and increased service offerings. The biggest gaps can be seen in libraries offering afterschool programming and STEAM events, in which 52% and 48% of renovated libraries, respectively, offered these services compared with 33% and 31% for libraries without renovations in the past five years. “This new analysis points to an outsize impact on community services in cases where the physical space is not able to keep pace with modern technology needs,” said John Bertot, survey lead researcher.

While these findings may not surprise m/any of us in the profession, they can be startling for policymakers and even local community members. As the Pew Research Center (and others) have found time and again, many people are unaware of the services modern libraries offer. Because many people in positions of power, do not yet recognize the extent to which libraries can be catalysts for individual opportunity and community progress, the nation underinvests in libraries. To reverse this trend, library allies must unite around shared policy goals and work together to educate and influence decision makers. In fact, this is a driving force behind the Policy Revolution! initiative.

This year’s research and nearly 20 years of Public Libraries and the Internet data before help us inform policymakers, researchers, media and the general public. What a mark John Bertot and his many collaborators have made in tracking our transitions!

Funded by the Institute of Museum and Library Services and managed by the ALA Office for Research & Statistics and the Information Policy and Access Center at the University of Maryland, the Digital Inclusion Study provides national- and state-level data. The International City/County Management Association and ALA Office for Information Technology Policy are partners in the research effort, and we have a great advisory committee.

Future blog posts will share more on the data tools and uses. Stay tuned!

The post Digital Inclusion Survey documents libraries transforming appeared first on District Dispatch.

Open Knowledge Foundation: Seeking a Chief Operating Officer

planet code4lib - Tue, 2015-10-20 15:24

The mission of Open Knowledge International is to open up all essential public interest information and see it utilized to create insight that drives change. To this end we work to create a global movement for open knowledge, supporting a network of leaders and local groups around the world; we facilitate coordination and knowledge sharing within the movement; we build collaboration with other change-making organisations both within our space and outside; and, finally, we prototype and provide a home for pioneering products.

A decade after its foundation, Open Knowledge International is ready for its next phase of development. We started as an organisation that led the quest for the opening up of existing data sets – and in today’s world most of the big data portals run on CKAN, an open source software product developed first by us.

Today, it is not only about opening up of data; it is making sure that this data is usable, useful and – most importantly – used, to improve people’s lives. Our current projects (OpenSpending, OpenTrials, School of Data, and many more) all aim towards giving people access to data, the knowledge to understand it, and the power to use it in our everyday lives.

With this development comes a new organisational structure, new processes that support this mission, and new ways of working together. Therefore, for the first time in our history, we are now looking for a dedicated COO to support us in developing and sustaining a world-class organisation.

Chief Operating Officer

(flexible location, 30 hours to full time)

Here is what we need you to do:

  • Develop and implement a lean project management model that supports the diverse project portfolio of Open Knowledge International, and that enables staff and contractors to plan, execute, and deliver high-impact projects;
  • Design strategies, policies and practices so that they fit our needs as a distributed organisation that spawns across many countries and timezones. This includes – but is not limited to – internal communications tools, ways to collaborate effectively in teams, and methods to assess and report on progress and impact;
  • Being a virtual organisation (without a central office) challenges us to have great and supportive HR processes in place, which take people’s experience and expectations into account, and support development of individual staff and of the organisation as a whole. You will be responsible for leading this, as well as helping us find and retain great talent;
  • Work with our Chief of Finance on all financial processes around budget planning, tracking, and reporting.

To be able to fulfill this role, you will need extensive experience in running the internal processes of a mid-sized organisation, preferably within a (partial) virtual organisation as well. You will have a proven track record of project management skills, both in running projects yourself, and in implementing methodology. You can show that you have implemented a variety of processes in organisations, and that these organisations performed better afterwards. Demonstrable experience in dealing with legal matters is required, as well as a solid understanding of Human Resources, especially regarding the professional and personal development of staff, both in terms of high-level strategy and the day-to-day operations.

Personally, you have a demonstrated commitment to working collaboratively, with respect and a focus on results over credit.

You are comfortable working with people from different cultural, social and ethnic backgrounds. You are happy to share your knowledge with others, and you find working in transparent and highly visible environments interesting and fun.

Instead of your formal education, we believe that your track record over the last 10 years speaks clearly of your abilities. You communicate in English like a native.

We demand a lot, but we offer a great opportunity as well: together with the CEO and the Portfolio Director, the COO forms the Senior Management Team of Open Knowledge International. You will be at the heart of the development of Open Knowledge International, able to make a huge impact and shape our future.

We also encourage people who are looking to re-enter the workplace to apply, and are willing to adjust working hours to suit.

You should be based somewhere in between the time zones UTC -1 to +3. You can work from home, with flexibility offered and required. You will be compensated with a market salary, in line with the parameters of a non-profit-organisation.

Interested? Then send us a motivational letter and a one page CV via Please indicate your current country of residence, as well as your salary expectations (in GBP) and your earliest availability.

If you have any questions, please direct them to Naomi Lillie, via mail naomi.lillie [at]

David Rosenthal: Storage Technology Roadmaps

planet code4lib - Tue, 2015-10-20 15:00
At the recent Library of Congress Storage Architecture workshop, Robert Fontana of IBM gave an excellent overview of the roadmaps for tape, disk, optical and NAND flash (PDF) storage technologies in terms of bit density and thus media capacity. His slides are well worth studying, but here are his highlights for each technology:
  • Tape has a very credible roadmap out to LTO10 with 48TB/cartridge somewhere around 2022.
  • Optical's roadmap shows increases from the current 100GB/disk to 200, 300, 500 and 1000GB/disk, but there are no dates on them. At least two of those increases will encounter severe difficulties making the physics work.
  • The hard disk roadmap shows the slow increase in density that has prevailed for the last 4 years continuing until 2017, when it accelerates to 30%/yr. The idea is that in 2017 Heat Assisted Magnetic Recording (HAMR) will be combined with shingling, and then in 2021 Bit Patterned Media (BPM) will take over, and shortly after be combined with HAMR.
  • The roadmap for NAND flash is for density to increase in the near term by 2-3X and over the next 6-8 years by 6-8X. This will require significant improvements in processing technology but "processing is a core expertise of the semiconductor industry so success will follow".
Below the fold, my comments.
  • As I've written before, tape recording technology tends to lag hard disk by about 8 years, so the tape roadmap out to 2022 is very credible. But even though 48TB/cartridge is impressive, it may not matter. Tape is losing market share in cold storage for reasons other than raw cartridge capacity, and with media vendors dropping out, and system vendors down to three, customers have to be increasingly concerned about its long-term viability.
  • The optical roadmap seems less credible. There are no dates, there are significant physical problems, and because it represents a significant increase over the historical 12%/yr density increase unless the roadmap extends through the next decade.
  • My friend Dr. Pangloss would enjoy the hard disk roadmap. HAMR was supposed to ship in 2009. Six years later, it has yet to ship in volume. Everyone understands that the HAMR-BPM transition will be even harder than PMR-HAMR. The idea that, after being six years late, HAMR will be in the market for only six years before being supplanted by BPM only six years from the proof-of-concept stage is positively Panglossian.
  • The cross-section in the slides of 3D NAND flash is mind-boggling. On the other hand Fontana is right that betting against the semiconductor industry being able to get process technology working has a poor track record. So this roadmap is fairly credible.
So far, all this is in terms of bits per square inch. But what we're really interested in is $/GB. Daniel Rosenthal's research suggests that, historically, the increase in bits per square inch for hard disk accounted for about 3/4 of the decrease in $/GB. There are reasons to believe that the factors accounting for the remaining 1/4 are now much less effective, so $/GB should track density more closely.

One fascinating slide in Fontana's presentation includes for tape, flash and hard disk the total revenue for a year divided by the number of gigabytes shipped.  Extracting these numbers from the slide, and dividing the $/GB for flash by the $/GB for hard disk gives us the cost ratio and the following table:
  • Year  Cost Ratio
  • 2008 12.2
  • 2009 13.1
  • 2010 17.7
  • 2011 11.6
  • 2012 7.8
  • 2013 8.7
  • 2014 8.4
As you can see, despite the disk industry's troubles with floods and new technology, the $/GB of flash has remained about an order of magnitude greater than that of hard disk. It is a more valuable product, so even if there were no supply constraints, it would command a higher price. But there are supply constraints, as I pointed out in Another good prediction. So hard disk will continue to be the medium on which bulk data lives and dies.

The roadmaps for the post-flash solid state technologies such as 3D Xpoint are necessarily speculative, since they are still some way from shipping in volume. But by analogy with flash we can see the broad outlines. They are a better technology than flash, 1000 times faster than NAND, 1000 times the endurance, and 100 times denser. So even if the manufacturing cost were the same, they would command a price premium. The manufacturing cost will initially be much higher because of low volumes, and will take time to ramp down.

Jonathan Rochkind: “Agile Failure Patterns In Organizations”

planet code4lib - Tue, 2015-10-20 13:39

An interesting essay showed up on Hacker News, called “Agile Failure Patterns In Organizations

Where I am, we’ve made some efforts to move to a more small-a agile iterative and incremental development approach in different ways, and I think it’s been successful in some ways and less successful in others. (Really, I would say we’ve been trying to do this before we’d even heard the word “agile”).

Parts of the essay seem a bit too scrum-focused to me (I’m sold on the general principle of agile development, I’m less sold on Scrum(tm)), and I’m not sure about the list of “Agile Failures at a Team Level”, but the list of “Agile Failures at Organizational Level”… ring some bells for me, loudly.

Agile Failure At Organizational Level:
  • Not having a (product) vision in the first place: If you don’t know, where you are going, any road will take you there.
  • The fallacy of “We know what we need to build”. There is no need for product discovery or hypotheses testing, the senior management can define what is relevant for the product backlog.
  • A perceived loss of control at management level leads to micro-management.
  • The organization is not transparent with regard to vision and strategy hence the teams are hindered to become self-organizing.
  • There is no culture of failure: Teams therefore do not move out of their comfort zones, but instead play safe.
  • The organization is not optimized for a rapid build-test-learn culture and thus departments are moving at different speed levels. The resulting friction caused is likely to equalize previous Agile gains.
  • Senior management is not participating in Agile processes, e.g. sprint demos, despite being a role model. But they do expect a different form of (push) reporting.
  • Not making organizational flaws visible: The good thing about Agile is that it will identify all organizational problems sooner or later. „When you put problem in a computer, box hide answer. Problem must be visible!“ Hideshi Yokoi, former President of the Toyota Production System Support Center in Erlanger, Kentucky, USA
  • Product management is not perceived as the “problem solver and domain expert” within the organization, but as the guys who turn requirements into deliverables, aka “Jira monkeys”.
  • Other departments fail to involve product management from the start. A typical behavior in larger organizations is a kind of silo thinking, featured by local optimization efforts without regard to the overall company strategy, often driven by individual incentives, e.g. bonuses. (Personal agendas are not always aligned with the company strategy.)
  • Core responsibilities of product management are covered by other departments, e.g. tracking, thus leaving product dependent on others for data-driven decisions.
  • Product managers w/o a dedicated team can be problem, if the product management team is oversized by comparison to the size of the engineering team.

How about you, do some of those make you wonder if the author has been studying your organization, they ring so true?

Filed under: General

Code4Lib Journal: Editorial Introduction: It’s All About Data, Except When It’s Not.

planet code4lib - Tue, 2015-10-20 13:18
Data capture and use is not new to libraries. We know data isn't everything, but it is ubiquitous in our work, enabling myriads of new ideas and projects. Articles in this issue reflect the expansion of data creation, capture, use, and analysis in library systems and services.


Subscribe to code4lib aggregator