You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 57 min 56 sec ago

Jonathan Rochkind: “Gates Foundation to require immediate free access for journal articles”

Mon, 2014-11-24 22:25

http://news.sciencemag.org/funding/2014/11/gates-foundation-require-immediate-free-access-journal-articles

Gates Foundation to require immediate free access for journal articles

By Jocelyn Kaiser 21 November 2014 1:30 pm

Breaking new ground for the open-access movement, the Bill & Melinda Gates Foundation, a major funder of global health research, plans to require that the researchers it funds publish only in immediate open-access journals.

The policy doesn’t kick in until January 2017; until then, grantees can publish in subscription-based journals as long as their paper is freely available within 12 months. But after that, the journal must be open access, meaning papers are free for anyone to read immediately upon publication. Articles must also be published with a license that allows anyone to freely reuse and distribute the material. And the underlying data must be freely available.

 

Is this going to work? Will researchers be able to comply with these requirements without harm to their careers?  Does the Gates Foundation fund enough research that new open access venues will open up to publish this research (and if so how will their operation be funded?), or do sufficient venues already exist? Will Gates Foundation grants include funding for “gold” open access fees?

I am interested to find out. I hope this article is accurate about what their doing, and am glad they are doing it if so.

The Gates Foundation’s own announcement appears to be here, and their policy, which doesn’t answer very many questions but does seem to be bold and without wiggle-room, is here.

I note that the policy mentions “including any underlying data sets.”  Do they really mean to be saying that underlying data sets used for all publications “funded, in whole or in part, by the foundation” must be published? I hope so.  Requiring “underlying data sets” to be available at all is in some ways just as big or bigger as requiring them to be available open access.


Filed under: General

FOSS4Lib Upcoming Events: BitCurator Users Forum

Mon, 2014-11-24 21:55
Date: Friday, January 9, 2015 - 08:00 to 17:00Supports: BitCurator

Last updated November 24, 2014. Created by Peter Murray on November 24, 2014.
Log in to edit this page.

Join BitCurator users from around the globe for a hands-on day focused on current use and future development of the BitCurator digital software environment. Hosted by the BitCurator Consortium (BCC), this event will be grounded in the practical, boots-on-the-ground experiences of digital archivists and curators. Come wrestle with current challenges—engage in disc image format debates, investigate emerging BitCurator integrations and workflows, and discuss the “now what” of handling your digital forensics outputs.

HangingTogether: What languages do public library collections speak?

Mon, 2014-11-24 21:04

Slate recently published a series of maps illustrating the languages other than English spoken in each of the fifty US states. In nearly every state, the most commonly spoken non-English language was Spanish. But when Spanish is excluded as well as English, a much more diverse – and sometimes surprising – landscape of languages is revealed, including Tagalog in California, Vietnamese in Oklahoma, and Portuguese in Massachusetts.

Public library collections often reflect the attributes and interests of the communities in which they are embedded. So we might expect that public library collections in a given state will include relatively high quantities of materials published in the languages most commonly spoken by residents of the state. We can put this hypothesis to the test by examining data from WorldCat, the world’s largest bibliographic database.

WorldCat contains bibliographic data on more than 300 million titles held by thousands of libraries worldwide. For our purposes, we can filter WorldCat down to the materials held by US public libraries, which can then be divided into fifty “buckets” representing the materials held by public libraries in each state. By examining the contents of each bucket, we can determine the most common language other than English found within the collections of public libraries in each state:

MAP 1: Most common language other than English found in public library collections, by state

As with the Slate findings regarding spoken languages, we find that in nearly every state, the most common non-English language in public library collections is Spanish. There are exceptions: French is the most common non-English language in public library collections in Massachusetts, Maine, Rhode Island, and Vermont, while German prevails in Ohio. The results for Maine and Vermont complement Slate’s finding that French is the most commonly spoken non-English language in those states – probably a consequence of Maine and Vermont’s shared borders with French-speaking Canada. The prominence of German-language materials in Ohio public libraries correlates with the fact that Ohio’s largest ancestry group is German, accounting for more than a quarter of the state’s population.

Following Slate’s example, we can look for more diverse language patterns by identifying the most common language other than English and Spanish in each state’s public library collections:

MAP 2: Most common language other than English and Spanish found in public library collections, by state

Excluding both English- and Spanish-language materials reveals a more diverse distribution of languages across the states. But only a bit more diverse: French now predominates, representing the most common language other than English and Spanish in public library collections in 32 of the 50 states. Moreover, we find only limited correlation with Slate’s findings regarding spoken languages. In some states, the most common non-English, non-Spanish spoken language does match the most common non-English, non-Spanish language in public library collections – for example, Polish in Illinois; Chinese in New York, and German in Wisconsin. But only about a quarter of the states (12) match in this way; the majority do not. Why is this so? Perhaps materials published in certain languages have low availability in the US, are costly to acquire, or both. Maybe other priorities drive collecting activity in non-English materials – for example, a need to collect materials in languages that are commonly taught in primary, secondary, and post-secondary education, such as French, Spanish, or German.

Or perhaps a ranking of languages by simple counts of materials is not the right metric. Another way to assess if a state’s public libraries tailor their collections to the languages commonly spoken by state residents is to compare collections across states. If a language is commonly spoken among residents of a particular state, we might expect that public libraries in that state will collect more materials in that language compared to other states, even if the sum total of that collecting activity is not sufficient to rank the language among the state’s most commonly collected languages (for reasons such as those mentioned above). And indeed, for a handful of states, this metric works well: for example, the most commonly spoken language in Florida after English and Spanish is French Creole, which ranks as the 38th most common language collected by public libraries in the state. But Florida ranks first among all states in the total number of French Creole-language materials held by public libraries.

But here we run into another problem: the great disparity in size, population, and ultimately, number of public libraries, across the states. While a state’s public libraries may collect heavily in a particular language relative to other languages, this may not be enough to earn a high national ranking in terms of the raw number of materials collected in that language. A large, populous state, by sheer weight of numbers, may eclipse a small state’s collecting activity in a particular language, even if the large state’s holdings in the language are proportionately less compared to the smaller state. For example, California – the largest state in the US by population – ranks first in total public library holdings of Tagalog-language materials; Tagalog is California’s most commonly spoken language after English and Spanish. But surveying the languages appearing in Map 2 (that is, those that are the most commonly spoken language other than English and Spanish in at least one state), it turns out that California also ranks first in total public library holdings for Arabic, Chinese, Dakota, French, Italian, Korean, Portuguese, Russian, and Vietnamese.

To control for this “large state problem”, we can abandon absolute totals as a benchmark, and instead compare the ranking of a particular language in the collections of a state’s public libraries to the average ranking for that language across all states (more specifically, those states that have public library holdings in that language). We would expect that states with a significant population speaking the language in question would have a state-wide ranking for that language that exceeds the national average. For example, Vietnamese is the most commonly spoken language in Texas other than English and Spanish. Vietnamese ranks fourth (by total number of materials) among all languages appearing in Texas public library collections; the average ranking for Vietnamese across all states that have collected materials in that language is thirteen. As we noted above, California has the most Vietnamese-language materials in its public library collections, but Vietnamese ranks only eighth in that state.

Map 3 shows the comparison of the state-wide ranking with the national average for the most commonly spoken language other than English and Spanish in each state:

MAP 3: Comparison of state-wide ranking with national average for most commonly spoken language other than English and Spanish

Now it appears we have stronger evidence that public libraries tend to collect heavily in languages commonly spoken by state residents. In thirty-eight states (colored green), the state-wide ranking of the most commonly spoken language other than English and Spanish in public library collections exceeds – often substantially – the average ranking for that language across all states. For example, the most commonly spoken non-English, non-Spanish language in Alaska – Yupik – is only the 10th most common language found in the collections of Alaska’s public libraries. However, this ranking is well above the national average for Yupik (182nd). In other words, Yupik is considerably more prominent in the materials held by Alaskan public libraries than in the nation at large – in the same way that Yupik is relatively more common as a spoken language in Alaska than elsewhere.

As Map 3 shows, six states (colored orange) exhibit a ranking equal to the national average; in all of these cases the language in question is French or German, languages that tend to be highly collected everywhere (the average ranking for French is four, and for German, five). Five states (colored red) exhibit a ranking that is below the national average; in four of the five cases, the state ranking is only one notch below the national average.

The high correlation between languages commonly spoken in a state, and the languages commonly found within that state’s public library collections suggests that public libraries are not homogenous, but in many ways reflect the characteristics and interests of local communities. It also highlights the important service public libraries provide in facilitating information access to community members who may not speak or read English fluently. Finally, public libraries’ collecting activity across a wide range of non-English language materials suggests the importance of these collections in the context of the broader system-wide library resource. Some non-English language materials in public library collections – perhaps the French Creole-language materials in Florida’s public libraries, or the Yupik-language materials in Alaska’s public libraries – could be rare and potentially valuable items that are not readily available in other parts of the country.

Visit your local public library … you may find some unexpected languages on the shelf.

Acknowledgement: Thanks to OCLC Research colleague JD Shipengrover for creating the maps.

Note on data: Data used in this analysis represent public library collections as they are cataloged in WorldCat. Data is current as of July 2013. Reported results may be impacted by WorldCat’s coverage of public libraries in a particular state.

 

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (6)

Karen Coyle: Multi-Entity Models.... Baker, Coyle, Petiya

Mon, 2014-11-24 19:23
Multi-Entity Models of Resource Description in the Semantic Web: A comparison of FRBR, RDA, and BIBFRAME
by Tom Baker, Karen Coyle, Sean Petiya
Published in: Library Hi Tech, v. 32, n. 4, 2014 pp 562-582 DOI:10.1108/LHT-08-2014-0081
Open Access Preprint

The above article was just published in Library hi Tech. However, because the article is a bit dense, as journal articles tend to be, here is a short description of the topic covered, plus a chance to reply to the article.

We now have a number of multi-level views of bibliographic data. There is the traditional "unit card" view, reflected in MARC, that treats all bibliographic data as a single unit. There is the FRBR four-level model that describes a single "real" item, and three levels of abstraction: manifestation, expression, and work. This is also the view taken by RDA, although employing a different set of properties to define instances of the FRBR classes. Then there is the BIBFRAME model, which has two bibliographic levels, work and instance, with the physical item as an annotation on the instance.

In support of these views we have three RDF-based vocabularies:

FRBRer (using OWL)
RDA (using RDFS)
BIBFRAME (using RDFS)

The vocabularies use a varying degree of specification. FRBRer is the most detailed and strict, using OWL to define cardinality, domains and ranges, and disjointness between classes and between properties. There are, however, no sub-classes or sub-properties. BIBFRAME properties all are defined in terms of domains (classes), and there are some sub-class and sub-property relationships. RDA has a single set of classes that are derived from the FRBR entities, and each property has the domain of a single class. RDA also has a parallel vocabulary that defines no class relationships; thus, no properties in that vocabulary result in a class entailment. [1]

As I talked about in the previous blog post on classes, the meaning of classes in RDF is often misunderstood, and that is just the beginning of the confusion that surrounds these new technologies. Recently, Bernard Vatant, who is a creator of the Linked Open Vocabularies site that does a statistical analysis of the existing linked open data vocabularies and how they relate to each other, said this on the LOV Google+ group:
"...it seems that many vocabularies in LOV are either built or used (or both) as constraint and validation vocabularies in closed worlds. Which means often in radical contradiction with their declared semantics."What Vatant is saying here is that many vocabularies that he observes use RDF in the "wrong way." One of the common "wrong ways" is to interpret the axioms that you can define in RDFS or OWL the same way you would interpret them in, say, XSD, or in a relational database design. In fact, the action of the OWL rules (originally called "constraints," which seems to have contributed to the confusion, now called "axioms") can be entirely counter-intuitive to anyone whose view of data is not formed by something called "description logic (DL)."

A simple demonstration of this, which we use in the article, is the OWL axiom for "maximum cardinality." In a non-DL programming world, you often state that a certain element in your data is limited to the number of times it can be used, such as saying that in a MARC record you can have only one 100 (main author) field. The maximum cardinality of that field is therefore "1". In your non-DL environment, a data creation application will not let you create more than one 100 field; if an application receiving data encounters a record with more than one 100 field, it will signal an error.

The semantic web, in its DL mode, draws an entirely different conclusion. The semantic web has two key principles: open world, and non-unique name. Open world means that whatever the state of the data on the web today, it may be incomplete; there can be unknowns. Therefore, you may say that you MUST have a title for every book, but if a look at your data reveals a book without a title, then your book still has a title, it is just an unknown title. That's pretty startling, but what about that 100 field? You've said that there can only be one, so what happens if there are 2 or 3 or more of them for a book? That's no problem, says OWL: the rule is that there is only one, but the non-unique name rule says that for any "thing" there can be more than one name for it. So when an OWL program [2] encounters multiple author 100 fields, it concludes that these are all different names for the same one thing, as defined by the combination of the non-unique name assumption and the maximum cardinality rule: "There can only be one, so these three must really be different names for that one." It's a bit like Alice in Wonderland, but there's science behind it.

What you have in your database today is a closed world, where you define what is right and wrong; where you can enforce the rule that required elements absolutely HAVE TO be there; where the forbidden is not allowed to happen. The semantic web standards are designed for the open world of the web where no one has that kind of control. Think of it this way: what if you put a document onto the open web for anyone to read, but wanted to prevent anyone from linking to it? You can't. The links that others create are beyond your control. The semantic web was developed around the idea of a web (aka a giant graph) of data. You can put your data up there or not, but once it's there it is subject to the open functionality of the web. And the standards of RDFS and OWL, which are the current standards that one uses to define semantic web data, are designed specifically for that rather chaotic information ecosystem, where, as the third main principle of the semantic web states, "anyone can say anything about anything."

I have a lot of thoughts about this conflict between the open world of the semantic web and the needs for closed world controls over data; in particular whether it really makes sense to use the same technology for both, since there is such a strong incompatibility in underlying logic of these two premises. As Vatant implies, many people creating RDF data are doing so with their minds firmly set in closed world rules, such that the actual result of applying the axioms of OWL and RDF on this data on the open web will not yield the expected closed world results.

This is what Baker, Petiya and I address in our paper, as we create examples from FRBRer, RDA in RDF, and BIBFRAME. Some of the results there will probably surprise you. If you doubt our conclusions, visit the site http://lod-lam.slis.kent.edu/wemi-rdf/ that gives more information about the tests, the data and the test results.

[1] "Entailment" means that the property does not carry with it any "classness" that would thus indicate that the resource is an instance of that class.

[2] Programs that interpret the OWL axioms are called "reasoners". There are a number of different reasoner programs available that you can call from your software, such as Pellet, Hermit, and others built into software packages like TopBraid.

LITA: Top Tech Trends: Call For Panelists

Mon, 2014-11-24 18:10

What technology are you watching on the horizon? Have you seen brilliant ideas that need exposing? Do you really like sharing with your LITA colleagues?

The LITA Top Tech Trends Committee is trying a new process this year and issuing a Call for Panelists. Answer the short questionnaire by 12/10 to be considered. Fresh faces and diverse panelists are especially encouraged to respond. Past presentations can be viewed at http://www.ala.org/lita/ttt.

Here’s the link:
https://docs.google.com/forms/d/1JH6qJItEAtQS_ChCcFKpS9xqPsFEUz52wQxwieBMC9w/viewform

If you have additional questions check with Emily Morton-Owens, Chair of the Top Tech Trends committee: emily.morton.owens@gmail.com

District Dispatch: Opportunity knocks: Take the HHI 2014 National Collections Care Survey

Mon, 2014-11-24 17:54

Help preserve our shared heritage, increase funding for conservation, and strengthen collections care by completing the Heritage Health Information (HHI) 2014 National Collections Care Survey. The HHI 2014 is a national survey on the condition of collections held by archives, libraries, historical societies, museums, scientific research collections, and archaeological repositories. It is the only comprehensive survey to collect data on the condition and preservation needs of our nation’s collections.

The deadline for the Heritage Health Information 2014: A National Collections Care Survey is December 19, 2014. In October, the Heritage Health Information sent invitations to the directors of over 14,000 collecting institutions across the country to participate in the survey. These invitations included personalized login information, which may be entered at hhi2014.com.

Questions about the survey may be directed to hhi2014survey [at] heritagepreservation [dot] org or 202-233-0824.

The post Opportunity knocks: Take the HHI 2014 National Collections Care Survey appeared first on District Dispatch.

District Dispatch: Archive webinar available: Giving legal advice to patrons

Mon, 2014-11-24 17:34

Reference librarian assisting readers. Photo by the Library of Congress.

An archive of the free webinar “Lib2Gov.org: Connecting Patrons with Legal Information” is now available. Hosted jointly by the American Library Association (ALA) and iPAC, the webinar was designed to help library reference staff build confidence in responding to legal inquiries. Watch the webinar

The session offers information on laws, legal resources and legal reference practices. Participants will learn how to handle a law reference interview, including where to draw the line between information and advice, key legal vocabulary and citation formats. During the webinar, leaders offer tips on how to assess and choose legal resources for patrons.

Speaker:

Catherine McGuire is the head of Reference and Outreach at the Maryland State Law Library. McGuire currently plans and presents educational programs to Judiciary staff, local attorneys, public library staff and members of the public on subjects related to legal research and reference. She serves as Vice Chair of the Conference of Maryland Court Law Library Directors and the co-chair of the Education Committee of the Legal Information Services to the Public Special Interest Section (LISP-SIS) of the American Association of Law Libraries (AALL).

Watch the webinar

The post Archive webinar available: Giving legal advice to patrons appeared first on District Dispatch.

Islandora: Islandora Show and Tell: Fundación Juan March

Mon, 2014-11-24 15:16

A couple of week ago we kicked off Islandora Show and Tell by looking at a newly launched site: Barnard Digital Collection. This week, we're going to take a look at a long-standing Islandora site that has been one of our standard answers when someone asks "What's a great Islandora site?" - Fundación Juan March, which will, to our great fortune, be the host of the next European Islandora Camp, set for May 27 - 29, 2015.

It was a foregone conclusion that once we launched this series, we would be featuring FJM sooner rather than later, but it happens that we're visiting them just as they have launched a new collection: La saga Fernández-Shaw y el teatro lírico, containing three archives of a family of Spanish playwrights. This collection is also a great example of why we love this site: innovative browsing tools such as a timeline viewer, carefully curated collections spanning a wide varieties of objects types living side-by-side (the Knowledge Protal approach really makes this work), and seamless multi-language support.

FJM was also highlighted by D-LIB Magazine this month, as their Featured Digital Collection, a well -deserved honour that explores their collections and past projects in greater depth. 

But are there cats? There are. Of course when running my standard generic Islandora repo search term, it helps to acknowledge that this is a collection of Spanish cultural works and go looking for gatos, which leads to Venta de los gatos (Sale of Cats), Orientaçao dos gatos (Orientation of Cats), Todos los gatos son pardos (All Cats are Grey). 

Curious about the code behind this repo? FJM has been kind enough to share the details of a number of their initial collections on GitHub. Since they take the approach of using .NET for the web interface instead of using Drupal, the FJM .Net Library may also prove useful to anyone exploring alternate front-ends for their own collections.

Our Show and tell interview was completed by Luis Martínez Uribe, who will be joining us at Islandora Camp in Madrid as an instructor in the Admin Track in May 2015.

What is the primary purpose of your repository? Who is the intended audience?

We have always said that more than a technical system, the FJM digital repository tries to bring in a new working culture. Since the Islandora deployment, the repository has been instrumental in transforming the way in which data is generated and looked after across the organization. Thus the main purpose behind our repository philosophy is to take an active approach to ensure that our organizational data is managed using appropriate standards, made available via knowledge portals and preserved for future access.

The contents are highly heterogeneous with materials from the departments of Art, Music, Conferences, a Library of Spanish Music and Theatre as well as various outputs from scientific centres and scholarships. Therefore the audience ranges from the general public interested in particular art exhibitions, concerts or lecture to the highly specialised researchers in fields such as theatre, sociology or biology.

Why did you choose Islandora?

Back in 2010 the FJM was looking for a robust and flexible repository framework to manage an increasing volume of interrelated digital materials. With preservation in mind, the other most important aspect was the capacity to create complex models to accommodate relations between diverse types of content from multiple sources such as databases, the library catalogue, etc. Islandora provided the flexibility of Fedora plus easy customization powered by Drupal. Furthermore, discoverygarden could kick start us with their services and having Mark Leggott leading the project provided us with the confidence that our library needs and setting would be well understood.

Which modules or solution packs are most important to your repository?

In our latest collections we mostly use Drupal for prototyping. For this reason modules such as the Islandora Solr Client, the PDF Solution Pack or the Book Module are rather useful components to help us test and correct our collections once ingested and before the web layer is deployed.

What feature of your repository are you most proud of?

We like to be able to present the information through easy to grasp visualizations and have used timelines and maps in the past. In addition to this, we have started exploring  the use of recommendation systems that once an object is selected it will suggest other materials of interest. This has been used in production in “All our art catalogues since 1973”.

Who built/developed/designed your repository (i.e, who was on the team?)

Driven by the FJM Library, Islandora was initially setup at FJM with help from discoverygarden and the first four collections (CLAMOR, CEACS IR, Archive of Joaquín Turina, Archive of Antonia Mercé) were developed in the first year.

After that, the Library and IT Services undertook the development of a small and simple collection of essays to then move into a more complex product like the Personal Library of Cortazar that required more advanced work from web programmers and designers.

In the last year, we have developed a .NET library that allows us to interact with the Islandora components such as Fedora, Solr or RISearch. Since then we have undertaken more complex interdepartmental ventures like the collection  “All our art catalogues since 1973” where Library, IT and the web team have worked with colleagues in other departments such digitisation, art and design.

In addition to this we have also kept working on Library collections with help from IT like Sim Sala Bim Library of Illusionism or our latest collection “La Saga de los Fernández Shaw” which merges three different archives with information managed in Archivist Toolkit.

Do you have plans to expand your site in the future?

The knowledge portals developed using Islandora have been well received both internally and externally with many visitors. We plan to expand the collections with many more materials as well as using the repository to host the authority index and the thesaurus collections for the FJM. This will continue our work to ensure that the FJM digital materials are managed, connected and preserved.

What is your favourite object in your collection to show off?

This is a hard one, but if we have to chose our favourite object we would probably chose a resource like the The Avant-Garde Applied (1890.1950) art catalogue. The catalogue is presented with different photos of the spine and back cover, with other editions and related catalogues with a responsive web design and multi-device progressive loading viewer.

Our thanks to Luis and to FJM for agreeing to this feature. To learn more about their approach to Islandora, you can query to source by attending Islandora Camp EU2.

LITA: 5 Tech Tools to be Thankful For

Mon, 2014-11-24 11:00

In honor of Thanksgiving, I’d like to give thanks for 5 tech tools that make life as a librarian much easier.


Google Drive
On any given day I work on at least 6 different computers and tablets. That means I need instant access to my documents wherever I go and without cloud storage I’d be lost. While there are plenty of other free file hosting services, I like Drive the most because it offers 15GB of free storage and it’s incredibly easy to use. When I’m working with patrons who already have a Gmail account, setting up Drive is just a click away.


Libib
I dabbled in Goodreads for a bit, but I must say, Libib has won me over. Libib lets you catalog your personal library and share your favorite media with others. While it doesn’t handle images quite as well as Goodreads, I much prefer Libib’s sleek and modern interface. Instead of cataloging books that I own, I’m currently using Libib to create a list of my favorite children’s books to recommend to patrons.


Hopscotch
Hopscotch is my favorite iOS app right now. With Hopscotch, you can learn the fundamentals of coding through play. The app is marketed towards kids, but I think the bubbly characters and lighthearted nature appeals to adults too. I’m using Hopscotch in an upcoming adult program at the library to show that coding can be quirky and fun. If you want to use Hopscotch at your library, check out their resources for teachers. They’ve got fantastic ready made lesson plans for the taking.


Adobe Illustrator
My love affair with Photoshop started many years ago, but as I’ve gotten older, Illustrator and I have become a much better match. I use Illustrator to create flyers, posters, and templates for computer class handouts. The best thing about Illustrator is that it’s designed for working with vector graphics. That means I can easily translate a design for a 6-inch bookmark into a 6-foot poster without losing image quality.


Twitter
Twitter is hands-down my social network of choice. My account is purely for library-related stuff and I know I can count on Twitter to pick me up and get me inspired when I’m running out of steam. Thanks to all the libraries and librarians who keep me going!

What tech tools are you thankful for? Please share in the comments!

DPLA: The mob that feeds

Mon, 2014-11-24 06:00

When Boston Public Library first designed its statewide digitization service plan as an LSTA-funded grant project in 2010, we offered free imaging to any institution that agreed to make their digitized collections available through the Digital Commonwealth repository and portal system. We hoped and suggested that money not spent by our partners on scanning might then be invested in the other side of any good digital object – descriptive metadata. We envisioned a resurgence of special collections cataloging in libraries, archives, and historical societies across Massachusetts.

After a couple of years, reality set in. Most of our partners did not have the resources to generate good descriptive records structured well enough to fit into our MODS application profile without major oversight and intervention on our part. What we did find, however, were some very dedicated and knowledgeable local historians, librarians, and archivists who maintained a variety of documentation that could be best described as “pre-metadata.”  Their local landscapes included inventories, spreadsheets, caption files, finding aids, catalog cards, sleeve inscriptions, dusty three-ring binders – the rich soil from which good metadata grows.

We understood it was now our job to cultivate and harvest metadata from these local sources. And thus the “Metadata Mob” was born. It is a fun and creative type of mob — less roughneck and more spontaneous dance routine. Except, instead of wildly cavorting to Do-Re-Mi in train stations, we cut-and-paste, we transcribe, we script, we spell check, we authorize, we regularize, we refine, we edit, and we enhance. It is a highly customized, hands-on process that differs slightly (or significantly) from collection to collection, institution to institution.

In many ways, the work Boston Public Library does has come to resemble the locally-sourced food movement in that we focus on how each community understands and represents their collections in their own unique way. Free-range metadata, so to speak, that we unearth after plowing through the annals of our partners.

Randall Harrow, 1870-1900. Boston Public Library via Digital Commonwealth.

We don’t impose our structures or processes on anyone beyond offering advice on some standard information science principles – the three major “food groups” of metadata as it were – well defined schema, authority control, and content standard compliance. We encourage our partners to maintain their local practices.

We then carefully nurture their information into healthy, juicy, and delicious metadata records that we can ingest into the Digital Commonwealth repository. We have all encountered online resources with weak and frail frames — malnourished with a few inconsistently used Dublin Core fields and factory-farmed values imported blindly from collection records or poorly conceived legacy projects. Our mob members eschew this technique. They are craftsmen, artisans, information viticulturists. If digital library systems are nourished by the metadata they ingest, then ours will be kept vigorous and healthy with the rich diet they have produced.

 

Thanks to SEMAP for use of their the logo in the header image. Check out SEMAP’s very informative website at semaponline.org. Buy Fresh, Buy Local! Photo credit: Lori De Santis. 

All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

DuraSpace News: Tutorial: Set Up Your Own DSpace Development Environment

Mon, 2014-11-24 00:00

From Bram Luyten, @mire

With the DSpace 5 release coming up, we wanted to make it easier for aspiring developers to get up and running with DSpace development. In our experience, starting off on the right foot with a proven set of tools and practices can reduce someone’s learning curve and help in quickly getting to initial results. IDEA 13, the integrated development environment by IntelliJ can make a developer’s life a lot easier thanks to a truckload of features that are not included in your run-of-the-mill text editor.

DuraSpace News: DSpace-CRIS at the euro CRIS Strategic Membership Meeting

Mon, 2014-11-24 00:00

By Michele Mennielli, International Relations, Cineca

Bologna, Italy  During the recent euroCRIS Strategic Membership Meeting held in Amsterdam November 11-13 Cineca had the opportunity to present a new version of DSpace-CRIS with DSpace 4.2. This version of DSpace CRIS will be released in the next few days.

DuraSpace News: Open Repository and Practical Action Repository Launch

Mon, 2014-11-24 00:00

From James Evans, Product Manager, Open Repository

Eric Hellman: NJ Gov. Christie Vetoes Reader Privacy Act, Asks for Stronger, Narrower Law

Sun, 2014-11-23 01:15
According to New Jersey Governor Chris Christie's conditional veto statement, "Citizens of this State should be permitted to read what they choose without unnecessary government intrusion." It's hard to argue with that! Personally, I think we should also be permitted to read what we choose without corporate surveillance.

As previously reported in The Digital Reader, the bill passed in September by wide margins in both houses of the New Jersey State Legislature and would have codified the right to read ebooks without letting the government and everybody else knowing about it.

I wrote about some problems I saw with the bill. Based on a California law focused on law enforcement, the proposed NJ law added civil penalties on booksellers who disclosed the personal information of users without a court order. As I understood it, the bill could have prevented online booksellers from participating in ad networks (they all do!).

Governor Christie's veto statement pointed out more problems. The proposed law didn't explicitly prevent the government from asking for personal reading data, it just made it against the law for a bookseller to comply. So, for example, a local sheriff could still ask Amazon for a list of people in his town reading an incriminating book. If Amazon answered, somehow the reader would have to:
  1. find out that Amazon had provided the information
  2. sue Amazon for $500.
Another problem identified by Christie was that the proposed law imposed privacy burdens on booksellers stronger than those on libraries. Under another law, library records in New Jersey are subject to subpoena, but bookseller records wouldn't be. That's just bizarre.

In New Jersey, a governor can issue a "Conditional Veto". In doing so, the governor outlines changes in a bill that would allow it to become law. Christie's revisions to the Reader Privacy Act make the following changes:
  1. The civil penalties are stripped out of the bill. This allows Gov. Christie to position himself and NJ as "business-friendly".
  2. A requirement is added preventing the government from asking for reader information without a court order or subpoena. Christie gets to be on the side of liberty. Yay!
  3. It's made clear that the law applies only to government snooping, and not to promiscuous data sharing with ad networks. Christie avoids the ire of rich ad network moguls.
  4. Child porn is carved out of the definition of "books". Being tough on child pornography is one of those politically courageous positions that all politicians love.
The resulting bill, which was quickly reintroduced in the State Assembly, is stronger but narrower. It wouldn't apply in situations like the recent Adobe Digital Editions privacy breach, but it should be more effective at stopping "unnecessary government intrusion". I expect it will quickly pass the Legislature and be signed into law. A law that properly addresses the surveillance of ebook reading by private companies will be much more complicated and difficult to achieve.

I'm not a fan of his by any means, but Chris Christie's version of the Reader Privacy Act is a solid step in the right direction and would be an excellent model for other states. We could use a law like it on the national level as well.

(Guest posted at The Digital Reader)

Galen Charlton: Crossing the country

Sat, 2014-11-22 23:19

As some of you already know, Marlene and I are moving from Seattle to Atlanta in December. We’ve moved many (too many?) times before, so we’ve got most of the logistics down pat. Movers: hired! New house: rented! Mail forwarding: set up! Physical books: still too dang many!

We could do it in our sleep! (And the scary thing is, perhaps we have in the past.)

One thing that is different this time is that we’ll be driving across the country, visiting friends along the way.  3,650 miles, one car, two drivers, one Keurig, two suitcases, two sets of electronic paraphernalia, and three cats.

Who wants to lay odds on how many miles it will take each day for the cats to lose their voices?

Fortunately Sophia is already testing the cats’ accommodations:

I will miss the friends we made in Seattle, the summer weather, the great restaurants, being able to walk down to the water, and decent public transportation. I will also miss the drives up to Vancouver for conferences with a great bunch of librarians; I’m looking forward to attending Code4Lib BC next week, but I’m sorry to that our personal tradition of American Thanksgiving in British Columbia is coming to an end.

As far as Atlanta is concerned, I am looking forward to being back in MPOW’s office, having better access to a variety of good barbecue, the winter weather, and living in an area with less de facto segregation.

It’s been a good two years in the Pacific Northwest, but much to my surprise, I’ve found that the prospect of moving back to Atlanta feels a bit like a homecoming. So, onward!

PeerLibrary: PeerLibrary participated at OpenCon 2014, the student and early...

Sat, 2014-11-22 10:52


PeerLibrary participated at OpenCon 2014, the student and early career researcher conference on Open Access, Open Education, and Open Data.

It was held November 15-17, 2014 in Washington, DC, and recordings of all sessions are available online. We presented PeerLibrary at the beginning of the Project Presentations 1 session (slides).

Photo: Aloysius Wilfred Raj Arokiaraj

Nicole Engard: Bookmarks for November 21, 2014

Fri, 2014-11-21 20:30

Today I found the following resources and bookmarked them on <a href=

Digest powered by RSS Digest

The post Bookmarks for November 21, 2014 appeared first on What I Learned Today....

Related posts:

  1. Compare RSS Readers
  2. Share your code
  3. Planning a party or event?

District Dispatch: Free webinar: The latest on Ebola

Fri, 2014-11-21 20:26

Photo by Phil Moyer

As the Ebola outbreak continues, the public must sort through all of the information being disseminated via the news media and social media. In this rapidly evolving environment, librarians are providing valuable services to their communities as they assist their users in finding credible information sources on Ebola, as well as other infectious diseases.

On Tuesday, December 12, 2014, library leaders from the U.S. National Library of Medicine will host the free webinar “Ebola and Other Infectious Diseases: The Latest Information from the National Library of Medicine.” As a follow-up to the webinar they presented in October, librarians from the U.S. National Library of Medicine will be discussing how to provide effective services in this environment, as well as providing an update on information sources that can be of assistance to librarians.

Speakers
  • Siobhan Champ-Blackwell is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center. Champ-Blackwell selects material to be added to the NLM disaster medicine grey literature data base and is responsible for the Center’s social media efforts. Champ-Blackwell has over 10 years of experience in providing training on NLM products and resources.
  • Elizabeth Norton is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center where she has been working to improve online access to disaster health information for the disaster medicine and public health workforce. Norton has presented on this topic at national and international association meetings and has provided training on disaster health information resources to first responders, educators, and librarians working with the disaster response and public health preparedness communities.

Date: December 12, 2014
Time: 2:00 PM–3:00 PM Eastern
Register for the free event

If you cannot attend this live session, a recorded archive will be available to view at your convenience. To view past webinars also done in collaboration with iPAC, please visit Lib2Gov.org.

The post Free webinar: The latest on Ebola appeared first on District Dispatch.

M. Ryan Hess: Library as Digital Consultancy

Fri, 2014-11-21 20:14

As faculty and students delve into digital scholarly works, they are tripping over the kinds of challenges that libraries specialize in overcoming, such as questions regarding digital project planning, improving discovery or using quality metadata. Indeed, nobody is better suited at helping scholars with their decisions regarding how to organize and deliver their digital works than librarians.

At my institution, we have not marketed our expertise in any meaningful way (yet), but we receive regular requests for help by faculty and campus organizations who are struggling with publishing digital scholarship. For example, a few years ago a team of librarians at my library helped researchers from the University of Ireland at Galway to migrate and restructure their online collection of annotations from the Vatican Archive to a more stable home on Omeka.net. Our expertise in metadata standards, OAI harvesting, digital collection platforms and digital project planning turned out to be invaluable to saving their dying collection and giving it a stable, long-term home. You can read more in my Saved by the Cloud post.

These kinds of requests have continued since. In recognition of this growing need, we are poised to launch a digital consultancy service on our campus.

Digital Project Planning

A core component of our jobs is planning digital projects. Over the past year, in fact, we’ve developed a standard project planning template that we apply to each digital project that comes our way. This has done wonders at keeping us all up to date on what stage each project is in and who is up next in terms of the workflow.

Researchers are often experts at planning out their papers, but they don’t normally have much experience with planning a digital project. For example, because metadata and preservation are things that normally don’t come up for them, they overlook planning around these aspects. And more generally, I’ve found that just having a template to work with can help them understand how the experts do digital projects and give them a sense of the issues they need to consider when planning their own projects, whether that’s building an online exhibit or organizing their selected works in ways that will reap the biggest bang for the buck.

We intend to begin formally offering project planning help to faculty very soon.

Platform Selection

It’s also our job to keep abreast of the various technologies available for distributing digital content, whether that is harvesting protocols, web content management systems, new plugins for WordPress or digital humanities exhibit platforms. Sometimes researchers know about some of these, but in my experience, their first choice is not necessarily the best for what they want to do.

It is fairly common for me to meet with campus partners that have an existing collection online, but which has been published in a platform that is ill-suited for what they are trying to accomplish. Currently, we have many departments moving old content based in SQL databases to plain HTML pages with no database behind them whatsoever. When I show them some of the other options, such as our Digital Commons-based institutional repository or Omeka.net, they often state they had no idea that such options existed and are very excited to work with us.

Metadata

I think people in general are becoming more aware of metadata, but there is still lots of technical considerations that your typical researcher may not be aware of. At our library, we have helped out with all aspects of metadata. We have helped them clean up their data to conform to authorized terms and standard vocabularies. We have explained Dublin Core. We have helped re-encode their data so that diacritics display online. We have done crosswalking and harvesting. It’s a deep area of knowledge and one that few people outside of libraries know on a suitably deep level.

One recommendation for any budding metadata consultants that I would share is that you really need to be the Carl Sagan of metadata. This is pretty technical stuff and most people don’t need all the details. Stick to discussing the final outcome and not the technical details and your help will be far more understood and appreciated. For example, I once presented to a room of researchers on all the technical fixes to a database that we made to enhance and standardize the metadata, but his went over terribly. People later came up to me and joked that whatever it was we did, they’re sure it was important and thanked us for being there. I guess that was a good outcome since they acknowledged our contribution. But it would have been better had they understood, the practical benefits for the collection and users of that content.

SEO

Search Engine Optimization is not hard, but it is likely that few people outside of the online marketing and web design world know what it is. I often find people can understand it very quickly if you simply define it as “helping Google understand your content so it can help people find you.” Simple SEO tricks like defining and then using keywords in your headers will do wonders for your collection’s visibility in the major search engines. But you can go deep with this stuff too, so I like to gauge my audience’s appetite for this stuff and then provide them with as much detail as I think they have an appetite for.

Discovery

It’s a sad statement on the state of libraries, but the real discovery game is in the major search engines…not in our siloed, boutique search interfaces. Most people begin their searches (whether academic or not) in Google and this is really bad news for our digital collections since by and large, library collections are indexed in the deep web, beyond the reach of the search robots.

I recently tried a search for the title of a digital image in one of our collections in Google.com and found it. Yeah! Now I tried the same search in Google Images. No dice.

More librarians are coming to terms with this discovery problem now and we need to share this with digital scholars as they begin considering their own online collections so that they don’t make the mistakes libraries made (and continue to make…sigh) with our own collections.

We had one department at my institution that was sitting on a print journal that they were considering putting online. Behind this was a desire to bring the publication back to life since they had been told by one researcher in Europe that she thought the journal had been discontinued years ago. Unfortunately, it was still being published, it just wasn’t being indexed in Google. We offered our repository as an excellent place to do so, especially because it would increase their visibility worldwide. Unfortunately, they opted for a very small, non-profit online publisher whose content we demonstrated was not surfacing in Google or Google Scholar. Well, you can lead a horse to water…

Still, I think this kind of understanding of the discovery universe does resonate with many. Going back to our somewhat invisible digital images, we will be pushing many to social media like Flickr with the expectation that this will boost visibility in the image search engines (and social networks) and drive more traffic to our digital collections.

Usability

This one is a tough one because people often come with pre-conceived notions of how they want their content organized or the site designed. For this reason, sometimes usability advice does not go over well. But for those instances when our experiences with user studies and information architecture can influence a digital scholarship project, it’s time well spent. In fact, I often hear people remark that they “never thought of it that way” and they’re willing to try some of the expert advice that we have to share.

Such advice includes things like:

  • Best practices for writing for the web
  • Principles of information architecture
  • Responsive design
  • Accessibility support
  • User Experience design
Marketing

It’s fitting to end on marketing. This is usually the final step in any digital project and one that often gets dropped. And yet, why do all the work of creating a digital collection only to let it go unnoticed. As digital project expert, librarians are familiar with the various channels available to promote and build followers with tools like social networking sites, blogs and the like.

With our own digital projects, we discuss marketing at the very beginning so we are sure all the hooks, timing and planning considerations are understood by everyone. In fact, marketing strategy will impact some of the features of your exhibit, your choice of keywords used to help SEO, the ultimate deadlines that you set for completion and the staffing time you know you’ll need post launch to keep the buzz buzzing.

Most importantly, though, marketing plans can greatly influence the decision for which platform to use. For example, one of the benefits of Omeka.net (rather than self-hosted Omeka) is that any collection hosted with them becomes part of a network of other digital collections, boosting the potential for serendipitous discovery. I often urge faculty to opt for our Digital Commons repository over, say, their personal website, because anything they place in DC gets aggregated into the larger DC universe and has built-in marketing tools like email subscriptions and RSS feeds.

The bottom line here is that marketing is an area where librarians can shine. Online marketing of digital collections really pulls together all of the other forms of expertise that we can offer (our understanding of metadata, web technology and social networks) to fulfill the aim of every digital project: to reach other people and teach them something.


Pages