Today I found the following resources and bookmarked them on Delicious.
- eval.in Paste and execute code online.
Digest powered by RSS Digest
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Science uses the art of observation to unearth truth. Sometimes the observation is minutely focused on a small constituent of a much larger ecosystem. By doing this, it can be possible to detect larger truths from such minutely focused observation. This brings me to my latest metadata investigation, which is about as minutely focused within the library metadata world as it is possible to be.
I decided to look at the life of a single MARC subfield, in this case the lowly 034 $2. The 034 field is “Coded Cartographic Mathematical Data”. The 034 field was proposed and adopted in 2006. The $2 subfield is where one can record the source of the data in the 034. Values were to come from a specified list of potential values.
From my “MARC Usage in WorldCat” work, I already knew that as of last January there were about 2.4 million records with an 034 field. I also knew that the $2 subfield of the 034 only appeared 1,976 times. Of course a year had passed so that figure was likely low.
So the first thing I did was to grab all of the 034 $2 subfields and count how many times each source code had been used. Since the point of my exercise was not to show errors, I combined entries with typos with what they should have been and only counted as “errors” entries that were clearly in the wrong place in the field:3868 bound 2539 gooearth 1069 geoapn 215 geonet 157 geonames 129 pnosa2011 46 other 26 gnis 26 ERRORS 17 cga 5 local 3 gnrnsw 3 aadcg 1 wikiped 1 gettytgn 1 geoapn geonames
I then wanted to find out who was using this subfield, so I ran a job to extract the 040 $a, the “original cataloging agency” and totaled the occurrences. It turns out the vast majority come from five institutions:
2471 National Library of Israel (J9U)
1632 Libraries Australia (AU@)
1076 British Library (UKMGB)
885 Pennsylvania State University (UPM)
799 Cambridge University (UkCU)
Then it drops off rather precipitously from there:
213 Agency for the Legal Deposit Libraries (Scotland) (StEdALDL)
206 New York Public (NYP)
117 Commonwealth Libraries, Bureau of State Library, Pennsylvania (PHA)
101 Yale University, Beinecke Rare Book and Manuscript Library (CtY-BR)
Curious about how the main user of this element was using it, I contacted the National Library of Israel. They were kind enough to reply to my odd query:
We have added geographic coordinates to records that describe ketubot, Jewish marriage contracts. The contracts almost always include the geographic location where the wedding takes place.
Using, google earth ($2 gooearth) , we added the coordinates with the intention of enabling the display of a google map in this website.
I don’t believe that the site is fully functional as to their intended goal, but you can at least start to get an idea as to how this data is going to be used. So even a lowly subfield can have higher aspirations for impact than may seem warranted at first.About Roy Tennant
Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.Mail | Web | Twitter | Facebook | LinkedIn | Flickr | YouTube | More Posts (87)
Schema.org is basically a simple vocabulary for describing stuff, on the web. Embed it in your html and the search engines will pick it up as they crawl, and add it to their structured data knowledge graphs. They even give you three formats to chose from — Microdata, RDFa, and JSON-LD — when doing the embedding. I’m assuming, for this post, that the benefits of being part of the Knowledge Graphs that underpin so called Semantic Search, and hopefully triggering some Rich Snippet enhanced results display as a side benefit, are self evident.
The vocabulary itself is comparatively easy to apply once you get your head around it — find the appropriate Type (Person, CreativeWork, Place, Organization, etc.) for the thing you are describing, check out the properties in the documentation and code up the ones you have values for. Ideally provide a URI (URL in Schema.org) for a property that references another thing, but if you don’t have one a simple string will do.
There are a few strangenesses, that hit you when you first delve into using the vocabulary. For example, there is no problem in describing something that is of multiple types — a LocalBussiness is both an Organisation and a Place. This post is about another unusual, but very useful, aspect of the vocabulary — the Role type.
At first look at the documentation, Role looks like a very simple type with a handful of properties. On closer inspection, however, it doesn’t seem to fit in with the rest of the vocabulary. That is because it is capable of fitting almost anywhere. Anywhere there is a relationship between one type and another, that is. It is a special case type that allows a relationship, say between a Person and an Organization, to be given extra attributes. Some might term this as a form of annotation.
So what need is this satisfying you may ask. It must be a significant need to cause the creation of a special case in the vocabulary. Let me walk through a case, that is used in a Schema.org Blog post, to explain a need scenario and how Role satisfies that need.Starting With American Football
Say you are describing members of an American Football Team. Firstly you would describe the team using the SportsOrganization type, giving it a name, sport, etc. Using RDFa:<div vocab="http://schema.org/" typeof="SportsOrganization" resource="http://example.com/teams/tlg"> <span property="name">Touchline Gods</span> <span property="sport">American Football</span> </div>
Then describe a player using a Person type, providing name, gender, etc.:<div vocab="http://schema.org/" typeof="Person" resource="http://example.com/folks/chucker"> <span property="name">Chucker Roberts</span> <span property="birthDate">1989</span> </div>
Now lets relate them together by adding an athlete relationship to the Person description:<div vocab="http://schema.org/" typeof="SportsOrganization"> <span property="name">Touchline Gods</span> <span property="sport">American Football</span> <span property="athlete" typeof="Person" src="http://example.com/folks/chucker"> <span property="name">Chucker Roberts</span> <span property="birthDate">1989</span> </span> </div>
Let’s take a look of the data structure we have created using Turtle – not a html markup syntax but an excellent way to visualise the data structures isolated from the html:@prefix schema: <http://schema.org/> . <http://example.com/teams/tlg> a schema:SportsOrganization; schema:name "Touchline Gods"; schema:sport "American Football"; schema:athlete <http://example.com/folks/chucker> . <http://example.com/folks/chucker> a schema:Person; schema:name "Chucker Roberts"; schema:birthDate "1969".
So we now have Chucker Roberts described as an athlete on the Touchline Gods team. The obvious question then is how do we describe the position he plays in the team. We could have extended the SportsOrganization type with a property for every position, but scaling that across every position for every team sport type would have soon ended up with far more properties than would have been sensible, and beyond the maintenance scope of a generic vocabulary such as Schema.org.
This is where Role comes in handy. Regardless of the range defined for any property in Schema.org, it is acceptable to provide a Role as a value. The convention then is to use a property with the same property name, that the Role is a value for, to then remake the connection to the referenced thing (in this case the Person). In simple terms we have have just inserted a Role type between the original two descriptions.
This indirection has not added much you might initially think, but Role has some properties of its own (startDate, endDate, roleName) that can help us qualify the relationship between the SportsOrganization and the athlete (Person). For the field of organizations there is a subtype of Role (OrganizationRole) which allows the relationship to be qualified slightly more.
RDFa:<div vocab="http://schema.org/" typeof="SportsOrganization" resource="http://example.com/teams/tlg"> <span property="name">Touchline Gods</span> <span property="sport">American Football</span> <span property="athlete" typeof="OrganizationRole"> <span propery="startDate">01072014</span> <span property="roleName">Quarterback</span> <span property="number">11;</span> <span property="athlete" typeof="Person" src="http://example.com/folks/chucker"> <span property="name">Chucker Roberts</span> <span property="birthDate">1989</span> </span> </span> </div>
and in Turtle:@prefix schema: <http://schema.org/> <http://example.com/teams/tlg> a schema:SportsOrganization; schema:name "Touchline Gods"; schema:sport "American Football"; schema:athlete [ a schema:OrganizationRole schema:roleName "Quarterback"; schema:startDate "01072014"; schema:number "11" schema:athlete <http://example.com/folks/chucker> ]. <http://example.com/folks/chucker> a schema:Person; schema:name "Chucker Roberts"; schema:birthDate "1969" .Beyond American Football
So far I have just been stepping through the example provided in the Schema.org blog post on this. Let’s take a look at an example from another domain – the one I spend my life immersed in – libraries.
There are many relationships between creative works that libraries curate and describe (books, articles, theses, manuscripts, etc.) and people & organisations that are not covered adequately by the properties available (author, illustrator, contributor, publisher, character, etc.) in CreativeWork and its subtypes. By using Role, in the same way as in the sports example above, we have the flexibility to describe what is needed.
Take a book (How to be Orange: an alternative Dutch assimilation course) authored by Gregory Scott Shapiro, that has a preface written by Floor de Goede. As there is no writerOfPreface property we can use, the best we could do is to is to put Floor de Goede in as a contributor. However by using Role can qualify the contribution role that he played to be that of the writer of preface.
In Turtle:@prefix schema: <http://schema.org/> . @prefix relators: <http://id.loc.gov/vocabulary/relators/> . @prefix viaf: <http://viaf.org/viaf/> . <http://www.worldcat.org/oclc/859406554> a schema:Book; schema:name "How to be orange : an alternative Dutch assimilation course"; schema:author viaf:305830120; # Gregory Scott Shapiro schema:exampleOfWork ; schema:contributor [ a schema:Role; schema:roleName relators:wpr; # Writer of preface schema:contributor viaf:283191359; # Floor de Goede ] .
and RDF:<div vocab="http://schema.org/" typeof="Book" resource="http://www.worldcat.org/oclc/859406554"> <span property="name">How to be orange : an alternative Dutch assimilation course</span> <span property="author" src="http://viaf.org/viaf/305830120">Gregory Scott Shapiro</span> <span property="exampleOfWork" src="http://worldcat.org/entity/work/id/1404771725"></span> <span property="contributor" typeOf="Role" > <span property="roleName" src="http://id.loc.gov/vocabulary/relators/wpr">Writer of preface</span> <span property="contributor" src="http://http://viaf.org/viaf/283191359">Floor de Goede</span> </span> </div>
You will note in this example I have made use of URLs, to external resources – VIAF for defining the Persons and the Library of Congress relator codes – instead of defining them myself as strings. I have also linked the book to it’s Work definition so that someone exploring the data can discover other editions of the same work.
Do I always use Role?
In the above example I relate a book to two people, the author and the writer of preface. I could have linked to the author via another role with the roleName being ‘Author’ or <http://id.loc.gov/vocabulary/relators/aut>. Although possible, it is not a recommended approach. Wherever possible use the properties defined for a type. This is what data consumers such as search engines are going to be initially looking for.
To demonstrate the flexibility of using the Role type here is the markup that shows a small diversion in my early career:@prefix schema: <http://schema.org/> . <http://www.wikidata.org/entity/Q943241> a schema:PerformingGroup; schema:name "Gentle Giant"; schema:employee [ a schema:Role; schema:roleName "Keyboards Roadie"; schema:startDate "1975"; schema:endDate "1976"; schema:employee [ a schema:Person; schema:name "Richard Wallis"; ]; ]; .
This demonstrates the ability of Role to be used to provide added information about most relationships between entities, in this case the employee relationship. Often Role itself is sufficient, with the ability for the vocabulary to be extended with subtypes of Role to provide further use-case specific properties added.
Whenever possible use URLs for roleName
In the above example, it is exceedingly unlikely that there is a citeable definition on the web, I could link to for the roleName. So it is perfectly acceptable to just use the string “Keyboards Roadie”. However to help the search engines understand unambiguously what role you are describing, it is always better to use a URL. If you can’t find one, for example in the Library of Congress Relater Codes, or in Wikidata, consider creating one yourself in Wikipedia or Wikidata for others to share. Another spin-off benefit for using URIs (URLs) is that they are language independent, regardless of the language of the labels in the data the URI always means the same thing. Sources like Wikidata often have names and descriptions for things defined in multiple languages, which can be useful in itself.
This very flexible mechanism has many potential uses when describing your resources in Schema.org. There is always a danger in over using useful techniques such as this. Be sure that there is not already a way within Schema, or worth proposing to those that look after the vocabulary, before using it.
Good luck in your role in describing your resources and the relationships between them using Schema.org
Last week, the American Library Association (ALA) Washington Office hosted librarians from Thailand who are visiting the United States to learn about library practices and futures. Our visitors, Supawan Ardkhla and Nusila Yumaso, are participants in the U.S. State Department’s International Visitor Leadership Program. Through short-term visits to the U.S., foreign leaders in a variety of fields experience our country firsthand and cultivate professional relationships. They were accompanied by interpreter Montanee Anusas-amornkul.
The visitors’ agenda was wide-ranging. Topics included ebooks, digital literacy, libraries as place, employment and entrepreneurship, and many more. After Washington, the Thai librarians visited libraries in several other U.S. cities.
ALA Washington Office Executive Director Emily Sheketoff and I represented ALA. Hosting visitors from abroad is a regular responsibility of the Office, and we’ve met with librarians from many other countries around the world, from Lebanon to Columbia.
Those who have been paying attention to the cutting edge of digital libraries no doubt know about the Hydra project headed up by Stanford. Hydra is a digital repository system that is built using Ruby and is designed to accept the full range of digital object types that a large research library must manage. Built on top of Fedora and Solr, with Blacklight as the default front-end, one doesn’t normally associate ease of installation with a stack like that. Heck, you could spend a week just getting all of the dependencies installed, configured, and up and running.
So color me surprised when it was announced that the Digital Public Library of America, Stanford University, and the Duraspace organization announced that IMLS had awarded them a $2 million National Leadership Grant to develop “Hydra-in-a-Box”. Just as it sounds, the goal is to “build, bundle, and promote a feature-complete, robust digital repository that is easy to install, configure, and maintain—in short, a next-generation digital repository that will work for institutions large and small, and is capable of running as a hosted service.”
That is no small goal, and a laudable one at that. But…gosh. What a distance there is to travel to get there. The project has it pegged at 30 months, so nearly three years. That sounds about right, and so far Tom Cramer has built one of the most broad-based coalitions I’ve seen in academic libraries around Hydra, so you won’t find me betting against him. Especially since he just landed $2 million to help him build out his pet project. So as much as it pains this Cal Bear to say it, Go Stanford!
Summer is right around the corner and a long held tradition in the public library community is summer reading programs. Synonymous with youth and young adult services, summer reading is worth the revisit by adults.
Science fiction is a gateway
I believe there is a positive correlation between reading science fiction novels and genuine interest in emerging technology. When I was younger, I loved science fiction and fantasy. My interests range from A Princess of Mars to The Hitchhikers Guide to the Galaxy. The Twilight Zone was a mark of my childhood. What I read and watched informed my psyche and furthered my interests in futuristic technology that modern humans could only dream of. The bottom line is that these books sparked an interest. Almost all tech heads I know love science fiction and fantasy. Not everyone is into books, but most science fiction films are based on alternate worlds created by authors like Isaac Asimov and Philip K. Dick. Authors of science fiction and fantasy push the envelope on physics, technology, psychology and history. These novels take place in the “future”, a fictional past or serve as social commentary. They can are cautionary tales or impetus for the reader to become proactive in current affairs. I’m sure no one wants to live in a world similar to Pat Frank’s Alas, Babylon.
A few suggestions for your reading list
In 2011 NPR published a fan-selected list of the top 100 science-fiction and fantasy books for summer reading. While selecting the best science fiction/fantasy book of all time may be a point of contention amongst staunch fans, the point in doing so is impractical.
I went ahead and selected my favorites from NPR’s list as suggestions for summer reading. There are a few that are on my personal reading wish list and many are on my re-read wish list. Which eager reader doesn’t have a wish list?
If you went to high school in the United States, you were probably forced to read these. You probably had to analyze the themes, tone, characters, etc. As a result the mere mention of them is trite, but they more than deserve their place on this list.
1984 by George Orwell
Fahrenheit 451 by Ray Bradbury
Brave New World by Aldous Huxley
Slaughterhouse-Five by Kurt Vonnegut
Frankenstein by Mary Shelley
Some of the best science-fiction/fantasy books are based in an infinite universe so that they require reader commitment and the ability to lift a ten pound book. Though your eyes may be weary, you won’t be at a loss for the possibilities that are illuminated through the text.
The Lord of the Rings by J.R.R. Tolkien
Dune by Frank Herbert
Foundation by Isaac Asimov
A Game of Thrones by George R.R. Martin
The Giver by Lois Lowry (not on NPR’s list)
A Princess of Mars by Edgar Rice Burroughs (not on NPR’s list)
Do Androids Dream of Electric Sheep? by Philip K. Dick
The Andromeda Strain by Michael Chrichton
The Gunslinger (The Dark Tower Series) by Stephen King
Outlander by Diana Gabaldon
1632 by Eric Flint
The Body Snatchers by Jack Finney
Now that I’ve performed my reader’s advisory, what’s on your summer reading list? If you have any recommendations, reply to this post to share with others.
Here's a contribution from Jeff Young, who manages the RDF aspects of VIAF:
Since Wikidata’s introduction to the Linked Data Web in 2014 and subsequent integration of Freebase, it has become a premier example of how to publish and manage Linked Data. Like VIAF, Wikidata uses Schema.org as its core RDF vocabulary and both datasets publish using Linked Data best practices. This consistency should allow applications to treat both datasets as complementary. The main difference will be in the coverage of entities/information, based on their respective sources.
The VIAF RDF changes outlined on the Developer Network blog are intended to further enrich and align the common purpose. Some of the VIAF changes provide additional information to help disambiguate entities, such as schema:location and schema:description. Where possible, schema:names are now language tagged, which should make it easier for applications to select a language-appropriate label for display.
The biggest change, though, is in the “shape of the data” that gets returned via Linked Data requests. Previously, this was a record-oriented view rather than a concise description of the entity. Like Wikidata, the new response will focus on the entity itself and depend on the related entities to describe themselves.
Alignment with Wikidata is a major step in the evolution of VIAF, which started with RDF/XML representations of name authority clusters in 2009 and transitioned to “primary entities” in 2011. The introduction of VIAF as Schema.org in 2014 extends the audience and integration with Wikidata further strengthens industry standard practices. These steps should help ensure that VIAF remains an authoritative source of entity identifiers and information in the linked web of data.
Note: We expect these RDF changes to be visible on viaf.org April 16, 2015. The bulk distribution will follow shortly after that.
Boston, MA – The Digital Public Library of America (DPLA), Stanford University, and the DuraSpace organization are pleased to announce that their joint initiative has been awarded a $2M National Leadership Grant from the Institute of Museum and Library Services (IMLS). Nicknamed Hydra-in-a-Box, the project aims foster a new, national, library network through a community-based repository system, enabling discovery, interoperability and reuse of digital resources by people from this country and around the world.
This transformative network is based on advanced repositories that not only empower local institutions with new asset management capabilities, but also interconnect their data and collections through a shared platform.
“At the core of the Digital Public Library of America is our national network of hubs, and they need the systems envisioned by this project,” said Dan Cohen, DPLA’s executive director. “By combining contemporary technologies for aggregating, storing, enhancing, and serving cultural heritage content, we expect this new stack will be a huge boon to DPLA and to the broader digital library community. In addition, I’m thrilled that the project brings together the expertise of DuraSpace, Stanford, and DPLA.”
Each of the partners will fulfill specific roles in the joint initiative. Stanford will use its existing leadership in the Hydra Project to develop core components, in concert with the broader Hydra community. DPLA will focus on the connective tissue between hubs, mapping, and crosswalks to DPLA’s metadata application profile, and infrastructure to support metadata enhancement and remediation. DuraSpace will use its expertise in building and serving repositories, and doing so at scale, to construct the back-end systems for Hydra hosting.
“DuraSpace is excited to provide the infrastructure for this project,” said Debra Hanken Kurtz, DuraSpace CEO. “It aligns perfectly with our mission to steward the scholarly and cultural heritage records and make them accessible for current and future generations. We look forward to working with DPLA and Stanford to support their work and that of the community to ensure a robust and sustainable future for Hydra-in-a-Box.’”
Over the project’s 30-month time frame, the partners will engage with libraries, archives, and museums nationwide, especially current and prospective DPLA hubs and the Hydra community, to systematically capture the needs for a next-generation, open source, digital repository. They will collaboratively extend the existing Hydra project codebase to build, bundle, and promote a feature-complete, robust digital repository that is easy to install, configure, and maintain—in short, a next-generation digital repository that will work for institutions large and small, and is capable of running as a hosted service. Finally, starting with DPLA’s own metadata aggregation services, the partners will work to ensure that these repositories have the necessary affordances to support networked aggregation, discovery, management and access to these resources, producing a shared, sustainable, nationwide platform.
“The Hydra Project has already demonstrated enormous traction and value as a best-in-class digital repository for institutions like Stanford,” said Tom Cramer, Chief Technology Strategist at the Stanford University Libraries. “And yet there is so much more to do. This grant will provide the means to rapidly accelerate Hydra’s rate of development and adoption–expanding its community, features and value all at once.”
To find out more about the Hydra-in-a-Box initiative contact Dan Cohen (firstname.lastname@example.org), Tom Cramer (email@example.com) or Debra Hanken Kurtz (firstname.lastname@example.org). An information page is available here: https://wiki.duraspace.org/display/hydra/Hydra+in+a+Box.
The Digital Public Library of America (http://dp.la) strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. Since launching in April 2013, it has aggregated over 8.5 million items from over 1,700 institutions. The DPLA is a registered 501(c)(3) non-profit.
DuraSpace (http://duraspace.org), an independent 501(c)(3) not-for-profit organization providing leadership and innovation for open technologies that promote durable, persistent access to digital data. We collaborate with academic, scientific, cultural, and technology communities by supporting projects (DSpace, Fedora, VIVO) and creating services (DuraCloud, DSpaceDirect, ArchivesDirect) to help ensure that current and future generations have access to our collective digital heritage. Our values are expressed in our organizational byline, “Committed to our digital future.”
About Stanford University Libraries
The Stanford University Libraries (http://library.stanford.edu) is internationally recognized as a leader among research libraries, and in leveraging digital technology to support scholarship in the age of information. It is a founder of both the Hydra Project and the Fedora 4 repository effort, and a leading institution in the International Image Interoperability Framework (IIIF) (http://iiif.io).
About the Hydra Project
The Hydra Project (http://projecthydra.org) is both an open source community and a suite of software that provides a flexible and robustframework for managing, preserving, and providing access to digital assets. The project motto, “One body, many heads,” speaks to the flexibility provided by Hydra’s modern, modular architecture, and the power of combining a robust repository backend (the “body”) with flexible, tailored, user interfaces (“heads”). Co-designed and developed in concert with Fedora 4, the extensible, durable, and widely used repository software, the Hydra/Fedora stack is centerpiece of a thriving and rapidly expanding open source community poised to easy-to-implement solution.