news aggregator

Bisson, Casey: What makes us special?

planet code4lib - Mon, 2014-02-17 22:37

In Daily Kos this weekend: A Common Thread Among Young-Earth Creationists, Gun Enthusiasts, Marriage Exclusivists, and the 1%. The key point is that groups identify by what makes them “feel special.” Distilled, here are the four groups:

  • Creationists: being created by god makes humans special
  • Gun enthusiasts: their role in protecting liberty makes them special
  • Marriage exclusivists: making marriage exclusive to straight people makes them special
  • One percenters: their accumulated wealth makes them special

I was interested in seeing the author’s evaluation of what may be a motivation for (some) members of the identified groups. Is there a word for this political philosophy or interpretation? “feeling special-ism?”

The argument rather misses the point however, and it’s best seen in the treatment of “gun enthusiasts” (the term “gun enthusiasts” is the author’s own). Winkler’s history of gun control reveals guns have been recognized as more than simple property throughout much of American history. My own feelings on gun control took a sharp turn more than a decade ago and I now see the issue as a huge political distraction.

Putting history and my political views aside, imagine the second bullet of my summary read as follows:

  • First Amendment absolutists: their role in protecting liberty makes them special

Does replacing “gun enthusiasts” with “First Amendment absolutists” change your feelings about it? Would the meaning of the original text be largely the same if the group were called “Second Amendment absolutists” instead of “gun enthusiasts?”

  • Second Amendment absolutists: their role in protecting liberty makes them special

Does that connote something different?

This is all worth considering seriously because the Constitution and Bill of Rights offer specific protections that First Amendment absolutists and “gun enthusiasts” have both committed themselves to defending. Those documents offer no specific recognition for the positions of creationists, marriage exclusivists, or one percenters.

More significantly, however. Of the four named groups, three are identified by the equality they deny others. The fourth is identified by the rights they insist are shared by all. The right and wrong of these positions is best interpreted by natural law

Two special extras:

  1. First and second amendment absolutists might actually gather here.
  2. Josh Cooley’s art for Movies R Fun makes me feel special for knowing most of the references.

This scene from Josh Cooley‘s Movies R Fun is pretty much perfect for a discussion of pride and gun control

Morgan, Eric Lease: CrossRef’s Prospect API

planet code4lib - Mon, 2014-02-17 18:11

This is the tiniest of blog postings outlining my experiences with a fledgling API called Prospect.

Prospect is an API being developed by CrossRef. I learned about it through both word-of-mouth as well as a blog posting by Eileen Clancy called “Easy access to data for text mining“. In a nutshell, given a CrossRef DOI via content negotiation, the API will return both the DOI’s bibliographic information as well as URL(s) pointing to the location of full text instances of the article. The purpose of the API is to provide a straight-forward method for acquiring full text content without the need for screen scraping.

I wrote a simple, almost brain-deal Perl subroutine implementing the API. For a good time, I put the subroutine into action in a CGI script. Enter a simple query, and the script will search CrossRef for full text articles, and return a list of no more than five (5) titles and their associated URL’s where you can get them in a number of formats.


screen shot of CrossRef Prospect API in action

The API is pretty straight-forward, but the URLs pointing to the full text are stuffed into a “Links” HTTP header, and the value of the header is not as easily parseable as one might desire. Still, this can be put to good use in my slowly growing stock of text mining tools. Get DOI. Feed to one of my tools. Get data. Do analysis.

Fun with HTTP.

Morgan, Eric Lease: CrossRef’s Prospect API

planet code4lib - Mon, 2014-02-17 18:11

This is the tiniest of blog postings outlining my experiences with a fledgling API called Prospect.

Prospect is an API being developed by CrossRef. I learned about it through both word-of-mouth as well as a blog posting by Eileen Clancy called “Easy access to data for text mining“. In a nutshell, given a CrossRef DOI via content negotiation, the API will return both the DOI’s bibliographic information as well as URL(s) pointing to the location of full text instances of the article. The purpose of the API is to provide a straight-forward method for acquiring full text content without the need for screen scraping.

I wrote a simple, almost brain-deal Perl subroutine implementing the API. For a good time, I put the subroutine into action in a CGI script. Enter a simple query, and the script will search CrossRef for full text articles, and return a list of no more than five (5) titles and their associated URL’s where you can get them in a number of formats.


screen shot of CrossRef Prospect API in action

The API is pretty straight-forward, but the URLs pointing to the full text are stuffed into a “Links” HTTP header, and the value of the header is not as easily parseable as one might desire. Still, this can be put to good use in my slowly growing stock of text mining tools. Get DOI. Feed to one of my tools. Get data. Do analysis.

Fun with HTTP.

Rochkind, Jonathan: A Proquest platform API

planet code4lib - Mon, 2014-02-17 17:29

We subscribe to a number of databases via Proquest.

I wanted an API for having my software execute fielded searches against a Proquest a database — specifically Dissertations and Theses in my current use ase — and get back structured machine-interpretable results.

I had vaguely remembered hearing about such an API, but was having trouble finding any info about it.

It turns out, while you’ll have trouble finding any documentation about it, or even any evidence it exists on the web, and you’ll have trouble getting information about it from Proquest support too — such an api does exist.  Hooray.

You may occasionally see it called the “XML Gateway” in some Proquest documentation materials (although Proquest support doesn’t neccesarily know this term). And it was probably intended for and used by federated search products — which makes me realize, oh yeah, if I have any database that’s used by a federated search product, then it’s probably got some kind of API.

And it’s an SRU endpoint.

(Proquest may also support z39.50, but at least some Proquest docs suggest they recommend you transition to the “XML Gateway” instead of z39.50, and I personally find it easier to work with then z39.50).

Here’s an example query:

http://fedsearch.proquest.com/search/sru/pqdtft?operation=searchRetrieve&version=1.2&maximumRecords=30&startRecord=1&query=title%3D%22global%20warming%22%20AND%20author%3DCastet

For me, coming from an IP address recognized as ‘on campus’ for our general Proquest access, no additional authentication is required to use this API. I’m not sure if we at some point prior had them activate the “XML Gateway” for us, likely for a federated search product, or if it’s just this way for everyone.

The path component after “/sru”, “pqdtft” is the database code for Proquest Dissertations and Theses. I’m not sure where you find a list of these database codes in general; if you’ve made a succesful API request to that endpoint, there will be a <diagnosticMessage> element near the end of the response listing all database codes you have access to (but without corresponding full English names, you kind of have to guess).

The value of the ‘query’ parameter is a valid CQL query, as usual for SRU. It can be a bit tricky figuring out how to express what you want in CQL, but the CQL standard docs are decent, if you spend a bit of time with them to learn CQL.

Unfortunately, there seems to be no SRU “explain” response available from Proquest to tell you what fields/operators are available. But guessing often works, “title”, “author”, and “date” are all available — I’m not sure exactly how ‘date’ works, need to experiment more — although doing things like `date > 1990 AND date <= 2010` appears initially to work.

The CQL query param above un-escaped is:

title="global warming" AND author=Castet

Responses seem to be in MARCXML, and that seems to be the only option.

It looks like you can tell if a full text is available (on Proquest platform) for a given item, based on whether there’s an 856 field with second indicator set to “0″ — that will be a URL to full text. I think. It looks like.

Did I mention if there are docs for any of this, I don’t have them?

So, there you go, a Proquest search API!

I also posted this to the code4lib listserv, and got some more useful details and hints from Andrew Anderson.

Oh, and if you want to link to a document you found this way, one way that seems to work is to take the Proquest document ID from the marc 001 field in the response, and construct a URL like `http://search.proquest.com/pqdtft/docview/$DOCID$`.  Seems to work. Linking to fulltext if it’s available otherwise a citation page.  Note the `pqdtft` code in the URL, again meaning ‘Proquest Dissertations and Theses’ — the same db I was searching to find to the doc id.


Filed under: General

Rochkind, Jonathan: A Proquest platform API

planet code4lib - Mon, 2014-02-17 17:29

We subscribe to a number of databases via Proquest.

I wanted an API for having my software execute fielded searches against a Proquest a database — specifically Dissertations and Theses in my current use ase — and get back structured machine-interpretable results.

I had vaguely remembered hearing about such an API, but was having trouble finding any info about it.

It turns out, while you’ll have trouble finding any documentation about it, or even any evidence it exists on the web, and you’ll have trouble getting information about it from Proquest support too — such an api does exist.  Hooray.

You may occasionally see it called the “XML Gateway” in some Proquest documentation materials (although Proquest support doesn’t neccesarily know this term). And it was probably intended for and used by federated search products — which makes me realize, oh yeah, if I have any database that’s used by a federated search product, then it’s probably got some kind of API.

And it’s an SRU endpoint.

(Proquest may also support z39.50, but at least some Proquest docs suggest they recommend you transition to the “XML Gateway” instead of z39.50, and I personally find it easier to work with then z39.50).

Here’s an example query:

http://fedsearch.proquest.com/search/sru/pqdtft?operation=searchRetrieve&version=1.2&maximumRecords=30&startRecord=1&query=title%3D%22global%20warming%22%20AND%20author%3DCastet

For me, coming from an IP address recognized as ‘on campus’ for our general Proquest access, no additional authentication is required to use this API. I’m not sure if we at some point prior had them activate the “XML Gateway” for us, likely for a federated search product, or if it’s just this way for everyone.

The path component after “/sru”, “pqdtft” is the database code for Proquest Dissertations and Theses. I’m not sure where you find a list of these database codes in general; if you’ve made a succesful API request to that endpoint, there will be a <diagnosticMessage> element near the end of the response listing all database codes you have access to (but without corresponding full English names, you kind of have to guess).

The value of the ‘query’ parameter is a valid CQL query, as usual for SRU. It can be a bit tricky figuring out how to express what you want in CQL, but the CQL standard docs are decent, if you spend a bit of time with them to learn CQL.

Unfortunately, there seems to be no SRU “explain” response available from Proquest to tell you what fields/operators are available. But guessing often works, “title”, “author”, and “date” are all available — I’m not sure exactly how ‘date’ works, need to experiment more — although doing things like `date > 1990 AND date <= 2010` appears initially to work.

The CQL query param above un-escaped is:

title="global warming" AND author=Castet

Responses seem to be in MARCXML, and that seems to be the only option.

It looks like you can tell if a full text is available (on Proquest platform) for a given item, based on whether there’s an 856 field with second indicator set to “0″ — that will be a URL to full text. I think. It looks like.

Did I mention if there are docs for any of this, I don’t have them?

So, there you go, a Proquest search API!

I also posted this to the code4lib listserv, and got some more useful details and hints from Andrew Anderson.

Oh, and if you want to link to a document you found this way, one way that seems to work is to take the Proquest document ID from the marc 001 field in the response, and construct a URL like `http://search.proquest.com/pqdtft/docview/$DOCID$`.  Seems to work. Linking to fulltext if it’s available otherwise a citation page.  Note the `pqdtft` code in the URL, again meaning ‘Proquest Dissertations and Theses’ — the same db I was searching to find to the doc id.


Filed under: General

Morgan, Eric Lease: Analyzing search results using JSTOR’s Data For Research

planet code4lib - Mon, 2014-02-17 15:58
Introduction

Data For Research (DFR) is an alternative interface to JSTOR enabling the reader to download statistical information describing JSTOR search results. For example, using DFR a person can create a graph illustrating when sets of citations where written, create a word cloud illustrating the most frequently used words in a journal article, or classify sets of JSTOR articles according to a set of broad subject headings. More advanced features enable the reader to extract frequently used phrases in a text as well as list statistically significant keywords. JSTOR’s DFR is a powerful tool enabling the reader to look for trends in large sets of articles as well as drill down into the specifics of individual articles. This hands-on workshop leads the student through a set of exercises demonstrating these techniques.

Faceted searching

DFR supports an easy-to-use search interface. Enter one or two words into the search box and submit your query. Alternatively you can do some field searching using the advanced search options. The search results are then displayed and sortable by date, relevance, or a citation rank. More importantly, facets are displayed along side the search results, and searches can be limited by selecting one or more of the facet terms. Limiting by years, language, subjects, and disciplines prove to be the most useful.


search results screen

Publication trends over time

By downloading the number of citations from multiple search results, it is possible to illustrate publication trends over time.

In the upper right-hand corner of every search result is a “charts view” link. Once selected it will display a line graph illustrating the number of citations fitting your query over time. It also displays a bar chart illustrating the broad subject areas of your search results. Just as importantly, there is a link at the bottom of the page — “Download data for year chart” — allowing you to download a comma-separated (CSV) file of publication counts and years. This file is easily importable into your favorite spreadsheet program and chartable. If you do multiple searches and download multiple CSV files, then you can compare publication trends. For example, the following chart compares the number of times the phrases “Henry Wadsworth Longfellow”, “Henry David Thoreau”, and “Ralph Waldo Emerson” have appeared in the JSTOR literature between 1950 and 2000. From the chart we can see that Emerson was consistently mentioned more of than both Longfellow and Thoreau. It would be interesting to compare the JSTOR results with the results from Google Books Ngram Viewer, which offers a similar service against their collection of digitized books.


chart view screen shot


publications trends for Emerson, Thoreau, and Longfellow

Key word analysis

DFR counts and tabulates frequently used words and statistically significant key words. These tabulations can be used to illustrate characteristics of search results.

Each search result item comes complete with title, author, citation, subject, and key terms information. The subjects and key terms are computed values — words and phrases determined by frequency and statistical analysis. Each search result item comes with a “More Info” link which returns lists of the item’s most frequently used words, phrases, and keyword terms. Unfortunately, these lists often include stop words like “the”, “of”, “that”, etc. making the results not as meaningful as they could be. Still, these lists are somewhat informative. They allude to the “aboutness” of the selected article.

Key terms are also facets. You can expand the Key terms facets to get a small word cloud illustrating the frequency of each term across the entire search result. Clicking on one of the key terms limits the search results accordingly. You can also click on the Export button to download a CVS file of key terms and their frequency. This information can then be fed to any number of applications for creating word clouds. For example, download the CSV file. Use your text editor to open the CSV file, and find/replace the commas with colons. Copy the entire result, and paste it into Wordle’s advanced interface. This process can be done multiple times for different searches, and the results can be compared & contrasted. Word clouds for Longfellow, Thoreau, and Emerson are depicted below, and from the results you can quickly see both similarities and differences between each writer.


Ralph Waldo Emerson key terms


Henry David Thoreau key terms


Henry Wadsworth Longfellow key terms

Downloading complete data sets

If you create a DFR account, and if you limit your search results to 1,000 items or less, then you can download a data set describing your search results.

In the upper right-hand corner of the search results screen is a pull-down menu option for submitting data set requests. The resulting screen presents you with options for downloading a number of different types of data (citations, word counts, phrases, and key terms) in two different formats (CSV and XML). The CSV format is inherently easier to use, but the XML format seems to be more complete, especially when it comes to citation information. After submitting your data set request you will have to wait for an email message from DFR because it takes a while (anywhere from a few minutes to a couple of hours) for it to be compiled.


data set request page

After downloading a data set you can do additional analysis against it. For example, it is possible to create a timeline illustrating when individual articles where written. It is not be too difficult to create word clouds from titles or author names. If you have programming experience, then you might be be able to track ideas over time or the originator of specific ideas. Concordances — keyword in context search engines — can be implemented. Some of this functionality, but certainly not all, is being slowly implemented in a Web-based application called JSTOR Tool.

Summary

As the written word is increasingly manifested in digital form, so does the ability to evaluate the written word quantifiably. JSTOR’s DFR is one example of how this can be exploited for the purposes of academic research.

Note

A .zip file containing some sample data and well as the briefest of instructions on how to use it is linked from this document.

Morgan, Eric Lease: Analyzing search results using JSTOR’s Data For Research

planet code4lib - Mon, 2014-02-17 15:58
Introduction

Data For Research (DFR) is an alternative interface to JSTOR enabling the reader to download statistical information describing JSTOR search results. For example, using DFR a person can create a graph illustrating when sets of citations where written, create a word cloud illustrating the most frequently used words in a journal article, or classify sets of JSTOR articles according to a set of broad subject headings. More advanced features enable the reader to extract frequently used phrases in a text as well as list statistically significant keywords. JSTOR’s DFR is a powerful tool enabling the reader to look for trends in large sets of articles as well as drill down into the specifics of individual articles. This hands-on workshop leads the student through a set of exercises demonstrating these techniques.

Faceted searching

DFR supports an easy-to-use search interface. Enter one or two words into the search box and submit your query. Alternatively you can do some field searching using the advanced search options. The search results are then displayed and sortable by date, relevance, or a citation rank. More importantly, facets are displayed along side the search results, and searches can be limited by selecting one or more of the facet terms. Limiting by years, language, subjects, and disciplines prove to be the most useful.


search results screen

Publication trends over time

By downloading the number of citations from multiple search results, it is possible to illustrate publication trends over time.

In the upper right-hand corner of every search result is a “charts view” link. Once selected it will display a line graph illustrating the number of citations fitting your query over time. It also displays a bar chart illustrating the broad subject areas of your search results. Just as importantly, there is a link at the bottom of the page — “Download data for year chart” — allowing you to download a comma-separated (CSV) file of publication counts and years. This file is easily importable into your favorite spreadsheet program and chartable. If you do multiple searches and download multiple CSV files, then you can compare publication trends. For example, the following chart compares the number of times the phrases “Henry Wadsworth Longfellow”, “Henry David Thoreau”, and “Ralph Waldo Emerson” have appeared in the JSTOR literature between 1950 and 2000. From the chart we can see that Emerson was consistently mentioned more of than both Longfellow and Thoreau. It would be interesting to compare the JSTOR results with the results from Google Books Ngram Viewer, which offers a similar service against their collection of digitized books.


chart view screen shot


publications trends for Emerson, Thoreau, and Longfellow

Key word analysis

DFR counts and tabulates frequently used words and statistically significant key words. These tabulations can be used to illustrate characteristics of search results.

Each search result item comes complete with title, author, citation, subject, and key terms information. The subjects and key terms are computed values — words and phrases determined by frequency and statistical analysis. Each search result item comes with a “More Info” link which returns lists of the item’s most frequently used words, phrases, and keyword terms. Unfortunately, these lists often include stop words like “the”, “of”, “that”, etc. making the results not as meaningful as they could be. Still, these lists are somewhat informative. They allude to the “aboutness” of the selected article.

Key terms are also facets. You can expand the Key terms facets to get a small word cloud illustrating the frequency of each term across the entire search result. Clicking on one of the key terms limits the search results accordingly. You can also click on the Export button to download a CVS file of key terms and their frequency. This information can then be fed to any number of applications for creating word clouds. For example, download the CSV file. Use your text editor to open the CSV file, and find/replace the commas with colons. Copy the entire result, and paste it into Wordle’s advanced interface. This process can be done multiple times for different searches, and the results can be compared & contrasted. Word clouds for Longfellow, Thoreau, and Emerson are depicted below, and from the results you can quickly see both similarities and differences between each writer.


Ralph Waldo Emerson key terms


Henry David Thoreau key terms


Henry Wadsworth Longfellow key terms

Downloading complete data sets

If you create a DFR account, and if you limit your search results to 1,000 items or less, then you can download a data set describing your search results.

In the upper right-hand corner of the search results screen is a pull-down menu option for submitting data set requests. The resulting screen presents you with options for downloading a number of different types of data (citations, word counts, phrases, and key terms) in two different formats (CSV and XML). The CSV format is inherently easier to use, but the XML format seems to be more complete, especially when it comes to citation information. After submitting your data set request you will have to wait for an email message from DFR because it takes a while (anywhere from a few minutes to a couple of hours) for it to be compiled.


data set request page

After downloading a data set you can do additional analysis against it. For example, it is possible to create a timeline illustrating when individual articles where written. It is not be too difficult to create word clouds from titles or author names. If you have programming experience, then you might be be able to track ideas over time or the originator of specific ideas. Concordances — keyword in context search engines — can be implemented. Some of this functionality, but certainly not all, is being slowly implemented in a Web-based application called JSTOR Tool.

Summary

As the written word is increasingly manifested in digital form, so does the ability to evaluate the written word quantifiably. JSTOR’s DFR is one example of how this can be exploited for the purposes of academic research.

Note

A .zip file containing some sample data and well as the briefest of instructions on how to use it is linked from this document.

Morgan, Eric Lease: LiAM source code: Perl poetry

planet code4lib - Mon, 2014-02-17 04:40

#!/usr/bin/perl # Liam Guidebook Source Code; Perl poetry, sort of # Eric Lease Morgan <emorgan@nd.edu> # February 16, 2014 # done exit;

#!/usr/bin/perl # marc2rdf.pl – make MARC records accessible via linked data # Eric Lease Morgan <eric_morgan@infomotions.com> # December 5, 2013 – first cut; # configure use constant ROOT => ‘/disk01/www/html/main/sandbox/liam’; use constant MARC => ROOT . ‘/src/marc/’; use constant DATA => ROOT . ‘/data/’; use constant PAGES => ROOT . ‘/pages/’; use constant MARC2HTML => ROOT . ‘/etc/MARC21slim2HTML.xsl’; use constant MARC2MODS => ROOT . ‘/etc/MARC21slim2MODS3.xsl’; use constant MODS2RDF => ROOT . ‘/etc/mods2rdf.xsl’; use constant MAXINDEX => 100; # require use IO::File; use MARC::Batch; use MARC::File::XML; use strict; use XML::LibXML; use XML::LibXSLT; # initialize my $parser = XML::LibXML->new; my $xslt = XML::LibXSLT->new; # process each record in the MARC directory my @files = glob MARC . “*.marc”; for ( 0 .. $#files ) { # re-initialize my $marc = $files[ $_ ]; my $handle = IO::File->new( $marc ); binmode( STDOUT, ‘:utf8′ ); binmode( $handle, ‘:bytes’ ); my $batch = MARC::Batch->new( ‘USMARC’, $handle ); $batch->warnings_off; $batch->strict_off; my $index = 0; # process each record in the batch while ( my $record = $batch->next ) { # get marcxml my $marcxml = $record->as_xml_record; my $_001 = $record->field( ’001′ )->as_string; $_001 =~ s/_//; $_001 =~ s/ +//; $_001 =~ s/-+//; print ” marc: $marc\n”; print ” identifier: $_001\n”; print ” URI: http://infomotions.com/sandbox/liam/id/$_001\n”; # re-initialize and sanity check my $output = PAGES . “$_001.html”; if ( ! -e $output or -s $output == 0 ) { # transform marcxml into html print ” HTML: $output\n”; my $source = $parser->parse_string( $marcxml ) or warn $!; my $style = $parser->parse_file( MARC2HTML ) or warn $!; my $stylesheet = $xslt->parse_stylesheet( $style ) or warn $!; my $results = $stylesheet->transform( $source ) or warn $!; my $html = $stylesheet->output_string( $results ); &save( $output, $html ); } else { print ” HTML: skipping\n” } # re-initialize and sanity check my $output = DATA . “$_001.rdf”; if ( ! -e $output or -s $output == 0 ) { # transform marcxml into mods my $source = $parser->parse_string( $marcxml ) or warn $!; my $style = $parser->parse_file( MARC2MODS ) or warn $!; my $stylesheet = $xslt->parse_stylesheet( $style ) or warn $!; my $results = $stylesheet->transform( $source ) or warn $!; my $mods = $stylesheet->output_string( $results ); # transform mods into rdf print ” RDF: $output\n”; $source = $parser->parse_string( $mods ) or warn $!; my $style = $parser->parse_file( MODS2RDF ) or warn $!; my $stylesheet = $xslt->parse_stylesheet( $style ) or warn $!; my $results = $stylesheet->transform( $source ) or warn $!; my $rdf = $stylesheet->output_string( $results ); &save( $output, $rdf ); } else { print ” RDF: skipping\n” } # prettify print “\n”; # increment and check $index++; last if ( $index > MAXINDEX ) } } # done exit; sub save { open F, ‘ > ‘ . shift or die $!; binmode( F, ‘:utf8′ ); print F shift; close F; return; }

#!/usr/bin/perl # ead2rdf.pl – make EAD files accessible via linked data # Eric Lease Morgan <eric_morgan@infomotions.com> # December 6, 2013 – based on marc2linkedata.pl # configure use constant ROOT => ‘/disk01/www/html/main/sandbox/liam’; use constant EAD => ROOT . ‘/src/ead/’; use constant DATA => ROOT . ‘/data/’; use constant PAGES => ROOT . ‘/pages/’; use constant EAD2HTML => ROOT . ‘/etc/ead2html.xsl’; use constant EAD2RDF => ROOT . ‘/etc/ead2rdf.xsl’; use constant SAXON => ‘java -jar /disk01/www/html/main/sandbox/liam/bin/saxon.jar -s:##SOURCE## -xsl:##XSL## -o:##OUTPUT##’; # require use strict; use XML::XPath; use XML::LibXML; use XML::LibXSLT; # initialize my $saxon = ”; my $xsl = ”; my $parser = XML::LibXML->new; my $xslt = XML::LibXSLT->new; # process each record in the EAD directory my @files = glob EAD . “*.xml”; for ( 0 .. $#files ) { # re-initialize my $ead = $files[ $_ ]; print ” EAD: $ead\n”; # get the identifier my $xpath = XML::XPath->new( filename => $ead ); my $identifier = $xpath->findvalue( ‘/ead/eadheader/eadid’ ); $identifier =~ s/[^\w ]//g; print ” identifier: $identifier\n”; print ” URI: http://infomotions.com/sandbox/liam/id/$identifier\n”; # re-initialize and sanity check my $output = PAGES . “$identifier.html”; if ( ! -e $output or -s $output == 0 ) { # transform marcxml into html print ” HTML: $output\n”; my $source = $parser->parse_file( $ead ) or warn $!; my $style = $parser->parse_file( EAD2HTML ) or warn $!; my $stylesheet = $xslt->parse_stylesheet( $style ) or warn $!; my $results = $stylesheet->transform( $source ) or warn $!; my $html = $stylesheet->output_string( $results ); &save( $output, $html ); } else { print ” HTML: skipping\n” } # re-initialize and sanity check my $output = DATA . “$identifier.rdf”; if ( ! -e $output or -s $output == 0 ) { # create saxon command, and save rdf print ” RDF: $output\n”; $saxon = SAXON; $xsl = EAD2RDF; $saxon =~ s/##SOURCE##/$ead/e; $saxon =~ s/##XSL##/$xsl/e; $saxon =~ s/##OUTPUT##/$output/e; system $saxon; } else { print ” RDF: skipping\n” } # prettify print “\n”; } # done exit; sub save { open F, ‘ > ‘ . shift or die $!; binmode( F, ‘:utf8′ ); print F shift; close F; return; }

#!/usr/bin/perl # store-make.pl – simply initialize an RDF triple store # Eric Lease Morgan <eric_morgan@infomotions.com> # # December 14, 2013 – after wrestling with wilson for most of the day # configure use constant ETC => ‘/disk01/www/html/main/sandbox/liam/etc/’; # require use strict; use RDF::Redland; # sanity check my $db = $ARGV[ 0 ]; if ( ! $db ) { print “Usage: $0 <db>\n”; exit; } # do the work; brain-dead my $etc = ETC; my $store = RDF::Redland::Storage->new( ‘hashes’, $db, “new=’yes’, hash-type=’bdb’, dir=’$etc’” ); die “Unable to create store ($!)” unless $store; my $model = RDF::Redland::Model->new( $store, ” ); die “Unable to create model ($!)” unless $model; # “save” $store = undef; $model = undef; # done exit;

#!/user/bin/perl # store-add.pl – add items to an RDF triple store # Eric Lease Morgan <eric_morgan@infomotions.com> # # December 14, 2013 – after wrestling with wilson for most of the day # configure use constant ETC => ‘/disk01/www/html/main/sandbox/liam/etc/’; # require use strict; use RDF::Redland; # sanity check #1 – command line arguments my $db = $ARGV[ 0 ]; my $file = $ARGV[ 1 ]; if ( ! $db or ! $file ) { print “Usage: $0 <db> <file>\n”; exit; } # sanity check #2 – store exists die “Error: po2s file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-po2s.db’ ); die “Error: so2p file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-so2p.db’ ); die “Error: sp2o file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-sp2o.db’ ); # open the store my $etc = ETC; my $store = RDF::Redland::Storage->new( ‘hashes’, $db, “new=’no’, hash-type=’bdb’, dir=’$etc’” ); die “Error: Unable to open store ($!)” unless $store; my $model = RDF::Redland::Model->new( $store, ” ); die “Error: Unable to create model ($!)” unless $model; # sanity check #3 – file exists die “Error: $file not found.\n” if ( ! -e $file ); # parse a file and add it to the store my $uri = RDF::Redland::URI->new( “file:$file” ); my $parser = RDF::Redland::Parser->new( ‘rdfxml’, ‘application/rdf+xml’ ); die “Error: Failed to find parser ($!)\n” if ( ! $parser ); my $stream = $parser->parse_as_stream( $uri, $uri ); my $count = 0; while ( ! $stream->end ) { $model->add_statement( $stream->current ); $count++; $stream->next; } # echo the result warn “Namespaces:\n”; my %namespaces = $parser->namespaces_seen; while ( my ( $prefix, $uri ) = each %namespaces ) { warn ” prefix: $prefix\n”; warn ‘ uri: ‘ . $uri->as_string . “\n”; warn “\n”; } warn “Added $count statements\n”; # “save” $store = undef; $model = undef; # done exit; 10.5 store-search.pl – query a triple store # Eric Lease Morgan <eric_morgan@infomotions.com> # December 14, 2013 – after wrestling with wilson for most of the day # configure use constant ETC => ‘/disk01/www/html/main/sandbox/liam/etc/’; my %namespaces = ( “crm” => “http://erlangen-crm.org/current/”, “dc” => “http://purl.org/dc/elements/1.1/”, “dcterms” => “http://purl.org/dc/terms/”, “event” => “http://purl.org/NET/c4dm/event.owl#”, “foaf” => “http://xmlns.com/foaf/0.1/”, “lode” => “http://linkedevents.org/ontology/”, “lvont” => “http://lexvo.org/ontology#”, “modsrdf” => “http://simile.mit.edu/2006/01/ontologies/mods3#”, “ore” => “http://www.openarchives.org/ore/terms/”, “owl” => “http://www.w3.org/2002/07/owl#”, “rdf” => “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “rdfs” => “http://www.w3.org/2000/01/rdf-schema#”, “role” => “http://simile.mit.edu/2006/01/roles#”, “skos” => “http://www.w3.org/2004/02/skos/core#”, “time” => “http://www.w3.org/2006/time#”, “timeline” => “http://purl.org/NET/c4dm/timeline.owl#”, “wgs84_pos” => “http://www.w3.org/2003/01/geo/wgs84_pos#” ); # require use strict; use RDF::Redland; # sanity check #1 – command line arguments my $db = $ARGV[ 0 ]; my $query = $ARGV[ 1 ]; if ( ! $db or ! $query ) { print “Usage: $0 <db> <query>\n”; exit; } # sanity check #2 – store exists die “Error: po2s file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-po2s.db’ ); die “Error: so2p file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-so2p.db’ ); die “Error: sp2o file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-sp2o.db’ ); # open the store my $etc = ETC; my $store = RDF::Redland::Storage->new( ‘hashes’, $db, “new=’no’, hash-type=’bdb’, dir=’$etc’” ); die “Error: Unable to open store ($!)” unless $store; my $model = RDF::Redland::Model->new( $store, ” ); die “Error: Unable to create model ($!)” unless $model; # search #my $sparql = RDF::Redland::Query->new( “CONSTRUCT { ?a ?b ?c } WHERE { ?a ?b ?c }”, undef, undef, “sparql” ); my $sparql = RDF::Redland::Query->new( “PREFIX modsrdf: <http://simile.mit.edu/2006/01/ontologies/mods3#>\nSELECT ?a ?b ?c WHERE { ?a modsrdf:$query ?c }”, undef, undef, ‘sparql’ ); my $results = $model->query_execute( $sparql ); print $results->to_string; # done exit;

#!/usr/bin/perl # store-dump.pl – output the content of store as RDF/XML # Eric Lease Morgan <eric_morgan@infomotions.com> # # December 14, 2013 – after wrestling with wilson for most of the day # configure use constant ETC => ‘/disk01/www/html/main/sandbox/liam/etc/’; # require use strict; use RDF::Redland; # sanity check #1 – command line arguments my $db = $ARGV[ 0 ]; my $uri = $ARGV[ 1 ]; if ( ! $db ) { print “Usage: $0 <db> <uri>\n”; exit; } # sanity check #2 – store exists die “Error: po2s file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-po2s.db’ ); die “Error: so2p file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-so2p.db’ ); die “Error: sp2o file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-sp2o.db’ ); # open the store my $etc = ETC; my $store = RDF::Redland::Storage->new( ‘hashes’, $db, “new=’no’, hash-type=’bdb’, dir=’$etc’” ); die “Error: Unable to open store ($!)” unless $store; my $model = RDF::Redland::Model->new( $store, ” ); die “Error: Unable to create model ($!)” unless $model; # do the work my $serializer = RDF::Redland::Serializer->new; print $serializer->serialize_model_to_string( RDF::Redland::URI->new, $model ); # done exit;

#!/usr/bin/perl # sparql.pl – a brain-dead, half-baked SPARQL endpoint # Eric Lease Morgan <eric_morgan@infomotions.com> # December 15, 2013 – first investigations # require use CGI; use CGI::Carp qw( fatalsToBrowser ); use RDF::Redland; use strict; # initialize my $cgi = CGI->new; my $query = $cgi->param( ‘query’ ); if ( ! $query ) { print $cgi->header; print &home } else { # open the store for business my $store = RDF::Redland::Storage->new( ‘hashes’, ‘store’, “new=’no’, hash-type=’bdb’, dir=’/disk01/www/html/main/sandbox/liam/etc’” ); my $model = RDF::Redland::Model->new( $store, ” ); # search my $results = $model->query_execute( RDF::Redland::Query->new( $query, undef, undef, ‘sparql’ ) ); # return the results print $cgi->header( -type => ‘application/xml’ ); print $results->to_string; } # done exit; sub home { # create a list namespaces my $namespaces = &namespaces; my $list = ”; foreach my $prefix ( sort keys $namespaces ) { my $uri = $$namespaces{ $prefix }; $list .= $cgi->li( “$prefix – ” . $cgi->a( { href=> $uri, target => ‘_blank’ }, $uri ) ); } $list = $cgi->ol( $list ); # return a home page return <<EOF <html> <head> <title>LiAM SPARQL Endpoint</title> </head> <body style=’margin: 7%’> <h1>LiAM SPARQL Endpoint</h1> <p>This is a brain-dead and half-baked SPARQL endpoint to a subset of LiAM linked data. Enter a query, but there is the disclaimer. Errors will probably happen because of SPARQL syntax errors. Remember, the interface is brain-dead. Your milage <em>will</em> vary.</p> <form method=’GET’ action=’./’> <textarea style=’font-size: large’ rows=’5′ cols=’65′ name=’query’ /> PREFIX hub:<http://data.archiveshub.ac.uk/def/> SELECT ?uri WHERE { ?uri ?o hub:FindingAid } </textarea><br /> <input type=’submit’ value=’Search’ /> </form> <p>Here are a few sample queries:</p> <ul> <li>Find all triples with RDF Schema labels – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0ASELECT+*+WHERE+%7B+%3Fs+rdf%3Alabel+%3Fo+%7D%0D%0A”>PREFIX rdf:<http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?s rdf:label ?o }</a></code></li> <li>Find all items with MODS subjects – <code><a href=’http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+mods%3A%3Chttp%3A%2F%2Fsimile.mit.edu%2F2006%2F01%2Fontologies%2Fmods3%23%3E%0D%0ASELECT+*+WHERE+%7B+%3Fs+mods%3Asubject+%3Fo+%7D’>PREFIX mods:<http://simile.mit.edu/2006/01/ontologies/mods3#> SELECT * WHERE { ?s mods:subject ?o }</a></code></li> <li>Find every unique predicate – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=SELECT+DISTINCT+%3Fp+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D”>SELECT DISTINCT ?p WHERE { ?s ?p ?o }</a></code></li> <li>Find everything – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=SELECT+*+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D”>SELECT * WHERE { ?s ?p ?o }</a></code></li> <li>Find all classes – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=SELECT+DISTINCT+%3Fclass+WHERE+%7B+%5B%5D+a+%3Fclass+%7D+ORDER+BY+%3Fclass”>SELECT DISTINCT ?class WHERE { [] a ?class } ORDER BY ?class</a></code></li> <li>Find all properties – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=SELECT+DISTINCT+%3Fproperty%0D%0AWHERE+%7B+%5B%5D+%3Fproperty+%5B%5D+%7D%0D%0AORDER+BY+%3Fproperty”>SELECT DISTINCT ?property WHERE { [] ?property [] } ORDER BY ?property</a></code></li> <li>Find URIs of all finding aids – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+hub%3A%3Chttp%3A%2F%2Fdata.archiveshub.ac.uk%2Fdef%2F%3E+SELECT+%3Furi+WHERE+%7B+%3Furi+%3Fo+hub%3AFindingAid+%7D”>PREFIX hub:<http://data.archiveshub.ac.uk/def/> SELECT ?uri WHERE { ?uri ?o hub:FindingAid }</a></code></li> <li>Find URIs of all MARC records – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+mods%3A%3Chttp%3A%2F%2Fsimile.mit.edu%2F2006%2F01%2Fontologies%2Fmods3%23%3E+SELECT+%3Furi+WHERE+%7B+%3Furi+%3Fo+mods%3ARecord+%7D%0D%0A%0D%0A%0D%0A”>PREFIX mods:<http://simile.mit.edu/2006/01/ontologies/mods3#> SELECT ?uri WHERE { ?uri ?o mods:Record }</a></code></li> <li>Find all URIs of all collections – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+mods%3A%3Chttp%3A%2F%2Fsimile.mit.edu%2F2006%2F01%2Fontologies%2Fmods3%23%3E%0D%0APREFIX+hub%3A%3Chttp%3A%2F%2Fdata.archiveshub.ac.uk%2Fdef%2F%3E%0D%0ASELECT+%3Furi+WHERE+%7B+%7B+%3Furi+%3Fo+hub%3AFindingAid+%7D+UNION+%7B+%3Furi+%3Fo+mods%3ARecord+%7D+%7D%0D%0AORDER+BY+%3Furi%0D%0A”>PREFIX mods:<http://simile.mit.edu/2006/01/ontologies/mods3#> PREFIX hub:<http://data.archiveshub.ac.uk/def/> SELECT ?uri WHERE { { ?uri ?o hub:FindingAid } UNION { ?uri ?o mods:Record } } ORDER BY ?uri</a></code></li> </ul> <p>This is a list of ontologies (namespaces) used in the triple store as predicates:</p> $list <p>For more information about SPARQL, see:</p> <ol> <li><a href=”http://www.w3.org/TR/rdf-sparql-query/” target=”_blank”>SPARQL Query Language for RDF</a> from the W3C</li> <li><a href=”http://en.wikipedia.org/wiki/SPARQL” target=”_blank”>SPARQL</a> from Wikipedia</li> </ol> <p>Source code — <a href=”http://infomotions.com/sandbox/liam/bin/sparql.pl”>sparql.pl</a> — is available online.</p> <hr /> <p> <a href=”mailto:eric_morgan\@infomotions.com”>Eric Lease Morgan <eric_morgan\@infomotions.com></a><br /> January 6, 2014 </p> </body> </html> EOF } sub namespaces { my %namespaces = ( “crm” => “http://erlangen-crm.org/current/”, “dc” => “http://purl.org/dc/elements/1.1/”, “dcterms” => “http://purl.org/dc/terms/”, “event” => “http://purl.org/NET/c4dm/event.owl#”, “foaf” => “http://xmlns.com/foaf/0.1/”, “lode” => “http://linkedevents.org/ontology/”, “lvont” => “http://lexvo.org/ontology#”, “modsrdf” => “http://simile.mit.edu/2006/01/ontologies/mods3#”, “ore” => “http://www.openarchives.org/ore/terms/”, “owl” => “http://www.w3.org/2002/07/owl#”, “rdf” => “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “rdfs” => “http://www.w3.org/2000/01/rdf-schema#”, “role” => “http://simile.mit.edu/2006/01/roles#”, “skos” => “http://www.w3.org/2004/02/skos/core#”, “time” => “http://www.w3.org/2006/time#”, “timeline” => “http://purl.org/NET/c4dm/timeline.owl#”, “wgs84_pos” => “http://www.w3.org/2003/01/geo/wgs84_pos#” ); return \%namespaces; }

# package Apache2::LiAM::Dereference; # Dereference.pm – Redirect user-agents based on value of URI. # Eric Lease Morgan <eric_morgan@infomotions.com> # December 7, 2013 – first investigations; based on Apache2::Alex::Dereference # configure use constant PAGES => ‘http://infomotions.com/sandbox/liam/pages/’; use constant DATA => ‘http://infomotions.com/sandbox/liam/data/’; # require use Apache2::Const -compile => qw( OK ); use CGI; use strict; # main sub handler { # initialize my $r = shift; my $cgi = CGI->new; my $id = substr( $r->uri, length $r->location ); # wants RDF if ( $cgi->Accept( ‘text/html’ )) { print $cgi->header( -status => ’303 See Other’, -Location => PAGES . $id . ‘.html’, -Vary => ‘Accept’ ) } # give them RDF else { print $cgi->header( -status => ’303 See Other’, -Location => DATA . $id . ‘.rdf’, -Vary => ‘Accept’, “Content-Type” => ‘application/rdf+xml’ ) } # done return Apache2::Const::OK; } 1; # return true or die

Morgan, Eric Lease: LiAM source code: Perl poetry

planet code4lib - Mon, 2014-02-17 04:40

#!/usr/bin/perl # Liam Guidebook Source Code; Perl poetry, sort of # Eric Lease Morgan <emorgan@nd.edu> # February 16, 2014 # done exit;

#!/usr/bin/perl # marc2rdf.pl – make MARC records accessible via linked data # Eric Lease Morgan <eric_morgan@infomotions.com> # December 5, 2013 – first cut; # configure use constant ROOT => ‘/disk01/www/html/main/sandbox/liam’; use constant MARC => ROOT . ‘/src/marc/’; use constant DATA => ROOT . ‘/data/’; use constant PAGES => ROOT . ‘/pages/’; use constant MARC2HTML => ROOT . ‘/etc/MARC21slim2HTML.xsl’; use constant MARC2MODS => ROOT . ‘/etc/MARC21slim2MODS3.xsl’; use constant MODS2RDF => ROOT . ‘/etc/mods2rdf.xsl’; use constant MAXINDEX => 100; # require use IO::File; use MARC::Batch; use MARC::File::XML; use strict; use XML::LibXML; use XML::LibXSLT; # initialize my $parser = XML::LibXML->new; my $xslt = XML::LibXSLT->new; # process each record in the MARC directory my @files = glob MARC . “*.marc”; for ( 0 .. $#files ) { # re-initialize my $marc = $files[ $_ ]; my $handle = IO::File->new( $marc ); binmode( STDOUT, ‘:utf8′ ); binmode( $handle, ‘:bytes’ ); my $batch = MARC::Batch->new( ‘USMARC’, $handle ); $batch->warnings_off; $batch->strict_off; my $index = 0; # process each record in the batch while ( my $record = $batch->next ) { # get marcxml my $marcxml = $record->as_xml_record; my $_001 = $record->field( ’001′ )->as_string; $_001 =~ s/_//; $_001 =~ s/ +//; $_001 =~ s/-+//; print ” marc: $marc\n”; print ” identifier: $_001\n”; print ” URI: http://infomotions.com/sandbox/liam/id/$_001\n”; # re-initialize and sanity check my $output = PAGES . “$_001.html”; if ( ! -e $output or -s $output == 0 ) { # transform marcxml into html print ” HTML: $output\n”; my $source = $parser->parse_string( $marcxml ) or warn $!; my $style = $parser->parse_file( MARC2HTML ) or warn $!; my $stylesheet = $xslt->parse_stylesheet( $style ) or warn $!; my $results = $stylesheet->transform( $source ) or warn $!; my $html = $stylesheet->output_string( $results ); &save( $output, $html ); } else { print ” HTML: skipping\n” } # re-initialize and sanity check my $output = DATA . “$_001.rdf”; if ( ! -e $output or -s $output == 0 ) { # transform marcxml into mods my $source = $parser->parse_string( $marcxml ) or warn $!; my $style = $parser->parse_file( MARC2MODS ) or warn $!; my $stylesheet = $xslt->parse_stylesheet( $style ) or warn $!; my $results = $stylesheet->transform( $source ) or warn $!; my $mods = $stylesheet->output_string( $results ); # transform mods into rdf print ” RDF: $output\n”; $source = $parser->parse_string( $mods ) or warn $!; my $style = $parser->parse_file( MODS2RDF ) or warn $!; my $stylesheet = $xslt->parse_stylesheet( $style ) or warn $!; my $results = $stylesheet->transform( $source ) or warn $!; my $rdf = $stylesheet->output_string( $results ); &save( $output, $rdf ); } else { print ” RDF: skipping\n” } # prettify print “\n”; # increment and check $index++; last if ( $index > MAXINDEX ) } } # done exit; sub save { open F, ‘ > ‘ . shift or die $!; binmode( F, ‘:utf8′ ); print F shift; close F; return; }

#!/usr/bin/perl # ead2rdf.pl – make EAD files accessible via linked data # Eric Lease Morgan <eric_morgan@infomotions.com> # December 6, 2013 – based on marc2linkedata.pl # configure use constant ROOT => ‘/disk01/www/html/main/sandbox/liam’; use constant EAD => ROOT . ‘/src/ead/’; use constant DATA => ROOT . ‘/data/’; use constant PAGES => ROOT . ‘/pages/’; use constant EAD2HTML => ROOT . ‘/etc/ead2html.xsl’; use constant EAD2RDF => ROOT . ‘/etc/ead2rdf.xsl’; use constant SAXON => ‘java -jar /disk01/www/html/main/sandbox/liam/bin/saxon.jar -s:##SOURCE## -xsl:##XSL## -o:##OUTPUT##’; # require use strict; use XML::XPath; use XML::LibXML; use XML::LibXSLT; # initialize my $saxon = ”; my $xsl = ”; my $parser = XML::LibXML->new; my $xslt = XML::LibXSLT->new; # process each record in the EAD directory my @files = glob EAD . “*.xml”; for ( 0 .. $#files ) { # re-initialize my $ead = $files[ $_ ]; print ” EAD: $ead\n”; # get the identifier my $xpath = XML::XPath->new( filename => $ead ); my $identifier = $xpath->findvalue( ‘/ead/eadheader/eadid’ ); $identifier =~ s/[^\w ]//g; print ” identifier: $identifier\n”; print ” URI: http://infomotions.com/sandbox/liam/id/$identifier\n”; # re-initialize and sanity check my $output = PAGES . “$identifier.html”; if ( ! -e $output or -s $output == 0 ) { # transform marcxml into html print ” HTML: $output\n”; my $source = $parser->parse_file( $ead ) or warn $!; my $style = $parser->parse_file( EAD2HTML ) or warn $!; my $stylesheet = $xslt->parse_stylesheet( $style ) or warn $!; my $results = $stylesheet->transform( $source ) or warn $!; my $html = $stylesheet->output_string( $results ); &save( $output, $html ); } else { print ” HTML: skipping\n” } # re-initialize and sanity check my $output = DATA . “$identifier.rdf”; if ( ! -e $output or -s $output == 0 ) { # create saxon command, and save rdf print ” RDF: $output\n”; $saxon = SAXON; $xsl = EAD2RDF; $saxon =~ s/##SOURCE##/$ead/e; $saxon =~ s/##XSL##/$xsl/e; $saxon =~ s/##OUTPUT##/$output/e; system $saxon; } else { print ” RDF: skipping\n” } # prettify print “\n”; } # done exit; sub save { open F, ‘ > ‘ . shift or die $!; binmode( F, ‘:utf8′ ); print F shift; close F; return; }

#!/usr/bin/perl # store-make.pl – simply initialize an RDF triple store # Eric Lease Morgan <eric_morgan@infomotions.com> # # December 14, 2013 – after wrestling with wilson for most of the day # configure use constant ETC => ‘/disk01/www/html/main/sandbox/liam/etc/’; # require use strict; use RDF::Redland; # sanity check my $db = $ARGV[ 0 ]; if ( ! $db ) { print “Usage: $0 <db>\n”; exit; } # do the work; brain-dead my $etc = ETC; my $store = RDF::Redland::Storage->new( ‘hashes’, $db, “new=’yes’, hash-type=’bdb’, dir=’$etc’” ); die “Unable to create store ($!)” unless $store; my $model = RDF::Redland::Model->new( $store, ” ); die “Unable to create model ($!)” unless $model; # “save” $store = undef; $model = undef; # done exit;

#!/user/bin/perl # store-add.pl – add items to an RDF triple store # Eric Lease Morgan <eric_morgan@infomotions.com> # # December 14, 2013 – after wrestling with wilson for most of the day # configure use constant ETC => ‘/disk01/www/html/main/sandbox/liam/etc/’; # require use strict; use RDF::Redland; # sanity check #1 – command line arguments my $db = $ARGV[ 0 ]; my $file = $ARGV[ 1 ]; if ( ! $db or ! $file ) { print “Usage: $0 <db> <file>\n”; exit; } # sanity check #2 – store exists die “Error: po2s file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-po2s.db’ ); die “Error: so2p file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-so2p.db’ ); die “Error: sp2o file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-sp2o.db’ ); # open the store my $etc = ETC; my $store = RDF::Redland::Storage->new( ‘hashes’, $db, “new=’no’, hash-type=’bdb’, dir=’$etc’” ); die “Error: Unable to open store ($!)” unless $store; my $model = RDF::Redland::Model->new( $store, ” ); die “Error: Unable to create model ($!)” unless $model; # sanity check #3 – file exists die “Error: $file not found.\n” if ( ! -e $file ); # parse a file and add it to the store my $uri = RDF::Redland::URI->new( “file:$file” ); my $parser = RDF::Redland::Parser->new( ‘rdfxml’, ‘application/rdf+xml’ ); die “Error: Failed to find parser ($!)\n” if ( ! $parser ); my $stream = $parser->parse_as_stream( $uri, $uri ); my $count = 0; while ( ! $stream->end ) { $model->add_statement( $stream->current ); $count++; $stream->next; } # echo the result warn “Namespaces:\n”; my %namespaces = $parser->namespaces_seen; while ( my ( $prefix, $uri ) = each %namespaces ) { warn ” prefix: $prefix\n”; warn ‘ uri: ‘ . $uri->as_string . “\n”; warn “\n”; } warn “Added $count statements\n”; # “save” $store = undef; $model = undef; # done exit; 10.5 store-search.pl – query a triple store # Eric Lease Morgan <eric_morgan@infomotions.com> # December 14, 2013 – after wrestling with wilson for most of the day # configure use constant ETC => ‘/disk01/www/html/main/sandbox/liam/etc/’; my %namespaces = ( “crm” => “http://erlangen-crm.org/current/”, “dc” => “http://purl.org/dc/elements/1.1/”, “dcterms” => “http://purl.org/dc/terms/”, “event” => “http://purl.org/NET/c4dm/event.owl#”, “foaf” => “http://xmlns.com/foaf/0.1/”, “lode” => “http://linkedevents.org/ontology/”, “lvont” => “http://lexvo.org/ontology#”, “modsrdf” => “http://simile.mit.edu/2006/01/ontologies/mods3#”, “ore” => “http://www.openarchives.org/ore/terms/”, “owl” => “http://www.w3.org/2002/07/owl#”, “rdf” => “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “rdfs” => “http://www.w3.org/2000/01/rdf-schema#”, “role” => “http://simile.mit.edu/2006/01/roles#”, “skos” => “http://www.w3.org/2004/02/skos/core#”, “time” => “http://www.w3.org/2006/time#”, “timeline” => “http://purl.org/NET/c4dm/timeline.owl#”, “wgs84_pos” => “http://www.w3.org/2003/01/geo/wgs84_pos#” ); # require use strict; use RDF::Redland; # sanity check #1 – command line arguments my $db = $ARGV[ 0 ]; my $query = $ARGV[ 1 ]; if ( ! $db or ! $query ) { print “Usage: $0 <db> <query>\n”; exit; } # sanity check #2 – store exists die “Error: po2s file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-po2s.db’ ); die “Error: so2p file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-so2p.db’ ); die “Error: sp2o file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-sp2o.db’ ); # open the store my $etc = ETC; my $store = RDF::Redland::Storage->new( ‘hashes’, $db, “new=’no’, hash-type=’bdb’, dir=’$etc’” ); die “Error: Unable to open store ($!)” unless $store; my $model = RDF::Redland::Model->new( $store, ” ); die “Error: Unable to create model ($!)” unless $model; # search #my $sparql = RDF::Redland::Query->new( “CONSTRUCT { ?a ?b ?c } WHERE { ?a ?b ?c }”, undef, undef, “sparql” ); my $sparql = RDF::Redland::Query->new( “PREFIX modsrdf: <http://simile.mit.edu/2006/01/ontologies/mods3#>\nSELECT ?a ?b ?c WHERE { ?a modsrdf:$query ?c }”, undef, undef, ‘sparql’ ); my $results = $model->query_execute( $sparql ); print $results->to_string; # done exit;

#!/usr/bin/perl # store-dump.pl – output the content of store as RDF/XML # Eric Lease Morgan <eric_morgan@infomotions.com> # # December 14, 2013 – after wrestling with wilson for most of the day # configure use constant ETC => ‘/disk01/www/html/main/sandbox/liam/etc/’; # require use strict; use RDF::Redland; # sanity check #1 – command line arguments my $db = $ARGV[ 0 ]; my $uri = $ARGV[ 1 ]; if ( ! $db ) { print “Usage: $0 <db> <uri>\n”; exit; } # sanity check #2 – store exists die “Error: po2s file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-po2s.db’ ); die “Error: so2p file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-so2p.db’ ); die “Error: sp2o file not found. Make a store?\n” if ( ! -e ETC . $db . ‘-sp2o.db’ ); # open the store my $etc = ETC; my $store = RDF::Redland::Storage->new( ‘hashes’, $db, “new=’no’, hash-type=’bdb’, dir=’$etc’” ); die “Error: Unable to open store ($!)” unless $store; my $model = RDF::Redland::Model->new( $store, ” ); die “Error: Unable to create model ($!)” unless $model; # do the work my $serializer = RDF::Redland::Serializer->new; print $serializer->serialize_model_to_string( RDF::Redland::URI->new, $model ); # done exit;

#!/usr/bin/perl # sparql.pl – a brain-dead, half-baked SPARQL endpoint # Eric Lease Morgan <eric_morgan@infomotions.com> # December 15, 2013 – first investigations # require use CGI; use CGI::Carp qw( fatalsToBrowser ); use RDF::Redland; use strict; # initialize my $cgi = CGI->new; my $query = $cgi->param( ‘query’ ); if ( ! $query ) { print $cgi->header; print &home } else { # open the store for business my $store = RDF::Redland::Storage->new( ‘hashes’, ‘store’, “new=’no’, hash-type=’bdb’, dir=’/disk01/www/html/main/sandbox/liam/etc’” ); my $model = RDF::Redland::Model->new( $store, ” ); # search my $results = $model->query_execute( RDF::Redland::Query->new( $query, undef, undef, ‘sparql’ ) ); # return the results print $cgi->header( -type => ‘application/xml’ ); print $results->to_string; } # done exit; sub home { # create a list namespaces my $namespaces = &namespaces; my $list = ”; foreach my $prefix ( sort keys $namespaces ) { my $uri = $$namespaces{ $prefix }; $list .= $cgi->li( “$prefix – ” . $cgi->a( { href=> $uri, target => ‘_blank’ }, $uri ) ); } $list = $cgi->ol( $list ); # return a home page return <<EOF <html> <head> <title>LiAM SPARQL Endpoint</title> </head> <body style=’margin: 7%’> <h1>LiAM SPARQL Endpoint</h1> <p>This is a brain-dead and half-baked SPARQL endpoint to a subset of LiAM linked data. Enter a query, but there is the disclaimer. Errors will probably happen because of SPARQL syntax errors. Remember, the interface is brain-dead. Your milage <em>will</em> vary.</p> <form method=’GET’ action=’./’> <textarea style=’font-size: large’ rows=’5′ cols=’65′ name=’query’ /> PREFIX hub:<http://data.archiveshub.ac.uk/def/> SELECT ?uri WHERE { ?uri ?o hub:FindingAid } </textarea><br /> <input type=’submit’ value=’Search’ /> </form> <p>Here are a few sample queries:</p> <ul> <li>Find all triples with RDF Schema labels – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+rdf%3A%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0ASELECT+*+WHERE+%7B+%3Fs+rdf%3Alabel+%3Fo+%7D%0D%0A”>PREFIX rdf:<http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?s rdf:label ?o }</a></code></li> <li>Find all items with MODS subjects – <code><a href=’http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+mods%3A%3Chttp%3A%2F%2Fsimile.mit.edu%2F2006%2F01%2Fontologies%2Fmods3%23%3E%0D%0ASELECT+*+WHERE+%7B+%3Fs+mods%3Asubject+%3Fo+%7D’>PREFIX mods:<http://simile.mit.edu/2006/01/ontologies/mods3#> SELECT * WHERE { ?s mods:subject ?o }</a></code></li> <li>Find every unique predicate – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=SELECT+DISTINCT+%3Fp+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D”>SELECT DISTINCT ?p WHERE { ?s ?p ?o }</a></code></li> <li>Find everything – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=SELECT+*+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D”>SELECT * WHERE { ?s ?p ?o }</a></code></li> <li>Find all classes – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=SELECT+DISTINCT+%3Fclass+WHERE+%7B+%5B%5D+a+%3Fclass+%7D+ORDER+BY+%3Fclass”>SELECT DISTINCT ?class WHERE { [] a ?class } ORDER BY ?class</a></code></li> <li>Find all properties – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=SELECT+DISTINCT+%3Fproperty%0D%0AWHERE+%7B+%5B%5D+%3Fproperty+%5B%5D+%7D%0D%0AORDER+BY+%3Fproperty”>SELECT DISTINCT ?property WHERE { [] ?property [] } ORDER BY ?property</a></code></li> <li>Find URIs of all finding aids – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+hub%3A%3Chttp%3A%2F%2Fdata.archiveshub.ac.uk%2Fdef%2F%3E+SELECT+%3Furi+WHERE+%7B+%3Furi+%3Fo+hub%3AFindingAid+%7D”>PREFIX hub:<http://data.archiveshub.ac.uk/def/> SELECT ?uri WHERE { ?uri ?o hub:FindingAid }</a></code></li> <li>Find URIs of all MARC records – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+mods%3A%3Chttp%3A%2F%2Fsimile.mit.edu%2F2006%2F01%2Fontologies%2Fmods3%23%3E+SELECT+%3Furi+WHERE+%7B+%3Furi+%3Fo+mods%3ARecord+%7D%0D%0A%0D%0A%0D%0A”>PREFIX mods:<http://simile.mit.edu/2006/01/ontologies/mods3#> SELECT ?uri WHERE { ?uri ?o mods:Record }</a></code></li> <li>Find all URIs of all collections – <code><a href=”http://infomotions.com/sandbox/liam/sparql/?query=PREFIX+mods%3A%3Chttp%3A%2F%2Fsimile.mit.edu%2F2006%2F01%2Fontologies%2Fmods3%23%3E%0D%0APREFIX+hub%3A%3Chttp%3A%2F%2Fdata.archiveshub.ac.uk%2Fdef%2F%3E%0D%0ASELECT+%3Furi+WHERE+%7B+%7B+%3Furi+%3Fo+hub%3AFindingAid+%7D+UNION+%7B+%3Furi+%3Fo+mods%3ARecord+%7D+%7D%0D%0AORDER+BY+%3Furi%0D%0A”>PREFIX mods:<http://simile.mit.edu/2006/01/ontologies/mods3#> PREFIX hub:<http://data.archiveshub.ac.uk/def/> SELECT ?uri WHERE { { ?uri ?o hub:FindingAid } UNION { ?uri ?o mods:Record } } ORDER BY ?uri</a></code></li> </ul> <p>This is a list of ontologies (namespaces) used in the triple store as predicates:</p> $list <p>For more information about SPARQL, see:</p> <ol> <li><a href=”http://www.w3.org/TR/rdf-sparql-query/” target=”_blank”>SPARQL Query Language for RDF</a> from the W3C</li> <li><a href=”http://en.wikipedia.org/wiki/SPARQL” target=”_blank”>SPARQL</a> from Wikipedia</li> </ol> <p>Source code — <a href=”http://infomotions.com/sandbox/liam/bin/sparql.pl”>sparql.pl</a> — is available online.</p> <hr /> <p> <a href=”mailto:eric_morgan\@infomotions.com”>Eric Lease Morgan <eric_morgan\@infomotions.com></a><br /> January 6, 2014 </p> </body> </html> EOF } sub namespaces { my %namespaces = ( “crm” => “http://erlangen-crm.org/current/”, “dc” => “http://purl.org/dc/elements/1.1/”, “dcterms” => “http://purl.org/dc/terms/”, “event” => “http://purl.org/NET/c4dm/event.owl#”, “foaf” => “http://xmlns.com/foaf/0.1/”, “lode” => “http://linkedevents.org/ontology/”, “lvont” => “http://lexvo.org/ontology#”, “modsrdf” => “http://simile.mit.edu/2006/01/ontologies/mods3#”, “ore” => “http://www.openarchives.org/ore/terms/”, “owl” => “http://www.w3.org/2002/07/owl#”, “rdf” => “http://www.w3.org/1999/02/22-rdf-syntax-ns#”, “rdfs” => “http://www.w3.org/2000/01/rdf-schema#”, “role” => “http://simile.mit.edu/2006/01/roles#”, “skos” => “http://www.w3.org/2004/02/skos/core#”, “time” => “http://www.w3.org/2006/time#”, “timeline” => “http://purl.org/NET/c4dm/timeline.owl#”, “wgs84_pos” => “http://www.w3.org/2003/01/geo/wgs84_pos#” ); return \%namespaces; }

# package Apache2::LiAM::Dereference; # Dereference.pm – Redirect user-agents based on value of URI. # Eric Lease Morgan <eric_morgan@infomotions.com> # December 7, 2013 – first investigations; based on Apache2::Alex::Dereference # configure use constant PAGES => ‘http://infomotions.com/sandbox/liam/pages/’; use constant DATA => ‘http://infomotions.com/sandbox/liam/data/’; # require use Apache2::Const -compile => qw( OK ); use CGI; use strict; # main sub handler { # initialize my $r = shift; my $cgi = CGI->new; my $id = substr( $r->uri, length $r->location ); # wants RDF if ( $cgi->Accept( ‘text/html’ )) { print $cgi->header( -status => ’303 See Other’, -Location => PAGES . $id . ‘.html’, -Vary => ‘Accept’ ) } # give them RDF else { print $cgi->header( -status => ’303 See Other’, -Location => DATA . $id . ‘.rdf’, -Vary => ‘Accept’, “Content-Type” => ‘application/rdf+xml’ ) } # done return Apache2::Const::OK; } 1; # return true or die

Hess, M Ryan: Screen Shot 2014-02-13 at 12.52.53 PM

planet code4lib - Fri, 2014-02-14 19:00

Image by Surian Soosay

Okay, so it is likely impossible to actually “use” the Internet without it “using” you back. I get that. Terms of service get changed without clear explanation, cookies get saved, NSA snoops do what NSA snoops do. The whole business model of the Interwebs is set up to trade your info for access.

I’m under no illusions.

But, after the Great Target Hack and Edward Snowden’s revelations regarding the NSA (I think we were all waiting for these things to happen), I’m finding myself rethinking the trade offs I made concerning privacy and online anonymity for online convenience (and laziness).

There was a time, when I used to block cookies and obsess over terms of service agreements. Hell, I even used Tor from time to time.

But, after awhile, it just became easier to stop worrying and learn to accept a level of personally sanctioned data breach. But now with all the stories of identity theft, commercialization of your personal info and multi-governmental and corporate sweeps of such data…it’s time for a little reflection…and retreat.

So, I’ve decided to experiment with reducing my digital footprint and I’ll post updates from time to time on how’s it going, in addition to my occasional posts on library projects.

Among my experiments, I’m planning on moving out of Googlelandia as much as possible, starting with changing the default search in my browser and moving back to Firefox. I’ll cover the Firefox post next time, but for now, let’s look at life without Google Search.

Most people online probably don’t remember a world before Google and those that do, don’t want to remember. Needless to say, Google’s initial search algorithm was so good, that it rapidly conquered the search market to the point that Yahoo! handed over its search to Microsoft and the dozens of smaller search engines were quickly forgotten. Anyone remember Web Crawler? Exactly!

Aside from Bing (hack!) and the Bing-lite Yahoo! search, there really aren’t many alternatives worth turning to when one needs anonymity. That is, except for DuckDuckGo, a search engine that uses secure HTTPS, does not use cookies by default and generally does not collect any data linked to you (see their privacy statement for more info).

And the search results are not that bad.

But they aren’t great.

Life on DuckDuckGo will be very reminiscent of the best old-school search tools from the pre-Google 90s. Gone will be the kinds of results that require an analysis of your personal search history, online social habits and analysis of your cookies. Often you’ll get exactly what you’re after, but just as often, you’ll get it a few results lower on the page, just below some commercial sites that are using keyword tricks to rise to the top.

For example, I’m thinking about what color scheme I want to go with for my new flat and used DuckDuckGo to find sites that could help me with that. So I did a search for something like: “paint interior design color tools.” The first result led to a 404 page. The second result was not too bad, a Benjamin Moore paint selecting tool for professional painters. Other results were somewhere between these two extremes, with many of them going to pages that were slightly relevant but failed in the “authoritative” category.

Google expends a lot of effort at weeding out, or drowning out, pages with low street cred, and you’ll probably hardly ever get to a 404 page thanks to their very busy and persistent robots. Something else that will be hard to find in Google is nothing. In Google, the dreaded “Sorry. No results were found” message would be an amazing and rare feat of your talents for obscurity. Not so in DuckDuckGo…these come up from time to time.

DuckDuckGo also lacks an image and video search functionality. For this, they provide a dropdown that lets you search via Google or Bing.

I’d also add, that I’m using DuckDuckGo in a Firefox omnibar plugin, so as I type, I get suggested hits. These are also not as accurate or relevant as the Google version, but I’ve also limited it by not preserving any search history in Firefox.

After a few days of trying this out, I do like DuckDuckGo enough to keep using it, but I have had several lapses of risky searches on Google. This is especially true for professional work, where Google knows my work interests quite well and serves up exactly what I need. But for general searches, DuckDuckGo is a good tradeoff for privacy wonks.

Stay tuned for more journeys off the grid including my return to Firefox and experiments with thumb drive applications…


Hess, M Ryan: Screen Shot 2014-02-13 at 12.52.53 PM

planet code4lib - Fri, 2014-02-14 19:00

Image by Surian Soosay

Okay, so it is likely impossible to actually “use” the Internet without it “using” you back. I get that. Terms of service get changed without clear explanation, cookies get saved, NSA snoops do what NSA snoops do. The whole business model of the Interwebs is set up to trade your info for access.

I’m under no illusions.

But, after the Great Target Hack and Edward Snowden’s revelations regarding the NSA (I think we were all waiting for these things to happen), I’m finding myself rethinking the trade offs I made concerning privacy and online anonymity for online convenience (and laziness).

There was a time, when I used to block cookies and obsess over terms of service agreements. Hell, I even used Tor from time to time.

But, after awhile, it just became easier to stop worrying and learn to accept a level of personally sanctioned data breach. But now with all the stories of identity theft, commercialization of your personal info and multi-governmental and corporate sweeps of such data…it’s time for a little reflection…and retreat.

So, I’ve decided to experiment with reducing my digital footprint and I’ll post updates from time to time on how’s it going, in addition to my occasional posts on library projects.

Among my experiments, I’m planning on moving out of Googlelandia as much as possible, starting with changing the default search in my browser and moving back to Firefox. I’ll cover the Firefox post next time, but for now, let’s look at life without Google Search.

Most people online probably don’t remember a world before Google and those that do, don’t want to remember. Needless to say, Google’s initial search algorithm was so good, that it rapidly conquered the search market to the point that Yahoo! handed over its search to Microsoft and the dozens of smaller search engines were quickly forgotten. Anyone remember Web Crawler? Exactly!

Aside from Bing (hack!) and the Bing-lite Yahoo! search, there really aren’t many alternatives worth turning to when one needs anonymity. That is, except for DuckDuckGo, a search engine that uses secure HTTPS, does not use cookies by default and generally does not collect any data linked to you (see their privacy statement for more info).

And the search results are not that bad.

But they aren’t great.

Life on DuckDuckGo will be very reminiscent of the best old-school search tools from the pre-Google 90s. Gone will be the kinds of results that require an analysis of your personal search history, online social habits and analysis of your cookies. Often you’ll get exactly what you’re after, but just as often, you’ll get it a few results lower on the page, just below some commercial sites that are using keyword tricks to rise to the top.

For example, I’m thinking about what color scheme I want to go with for my new flat and used DuckDuckGo to find sites that could help me with that. So I did a search for something like: “paint interior design color tools.” The first result led to a 404 page. The second result was not too bad, a Benjamin Moore paint selecting tool for professional painters. Other results were somewhere between these two extremes, with many of them going to pages that were slightly relevant but failed in the “authoritative” category.

Google expends a lot of effort at weeding out, or drowning out, pages with low street cred, and you’ll probably hardly ever get to a 404 page thanks to their very busy and persistent robots. Something else that will be hard to find in Google is nothing. In Google, the dreaded “Sorry. No results were found” message would be an amazing and rare feat of your talents for obscurity. Not so in DuckDuckGo…these come up from time to time.

DuckDuckGo also lacks an image and video search functionality. For this, they provide a dropdown that lets you search via Google or Bing.

I’d also add, that I’m using DuckDuckGo in a Firefox omnibar plugin, so as I type, I get suggested hits. These are also not as accurate or relevant as the Google version, but I’ve also limited it by not preserving any search history in Firefox.

After a few days of trying this out, I do like DuckDuckGo enough to keep using it, but I have had several lapses of risky searches on Google. This is especially true for professional work, where Google knows my work interests quite well and serves up exactly what I need. But for general searches, DuckDuckGo is a good tradeoff for privacy wonks.

Stay tuned for more journeys off the grid including my return to Firefox and experiments with thumb drive applications…


ALA Equitable Access to Electronic Content: ALA joins WifiForward initiative

planet code4lib - Fri, 2014-02-14 18:59

Ten years ago, only about 18 percent of public libraries offered free public access to Wi-Fi. Now it’s nearly ubiquitous in communities of all sizes. Wireless access not only enables our library patrons to bring their own devices to access the internet and digital content (sometimes from our sidewalks and parking lots), but it also enables libraries to improve and expand our technology services through mobile laptop labs, self-checkout, and even new pilots experimenting with using TV white space to extend our reach further into our communities.

Our school and higher education libraries also are meeting students and educators “where they are” – increasingly on their smart phones and other mobile devices. Wi-Fi is increasingly critical for enabling easy access to our collections and services across all types of libraries. In fact, more Internet traffic is carried over Wi-Fi than any other path in the United States.

Wi-Fi runs on unlicensed spectrum, parts of the radio frequencies that anyone can use as long as the technical rules established by the Federal Communications Commission (FCC) are followed. Recent analyses indicate that Wi-Fi in our homes, businesses, libraries and schools is becoming congested by a deluge of data from more devices, applications and services connecting to the Internet without wires. Cisco predicts that by 2017, Wi-Fi will handle a majority of all data consumers’ access from the Internet.
This is why the American Library Association (ALA) has joined a new coalition calling on policymakers to unleash unlicensed spectrum for Wi-Fi and other uses.

WifiForward is an ad hoc group of companies, organizations and public sector institutions working to alleviate the Wi-Fi spectrum crunch. In addition to ALA, members include the Schools, Health and Libraries Broadband (SHLB) Coalition, Google, Comcast, Microsoft, the Consumer Electronics Association, and the International Association of Venue Managers.

The coalition will marshal support to:

  • Protect and strengthen existing unlicensed spectrum designations,
  • Free up new spectrum for unlicensed use, and
  • Establish investment-friendly, transparent and predictable unlicensed rules that encourage growth and deployment.

The FCC has several opportunities to make more spectrum available: the auction of TV spectrum, determining how to share and apportion parts of the 3.5 GHz spectrum, and an ongoing proceeding in the 5 GHz band. The WifiForward coalition will work to raise public awareness about the importance of unlicensed spectrum to support the next-generation of technologies and the emerging “Internet of Things.” This work also aligns with the focus of the ALA’s Policy Revolution! initiative to expand and deepen its policy engagement and outreach to key stakeholders.

Providing and leveraging Wi-Fi is an increasingly importance part of ensuring equitable access to information in the United States, and the ALA is pleased to be a part of this important effort.

The post ALA joins WifiForward initiative appeared first on District Dispatch.

ALA Equitable Access to Electronic Content: ALA, ARL and EDUCAUSE re-engage FCC on network neutrality

planet code4lib - Fri, 2014-02-14 00:27

Photo by Moore Library via flickr.

One of my first projects when I joined the ALA Office for Information Technology Policy was to work with colleagues in the Office of Government Relations to advocate around the development of what would become the Federal Communications Commission’s (FCC) Open Internet Order.

Not only were we speaking to the important principles of network neutrality, but we successfully argued to ensure libraries and higher education institutions were included (along with residential and business customers) in network neutrality provisions for the public internet.

My colleagues and partners in this work were already “old hands” (dating back to 2006) on this issue as the library and higher-education community had spoken out early on the importance a neutral and free internet for our students, faculty, general public and our staff. Among our arguments, we asserted that:

  • Libraries, colleges and universities depend on the intellectual freedom afforded by the Open Internet to develop content and applications that serve the public interest;
  • Libraries and higher education institutions are prolific providers of content, services and applications on the Open Internet;
  • Research libraries and institutions rely on the Open Internet as end-users to collaborate with and obtain content and services from outside sources; and
  • The ability to access library, college and university services should not depend on location.

While we felt the FCC’s Open Internet Order fell short in some areas, particularly with regard to mobile wireless services, ALA was pleased the order established a precedent that ISPs must keep the Internet open to library users and library content

Unfortunately, the U.S. Court of Appeals has made the old new again with its ruling to strike down most of the Open Internet Order on January 14. The court’s decision gives commercial companies the astounding legal authority to block Internet traffic, give preferential treatment to certain Internet services or applications, and steer users to or away from certain web sites based on their own commercial interests. At the same time, however, the court did recognize the FCC’s legal authority to protect the public’s access to Internet services.

FCC Chairman Wheeler has said that the court “invited the Commission to act to preserve a free and open Internet,” and he will soon release his plan to move forward.

This is good news for all of us who believe that preserving an open Internet is essential to our nation’s freedom of speech, educational achievement, and economic growth. In a letter to Chairman Wheeler today, ALA with ARL and EDUCAUSE, seek to work with the FCC in developing new policies that preserve network neutrality and incorporate the essential roles our institutions play in this area.

The arguments we made several years ago are only more true today in terms of ensuring equitable access to educational digital content. While digital learning was already well underway in 2010, it has become a far more important force in learning, particularly with the recent emergence of massive open online courses (MOOCs). Digital collections have been “kicked up a notch” with the emergence of the Digital Public Library of America. And more public libraries are enabling digital creation and distribution through their hands-on learning labs. We must ensure that these creative and research resources are not relegated to any Internet “slow lane” while others with deeper pockets are able to cut deals with ISPs to prioritize access to their offerings.

It’s time to restore the Open Internet!

The post ALA, ARL and EDUCAUSE re-engage FCC on network neutrality appeared first on District Dispatch.

Dempsey, Lorcan: Roses are red .... the top love stories?

planet code4lib - Thu, 2014-02-13 22:00

One of the nice things about WorldCat is that is has sufficient scale to be a good proxy for a large part of the scholarly and cultural record. The aggregate holdings of thousands of libraries contain not just books, but movies, music, and so on. It is not complete but it gives good results.

In honor of St Valentine's day, my colleagues JD Shipengrover and Diane Vizine-Goetz have produced a list of the most widely held love stories in libraries - the most widely held books and the most widely held movies.

Here are the lists ... It is interesting seeing the similarities between the two.

Books


  1. Pride and Prejudice

  2. Jane Eyre

  3. Wuthering Heights

  4. Emma

  5. Sense and Sensibility

  6. The Great Gatsby

  7. Anna Karenina

  8. The Return of the Native

  9. The Portrait of a Lady

  10. Mansfield Park


Movies


  1. Jane Eyre

  2. Gone With the Wind

  3. Pride and Prejudice

  4. Emma

  5. West Side Story

  6. Sense and Sensibility

  7. The Sound of Music

  8. Romeo & Juliet

  9. Titanic

  10. The Princess Bride

These lists are based on a fiction (books and movies) subset of Worldcat, using the genre data in the records. The data is clustered at the work level, to consolidate editions and so on. A little more context is at the list page.

Open Knowledge Foundation: River level data must be open

planet code4lib - Thu, 2014-02-13 12:09

My home – as you can see – is flooded, for the second time in a month. The mighty Thames is reclaiming its flood-plains, and making humans – especially the UK government’s Environment Agency – look puny and irrelevant. As I wade to and fro, putting sandbags around the doors, carrying valuables upstairs, and adding bricks to the stacks that prop up the heirloom piano, I occasionally check the river level data at the Agency website, and try to estimate how high the water will rise, and when.

There are thousands of river monitoring stations across the UK, recording water levels every few minutes. The Agency publishes the resulting data on its website, in pages like this. For each station it shows a graph of the level over the last 24 hours (actually, the 24 hours up to the last reported data: my local station stopped reporting three days ago, presumably overwhelmed by the water), and has some running text giving the current level in metres above a local datum. There’s a small amount of station metadata, and that’s all. No older data, and no tabular data. I can’t:

  • See the levels over the course of a previous flood;
  • Measure how quickly the river typically rises, or how long it typically takes to go down;
  • Compare today’s flood to that four weeks ago (or those in 2011 or 2003);
  • Easily navigate to the data for neighbouring stations up and down river;
  • Get a chart showing the river level, or river level anomalies, along the length of the Thames;
  • Get a chart comparing that longitudinal view of the flood with the situation at any previous time;
  • Make a maps mash-up showing river level anomalies across the Thames catchment;
  • Make a personalised chart by adding my own observations, or critical values (‘electrics cut out’, ‘front garden floods’, ‘water comes into house’, …);
  • Make a crowd-sourced flooding community site combining river level data, maps, pictures, observations, and advice (‘sandbags are now available at the village hall’);
  • Make a mash-up combining river level data with precipitation records;
  • Make a flood forecasting tool by combining historical river level, ground-water, and precipitation records with precipitation forecasts.

Most of these things (not the last!) would be a small matter of programming, if the data were available. The Thames Valley is teeming with programmers who would be interested in bashing together a quick web app; or taking part in a larger open-source project to deliver more detailed, more accessible, and more useful flood data. But if we want to do any of those things, we have to pay a licence fee to access the data, and the licence would apparently then require us to get pre-approval from the Environment Agency before releasing any ‘product’. All this for data which is gathered, curated, and managed by a part of the UK government, nominally for the benefit of all.

Admittedly I couldn’t do any of those things this week anyway – too many boxes to carry, too much furniture to prop up. But surely this is a prime example of the need for open data.

Király, Péter: Solr query facets in Europeana

planet code4lib - Wed, 2014-02-12 23:56

In Europeana we use Apache Solr for searching. Our data model is called EDM (Europeana Data Model), in which a real record* has two main parts: the metadata object, containing information about an objects stored in one of the 2400 cultural heritage institutions all over Europe, and the contextual entities, which stores information about the agents, places, concepts and timespans occured in the particular metadata object. This model has almost 200 fields, and in Solr we index all of them. We also have some special fields for facets, and we have some aggregated fields, which aggregates other fields, such as "who" field contains the metadata object's dc:creator, and dc:contributor, and the agent object's skos:prefLabel, skos:altLabel, and foaf:name fields, in order to provide the user a singe field for searching for personal names. For more information please consult our EDM and the Europeana API documentations.

One of Europeana's important aim is to make the rights statements of records clear and straightforward. You can imagine, that the 2400 partners have different approaches for licencing their objects, and right now in the database we have 60+ different licence types, in other words the RIGHTS facet has 60+ individual values. Some of them are language or version variations of the same CC licence. It turned out, that most of the users don't want to select from that range of options. And the thing is, that we can categorize these rights statements under 3 main categories:

  • freely resuable with attribution (CC0, CC BY, CC BY SA)
  • resable with some restrictions (CC BY NC, CC BY NC SA, CC BY NC ND, CC BY ND, OOC NC)
  • reusable only with permissions (licences of the Europeana Rights Framework)

What wanted to achive is to form a new facet from these options, but the most straighforward solution, i. e. to create a new field in Solr were not an easily implementable option, because it would require a full reindexing (it would be another blog entry which explains why that was not possible), so we have to search for another solution. To count the numbers belongs to the individual rights statements in the RIGHTS facet would work, but that's only good for displaying, and it doesn't cover the problem of user interaction. To use the RIGHTS field for search turn out to be risky, because it interferes with the RIGHTS facet, so that did not worked either. Finally we come up with a fake facet, which has two sides: one on the display side, and one on the search side.

Facets including the new reusability (”Can I use it?”) facet in Europeana.eu

To count the numbers we use a special Solr facet type: query facet. It is a simple, and at the same time a powerful solution. It doesn't gives you a list of existing field values with a number (which tells you how many records has those term given the main queries) as a normal facet. In the query facet the input is a query, and the returning value is a number, which tells you how many records fit the combination of the main query, and the query specified in the facet's parameter. Since we don't need to know the list of items in the categories, that's enough for us. We defined three queries:

  • RIGHTS:("CC0" OR "CC BY" OR "CC BY SA")
  • RIGHTS:("CC BY NC" OR "CC BY NC SA" OR "CC BY NC ND" OR "CC BY ND" OR "OOC NC")
  • RIGHTS:(NOT(
          "CC0" OR "CC BY" OR "CC BY SA"
    OR "CC BY NC" OR "CC BY NC SA" OR "CC BY NC ND" OR "CC BY ND" OR "OOC NC"))

In reallity we use URLs, and not string literals in the database, but the logic is the same. At the end of the blog entry I'll show you the real queries as well. There is a not well known gem in Solr: you can tag your parameters, and those tag will be in the return value. There are some tags, which has predefined meanings, but you can also add custom tags, which operationally will be ignored by Solr, so they won't affect the search itself. We use two attibute in our tag, id and ex:

&facet.query={!id=REUSABILITY:restricted ex=REUSABILITY} RIGHTS:("CC0" OR "CC BY" OR "CC BY SA")
  • ex - this is a standard tag, and stands for excluding. It means, that this query will exclude the filter tagged as REUSABILITY. This makes is possible, that when the user filters one of these 3 categories, he can see the numbers for all of them correctly.
  • id - a custom tag, we use it as an identifier. It helps us to identify the query when we retrieve the result. It is more easy to find it than the quite complicated Solr query. With a simple regex we can parse the query facets in the response and link the numbers to what it belongs to.

When the user selects an item in this reusability facet, the same query runs, but now as a filter. It effects the whole result set: the number of records, and the real facets. Its format is something like that:

&fq={!tag=REUSABILITY}RIGHTS:("CC0" OR "CC BY" OR "CC BY SA")
  • tag has the same role as id in the query facet. (The difference is that it is a standard Solr tag, and id is our custom solution. Unfortunatelly query facet doesn't support tag attribute, so we have to find a custom one.) We identify here this filter, and this filter will be ignored by those queries, which refers to this by the ex attribute.

All these Solr parameters runs on the background. On the Europeana portal we use a fake facet called ”REUSABILITY”, and we use it in our filtering parameter (&qf) as REUSABILITY:open, or REUSABILITY:restricted or REUSABILITY:permission. It is a shortcut for the lengthy query. We keep the interface (and the URL) clean. In the API we introduced the ”reusability” parameter with the same options as in the portal: "open", "restricted" and "permission" denotate the above mentioned categories:

http://europeana.eu/api/v2/search.json?wskey=[YOUR API KEY]&query=*:*&reusability=open

For those who interested, here is a real Solr query (slightly formatted for the sake of readability)

q=*:*
&fq={!tag=REUSABILITY}RIGHTS:(
     http\:\/\/creativecommons.org\/licenses\/by-nc\/*
  OR http\:\/\/creativecommons.org\/licenses\/by-nc-sa\/*
  OR http\:\/\/creativecommons.org\/licenses\/by-nc-nd\/*
  OR http\:\/\/creativecommons.org\/licenses\/by-nd\/*
  OR http\:\/\/www.europeana.eu\/rights\/out-of-copyright-non-commercial\/*)
&rows=12
&start=0
&sort=score desc
&timeAllowed=30000
&facet.mincount=1
&facet=true
&facet.field=UGC
&facet.field=LANGUAGE
&facet.field=TYPE
&facet.field=YEAR
&facet.field=PROVIDER
&facet.field=DATA_PROVIDER
&facet.field=COUNTRY
&facet.field=RIGHTS
&facet.limit=750
&facet.query={!id=REUSABILITY:open ex=REUSABILITY}RIGHTS:(
     http\:\/\/creativecommons.org\/publicdomain\/mark\/*
  OR http\:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/*
  OR http\:\/\/creativecommons.org\/licenses\/by\/*
  OR http\:\/\/creativecommons.org\/licenses\/by-sa\/*)
&facet.query={!id=REUSABILITY:restricted ex=REUSABILITY}RIGHTS:(
     http\:\/\/creativecommons.org\/licenses\/by-nc\/*
  OR http\:\/\/creativecommons.org\/licenses\/by-nc-sa\/*
  OR http\:\/\/creativecommons.org\/licenses\/by-nc-nd\/*
  OR http\:\/\/creativecommons.org\/licenses\/by-nd\/*
  OR http\:\/\/www.europeana.eu\/rights\/out-of-copyright-non-commercial\/*)
&facet.query={!id=REUSABILITY:permission ex=REUSABILITY}RIGHTS:(
  NOT(
        http\:\/\/creativecommons.org\/publicdomain\/mark\/*
     OR http\:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/*
     OR http\:\/\/creativecommons.org\/licenses\/by\/*
     OR http\:\/\/creativecommons.org\/licenses\/by-sa\/*
     OR http\:\/\/creativecommons.org\/licenses\/by-nc\/*
     OR http\:\/\/creativecommons.org\/licenses\/by-nc-sa\/*
     OR http\:\/\/creativecommons.org\/licenses\/by-nc-nd\/*
     OR http\:\/\/creativecommons.org\/licenses\/by-nd\/*
     OR http\:\/\/www.europeana.eu\/rights\/out-of-copyright-non-commercial\/*))

See it in action at Europeana.eu.

Notes

* Strictly speaking the EDM is based on linked data paradigm, so we don't have records the same ways as in a relational database. This is rather a named graph, but that's too technical, and we refer it as ”record” or ”object”.


Rochkind, Jonathan: Job in Systems department here at JHU

planet code4lib - Wed, 2014-02-12 20:32

We have a job open where I work. The position will support ILL (ILLiad), reserves (ARES), and EZProxy software,  as well as do programming to integrate and improve UX for those areas of library workflow as well as others.

Johns Hopkins University has an immediate opening for a Software Engineer position in the Sheridan Libraries and Museums.  This exciting opportunity is located at the Homewood Campus in Baltimore, Maryland.  The incumbent will primarily be responsible for administering, developing and maintaining library systems to support three main services for all of the Johns Hopkins libraries: electronic reserves, inter-library loan and access to licensed resources.  The incumbent integrates supported library systems, such as ILLiad, Ares and EZproxy, with other systems at the university (ie. JHED directory, Shibboleth and Blackboard), in the library (ie. Library management system, Horizon) and with 3rd party licensed resources (ie. Ebscohost and JSTOR).  The incumbent works as a member of the enterprise applications team in the Library Systems department.

For additional information about the position and to apply, visit http://jobs.jhu.edu .  Locate Job # 60195  and click “Apply.”  To be considered for this position, you must complete an online application.

Qualifications:
- Bachelor’s degree required
- Five years of related work experience with computer systems and applications.
- Experience with Windows Server, IIS, MS SQL Server
- Progressive experience with programming language
- Knowledge of library systems, such as ILLiad, Ares, EZproxy, Horizon, etc.

Johns Hopkins University is an equal opportunity/affirmative action employer committed to recruiting, supporting, and fostering a diverse community of outstanding faculty, staff, and students.  All applicants who share this goal are encouraged to apply.


Filed under: General

Open Knowledge Foundation: Who are you? Community Survey Results (Part 1)

planet code4lib - Wed, 2014-02-12 14:50

You are incredibility diverse and passionate. Last fall over 320 of you participated in our first OKF community-wide survey. You gave us an incredible view into you, your needs and how we at OKF can better support you. This is the first of three posts to show you: who you are, some analysis on your responses and, most importantly, how we are working to meet your feedback. Responses came from around the globe: Argentina to Indonesia to Norway to South Africa to the USA.

Today’s post is a few shiny examples to show you more about you. Without the community, OKF is just a green logo. We hope that you will enjoy this window into your OKF:

How would you describe your role in the open knowledge / open data world?

Why are you involved with or interested in the Open Knowledge Foundation? Do you work for, or closely with, any other organisation in the open data / open knowledge space?

How you define Open Knowledge:

Antti Poikola (Finland) defines Open Knowledge as: open data + open content + open collaborative ways to work/act share and develop shared knowledge

Why are you involved with or interested in the Open Knowledge Foundation?

Tune in for the next post all about your feedback and what you think is critical or needs improvement.

Thanks again to everyone who responded. And, for all you who continue to make a difference in the open world.

Ribaric, Tim: The Day We Fight Back Fallout

planet code4lib - Wed, 2014-02-12 14:24

 

Yesterday was The Day We Fight Back! So how did it go?

read more

Syndicate content