You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 2 hours 14 min ago

Evergreen ILS: Evergreen 2.9.6 and 2.10.5 released

Thu, 2016-06-16 03:21

We are pleased to announce the release of Evergreen 2.9.6 and 2.10.5, both bugfix releases.

Evergreen 2.9.6 fixes the following issues:

  • Emails sent using the Action Trigger SendEmail reactor now always MIME-encode the From, To, Subject, Bcc, Cc, Reply-To, and Sender headers. As a consequence, non-ASCII character in those fields are more likely to be displayed correctly in email clients.
  • Fixes the responsive view of the My Account Items Out screen so that Title and Author are now in separate columns.
  • Fixes an incorrect link for the MVF field definition and adds a new link to BRE in fm_IDL.xml.

Evergreen 2.10.5 fixes the following issues:

  • Fixes SIP2 failures with patron information messages when a patron has one or more blocking penalties that are not otherwise ignored.
  • Recovers a previously existing activity log entry that logged the username, authtoken, and workstation (when available) for successful logins.
  • Fixes an error that occurred when the system attempted to display a translated string for the “Has Local Copy” hold placement error message.
  • Fixes an issue where the Show More/Show Fewer Details button didn’t work in catalogs that default to showing more details.
  • Removes Social Security Number as a stock patron identification type for new installations. This fix does not change patron identification types for existing Evergreen systems.
  • Adds two missing link fields (patron profile and patron home library) to the fm_idl.xml for the Combined Active and Aged Circulations (combcirc) reporter source.
  • Adds a performance improvement for the “Clear Holds Shelf” checkin modifier.

Please visit the downloads page to retrieve the server software and staff clients

Evergreen ILS: Evergreen 2.9.6 and 2.10.5 released

Thu, 2016-06-16 03:21

We are pleased to announce the release of Evergreen 2.9.6 and 2.10.5, both bugfix releases.

Evergreen 2.9.6 fixes the following issues:

  • Emails sent using the Action Trigger SendEmail reactor now always MIME-encode the From, To, Subject, Bcc, Cc, Reply-To, and Sender headers. As a consequence, non-ASCII character in those fields are more likely to be displayed correctly in email clients.
  • Fixes the responsive view of the My Account Items Out screen so that Title and Author are now in separate columns.
  • Fixes an incorrect link for the MVF field definition and adds a new link to BRE in fm_IDL.xml.

Evergreen 2.10.5 fixes the following issues:

  • Fixes SIP2 failures with patron information messages when a patron has one or more blocking penalties that are not otherwise ignored.
  • Recovers a previously existing activity log entry that logged the username, authtoken, and workstation (when available) for successful logins.
  • Fixes an error that occurred when the system attempted to display a translated string for the “Has Local Copy” hold placement error message.
  • Fixes an issue where the Show More/Show Fewer Details button didn’t work in catalogs that default to showing more details.
  • Removes Social Security Number as a stock patron identification type for new installations. This fix does not change patron identification types for existing Evergreen systems.
  • Adds two missing link fields (patron profile and patron home library) to the fm_idl.xml for the Combined Active and Aged Circulations (combcirc) reporter source.
  • Adds a performance improvement for the “Clear Holds Shelf” checkin modifier.

Please visit the downloads page to retrieve the server software and staff clients

Cynthia Ng: Accessibility June Meetup (Vancouver) Notes

Thu, 2016-06-16 02:38
Notes from the June Accessibility Meetup presentations. AT-BC (Accessible Technology of BC) Providing assistive technology resources to make learning and working environments usable for people with disabilities. Examples of technology: * “handshake” mouse * microphone with direct to headphones setup * microphone with sound amplification/speaker behind audience. Tend to be more relaxed by decreasing stress … Continue reading Accessibility June Meetup (Vancouver) Notes

Cynthia Ng: Accessibility June Meetup (Vancouver) Notes

Thu, 2016-06-16 02:38
Notes from the June Accessibility Meetup presentations. AT-BC (Accessible Technology of BC) Providing assistive technology resources to make learning and working environments usable for people with disabilities. Examples of technology: * “handshake” mouse * microphone with direct to headphones setup * microphone with sound amplification/speaker behind audience. Tend to be more relaxed by decreasing stress … Continue reading Accessibility June Meetup (Vancouver) Notes

Terry Reese: MarcEdit Update

Wed, 2016-06-15 21:36

Last night, I posted an update squashing a couple bugs and adding some new features.  Here’s the change log:

* Bug Fix: Merge Records Tool: If the user defined field is a title, the merge doesn’t process correctly.
* Bug Fix: Z39.50 Batch Processing: If the source server provides data in UTF8, characters from multi-byte languages may be flattened.
* Bug Fix: ILS Integration..Local:  In the previous version, one of the libraries versions didn’t get updated and early beta testers had some trouble.
* Enhancement: Join Records — option added to process subdirectories.
* Enhancement: Batch Processing Tool — option added to process subdirectories
* Enhancement: Extract Selected Records — Allowing regular expressions as an option when processing file data.
* Enhancement: Alma Integration UI Improvements

Downloads can be picked up via the automated updating tool or via the downloads (http://marcedit.reeset.net/downloads) page.

 

–tr

Terry Reese: MarcEdit Update

Wed, 2016-06-15 21:36

Last night, I posted an update squashing a couple bugs and adding some new features.  Here’s the change log:

* Bug Fix: Merge Records Tool: If the user defined field is a title, the merge doesn’t process correctly.
* Bug Fix: Z39.50 Batch Processing: If the source server provides data in UTF8, characters from multi-byte languages may be flattened.
* Bug Fix: ILS Integration..Local:  In the previous version, one of the libraries versions didn’t get updated and early beta testers had some trouble.
* Enhancement: Join Records — option added to process subdirectories.
* Enhancement: Batch Processing Tool — option added to process subdirectories
* Enhancement: Extract Selected Records — Allowing regular expressions as an option when processing file data.
* Enhancement: Alma Integration UI Improvements

Downloads can be picked up via the automated updating tool or via the downloads (http://marcedit.reeset.net/downloads) page.

 

–tr

LITA: Jobs in Information Technology: June 15, 2016

Wed, 2016-06-15 19:35

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Midwestern University, Library Manager, Glendale, AZ

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

David Rosenthal: What took so long?

Wed, 2016-06-15 15:00
More than ten months ago I wrote Be Careful What You Wish For which, among other topics, discussed the deal between Elsevier and the University of Florida:
And those public-spirited authors who take the trouble to deposit their work in their institution's repository are likely to find that it has been outsourced to, wait for it, Elsevier! The ... University of Florida, is spearheading this surrender to the big publishers.Only now is the library community starting to notice that this deal is part of a consistent strategy by Elsevier and other major publishers to ensure that they, and only they, control the accessible copies of academic publications. Writing on this recently we have:
Barbara Fister writes:
librarians need to move quickly to collectively fund and/or build serious alternatives to corporate openwashing. It will take our time and money. It will require taking risks. It means educating ourselves about solutions while figuring out how to put our values into practice. It will mean making tradeoffs such as giving up immediate access for a few who might complain loudly about it in order to put real money and time into long-term solutions that may not work the first time around. It means treating equitable access to knowledge as our primary job, not as a frill to be worked on when we aren’t too busy with our “real” work of negotiating licenses, fixing broken link resolvers, and training students in the use of systems that will be unavailable to them once they graduate.Amen to all that, even if it is 10 months late. If librarians want to stop being Elsevier's minions they need to pay close, timely attention to what Elsevier is doing. Such as buying SSRN. How much would arXiv.org cost them?

David Rosenthal: What took so long?

Wed, 2016-06-15 15:00
More than ten months ago I wrote Be Careful What You Wish For which, among other topics, discussed the deal between Elsevier and the University of Florida:
And those public-spirited authors who take the trouble to deposit their work in their institution's repository are likely to find that it has been outsourced to, wait for it, Elsevier! The ... University of Florida, is spearheading this surrender to the big publishers.Only now is the library community starting to notice that this deal is part of a consistent strategy by Elsevier and other major publishers to ensure that they, and only they, control the accessible copies of academic publications. Writing on this recently we have:
Barbara Fister writes:
librarians need to move quickly to collectively fund and/or build serious alternatives to corporate openwashing. It will take our time and money. It will require taking risks. It means educating ourselves about solutions while figuring out how to put our values into practice. It will mean making tradeoffs such as giving up immediate access for a few who might complain loudly about it in order to put real money and time into long-term solutions that may not work the first time around. It means treating equitable access to knowledge as our primary job, not as a frill to be worked on when we aren’t too busy with our “real” work of negotiating licenses, fixing broken link resolvers, and training students in the use of systems that will be unavailable to them once they graduate.Amen to all that, even if it is 10 months late. If librarians want to stop being Elsevier's minions they need to pay close, timely attention to what Elsevier is doing. Such as buying SSRN. How much would arXiv.org cost them?

DPLA: Reflections on Community Currents at #DPLAfest

Wed, 2016-06-15 14:33

This guest post was written by T-Kay Sangwand, Librarian for Digital Collection Development, Digital Library Program, UCLA and DPLA + DLF ‘Cross-Pollinator.’ (Twitter: @tttkay)

As an information professional committed to social justice and employing a critical lens to examine the impact of our work, I always look forward to seeing how these principles and issues of diversity and representation of the profession and historical record are more widely discussed in national forums. In my new role as Librarian for Digital Collection Development at UCLA’s Digital Library Program, I grapple with how our work as a digital library can serve our predominantly people of color campus community within the larger Los Angeles context, a city also predominantly comprised of people of color. As a first time attendee to DPLAfest, I was particularly interested in how DPLA frames itself as a national digital library for a country that is projected to have a majority person of color population by 2060. I observed that the DPLAfest leadership did not yet reflect the country’s changing demographics. The opening panel featured eight speakers yet there was only one woman and two people of color.

The opening panel of DPLAfest was filled with many impressive statistics – over 13 million items in DPLA, over 1900 contributors, over 30 partners, over 100 primary source sets, with all 50 states represented by the collections. While these accomplishments merit celebration, I appreciated Dr. Kim Christen Withey’s Twitter comment that encourages us to consider alternate frameworks of success:

#DPLAfest lots of talk of numbers–presumably the bigger the better–how else can we think about success? esp in the digital content realm?

— Kim Christen Withey (@mukurtu) April 14, 2016

“Tech Trends in Libraries” panelists Carson Block, Alison Macrina, and John Resig discuss ‘big data’ and libraries. Photo by Jason Dixson

While the amount of materials or information we have access to is frequently used as a measure of success, several panels such as The People’s Archives: Communities and Documentation Strategy, Wax Works in the Age of Digital Reproduction: The Futures of Sharing Native/First Nations Cultural Heritage, and Technology Trends in Libraries encouraged nuanced discussions of success through its discussions around the complexities of access. The conversation between Alison Macrina of Library Freedom Project and John Resig of Khan Academy critically interrogated the celebration of big data. Macrina reminds libraries to ask the questions: Who owns big data? What is the potential for exploitation? Who has access? How do we negotiate questions of privacy for individuals yet not allow institutions to escape accountability?

The complexities of access and privacy were further explored in the community archives sessions. Community archivists Carol Steiner and Keith Wilson from the People’s Archive of Police Violence in Cleveland spoke on storytelling as a form of justice in the face of impunity but also the real concerns of retribution for archiving citizen stories of police abuse. Dr. Kim Christen Withey spoke on traditional knowledge labels and the Mukurtu content management system that privileges indigenous knowledge about their own communities and enables a continuum of access instead of a binary open/closed model of access. In both of these cases, exercising control over one’s self and community representation constitutes a form of agency in the face of symbolic annihilation that traditional archives and record keeping have historically wreaked on marginalized communities. Additionally, community investment in these documentation projects outside traditional library and archive spaces have been key to their sustainability. In light of this, Bergis Jules raised the important question of “what is or should be the role of large scale digital libraries, such as DPLA, in relation to community archives?” First and foremost, I think our role as information professionals is to listen to communities’ vision(s) for their historical materials; it’s only then that we may be able contribute to and support communities’ agency in documentation and representation. I’m grateful that participants created space within DPLA to have these nuanced discussions and I’m hopeful that community driven development can be a guiding principle in DPLA’s mission.

For a closer read of the aforementioned panels, see my Storify: Community Archives @ DPLAfest.

Special thanks to the Digital Library Federation for making the DPLAfest Cross-Pollinator grant possible.

DPLA: Reflections on Community Currents at #DPLAfest

Wed, 2016-06-15 14:33

This guest post was written by T-Kay Sangwand, Librarian for Digital Collection Development, Digital Library Program, UCLA and DPLA + DLF ‘Cross-Pollinator.’ (Twitter: @tttkay)

As an information professional committed to social justice and employing a critical lens to examine the impact of our work, I always look forward to seeing how these principles and issues of diversity and representation of the profession and historical record are more widely discussed in national forums. In my new role as Librarian for Digital Collection Development at UCLA’s Digital Library Program, I grapple with how our work as a digital library can serve our predominantly people of color campus community within the larger Los Angeles context, a city also predominantly comprised of people of color. As a first time attendee to DPLAfest, I was particularly interested in how DPLA frames itself as a national digital library for a country that is projected to have a majority person of color population by 2060. I observed that the DPLAfest leadership did not yet reflect the country’s changing demographics. The opening panel featured eight speakers yet there was only one woman and two people of color.

The opening panel of DPLAfest was filled with many impressive statistics – over 13 million items in DPLA, over 1900 contributors, over 30 partners, over 100 primary source sets, with all 50 states represented by the collections. While these accomplishments merit celebration, I appreciated Dr. Kim Christen Withey’s Twitter comment that encourages us to consider alternate frameworks of success:

#DPLAfest lots of talk of numbers–presumably the bigger the better–how else can we think about success? esp in the digital content realm?

— Kim Christen Withey (@mukurtu) April 14, 2016

“Tech Trends in Libraries” panelists Carson Block, Alison Macrina, and John Resig discuss ‘big data’ and libraries. Photo by Jason Dixson

While the amount of materials or information we have access to is frequently used as a measure of success, several panels such as The People’s Archives: Communities and Documentation Strategy, Wax Works in the Age of Digital Reproduction: The Futures of Sharing Native/First Nations Cultural Heritage, and Technology Trends in Libraries encouraged nuanced discussions of success through its discussions around the complexities of access. The conversation between Alison Macrina of Library Freedom Project and John Resig of Khan Academy critically interrogated the celebration of big data. Macrina reminds libraries to ask the questions: Who owns big data? What is the potential for exploitation? Who has access? How do we negotiate questions of privacy for individuals yet not allow institutions to escape accountability?

The complexities of access and privacy were further explored in the community archives sessions. Community archivists Carol Steiner and Keith Wilson from the People’s Archive of Police Violence in Cleveland spoke on storytelling as a form of justice in the face of impunity but also the real concerns of retribution for archiving citizen stories of police abuse. Dr. Kim Christen Withey spoke on traditional knowledge labels and the Mukurtu content management system that privileges indigenous knowledge about their own communities and enables a continuum of access instead of a binary open/closed model of access. In both of these cases, exercising control over one’s self and community representation constitutes a form of agency in the face of symbolic annihilation that traditional archives and record keeping have historically wreaked on marginalized communities. Additionally, community investment in these documentation projects outside traditional library and archive spaces have been key to their sustainability. In light of this, Bergis Jules raised the important question of “what is or should be the role of large scale digital libraries, such as DPLA, in relation to community archives?” First and foremost, I think our role as information professionals is to listen to communities’ vision(s) for their historical materials; it’s only then that we may be able contribute to and support communities’ agency in documentation and representation. I’m grateful that participants created space within DPLA to have these nuanced discussions and I’m hopeful that community driven development can be a guiding principle in DPLA’s mission.

For a closer read of the aforementioned panels, see my Storify: Community Archives @ DPLAfest.

Special thanks to the Digital Library Federation for making the DPLAfest Cross-Pollinator grant possible.

Open Knowledge Foundation: Introducing The New Proposed Global Open Data Index Survey

Wed, 2016-06-15 11:00

The Global Open Data Index (GODI) is one of the core projects of Open Knowledge International. Originally launched in 2013, it has quickly grown and now measures open data publication in 122 countries. GODI is a community tool, and throughout the years the open data community have taken an active role in shaping it by reporting problems, discussing issues on GitHub and in our forums as well as sharing success stories. We welcome this feedback with open arms and in 2016, it has proved invaluable in helping us produce an updated set of survey questions.

In this blogpost we are sharing the first draft of the revised GODI survey. Our main objective in updating the survey this year has been to improve the clarity of the questions and provide better guidance to submitters in order to ensure that contributors understand what datasets they should be evaluating and what they should be looking for in those datasets. Furthermore, we hope the updated survey will help us to highlight some of the tangible challenges to data publication and reuse by paying closer attention to the contents of datasets.

Our aim is to adopt this new survey structure for future editions of GODI as well as the Local Open Data Index and we would love to hear your feedback! We are aware that some changes might affect the comparability with older editions of GODI and it’s for this reason that your feedback is critical. We are especially curious to hear the opinion of the Local Open Data Index community. What do you find positive? Where do you see issues with your local index? Where could we improve?

In the following we would like to present our ideas behind the new survey. You will find a detailed comparison of old and new questions in this table.

A brief overview of the proposed changes:

  • Better measure and document how easy it is to find government data online
  • Enhance our understanding of the data we measure
  • Improve the robustness of our analysis

 

  1. Better measure and document how easy or difficult it is to find government data online

Even if governments are publishing data, if potential users cannot find them, then it goes without saying that they will not be able to use it. In our revised version of the survey, we ask submitters to document where they found a given dataset as well how much time they needed to find it. We recognise this to be an imperfect measure, as different users are likely to vary in their capacity to find government data online. However, we hope that this question will help us to extract critical information around the challenges related to usability that are not easily captured by a legal and technical analysis of a given dataset, even if it would be difficult to quantify the results and therefore use it in the scoring. 

  1. Enhance our understanding of the data we measure

It is common from governments to publish datasets in separate files and places. Contributors might find department spending data scattered across different department websites or, even when made available in one place such as a portal, the data could be split up into a multiple files. Some portion of this data might be openly licensed, another portion machine-readable while others are in PDFs. Sometimes non-machine-readable data is available without charge, while machine-readable files are available for a fee. In the past, this has proven to be an enormous challenge for the Index as submitters are forced to decide what data should be evaluated (see this discussion in our forum). 

The inconsistent publication of government data leads to confusion among our submitters and negatively impacts the reliability of the Index as an assessment tool. Furthermore, we think it is safe to say if open data experts are struggling to find or evaluate datasets, potential users will face similar challenges and as such, the inconsistent and sporadic data publication policies of governments is likely to affect data uptake and reuse. In order to ensure that we are comparing like with like, GODI assesses the openness of clearly defined datasets. These dataset definitions are what have determined, in collaboration with experts in the field, to be essential government data – data that contains crucial information for society at large. If a submitter only finds parts of this information in a file or scattered across different files, rather than assessing the openness of key datasets, we end up assessing a partial snapshot that is unlikely to be representative. There is more at stake than our ability to assess the “right” datasets – incoherent data publication significantly limits the capacity of civil society to tap into the full value of government data.

  1. Improve the robustness of our analysis

In the updated survey, we will determine whether datasets are available from one URL by asking “Are all the data downloadable from one URL at once?” (formerly “Available in bulk?”).  To respond in the affirmative, submitters would have to be able to demonstrate that all required data characteristics is made available in one file. If the data cannot be downloaded from one URL, or if submitters find multiple files on one URL, they will be asked to select one dataset, from one URL, which the most number of requirements and is available free of charge. Submitters will document why they’ve chosen this dataset and data source in order to help reviewers understand the rationale for choosing a given dataset and to aid in verifying sources.

The subsequent question will, “Which of these characteristics are included in the downloadable file?”, will help us verify that the dataset submitted does indeed contain all the requisite characteristics. Submitters will assess the dataset by selecting each individual characteristic contained within it.  Not only will this prompt contributors to really verify that all the established characteristics are met, it will also allow us to gain a better understanding of the common components missing when governments are publishing data, thus giving civil society a better foundation to advocate for publishing the crucial data. In our results we will more explicitly flag which elements are missing and declare only those dataset fully open that match all of our dataset requirements. 

 

This year, we are committed to improving the clarity of the survey questions: 

  1. “Does the data exist?” – The first question in previous versions of the Index was often confusing for submitters and has been reformulated to ask: Is the data published by government (or a third-party related to government)?” If the response is no, contributors will be asked to justify their response. For example, does the collection, and subsequent publication, of this data fall under under the remit of a different level of government?  Or perhaps the data is collected and published (or not) by a private company? There are a number of legal, social, technical and political reasons that might mean that the data we are assessing simply does not exist and the aim of this question is to help open data activists advocate for coherent policies around data production and publication (see past issues with this question here and here).  
  1. “Is data in digital form?” – The objective of this question was to cover cases where governments provided large data on DVDs, for example. However, users have commented that we should not ask for features that do not make data more open. Ultimately, we have concluded that if data is going to be usable for everyone, it should be online. We have therefore deleted this question.
  2. “Publicly Available?” – We merged “Publicly available?” with “Is the data available online?”. The reason is that we only want to reward data that is publicly accessible online without mandatory registrations (see for instance discussions here and here) .
  3. “Is the data machine-readable?” – There have been a number of illuminating discussions in regards to what counts as machine-readable formats (see for example discussions here and here). We found that the question “Is the data machine-readable?” was overly technical. Now we simply ask users “In which file formats are the data?”. When submitters enter the format our system automatically recognises if the format is machine-readable and in an open format.
  4. “Openly licensed” – Some people argued that the question “Openly licensed?” does not adequately take into account the fact that some government data are in the public domain and not under the protection of copyright. As such, we have expanded the question to “Is the data openly licensed/in the public domain”. If data are not under the protection of copyright, they do not necessarily need to be openly licensed; however, a clear disclaimer must be provided informing users about their copyright status (which can be in form of an open licence). This change is in line with the Open Definition 2.1. (See discussions here and here).

Looking forward hearing your thoughts on the forum or by commenting on this post!

Islandora: The Islandora Long Tail is now Awesome

Wed, 2016-06-15 10:03

I've been posting about the Long Tail of Islandora for a while now, putting a spotlight on Islandora modules developed and shared by members of our community. It's a good way to find new tools and modules that might answer a need you have on your site (so you don't have to build your own from scratch). We've also kept an annotated list of community developed modules in our Resources section, but it had a tendency to get a little stale and sometimes miss great work that wasn't happening in places we expect.

Enter the concept of the Awesome List, a curated list of awesome lists, complete with helpful guidelines and policies that we could crib from to make our own list of all that is awesome for Islandora. It now lives in our Islandora Labs GitHub organization, and new contributions are very welcome. You can share your own work, your colleagues' work, or any public Islandora resource that you think other Islandorians might find useful. If you have something to add, please put in a pull request or email me.

Awesome Islandora

pinboard: Google Groups

Wed, 2016-06-15 02:48
Hey #Code4Lib Southeastern folk - #C4LSE is reopening a regional dialogue. Join us?

DuraSpace News: Running Effective Institutional Repositories: A Look at Best Practices

Wed, 2016-06-15 00:00

From Sarah Tanksalvala, Thomson Reuters  Institutional repositories are an increasingly common feature of universities, creating a database of scholarly and educational work produced by a university’s faculty and students. Done right, they can create a showcase for researchers and students hoping to demonstrate their scholarship, at the same time showcasing the university’s achievements as a whole.

Peter Sefton: Open Repositories 2016: Demo: A repository before breakfast

Tue, 2016-06-14 22:00

I have just returned from the Open Repositories 2016 conference in Dublin where I did a demo in the Developer Track, curated by my colleagues Claire Knowles and Adam Field. The demo went OK, despite being interrupted by a fire alarm.

Here’s my abstract:

Presented by Peter Sefton, University of Technology, Sydney peter.sefton@uts.edu.au

In this session I’d like to show off the technical side of the open source platform, Ozmeka (based on Omeka) which was presented at OR2015.

In the demo I will:

  • Spin up a fresh instance of a repository using a vagrant script my team prepared earlier.

  • Show how to populate the repository via a CSV file, complete with multiple different item types (people, creative works, that sort of thing) with relations between them.

  • Demonstrate that this is a Linked-data-ish system, with relations between items in the repo, and external authorities and talk about why this is better than using string-based metadata which is still the default in most repository systems.

  • Talk about why it is worth considering Omeka/Ozmeka for small-to-medium repository and website development.

To which I added:

Demo loading the same data into a Fedora 4 repository.

The spreadsheet format I demoed is still a work in progress, which I will document on github project; I think it shows promise as a way of creating simple websites from data, including multiple types of object, and nested collections. I took the first fleet maps data, munged it a little to create a linked-data set for this demo. As downloaded, the data is a list of map images. I added a couple of extra rows:

  • Two collections, one for the maps and one for people
  • Entries for the people mentioned in the metadata the creators of the maps

And extra columns for relationships:

  • Collection membership via a pcdm:Collection column.
  • A REL:dc:creator for the dublin core creator relationship.
A sample Omeka page What is this?

I presented a paper last year co-authored with Sharyn Wise about an Omeka-based project we did at UTS, building a cross-disciplinary research repository, Dharmae. This time I just wanted to do a quick demo for the developer track showing out how easy it is to get started with a dev version of Omeka, and also show some early work on an Python API for Fedora 4.

Audience

This is for developers who can run Python and understand virtual environments. NOTE: These instructions have not been independently tested; you will probably need to do some problem solving to get this to run, including, but not limited to running both python 3 and python 2.

Get the dependencies up and running
  1. Get & run Omeka via this vagrant script, put together by Thom McIntyre.
  • Get an API Key via http://localhost:8080/admin/users/api-keys/1

  • Install the item relations plugin (it’s there, you just need to activate it via the install button) http://localhost:8080/admin/plugins

  1. Get the One-click-run Fedora Application from the Fedora downloads page.
Import some data into Ozmeka

Assuming Omeka is running, as per the instructions above.

NOTE: This is a Python 2 script.

  1. Check out the Ozmeka Python Utils.

  2. Follow the instructions on how to upload some sample data to Omeka from a CSV file.

Remember your API key, and to install the Item Relations plugin.

Import the same data into Fedora 4

NOTE: this is a Python 3 Script.

Also, note that Fedora 4 doesn’t come with a web interface - you’ll just be putting data into it in a raw form like this:

Data in Fedora 4
  1. Start Fedora by running the Jar file (try double-clicking it).
  2. Select port 8081
  3. Click Start
  4. Install our experimental Fedora api client for Python 3.
  5. Follow the instructions to import csv data into Fedora.

Thanks to Mike Lynch for the Fedora API code.

District Dispatch: A “Library for All” around the world and volunteer opportunity

Tue, 2016-06-14 19:45

June 8, 2016 meeting of the Library for All Board of Directors, Advisory Board, and Young Professional Board in New York City.

Last week, I was in New York City for a board meeting for Library For All (LFA), a nonprofit organization that has built a digital library platform to address the lack of access to quality educational materials in developing countries. Among other things, I learned about the latest LFA success—in Cambodia, where the kids’ demand for ebooks came to exceed the supply, at least temporarily.

Designed for low-bandwidth environments, the LFA digital library is a customizable, user-friendly digital platform that delivers ebooks to low cost devices such as mobile phones, tablets and computers. The collection is filled with content that is culturally relevant and available in local and international languages. The Library currently reaches readers in Haiti, Democratic Republic of Congo, Rwanda, Cambodia, and Mongolia.

The Volunteer Opportunity:  Country Curators

LFA has a particular need for curators of specialized collections for their country libraries. Some of the topical areas include girls, early grade literacy, adult literacy, and health—but other topics are of interest as well.

Responsibilities of the volunteer curator may include:

  • Identify titles that will make up a specific collection
  • Research and have a broad understanding of the collection you’re curating
  • You will be researching existing open source content as well as evaluating what publishers / NGOs already have available
  • Work with the Content Manager to reach out to existing publishers who may have suitable content
  • You will work with the Content Manager and take ownership of the curation and implementation of the collection
  • By the end of your time you will have a specialized collection uploading and being read in the digital library
  • Reach out to publishers and NGOs to see if we can use the content on our library platform
  • Add metadata to the content and upload books onto our Digital Assets Management

The specific tasks and timetable for a given volunteer will vary and are flexible, though generally LFA seeks those who can provide a block of time over a couple of months rather than a lesser engagement over many months. Fluency in English required; fluency in French, or one of the current LFA local languages, Haitian Creole, Khmer, Mongolian a plus but not essential.

For Further Information about the Opportunity or LFA

Those who are interested in learning more should contact Georgia Tyndale at Georgia@libraryforall.org. Also, note that Rebecca McDonald, CEO of Library For All, will be at the upcoming ALA Annual Conference in Orlando. Those interested in learning more about this volunteer opportunity or about LFA generally are invited to meet with her there. To arrange a meeting, contact Rebecca at rebeccam@libraryforall.org.

The post A “Library for All” around the world and volunteer opportunity appeared first on District Dispatch.

Jonathan Rochkind: Handy introspection for debugging Rails routes

Tue, 2016-06-14 15:44

I always forget how to do this, so leave this here partly as a note to myself. From Zobie’s Blog and Mike Blyth’s Stack Overflow answer

 

routes = Rails.application.routes # figure out what route a path maps to: routes.recognize_path "/station/index/42.html" # => {:controller=>"station", :action=>"index", :format=>"html", :id=>"42"} # or get a ActionController::RoutingError # figure out what url is generated for params, what url corresponds # to certain controller/action/parameters... r.generate :controller => :station, :action=> :index, :id=>42

If you have an isolated Rails engine mounted, it’s paths seem to not be accessible from the
`Rails.application.routes` router. You may need to try that specific engine’s router, like `Spree::Core::Engine.routes`.

It seems to me there’s got to be a way to get the actual ‘master’ router that’s actually used
for recognizing incoming urls, since there’s got to be one that sends to the mounted engine
routes as appropriate based on paths. But I haven’t figured out how to do that.


Filed under: General

David Rosenthal: Decentralized Web Summit

Tue, 2016-06-14 15:00
Brad Shirakawa/Internet ArchiveThis is a quick report from the Decentralized Web Summit. First, Brewster Kahle, Wendy Hanamura and the Internet Archive staff deserve great praise for assembling an amazing group of people and running an inspiring and informative meeting. It was great to see so many different but related efforts to build a less centralized Web.

Pictures and videos are up here. You should definitely take the time to watch, at least the talks on the second day by:
and the panel moderated by Kevin Marks, in particular this contribution from Zooko Wilcox. He provides an alternative view on my concerns about Economies of Scale in Peer-to-Peer Networks.

I am working on a post about my reactions to the first two days (I couldn't attend the third) but it requires a good deal of thought, so it'll take a while.

Mark E. Phillips: Comparing Web Archives: EOT2008 and EOT2012 – What

Tue, 2016-06-14 14:30

This post carries on from where the previous post in this series ended.

A very quick recap,  this series is trying to better understand the EOT2008 and the EOT2012 web archives.  The goal is to see how they are similar, how they are different, and if there is anything that can be learned that will help us with the upcoming EOT2016 project.

What

The CDX files we are using has a column that contains the Media Type (MIME Type) for the different URIs in the WARC files.  A list of the assigned Media Types are available at the International Assigned Numbers Authority (IANA) in their Media Type Registry.

This is a field that is inherently “dirty” for a few reasons.  This field is populated from a field in the WARC Record that comes directly from the web server that responded to the initial request.  Usually these are fairly accurate but there are many times where they are either wrong or at the least confusing.  Often times this is caused by  a server administrator, programmer, or system architect that is trying to be clever,  or just misconfigured something.

I looked at the Media Types for the two EOT collections to see if there are any major differences between what we collected in the two EOT archives.

In the EOT2008 archive there are a total of 831 unique Mime/Media Types,  in the EOT2012 there are a total of 1,208 unique type values.

I took the top 20 Mime/Media Types for each of the archives and pushed them together to see if there was any noticeable change in what we captured between the two archives.  In addition to just the raw counts I also looked at what percentage of the archive a given Media Type represented.  Finally I noted the overall change in those two percentages.

Media Type 2008 Count % of Archive 2012 Count % of Archive % Change Change in % of Archive text/html 105,592,852 65.9% 116,238,952 59.9% 10.1% -6.0% image/jpeg 13,667,545 8.5% 24,339,398 12.5% 78.1% 4.0% image/gif 13,033,116 8.1% 8,408,906 4.3% -35.5% -3.8% application/pdf 10,281,663 6.4% 7,097,717 3.7% -31.0% -2.8% – 4,494,674 2.8% 613,187 0.3% -86.4% -2.5% text/plain 3,907,202 2.4% 3,899,652 2.0% -0.2% -0.4% image/png 2,067,480 1.3% 7,356,407 3.8% 255.8% 2.5% text/css 841,105 0.5% 1,973,508 1.0% 134.6% 0.5%

Because I like pictures here is a chart of the percent change.

Change in Media Type

If we compare the Media Types between the two archives we find that the two archives share 527 Media Types.  The EOT2008 archive has 304 Media Types that aren’t present in EOT2012 and EOT2012 has 681 Media Types that aren’t present in EOT2008.

The ten most frequent Media Types by count found only in the EOT2008 archive are presented below.

Media Type Count no-type 405,188 text/x-vcal 17,368 .wk1 8,761 x-text/tabular 5,312 application/x-wp 5,158 * 4,318 x-application/pdf 3,660 application/x-gunzip 3,374 image/x-fits 3,340 WINDOWS-1252 2,304

The ten most frequent Media Types by count found only in the EOT2012 archive are presented below.

Media Type Count warc/revisit 12,190,512 application/http 1,050,895 application/x-mpegURL 23,793 img/jpeg 10,466 audio/x-flac 7,251 application/x-font-ttf 7,015 application/x-font-woff 6,852 application/docx 3,473 font/ttf 3,323 application/calendar 2,419

In the EOT2012 archive the team that captured content had fully moved to the WARC format for storing Web archive content.  The warc/revisit records are records for URLs that had not changed content-wise across more than one crawl.  Instead of storing the URL again, there is a reference to the previously captured content in the warc/revisit record.  That’s why there are so many of these Media types.

Below is a table showing the thirty most changed Media Types that are present in both the EOT2008 and EOT2012 archives.  You can see both the change in overall numbers as well as the percentage change between the two archives.

Media Type EOT2008 EOT2012 Change % Change image/jpeg 13,667,545 24,339,398 10,671,853 78.1% text/html 105,592,852 116,238,952 10,646,100 10.1% image/png 2,067,480 7,356,407 5,288,927 255.8% image/gif 13,033,116 8,408,906 -4,624,210 -35.5% – 4,494,674 613,187 -3,881,487 -86.4% application/pdf 10,281,663 7,097,717 -3,183,946 -31.0% application/javascript 39,019 1,511,594 1,472,575 3774.0% text/css 841,105 1,973,508 1,132,403 134.6% text/xml 344,748 1,433,159 1,088,411 315.7% unk 4,326 818,619 814,293 18823.2% application/rss+xml 64,280 731,253 666,973 1037.6% application/x-javascript 622,958 1,232,306 609,348 97.8% application/vnd.ms-excel 734,077 212,605 -521,472 -71.0% text/javascript 69,340 481,701 412,361 594.7% video/x-ms-asf 26,978 372,565 345,587 1281.0% application/msword 563,161 236,716 -326,445 -58.0% application/x-shockwave-flash 192,018 479,011 286,993 149.5% application/octet-stream 419,187 191,421 -227,766 -54.3% application/zip 312,872 92,318 -220,554 -70.5% application/json 1,268 217,742 216,474 17072.1% video/x-flv 1,448 180,222 178,774 12346.3% image/jpg 26,421 172,863 146,442 554.3% application/postscript 181,795 39,832 -141,963 -78.1% image/x-icon 45,294 164,673 119,379 263.6% chemical/x-mopac-input 110,324 1,035 -109,289 -99.1% application/atom+xml 165,821 269,219 103,398 62.4% application/xml 145,141 246,857 101,716 70.1% application/x-cgi 100,813 51 -100,762 -99.9% audio/mpeg 95,613 179,045 83,432 87.3% video/mp4 1,887 73,475 71,588 3793.7%

Presented as a set of graphs,  first showing the change in number of instances of a given Media Type between the two archives.

30 Media Types that changed the most

The second graph is the percentage change between the two archives.

% Change in top 30 media types shared between archives

Things that stand out are the growth of application/javascript between 2008 and 2012,  up 3,774% and application/json that was up over 17,000%.  Two formats used to deliver video grew as well with video/x-flv and video/mp4 increasing 12,346% and 3794% respectively.

There were a number of Media Types that reduced in the number and percentage but they are not as dramatic as those identified above.  Of note is that between 2008 and 2012 there was a decline of 100% in content with a Media Type of application/x-cgi and a 78% decrease in files that were application/postscript.

Working with the Media Types found in large web archives is a bit messy.  While there are standard ways of presenting Media Types to browsers, there are also non-standard, experimental and inaccurate instances of Media Types that will exist in these archives.  It does appear that we can see the introduction of some of the newer technologies between the two different archives.  Technologies such as the adoption of JSON and Javascript based sites as well as new formats of video on the web.

If you have questions or comments about this post,  please let me know via Twitter.

Pages