You are here

Feed aggregator

Open Library Data Additions: Amazon Crawl: part 10

planet code4lib - Tue, 2016-05-03 13:28

Part 10 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part o-6

planet code4lib - Tue, 2016-05-03 13:23

Part o-6 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part 2-aa

planet code4lib - Tue, 2016-05-03 13:21

Part 2-aa of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Library of Congress: The Signal: The Harvard Library Digital Repository Service

planet code4lib - Tue, 2016-05-03 13:19

This is a guest post by Julie Siefert.

The Charles River between Boston and Cambridge. Photo by Julie Siefert.

As part of the National Digital Stewardship Residency, I am assessing the Harvard Library Digital Repository Service, comparing it to the ISO16363 standard for trusted digital repositories (which is similar to TRAC). The standard is made up of over 100 individual metrics that address various aspects of a repository, everything from financial planning to ingest workflows.

The Harvard Digital Repository Service provides long-term preservation and access to materials from over fifty libraries, archives and museums at Harvard. It’s been in production for about fifteen years. The next generation of the DRS, with increased preservation capabilities, was recently launched, so this is an ideal time to evaluate the DRS and consider how it might be improved in the future. I hope to identify areas needing new policies and/or documentation and, in doing so, help the DRS improve its services. The DRS staff also hope to eventually seek certification as a trusted digital repository and this project will prepare them.

When I started the project, my first step was to become familiar with the ISO16363 standard. I read through it several times and tried to parse out the meaning of the metrics. Sometimes this was straightforward and I found the metric easy to understand. For others, I had to read through a few times before I fully understood what the metric was asking for. I also found it helpful to write down notes about what they meant and put it in my own words. I read about other people’s experiences performing audits, which was  very helpful and gave me some ideas about how to go about the process. In particular, I found David Rosenthal’s blogs posts about the CLOCKSS self-audit helpful, as they used the same standard, ISO16363.

By Julie Siefert

Inspired by the CLOCKSS audit, I created a Wiki with a different page for each metric. On these pages, I copied the text from the standard and included space for my notes. I also created an Excel sheet to help track my findings. In the Excel sheet, I gave each metric its own row and , in that row, a column about documentation and a column that linked to the Wiki. (I blogged more about the organization process.)

I reviewed the DRS documentation, interviewed staff members about metrics and asked them to point me to relevant documentation. I realized that many of the actions required by the metric were being performed at Harvard but these actions and policies weren’t documented. Everyone in the organization knew that they happened but sometimes no one had written them down. In my notes, I indicated when something was being done but not documented versus when something was not being done at all. I used a Green, Yellow, Red color scheme in the Excel sheet for the different metrics, with yellow indicating things that were done but not documented.

The assessment was the most time-consuming part.  In thinking about how to best summarize and report on my findings, I am looking for commonalities among the gap areas. It’s possible that many of the gaps are similar and several gaps could be filled with a single piece of documentation. For example, many of the “yellow”  areas have to do with ingest workflows, so perhaps a single document about this workflow could fill all these gaps at once. I hope that finding the commonalities among the gaps can help the DRS fill these gaps most effectively and efficiently.

Open Library Data Additions: Amazon Crawl: part 2-ag

planet code4lib - Tue, 2016-05-03 13:14

Part 2-ag of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

DuraSpace News: NOW AVAILABLE: Fedora 4.5.1 Release

planet code4lib - Tue, 2016-05-03 00:00

From David Wilcox, Fedora Product Manager, on behalf of the Fedora team.

Austin, TX  The Fedora team is proud to announce that Fedora 4.5.1 was released on April 29, 2016. Full release notes are included below and are also available on the wiki: https://wiki.duraspace.org/display/FF/Fedora+4.5.1+Release+Notes.

M. Ryan Hess: AI First

planet code4lib - Mon, 2016-05-02 22:54

Looking to the future, the next big step will be for the very concept of the “device” to fade away. Over time, the computer itself—whatever its form factor—will be an intelligent assistant helping you through your day. We will move from mobile first to an AI first world.

Google Founder’s Letter, April 2016

My Library recently finalized a Vision Document for our virtual library presence. Happily, our vision was aligned with the long-term direction of technology as understood by movers and shakers like Google.

As I’ve written previously, the Library Website will disappear. But this is because the Internet (as we currently understand it) will also disappear.

In its place, a new mode of information retrieval and creation will move us away from the paper-based metaphor of web pages. Information will be more ubiquitous. It will be more free-form, more adaptable, more contextualized, more interactive.

Part of this is already underway. For example, people are becoming a data set. And other apps are learning about you and changing how they work based on who you are. Your personal data set contains location data, patterns in speech and movement around the world, consumer history, keywords particular to your interests, associations based on your social networks, etc.

AI Emerging

All of this information makes it possible for emerging AI systems like Siri and Cortana to better serve you. Soon, it will allow AI to control the flow of information based on your mood and other factors to help you be more productive. And like a good friend that knows you very, very well, AI will even be able to alert you to serendipitous events or inconveniences so that you can navigate life more happily.

People’s expectations are already being set for this kind of experience. Perhaps you’ve noticed yourself getting annoyed when your personal assistant just fetches a Wikipedia article when you ask it something. You’re left wanting. What we want is that kernel of gold we asked about. But what we get right now, is something too general to be useful.

But soon, that will all change. Nascent AI will soon be able to provide exactly the piece of information that you really want rather than a generalized web page. This is what Google means when they make statements like “AI First” or “the Web will die.” They’re talking about a world where information is not only presented as article-like web pages, but broken down into actual kernels of information that are both discrete and yet interconnected.

AI First in the Library

Library discussions often focus on building better web pages or navigation menus or providing responsive websites. But the conversation we need to have is about pulling our data out of siloed systems and websites and making it available to all modes like AI, apps and basic data harvesters.

You hear this conversation in bits and pieces. The ongoing linked data project is part of this long-term strategy. So too with next-gen OPACs. But on the ground, in our local strategy meetings, we need to tie every big project we do to this emerging reality where web browsers are increasingly no longer relevant.

We need to think AI First.


LITA: LITA ALA Annual Precon: Technology Tools and Transforming Librarianship

planet code4lib - Mon, 2016-05-02 20:19

Sign up for this fun, informative, and hands on ALA Annual pre-conference

Technology Tools and Transforming Librarianship
Friday June 24, 2016, 1:00 – 4:00 pm
Presenters: Lola Bradley, Reference Librarian, Upstate University; Breanne Kirsch, Coordinator of Emerging Technologies, Upstate University; Jonathan Kirsch, Librarian, Spartanburg County Public Library; Rod Franco, Librarian, Richland Library; Thomas Lide, Learning Engagement Librarian, Richland Library

Register for ALA Annual and Discover Ticketed Events

Technology envelops every aspect of librarianship, so it is important to keep up with new technology tools and find ways to use them to improve services and better help patrons. This hands-on, interactive preconference will teach six to eight technology tools in detail and show attendees the resources to find out about 50 free technology tools that can be used in all libraries. There will be plenty of time for exploration of the tools, so please BYOD! You may also want to bring headphones or earbuds.

    

Lola Bradley is a Public Services Librarian at the University of South Carolina Upstate Library. Her professional interests include instructional design, educational technology, and information literacy for all ages.

Breanne Kirsch is a Public Services Librarian at the University of South Carolina Upstate Library. She is the Coordinator of Emerging Technologies at Upstate and the founder and current Co-Chair of LITA’s Game Making Interest Group.

Jonathan Kirsch is the Head Librarian at the Pacolet Library Branch of the Spartanburg County Public Libraries. His professional interests include emerging technology, digital collections, e-books, publishing, and programming for libraries.

Rod Franco is a Librarian at Richland Library, Columbia South Carolina. Technology has always been at the forefront of any of his library related endeavors.

Thomas Lide is the Learning Engagement Librarian at Richland Library, Columbia South Carolina.  He helps to pave a parallel path of learning for community members and colleagues.

More LITA Preconferences at ALA Annual
Friday June 24, 2016, 1:00 – 4:00 pm

  • Digital Privacy and Security: Keeping You And Your Library Safe and Secure In A Post-Snowden World
  • Islandora for Managers: Open Source Digital Repository Training

Cost:

LITA Member: $205
ALA Member: $270
Non Member: $335

Registration Information

Register for the 2016 ALA Annual Conference in Orlando FL

Discover Ticketed Events

Questions or Comments?

For all other questions or comments related to the preconference, contact LITA at (312) 280-4269 or Mark Beatty, mbeatty@ala.org.

District Dispatch: ALA, Harry Potter Alliance make it easy to advocate

planet code4lib - Mon, 2016-05-02 16:31

The American Library Association (ALA) joined the Harry Potter Alliance in launching “Spark,” an eight-part video series developed to support and guide first-time advocates who are interested in advocating at the federal level for issues that matter to them. The series, targeted to viewers aged 13–22, will be hosted on the YouTube page of the Harry Potter Alliance, while librarians and educators are encouraged to use the videos to engage young people or first time advocates. The video series was launched today during the 42nd annual National Library Legislative Day in Washington, D.C.

The video series provides supporting information for inexperienced grassroots advocates, covering everything from setting up in-person legislator meetings to the process of constructing a campaign. By breaking down oft-intimidating “inside the Beltway” language, Spark provides an accessible set of tools that can activate and motivate young advocates for the rest of their lives. The video series also includes information on writing press releases, staging social media campaigns, using library resources for research or holding events, and best practices for contacting elected officials.

“We are pleased to launch Spark, a series of interactive advocacy videos. We hope that young or new advocates will be inspired to start their own campaigns, and that librarians and educators will be able to use the series to engage young people and get them involved in advocacy efforts.” said Emily Sheketoff, executive director of the American Library Association’s Washington Office.

Janae Phillips, Chapters Director for the Harry Potter Alliance, added, “I’ve worked with youth for a many years now, and I’ve never met a young person who just really didn’t want to get involved – they just weren’t sure how! I think this is true for adults who have never been involved in civic engagement before, too. I hope that Spark will be a resource to people who have heard a lot about getting engaged in the political process but have never been sure where to start, and hopefully—dare I say—spark some new ideas and action.

The post ALA, Harry Potter Alliance make it easy to advocate appeared first on District Dispatch.

Access Conference: Review Process for Proposals Now Underway

planet code4lib - Mon, 2016-05-02 15:31

The Call for Proposals closed last week. A big thank you to all the eager participants.

The review and selection process is now underway. The committee has their work cut out as there are many great submissions. We also have a few interesting ideas up our sleeves.

It is shaping up to be an excellent conference!

Mark E. Phillips: DPLA Description Fields: More statistics (so many graphs)

planet code4lib - Mon, 2016-05-02 14:30

In the past few posts we looked at the length of the description fields in the DPLA dataset as a whole and at the provider/hub level.

The length of the description field isn’t the only field that was indexed for this work.  In fact I indexed on a variety of different values for each of the descriptions in the dataset.

Below are the fields I currently am working with.

Field Indexed Value Example dpla_id 11fb82a0f458b69cf2e7658d8269f179 id 11fb82a0f458b69cf2e7658d8269f179_01 provider_s usc desc_order_i 1 description_t A corner view of the Santa Monica City Hall.; Streetscape. Horizontal photography. desc_length_i 82 tokens_ss “A”, “corner”, “view”, “of”, “the”, “Santa”, “Monica”, “City”, “Hall”, “Streetscape”, “Horizontal”, “photography” token_count_i 12 average_token_length_f 5.5833335 percent_int_f 0 percent_punct_f 0.048780486 percent_letters_f 0.81707317 percent_printable_f 1 percent_special_char_f 0 token_capitalized_f 0.5833333 token_lowercased_f 0.41666666 percent_1000_f 0.5 non_1000_words_ss “santa”, “monica”, “hall”, “streetscape”, “horizontal”, “photography” percent_5000_f 0.6666667 non_5000_words_ss “santa”, “monica”, “streetscape”, “horizontal” percent_en_dict_f 0.8333333 non_english_words_ss “monica”, “streetscape” percent_stopwords_f 0.25 has_url_b FALSE

This post will try and pull together some of the data from the different fields listed above and present them in a way that we will hopefully be able to use to derive some meaning from.

More Description Length Discussion

In the previous posts I’ve primarily focused on the length of the description fields.  There are two other fields that I’ve indexed that are related to the length of the description fields.  These two fields include the number of tokens in a description and the average token length of fields.

I’ve included those values below.  I’ve included two mean values, one for all of the descriptions in the dataset (17,884,946 descriptions) and in the other the descriptions that are 1 character in length or more (13,771,105descriptions).

Field Mean – Total Mean – 1+ length desc_length_i 83.321 108.211 token_count_i 13.346 17.333 average_token_length_f 3.866 5.020

The graphs below are based on the numbers of just descriptions that are 1+ length or more.

This first graph is being reused from a previous post that shows the average length of description by Provider/Hub.  David Rumsey and the Getty are the two that average over 250 characters per description.

Average Description Length by Hub

It shouldn’t surprise you that David Ramsey and Getter are two of the Providers/Hubs that have the highest average token counts,  with longer descriptions generally creating more tokens. There are a few differences that don’t match this though,  USC that has an average of just over 50 characters for the average description length comes in as the third highest in the average token counts at over 40 tokens per description.  There are a few other providers/hubs that look a bit different than their average description length.

Average Token Count by Provider

Below is a graph of the average token lengths by providers.  The lower the number is the lower average length of a token.  The mean for the entire DPLA dataset for descriptions of length 1+ is just over 5 characters.

Average Token Length by Provider

That’s all I have to say about the various statistics related to length for this post.  I swear!. Next we move on to some of the other metrics that I calculated when indexing things.

Other Metrics for the Description Field

Throughout this analysis I had a question of when to take into account that there were millions of records in the dataset that had no description present.  I couldn’t just throw away that fact in the analysis but I didn’t know exactly what to do with them.  So below I present statistics for the average of many of the fields I indexed as both the mean of all of the descriptions and then the mean of just the descriptions that are one or more characters in length.  The graphs that follow the table below are all based on the subset of descriptions that are greater than or equal to one character in length.

Field Mean – Total Mean – 1+ length percent_int_f 12.368% 16.063% percent_punct_f 4.420% 5.741% percent_letters_f 50.730% 65.885% percent_printable_f 76.869% 99.832% percent_special_char_f 0.129% 0.168% token_capitalized_f 26.603% 34.550% token_lowercased_f 32.112% 41.705% percent_1000_f 19.516% 25.345% percent_5000_f 31.591% 41.028% percent_en_dict_f 49.539% 64.338% percent_stopwords_f 12.749% 16.557% Stopwords

Stopwords are words that occur very commonly in natural language.  I used a list of 127 stopwords for this work to help understand what percentage of a description (based on tokens) is made up of stopwords.  While stopwords generally carry little meaning for natural language, they are a good indicator of natural language,  so providers/hubs that have a higher percentage of stopwords would probably have more descriptions that resemble natural language.

Percent Stopwords by Provider

Punctuation

I was curious about how much punctuation was present in a description on average.  I used the following characters as my set of “punctuation characters”

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

I found the number of characters in a description that were made up of these characters vs other characters and then divided the number of punctuation characters by the total description length to get the percentage of the description that is punctuation.

Percent Punctuation by Provider

Punctuation is common in natural language but it occurs relatively infrequently. For example that last sentence was eighty characters long and only one of them was punctuation (the period at the end of the sentence). That comes to a percent_punctuation of only 1.25%.  In the graph above you will see the the bhl provider/hub has over 50% of their description with 25-49% punctuation.  That’s very high when compared to the other hubs and the fact that there is an average of about 5% overall for the DPLA dataset. Digital Commonwealth has a percentage of descriptions that are from 50-74% punctuation which is pretty interesting as well.

Integers

Next up in our list of things to look at is the percentage of the description field that consists of integers.  For review,  integers are digits,  like the following.

0123456789

I used the same process for the percent integer as I did for the percent punctuation mentioned above.

Percent Integer by Provider

You can see that there are several providers/hubs that have quite a high percentage integer for their descriptions.  These providers/hubs are the bhl and the smithsonian.  The smithsonian has over 70% of its descriptions with percent integers of over 70%.

Letters

Once we’ve looked at punctuation and integers,  that leaves really just letters of the alphabet to makeup the rest of a description field.

That’s exactly what we will look at next. For this I used the following characters to define letters.

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ

I didn’t perform any case folding so letters with diacritics wouldn’t be counted as letters in this analysis,  but we will look at those a little bit later.

Percent Letter by Provider

For percent letters you would expect there to be a very high percentage of the descriptions that themselves contain a high percentage of letters in the description.  Generally this appears to be true but there are some odd providers/hubs again mainly bhl and the smithsonian,  though nypl, kdl and gpo also seem to have a different distribution of letters than others in the dataset.

Special Characters

The next thing to look at was the percentage of “special characters” used in a description.  For this I used the following definition of “special character”.  If a character is not present in the following list of characters (which also includes whitespace characters) then it is considered to be a “special character”

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 

Percent Special Character by Provider

A note in reading the graph above,  keep in mind that the y-axis is only 95-100% so while USC looks different here it only represents 3% of its descriptions that have 50-100% of the description being special characters.  Most likely a set of descriptions that have metadata created in a non-english language.

URLs

The final graph I want to look at in this post is the percentage of descriptions for a provider/hub that has a URL present in its description.  I used the presence of either http:// or https:// in the description to define if it does or doesn’t have a URL present.

Percent URL by Provider

The majority providers/hubs don’t have URLs in their descriptions with a few obvious exceptions.  The provider/hubs of washington, mwdl, harvard, gpo and david_ramsey do have a reasonable number of descriptions with URLs with washington leading with almost 20% of their descriptions having a URL present.

Again this analysis is just looking at what high-level information about the descriptions can tell us.  The only metric we’ve looked at that actually goes into the content of the description field to pull out a little bit of meaning is the percent stopwords.  I have one more post in this series before we wrap things up and then we will leave descriptions in the DPLA along for a bit.

If you have questions or comments about this post,  please let me know via Twitter.

Open Knowledge Foundation: And what are your plans for Transparency Camp Europe?

planet code4lib - Mon, 2016-05-02 13:16

This post was written by our friends at Open State Foundation in the Netherlands. 

Let’s face it. When it comes to relevant open data and transparency in European decision-making, we have a lot to do. Despite growing open data portals, and aggregating European data portal, if you want to make sense of European decision-making and public finance, it takes a lot of efforts.

Dieter Schalk / Open State Foundation

The time is ripe. With the Dutch referendum on the EU-Ukraine Association Agreement and Brexit, debates around immigration and refugees, new bailout talks between the EU and Greece, decisions by the EU affect millions of citizens living and working within its member states and people around the world. As everyone has the right to information, people need to know how these decisions are taken, who participates in preparing them, who receives funding, how you can make your views known, and what information is held or produced to develop and adopt those decisions. 

In the wake of the Panama Papers, renewed calls for open company registers and registers on beneficial ownership, the need for open spending, contracting and tenders data, require us to come together, join efforts and help to make the EU more transparent.

TransparencyCamp Europe comes at the right moment. This unconference on open government and open data, to be held on June 1 in Amsterdam will bring together developers, journalists, open data experts, NGOs, policymakers, and activists. In the run-up, an online European-wide open data App Competition (deadline for submissions May 1) and a number of local events or diplohacks are organized. This will all come together at TransparencyCamp Europe, where apart from numerous sessions organized by participants themselves, developers will present their open data app to a jury.

Dieter Schalk / Open State Foundation

EU decision making is quite complex, involving national governments and parliaments, the European Commission and the EuropeanParliament, the European Council and the many EU institutions and agencies involved.  Still, there is already quite some open data available, differing in quality and ease of use. Definitely, you want to know more about the EU’s institutions, who work there and how you can contact them. Although the information is available at the EU Whoiswho website, the data is not easily reusable. That is why we scrapped it and had made it available to you on GitHub as CSV and JSON. And if you’re crawling through information on EU budgets, finances, funds, contracts and beneficiaries, you’ll notice there is much room for improvement.

So, there you go, join us and help to make the EU more transparent as TransparencyCamp Europe comes to Amsterdam. Registration for the unconference is free, follow us on Twitter and subscribe to the newsletter.

District Dispatch: Virtual library lobbying ready to reverberate in Washington

planet code4lib - Mon, 2016-05-02 12:00

Do you hear an echo? That’s the sound of thousands of library advocates speaking up all over the country. Starting today, May 2nd,  almost 400 librarians converge in Washington, DC for National Library Legislative Day (NLLD). They’ve come from all over the nation to tell Members of Congress and their staffs what librarians’ legislative priorities are, but they need library supporters everywhere to help amplify the messages they’ll be delivering in person by participating in Virtual Library Legislative Day (VLLD).

Photo by mikael altemark

This week, while hundreds of librarians and library supporters are hitting the Hill, visit ALA’s Legislative Action Center to back them up. You’ll find everything you’ll need to call, email and/or tweet at your Representative and Senators.

The more messages Congressional offices get about ALA’s top 2016 Legislative Day Priorities all this week (May 3-6), the better!

Ask them to:

  • CONFIRM Dr. Carla Hayden as the next Librarian of Congress #Hayden4LOC
  • SUPPORT LSTA and Innovative Approaches to Literacy (IAL) Funding
  • PASS Electronic Communications Privacy Act Reform
  • RATIFY the Marrakesh Treaty for the print-disabled ASAP

For more background information about these issues, take a look at the one-page issue briefs that NLLD participants will receive when they get to Washington and that they’ll be sharing with Congressional offices.

On Monday May 2, tune in to the first-ever live stream of the NLLD program and issue briefings that NLLD participants will experience (or catch the recording by visiting the ALA YouTube Channel).

This year, ALA has also partnered with the Harry Potter Alliance (HPA) – a group of incredible, library-loving young people who’ve already made a huge impact with their advocacy work worldwide. So far, their members have pledged over 500 calls, emails or tweets to Congress for VLLD 2016. Let’s show those wizard punks how it’s really done and send 1,000 messages to Congress of our own!

Together, we can make the Capitol echo all this week with the voices and messages of library advocates in Washington and online.

Join us for Virtual Library Legislative Day 2016! (And don’t forget to follow along on social media #nlld16.)

The post Virtual library lobbying ready to reverberate in Washington appeared first on District Dispatch.

Terry Reese: MarcEdit Updates

planet code4lib - Mon, 2016-05-02 03:10

This weekend, I posted a new MarcEdit update.  This is one of the biggest changes that I’ve made in a while.  While the actual changelog is brief – these changes represented ~17k lines of code Windows (~10K not related to UI work) and ~15.5k lines of code on the OSX side (~9K not related to UI work).

Specific changes added to MarcEdit:

Windows/Linux:

  • Enhancement: UNIMARC Tools: Provides a lite-weight tool to convert data to MARC21 from UNIMARC and to UNIMARC from MARC21.
  • Enhancement: Replace Function: Option to support External search/replace criteria.
  • Enhancement: MARCEngine COM Object Updates

MacOSX

  • Enhancement: UNIMARC Tools: Provides a lite-weight tool to convert data to MARC21 from UNIMARC and to UNIMARC from MARC21.
  • Enhancement: Replace Function: Option to support External search/replace criteria.
  • Update: Installation has been changed to better support keeping configuration information sync’d between updates.
  • Bug Fix: Add/Delete Function — Add field if not a duplicate:  Option wasn’t always working.  This has been corrected.

I’m created some videos to demonstrate how these two elements work, and then a third video showing how to use the Add Field if not a duplicate (added in the previous update).  You can find these videos here:

Add Field If Not a Duplicate
URL: https://youtu.be/ObRKgAD9ye8

MarcEdit’s UNIMARC Tools:
URL: https://youtu.be/4rdzOCAwhSU

MarcEdit: Batch Replacement using External Criteria
URL: https://youtu.be/uJB9Uqg6bJs

You can get the changes from the downloads page or through MarcEdit’s automated update tool.

–tr

DuraSpace News: AVAILABLE: Recording and Slides from April 29 LYRASIS and DuraSpace CEO Town Hall Meeting

planet code4lib - Mon, 2016-05-02 00:00

Austin, TX  On April 29, 2016, Robert Miller, CEO of LYRASIS and Debra Hanken Kurtz, CEO of DuraSpace presented the third in a series of online Town Hall Meetings. They reviewed how their organizations came together to investigate a merger in order to build a more robust, inclusive, and truly global community with multiple benefits for members and users. They also unveiled a draft mission statement for the merged organization and provided updates on the status of the proposed merge.

Cynthia Ng: Imagine Living Without Books Part 2: Connecting Print Disabled Readers

planet code4lib - Sun, 2016-05-01 18:08
In part 1, I asked readers to think about what it would be like to imagine living with access to only a very small selection of books, and provided some additional context for Canada. If you haven’t already, please read Imagine Living Without Books Part 1 as the two parts are meant to be read … Continue reading Imagine Living Without Books Part 2: Connecting Print Disabled Readers

Galen Charlton: Natural and unnatural problems in the domain of library software

planet code4lib - Sun, 2016-05-01 16:04

I offer up two tendentious lists. First, some problems in the domain of library software that are natural to work on, and in the hopeful future, solve:

  • Helping people find stuff. On the one hand, this surely comes off as simplistic; on the other hand, it is the core problem we face, and has been the core problem of library technology from the very moment that a library’s catalog grew too large to stay in the head of one librarian.  There are of course a number of interesting sub-problems under this heading:
    • Helping people produce and maintain useful metadata.
    • Usefully aggregating metadata.
    • Helping robots find stuff (presumably with the ultimate purpose of helping people to find stuff).
    • Artificial intelligence. By this I’m not suggesting that library coders should be aiming to have an ILS kick off the Singularity, but there’s plenty of room for (e.g.) natural language processing to assist in the overall task of helping people find stuff.
  • Helping people evaluate stuff. “Too much information, little knowledge, less wisdom” is one way of describing the glut of bits infesting the Information Age. Libraries can help and should help—even though pitfalls abound.
  • Helping people navigate software and information resources. This includes UX for library software, but also a lot of other software that librarians, like it or not, find themselves helping patrons use. There are some areas of software engineering where the programmer can assume that the user is expert in the task that the software assists with; library software isn’t one of them.
  • Sharing stuff. What is Evergreen if not a decade-long project in figuring out ways to better share library materials among more users? Sharing stuff is not a solved problem even for digital stuff.
  • Keeping stuff around. This is an increasingly difficult problem. Time was, you could leave a pile of books sitting around and reasonably expect that at least a few would still exist five hundred years hence. Digital stuff never rewards that sort of carelessness.
  • Protecting patron privacy. This nearly ended up in the unnatural list—a problem can be unnatural but nonetheless crucial to work on. However, since there’s no reason to expect that people will stop being nosy about what other people are reading—and for that nosiness to sometimes turn into persecution—here we are.
  • Authentication. If the library keeps any transaction information on behalf of a patron so that they can get to it later, the software had better be trying to make sure that only the correct patron can see it. Of course, one could argue that library software should never store such information in the first place (after, say, a loan is returned), but I think there can be an honest conflict with patrons’ desires to keep track of what they used in the past.

Second, some distinctly unnatural problems that library technologists all too often must work on:

  • Digital rights management. If Ambrose Bierce were alive, I would like to think that he might define DRM in a library context thus: “Something that is ineffective in its stated purpose—and cannot possible be effective—but which serves to compromise libraries’ commitment to patron privacy in the pursuit of a misunderstanding about what will keep libraries relevant.”
  • Walled garden maintenance. Consider EZproxy. It takes the back of a very small envelope to realize that hundreds of thousands of person-hours have been expended fiddling with EZproxy configuration files for the sake of bolstering the balance sheets of Big Journal. Is this characterization unfair? Perhaps. Then consider this alternative formulation: the opportunity cost imposed by time spent maintaining or working around barriers to the free exchange of academic publications is huge—and unlike DRM for public library ebooks, there isn’t even a case (good, bad, or indifferent) to be made that the effort results in any concrete financial compensation to the academics who wrote the journal articles that are being so carefully protected.
  • Authorization. It’s one thing to authenticate a patron so that they can get at whatever information the library is storing on their behalf. It’s another thing to spend time coding authentication and authorization systems as part of maintaining the walled gardens.

The common element among the problems I’m calling unnatural? Copyright; in the particular, the current copyright regime that enforces the erection of barriers to sharing—and which we can imagine, if perhaps wistfully, changing to the point where DRM and walled garden maintenance need not occupy the attention of the library programmer, who then might find more time to work on some of the natural problems.

Why is this on my mind? I would like to give a shout-out to (and blow a raspberry at) an anonymous publisher who had this to say in a recent article about Sci-Hub:

And for all the researchers at Western universities who use Sci-Hub instead, the anonymous publisher lays the blame on librarians for not making their online systems easier to use and educating their researchers. “I don’t think the issue is access—it’s the perception that access is difficult,” he says.

I know lots of library technologists who would love to have more time to make library software easier to use. Want to help, Dear Anonymous Publisher? Tell your bosses to stop building walls.

Nick Ruest: #panamapapers images April 4-29, 2016

planet code4lib - Sun, 2016-05-01 00:34
#panamapapers images April 4-29, 2016


Dataset is available here.

Looking at the #panamapapers capture I've been doing we have, 1,424,682 embedded image urls from 3,569,960 tweets. I'm downloading the 1,424,682 images now, and hope to do something similar to what I did with the #elxn42 images. While we're waiting for the images to download, here are the 10 most tweeted embedded image urls:

Tweets Image 1. 10243 2. 8093 3. 6588 4. 5613 5. 5020 6. 4944 7. 4421 8. 3740 9. 3616 10. 3585 tags: twittertwarc#panamapaperswahrweb archives for historical research

Nicole Engard: Bookmarks for April 30, 2016

planet code4lib - Sat, 2016-04-30 20:30

Today I found the following resources and bookmarked them on Delicious.

Digest powered by RSS Digest

The post Bookmarks for April 30, 2016 appeared first on What I Learned Today....

Related posts:

  1. Digital Cameras – I’m up for Suggestions
  2. First Big Present
  3. Google Homepage Themes

Patrick Hochstenbach: Brush Inking Exercise

planet code4lib - Sat, 2016-04-30 07:07
Portrait of a treeFiled under: portaits, Sketchbook Tagged: art, brush, illustration, ink, Photoshop, tree

Pages

Subscribe to code4lib aggregator