You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 8 min 33 sec ago

HangingTogether: Library Linked Data in the Cloud

Fri, 2015-05-22 20:39

A book that a few of our colleagues have been working on for quite some time has now been released: Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource Description. You can also preview it on Google Books.

OCLC Research has been working with linked data for years, and we have developed processes for mining our MARC record database into linked and linkable entities. This book reports on a lot of that work, the problems we ran into and some of the solutions we created.

The main sections are:

  1. Library Standards and the Semantic Web
  2. Modeling Library Authority Files
  3. Modeling and Discovering Creative Works
  4. Entity Identification Through Text Mining
  5. The Library Linked Data Cloud

There are likely few people who have had as much experience parsing library data into linked data triples than the authors of this book and their OCLC Research colleagues. Therefore, anyone seeking to create or use library linked data would do well to study this book. You can take my word for it.

About Roy Tennant

Roy Tennant works on projects related to improving the technological infrastructure of libraries, museums, and archives.

Mail | Web | Twitter | Facebook | LinkedIn | Flickr | YouTube | More Posts (89)

District Dispatch: IMLS announces new immigration webinar for public libraries

Fri, 2015-05-22 17:25

[UPDATE: IMLS HAS POSTPONED THIS WEBINAR, AND WILL ANNOUNCE A NEW DATE AND TIME IN THE COMING WEEKS]

Next week, the Institute of Museum and Library Services (IMLS) and U.S. Citizenship and Immigration Services (USCIS) will host a free webinar for public librarians on the topic of immigration and U.S. citizenship. Join in to learn more about what resources are available to assist libraries that provide immigrant and adult education services. The webinar will provide an overview of how libraries can expand these services and even acquire free materials to display.

Webinar Details
Date: May 27, 2015
Time: 2:00 – 3:00 p.m. EDT
Click here to register

Prior participation in previous webinars on this topic is not required. Registration is not requried, but the agencies recomment that you check your system for compatibility in advance.

This series was developed as part of a partnership between IMLS and USCIS to ensure that librarians have the necessary tools and knowledge to refer their patrons to accurate and reliable sources of information on immigration-related topics. To find out more about the partnership and the webinar series, visit the Serving New Americans page of the IMLS website or on the USCIS website.

The post IMLS announces new immigration webinar for public libraries appeared first on District Dispatch.

Library of Congress: The Signal: A New Interface and New Web Archive Content at Loc.gov

Fri, 2015-05-22 14:18

The following is a guest post by Abbie Grotke, Lead Information Technology Specialist on the Web Archiving Team, Library of Congress.

Archived version of a Member of Congress Official Web Site – Barack Obama

Recently the Library of Congress launched a significant amount of new Web Archive content on the Library’s Web site, as a part of a continued effort to integrate the Library’s Web Archives into the rest of the loc.gov web presence.

This is our first big release since we launched the first iteration of collections into this new interface, back in June 2013. The earlier approach to presenting archived web sites turned out to be a challenge to allow us to increase the amount of content available, so in a “one step back, two steps forward” move, the interface has been simplified, and should be more familiar to those working with Web Archives at other institutions – item records point to archived web sites displaying in an open-source version of the Wayback Machine. This simplification allowed the Library to increase the number of sites available in this interface from just under 1,000 to over 5,800. The most recent harvested sites now publicly available were harvested in March-April 2012. The simplified approach should also allow catching up with moving more current content into the online collections.

There are now 21 named collections available in the new interface; some had been available in our old interface but are newly migrated; other content is entirely new. With this launch, we are particularly excited about the addition of the United States Congressional Web Archives, which for the first time allows researchers to access content collected since December 2002 up thru April 2012. Each record covers those sessions where a particular member of Congress was serving, such as for Barack Obama as senator during two sessions, or the example of Kirsten E. Gillibrand serving in the House and Senate, represented on one record despite a URL change.

Other newly available collections include the Burma/Myanmar General Election 2010 Web Archive, Egypt 2008 Web Archive, Laotian General Election 2011 Web Archive, Thai General Election 2011 Web Archive, Vietnamese General Election 2011 Web Archive and the Winter Olympic Games 2002 Web Archive.

We still have some work to do to move the U.S. Election Web Archives from our old interface, so for the time being researchers interested in those collections will need to refer back to the old site. Eventually we will be combining the separate Election collections into one U.S. Election Archive that will allow better searchability and access, and migrating them over (and then “turning off” the old interface).

We hope researchers will enjoy access to these new web archive collections.

LITA: Navigating Conferences Like a Pro… When You’re a Rookie

Fri, 2015-05-22 14:00

I’ve recently attended some of my first conferences/meetings post-MLIS and I thought I’d pass on the information I learned from my experience navigating them for the first time.

 

Courtesy of Jatenipit. Pixabay 2014

Always be prepared to promote

This is the most dreaded aspect of networking. It essentially implies schmoozing and self-aggrandizement, but if you consider it as a socializing you’ll realize it’s an essential part of getting to know others in the profession and the roles they play in their organization. If you’re new to the information profession, it can be a great opportunity to ask other professionals about the path they took to enter the industry. More often than not when they find that you’re new to the profession, they’ll offer you advice. They’ll be curious to know what your career goals are and why you’re attending. This is a great opportunity to ask for their business card or contact information. If you find that you’ve built a good rapport and want to become more familiar with their work/organization, you should offer your business card (more on this later).

 

The thing about promotion

If you’re at a conference on behalf of an organization, then you’re on the company dollar. Therefore your mission is to network, learn and share. Since I plan on attending conferences to learn more about the profession and network, I couldn’t talk shop about procedures and management. If you are attending on behalf of an organization you’re expected to create professional networks and trace them back to your institution. It sounds intimidating, but if you allow yourself to soak-up as much information as possible, while being open about what works and doesn’t for your information environment, you’ll find others may want to emulate your framework and share theirs in return.

 

You have leverage too

Believe it or not the pros don’t know everything. Sometimes when you’re new to a profession you can become caught-up in what you don’t know and the list of skills you need to get to that ever distant “next level.” I was very surprised to find that many of the resources I was familiar with escaped the purview of individuals working in the digital records management and archives field. I introduced The Signal Digital Preservation and the Cancer Imaging Archive into a conversation and a few individuals took genuine interest in my explanation of their services. While earning your degree or working in different information environments, you are exposed to a variety of resources and ideas that others aren’t aware of. Don’t count yourself out, you have something to add to the conversation.

 

Think outside the box

There is no need to be intimidated about approaching new acquaintances during a professional conference. Most of the time you’re meeting with people who remember what it’s like to be at the forefront of a new career. It can be exciting and informative to strike-up a conversation with a presenter. There is nothing wrong with inquiring about lunch plans and meeting outside of the conference venue during scheduled breaks. The relaxed atmosphere of a restaurant is where funny stories of the trade can be passed along and you’ll get to know each other on a personal level. There are several factors that account for good networking and having an outgoing personality is one of them. While being personable is fine, doing so in a respectful manner is most apt.

 

Handy business cards

If you’re using a conference to network for future employment, then you need to have business cards. At larger conferences you can be one in hundreds of attendees. Business cards are a great way to establish that you’re prepared and professional. However, providing an acquaintance with your contact information is not enough. Perhaps you may want to ask for their card if you want to continue the conversation after the conference concludes. It’s likely that they’ll never take a look at your business card again, so it’s important to follow-up with an e-mail to remind them of the highlights of the conversation you had and how you’d like to collaborate with them going forward.

If you’re hoping to enter a new field post-graduation, at a minimum your business card should include: your name, degree(s) and university, your phone number and e-mail. You can also add a specialization to encompass your career trajectory such as Librarian, Electronic Resource Specialist or Certified Webmaster. For points of contact beyond your phone number and e-mail, providing your website, online portfolio or LinkedIn URL is a great way to showcase your web presence. If you can connect with another professional’s LinkedIn, you will not only increase their awareness of you, but you will be exposed to their extended network as well.

 

An added bonus

If you are networking for employment, one thing that you don’t want to do is outrightly ask about potential employment with another attendee’s organization. I’ve seen this happen before and it can be off-putting for the person being asked as well as anyone involved in the conversation. If you’re a new graduate or changing careers, the conversation will naturally flow into questions about your career plans. If the person you’re speaking with feels inclined to mention an upcoming opportunity, then it is an added bonus. Otherwise, enjoy yourself and take advantage of the learning opportunity. You’ll be in a room filled with like-minded professionals and everyone wants the most of their experience.

 

Are you planning on attending any conferences this year? What takeaways do you have from conferences you’ve attended in the past? Let me know in the comments section.

OCLC Dev Network: WorldShare License Manager API in Production

Fri, 2015-05-22 14:00

We’re happy to announce that the WorldShare License Manager API is now available in Production.

DuraSpace News: NOW AVAILABLE: Fedora 4.2.0 Release

Fri, 2015-05-22 00:00

From Andrew Woods, Technical Lead for Fedora, on behalf of the Fedora team.

Winchester, MA  The Fedora team is pleased to announce that Fedora 4.2.0 was released on May 19, 2015 and is now available.

The focus of the 4.2.0 release was twofold:

  • Establish an Audit Service

  • Create and exercise tooling for Fedora 3 to 4 migrations

District Dispatch: Reminder: Apply now for the Oakley Memorial Scholarship

Thu, 2015-05-21 18:59

Robert Oakley

Reminder: The application window to apply for the Robert L. Oakley Memorial Scholarship, a scholarship opportunity that supports research and advanced study for librarians in their early-to-mid-careers, closes on June 1, 2015. The annual $1,000 scholarship, which was developed by the American Library Association and the Library Copyright Alliance, supports librarians interested in intellectual property, public policy and copyright law.

Applicants should provide a statement of intent for use of the scholarship funds. Such a statement should include the applicant’s interest and background in intellectual property, public policy, and/or copyright and their impacts on libraries and the ways libraries serve their communities. Additionally, statements should include information about how the applicant and the library community will benefit from the applicant’s receipt of scholarship. Statements should be no longer than three pages (1,000 words). The applicant’s resume or curriculum vitae should be included in their application.

Applications must be submitted via e-mail to Carrie Russell, crussell[at]alawash[dot]org. Awardees may receive the Robert L. Oakley Memorial Scholarship up to two times in a lifetime. Funds may be used for equipment, expendable supplies, travel necessary to conduct, attend conferences, release from library duties or other reasonable and appropriate research expenses.

The award honors the life accomplishments and contributions of Robert L. Oakley. Professor and law librarian Robert Oakley was an expert on copyright law and wrote and lectured on the subject. He served on the Library Copyright Alliance representing the American Association of Law Librarians and played a leading role in advocating for U.S. libraries and the public they serve at many international forums including those of the World Intellectual Property Organization and United Nations Educational Scientific and Cultural Organization.

Oakley served as the United States delegate to the International Federation of Library Associations Standing Committee on Copyright and Related Rights from 1997-2003. Mr. Oakley testified before Congress on copyright, open access, library appropriations and free access to government documents and was a member of the Library of Congress’ Section 108 Study Group. A valued colleague and mentor for numerous librarians, Oakley was a recognized leader in law librarianship and library management who also maintained a profound commitment to public policy and the rights of library users.

The post Reminder: Apply now for the Oakley Memorial Scholarship appeared first on District Dispatch.

LITA: Should LITA oppose Elsevier’s new sharing policy?

Thu, 2015-05-21 18:28

It’s come to the LITA Board’s attention that the Confederation of Open Access Repositories is circulating a statement against Elsevier’s new sharing policy. (You can find that policy here.) COAR is concerned that the policy imposes long embargoes for open access content (up to 4 years); applies retroactively; and restricts author’s choice of Creative Commons license. Numerous individuals and library organizations, including ALA and ACRL, have signed on to this statement; the LITA Board is discussing doing likewise.

But we represent you, the members! So tell us what you think. Should LITA sign on?

Casey Bisson: Rewrite git repo URLs

Thu, 2015-05-21 17:17

A question in a mail list I’m on introduced me to a git feature that was very new to me: it’s possible to have git rewrite the repository URLs to always use HTTPS or git+ssh, etc.

This one-liner seems to force https:

git config --global url.https://github.com/.insteadOf git://github.com/

Or you can add these to your .gitconfig:

# Use https instead of git and git+ssh [url "https://github.com/"] insteadOf = git://github.com/ [url "https://github.com/"] insteadOf = git@github.com: # Use git and git+ssh instead of https [url "git://github.com/"] insteadOf = https://github.com/ [url "git@github.com:"] pushInsteadOf = "git://github.com/" [url "git@github.com:"] pushInsteadOf = "https://github.com/"

DPLA: On the Spirit of Creative Exploration at DPLAfest 2015

Thu, 2015-05-21 16:55

It’s been just about a month now since the conclusion of the second annual DPLAfest in Indianapolis, where about 300 people gathered to discuss the current and future state of digital collections and digital publishing. Those in attendance included librarians, archivists, publishers, authors, developers, and other interested members of the community. My own participation at the conference was generously made possible by a DPLA + DLF Cross-Pollinator Travel Grant, which allowed me share with and learn from DPLA community during the two-day event.

Put simply, my time at the DPLAfest was inspiring, based largely on my interactions with the DPLA community, which I found to be vibrant, welcoming, ingenious and altogether energizing. A fellow travel grant recipient, Laura Wrubel, has recently written on her own experiences at the conference. In her post, Laura identified a tangible excitement and a feeling of momentum that accompanied the conference, and I must agree wholeheartedly. An unmistakable atmosphere of positivity and possibility flowed through the conference and its attendees. Allow me to share an example.

My participation in the conference revolved mainly around the hackathon. On the first day of the conference, a large group of attendees learned how to access the DPLA API, and from there we split into smaller groups to discuss possible app ideas. My fellow brainstormers in this process included Alexandra Murray, Laura Wrubel, and Brandon Locke. The four of us were mostly DPLA newcomers, and were attending the conference in an exploratory mode. Even with our status as novices, however, we were received as a peers into the DPLA community. As we talked through app ideas among each other and with the larger group, the support we received from DPLA community members was immediate and enthusiastic. DPLA staff Audrey Altman, Mark Breedlove, Mark Matienzo, and Tom Johnson were all on hand to provide direction and feedback as we worked first through ideas and then through code.

Working with Mark Matienzo’s Dial-a-DPLA app during the DPLAfest hackathon (source)

Inspired by Historical Cats, Mark Matienzo’s Dial-a-DPLA, and the encouragement we received from others in the room, we aimed to create a Twitter-based app that could serve as a whimsical window into the DPLA collection. Borrowing from Mark Sample’s DPLAbot, we reconfigured some nifty Node.js code so that our new bot matched food terms with DPLA metadata and produced Tweets that encouraged hungry Twitter users to take a look at food-related items in the DPLA collection. We dubbed this little program “DinnerPLAnsbot.” The final app would not have been possible without some well-timed and much-needed JavaScript assistance from Chad Nelson in the final few moments before we unveiled DinnerPLAnsbot at the Developer Showcase on the second day of the conference. At every turn during the DPLAfest, I felt the support of a talented and thoughtful community that wanted to help us build with the DPLA collection.

DinnerPLAsbot team at the DPLAfest developer showcase. Left to right: Brandon Locke, Scott W. H. Young, Alexandra Murray, Laura Wrubel (source)

In thinking back to when I first learned of the DPLA, the earliest experience I could recall was the Fall 2013 meeting of the Coalition of Networked Information, where I attended a presentation by Dan Cohen. The DPLA was a mere 8 months into its existence at that time, and I remember being impressed not only with its mission and vision, but also the sticker on Dan’s laptop:

In the 16 months since, the DPLA has grown into a fully-realized digital library platform that now provides access to over 10,000,000 metadata-rich records from over 1,600 contributing institutions, and with a killer API to boot. With such a diverse number of digital objects, a vibrant, supportive community, and a forward-thinking vision for digital collections and digital services, the DPLA represents a spirit of creative exploration (demonstrated nicely in the DPLA App Library).

A query of the DPLA API shows that my institution, Montana State University Library, has so far contributed 1,632 records to the DPLA through the Mountain West Digital Library. We plan to increase our contribution by several more thousand records over the next several months, including records from the Acoustic Atlas, our newly-formed collection of natural sound recordings from the American West. My participation in the DPLAfest has served to strengthen my own interest and motivation for exploring, building, and creating with digital library objects, and I am now even more excited to see the DPLA’s collection and community continue to grow and strengthen.

To find out more about Scott, visit his personal website or follow him on Twitter.

District Dispatch: IMLS awards libraries National Medals

Thu, 2015-05-21 16:26

(Left to right) Cecil County Public Library Director Denise Davis, Cecil County Community Member Thomas Cousar and Michelle Obama with National Medal.

Earlier this week, First Lady Michelle Obama joined the Institute of Museum and Library Services (IMLS) Acting Director Maura Marx to present the 2015 National Medal for Museum and Library Service to ten exemplary libraries and museums National Medals for their service to their communities. Now in its 21st year, the National Medal is the nation’s highest honor conferred on libraries and museums, and celebrates institutions that make a difference for individuals, families, and communities.

National Medal recipients include:

(Left to right) Erica Jesonis, Chief Librarian for Information Technology; Morgan Miller, Assistant Director for Public Service; U.S. Rep. Andy Harris (R-MD); Denise Davis, Cecil County Public Library Director, Frazier Walker, Community Relations Specialist.

During the event, First Lady Michelle Obama said to the recipients: “The services that you all provide are not luxuries. Just the opposite. Every day your institutions are keeping so many folks in this country from falling through the cracks. In many communities our libraries and museums are the places that help young people dream bigger and reach higher for their futures, the places that help new immigrants learn English and apply for citizenship…the places where folks can access a computer and send out a job application so they can get back to work and get back to the important process of supporting their families.”

Denise Davis, director of the Cecil County Public Library in Elkton, Md., spoke about receiving the prestigious recognition:

Public libraries have a powerful role in creating opportunities by keeping the doors to knowledge open, allowing creativity to flourish, and never letting barriers become insurmountable.

The next deadline for nominating a library or museum is October 1, 2015. Learn more about the National Medal at www.imls.gov/medals.

The post IMLS awards libraries National Medals appeared first on District Dispatch.

Library of Congress: The Signal: The K-12 Web Archiving Program: Preserving the Web from a Youthful Point of View

Thu, 2015-05-21 15:00

This article is being co-published on the Teaching With the Library of Congress blog and was written by Butch Lazorchak and Cheryl Lederle.

If you believe the Web (and who doesn’t believe everything they read on the Web?), it boastfully celebrated its 25th birthday last year. Twenty-five years is long enough for the first “children of the Web” to be fully-grown adults, just now coming of age to recognize that the Web that grew up around them has irrevocably changed.

In this particular instance, change is good. It’s only by becoming aware of what we’re losing (or have already lost) that we’ll be spurred to action to preserve it. We’ve been aware of the value of the historic web for a number of years here at the Library of Congress, and we’ve worked hard to understand how to capture the Web through the Library’s Web Archiving program and the work we’ve done with partners at the Memento project and through the International Internet Preservation Consortium.

K-12 Web Archiving Program.

But let’s go back to those “children of the Web.” Nostalgia is a powerful driver for preservation, but most preservation efforts are driven by full-grown adults. If they’re able to bring a child’s perspective to their work it’s only through the prism of their own memory, and in any event, the nostalgic items they may wish to capture may not be around anymore by the time they get to them. What’s needed is not just a nostalgic memory of the web, but efforts to curate and capture the web with a perspective that includes the interests of the young. And who better to represent the interests of the young than children and teenagers themselves! Luckily the Library of Congress has such a program: the K-12 web archiving program.

The K-12 Web Archiving program has been operating since 2008, engaging dozens of schools and hundreds of students from schools, large and small, from across the U.S. in understanding what the Web means to them, and why it’s important to capture it. In partnership with the Internet Archive, the program enables schools to set up their own web capture tools and choose sets of web resources to collect; resources that represent the full range of youthful experience, including popular culture, commerce, news, entertainment and more.

Cheryl Lederle, an Educational Resource Specialist at the Library of Congress, notes that the program builds student awareness of the internet as a primary source as well as how quickly it can change. The program might best be understood through the reflections of participating teachers:

  • “The students gained an understanding of how history is understood through the primary sources that are preserved and therefore the importance of the selection process for what we are digitally preserving. But, I think the biggest gain was their personal investment in preserving their own history for future generations. The students were excited and fully engaged by being a part of the K-12 archiving program and that their choices were being preserved for their own children someday to view.” – MaryJane Cochrane, Paul VI Catholic High School
  • “The project introduced my students to historical thinking; awareness of digital data as a primary source and documentation of current events and popular culture; and helped foster an appreciation and awareness of libraries and historical archives.” – Patricia Carlton, Mount Dora High School

And participating students:

  • “Before this project, I was under the impression that whatever was posted on the Internet was permanent. But now, I realize that information posted on the Internet is always changing and evolving.”
  • “I find it very interesting that you can look back on old websites and see how technology has progressed. I want to look back on the sites we posted in the future to see how things have changed.”
  • “I was surprised by the fact that people from the next generation will also share the information that I have collected.”
  • “They’re really going to listen to us and let us choose sites to save? We’re eight!”

Collections from 2008-2014 are available for study on the K-12 Web Archiving site, and the current school year will be added soon. Students examining these collections might:

  • Compare one school’s collections from different years.
  • Compare collections preserved by students of different grade levels in the same year.
  • Compare collections by students of the same grade level, but from different locations.
  • Create a list of Web sites they think should be preserved and organize them into two or three collections.

What did your students discover about the value of preserving Web sites?

David Rosenthal: Unrecoverable read errors

Thu, 2015-05-21 15:00
Trevor Pott has a post at The Register entitled Flash banishes the spectre of the unrecoverable data error in which he points out that while disk manufacturers quoted Bit Error Rates (BER) for hard disks are typically 10-14 or 10-15, SSD BERs range from 10-16 for consumer drives to 10-18 for hardened enterprise drives. Below the fold, a look at his analysis of the impact of this difference of up to 4 orders of magnitude.

When a disk in a RAID-5 array fails and is replaced, all the data on other drives in the array must be read to reconstruct the data from the failed drive. If an unrecoverable read error (URE) is encountered in this process, one or more data blocks will be lost. RAID-6 and up can survive increasing numbers of UREs.

It has been obvious for some time that as hard disks got bigger without a corresponding decrease in BER that RAID technology had a problem, in that the probability of encountering a URE during reconstruction was going up, and thus so was the probability of losing data when a drive failed.As Trevor writes:
Putting this into rather brutal context, consider the data sheet for the 8TB Archive Drive from Seagate. This has an error rate of 10^14 bits. That is one URE every 12.5TB. That means Seagate will not guarantee that you can fully read the entire drive twice before encountering a URE.
Let's say that I have a RAID 5 of four 5TB drives and one dies. There is 12TB worth of data to be read from the remaining three drives before the array can be rebuilt. Taking all of the URE math from the above links and dramatically simplifying it, my chances of reading all 12TB before hitting a URE are not very good.
With 6TB drives I am beyond the math. In theory, I shouldn't be able to rebuild a failed RAID 5 array using 6TB drives that have a 10^14 BER. I will encounter a URE before the array is rebuilt and then I’d better hope the backups work.
So RAID 5 for consumer hard drives is dead.Well, yes, but RAID-5, and RAID in general, is just one rather simple form of erasure coding. There are better forms of erasure coding for long-term data reliability. I disagree with Trevor when he writes:
There are plenty of ways to ensure that we can reliably store data, even as we move beyond 8TB drives. The best way, however, may be to put stuff you really care about on flash arrays. Especially if you have an attachment to the continued use of RAID 5.Trevor is ignoring the economics. Hard drives are a lot cheaper for bulk storage than flash. As Chris Mellor pointed out in a post at The Register about a month ago, each byte of flash contains at least 50 times as much capital investment as a byte of hard drive. So it will be a lot more expensive, even if not 50 times as expensive. For the sake of argument, lets say it is 5 times as expensive. To a first approximation, cost increases linearly with the replication factor, but reliability increases exponentially. So, instead of a replication factor of 1.2 in a RAID-5 flash array, for the same money I can have a replication factor of 12.2 in a hard disk array. Data in the hard drive array would be much, much safer for the same money. Or suppose I used a replication factor of 2.5, the data would be a great deal safer for 40% of the cost.

DuraSpace News: NOW AVAILABLE: DSpace 5.2!

Thu, 2015-05-21 00:00
From Hardy Pottinger, on behalf of the DSpace 5.2 Release Team, and all the DSpace developers.   Winchester, MA  The DSpace developers are pleased to formally announce that DSpace 5.2 is now available. DSpace 5.2 is a bug-fix release and contains no new features. DSpace 5.2 can be downloaded immediately at either of the following locations:   • SourceForge: https://sourceforge.net/projects/dspace/files/

Ed Summers: SKOS and Wikidata

Wed, 2015-05-20 21:10

For #DayOfDH yesterday I created a quick video about some data normalization work I have been doing using Wikidata entities. I may write more about this work later, but the short version is that I have a bunch of spreadsheets with names in them (authors) in a variety of formats and transliterations, which I need to collapse into a unique identifier so that I can provide a unified display of the data per unique author. So for example, my spreadsheets have information for Fyodor Dostoyevsky using the following variants:

  • Dostoeieffsky, Feodor
  • Dostoevski
  • Dostoevski, F. M.
  • Dostoevski, Fedor
  • Dostoevski, Feodor Mikailovitch
  • Dostoevskii
  • Dostoevsky
  • Dostoevsky, Fiodor Mihailovich
  • Dostoevsky, Fyodor
  • Dostoevsky, Fyodor Michailovitch
  • Dostoieffsky
  • Dostoieffsky, Feodor
  • Dostoievski
  • Dostoievski, Feodor Mikhailovitch
  • Dostoievski, Feodore M.
  • Dostoievski, Thedor Mikhailovitch
  • Dostoievsky
  • Dostoievsky, Feodor Mikhailovitch
  • Dostoievsky, Fyodor
  • Dostojevski, Feodor
  • Dostoyeffsky
  • Dostoyefsky
  • Dostoyefsky, Theodor Mikhailovitch
  • Dostoyevski, Feodor
  • Dostoyevsky
  • Dostoyevsky, Fyodor
  • Dostoyevsky, F. M.
  • Dostoyevsky, Feodor Michailovitch
  • Dostoyevsky, Feodor Mikhailovich

So, obviously, I wanted to normalize these. But I also want to link the name up to an identifier that could be useful for obtaining other information, such as an image of the author, a description of their work, possibly link to works by the author, etc. I’m going to try to map the authors to Wikidata, largely because there are links from Wikidata to other places like the Virtual International Authority File, and Freebase, but there are also images on Wikimedia Commons, and nice descriptive text for the people. As an example here is the Wikidata page for Dostoyevsky.

To aid in this process I created a very simple command line tool and library called wikidata_suggest which uses Wikidata’s suggest API to interactively match up a string of text to a Wikidata entity. If Wikidata doesn’t have any suggestions as a fallback the utility looks in a page of Google’s search results for a Wikipedia page and then will optionally let you use that text.

SKOS

Soon after tweeting about the utility and the video I made about it I heard from Alberto who works on the NASA Astrophysics Data System and was interested in using wikidata_suggest to try to link up the Unified Astronomy Thesaurus to Wikidata.

@libcce map UAT to @wikidata? https://t.co/sqyPRdqd9U

— Alberto Accomazzi (@aaccomazzi)

May 20, 2015

Fortunately the UAT is made available as a SKOS RDF file. So I wrote a little proof of concept script named skos_wikidata.py that loads a SKOS file, walks through each skos:Concept and asks you to match the skos:prefLabel to Wikidata using wikidata_suggest. Here’s a quick video I made of what this process looks like:

I guess this is similar to what you might do in OpenRefine, but I wanted a bit more control over how the data was read in, modified and matched up. I’d be interested in your ideas on how to improve it if you have any.

It’s kind of funny how Day of Digital Humanities quickly morphed into Day of Astrophysics…

Nicole Engard: Bookmarks for May 20, 2015

Wed, 2015-05-20 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Coffitivity Coffitivity recreates the ambient sounds of a cafe to boost your creativity and help you work better.

Digest powered by RSS Digest

The post Bookmarks for May 20, 2015 appeared first on What I Learned Today....

Related posts:

  1. New: The Journal of Library Innovation
  2. Google Buys Feedburner – it’s official
  3. Monitor your sites

OCLC Dev Network: WMS Collection Management API - Problems with Write Operations

Wed, 2015-05-20 20:15

We have discovered problems with WMS Collection Management API write operations - including both Create and Update - and are advising users of this web service to limit their usage to the Read and Search operations only for now.

District Dispatch: Panel to discuss ebook lending growth at 2015 ALA Annual Conference

Wed, 2015-05-20 18:00

A leading panel of library and publishing experts will provide an update on the library ebook lending market and discuss best ways for libraries to advance library access to digital content at the 2015 American Library Association’s (ALA) Annual Conference in San Francisco. The interactive session, “Making Progress in Digital Content,” takes place from 10:30 to11:30a.m. on Sunday, June 28, 2015. The session will be held at the Moscone Convention Center in room 2018 of the West building.

During the session, an expert panel of library leaders from ALA’s Digital Content Working Group (DCWG) will provide insights on the most promising opportunities available to advance library access to digital content. Organizational leaders will discuss ALA’s efforts toward exploiting digital content access opportunities. Audience input will be sought to inform ALA priorities in this area. The program features DCWG co-chairs Carolyn Anthony and Erika Linke, along with additional guest panelists.

Speakers
  • Carolyn Anthony, co-chair, ALA Digital Content Working Group; director, Skokie Public Library (Illinois); immediate past-president, Public Library Association
  • Erika Linke, co-chair, ALA Digital Content Working Group; associate dean of Libraries and director of Research and Academic Services, Carnegie Mellon University Libraries

View all ALA Washington Office conference sessions

The post Panel to discuss ebook lending growth at 2015 ALA Annual Conference appeared first on District Dispatch.

LITA: Jobs in Information Technology: May 20, 2015

Wed, 2015-05-20 17:24

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Director of Information Technology, Douglas County Libraries, Castle Rock, CO

Associate Product Owner, The Library Corporation (TLC), Inwood, WV

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Jonathan Rochkind: “First Rule of Usability? Don’t Listen to Users”

Wed, 2015-05-20 15:23

A 15-year-old interesting brief column from noted usability expert Jakob Nielsen, which I saw posted today on reddit:  First Rule of Usability? Don’t Listen to Users

Summary: To design the best UX, pay attention to what users do, not what they say. Self-reported claims are unreliable, as are user speculations about future behavior. Users do not know what they want.

I’m reposting here, even though it’s 15 years old, because I think many of us haven’t assimilated this message yet, especially in libraries, and it’s worth reviewing.

An even worse version of trusting users self-reported claims, I think, is trusting user-facing librarians self-reported claims about what they have generally noticed users self-reporting.  It’s like taking the first problem and adding a game of ‘telephone’ to it.

Nielsen’s suggested solution?

To discover which designs work best, watch users as they attempt to perform tasks with the user interface. This method is so simple that many people overlook it, assuming that there must be something more to usability testing. Of course, there are many ways to watch and many tricks to running an optimal user test or field study. But ultimately, the way to get user data boils down to the basic rules of usability:

  • Watch what people actually do.
  • Do not believe what people say they do.
  • Definitely don’t believe what people predict they may do in the future.

Yep. If you’re not doing this, start. If you’re doing it, you probably need to do it more.  Easier said than done in a typical bureaucratic inertial dysfunctional library organization, I realize.

It also means we have a professional obligation to watch what the users do — and determine how to make things better for them. And then watch again to see if it did. That’s what makes us professionals. We can not simply do what the users say, it is an abrogation of our professional responsibility, and does not actually produce good outcomes for our patrons. Again, yes, this means we need library organizations that allow us to exersize our professional responsibilities and give us the resources to do so.

For real, go read the very short article. And consider what it would mean to develop in libraries taking this into account.


Filed under: General

Pages