You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 6 hours 26 min ago

FOSS4Lib Updated Packages: OpenScholar

Tue, 2015-04-07 13:26

Last updated April 7, 2015. Created by Peter Murray on April 7, 2015.
Log in to edit this page.

OpenScholar is open source software built on top of Drupal that allows end users to easily create dynamic and customizable academic web sites. Each site comes with a suite of apps, widgets and themes, enabling users to build and manage feature-rich web sites.

OpenScholar is developed and maintained by The Institute for Quantitative Social Science in collaboration with HPAC and HUIT at Harvard University with contribution from open source community

Package Type: Content Management SystemLicense: GPLv2 Package Links Releases for OpenScholar Open Hub Link: https://openhub.net/p/openscholar-harvardOpen Hub Stats Widget: works well with: Drupal

Mark E. Phillips: Metadata Edit Events: Part 6 – Average Edit Duration by Facet

Tue, 2015-04-07 12:29

This is the sixth post in a series of posts related to metadata edit events for the UNT Libraries’ Digital Collections from January 1, 2014 to December 31, 2014.  If you are interested in the previous posts in this series,  they talked about the when, what, who, duration based on time buckets and finally calculating the average edit event time.

In the previous post I was able to come up with what I’m using as the edit event duration ceiling for the rest of this analysis.  This means that the rest of the analysis in this post will ignore the events that took longer than 2,100 seconds this leaves us with 91,916 (or 97.6% of the original dataset) valid events to analyze after removing 2,306 that had a duration of over 2,100.

Editors

The table below is the user stats for our top ten editors once I’ve ignored items over 2,100 seconds.

username                                    min max edit events duration sum mean stddev htarver 2 2,083 15,346 1,550,926 101.06 132.59 aseitsinger 3 2,100 9,750 3,920,789 402.13 437.38 twarner 5 2,068 4,627 184,784 39.94 107.54 mjohnston 3 1,909 4,143 562,789 135.84 119.14 atraxinger 3 2,099 3,833 1,192,911 311.22 323.02 sfisher 5 2,084 3,434 468,951 136.56 241.99 cwilliams 4 2,095 3,254 851,369 261.64 340.47 thuang 4 2,099 3,010 770,836 256.09 397.57 mphillips 3 888 2,669 57,043 21.37 41.32 sdillard 3 2,052 2,516 1,599,329 635.66 388.3

You can see that many of these users have very short edit times for their lowest edits and all but one have edit times for the maximum that approach the duration ceiling.  The average amount of time spent per edit event ranges from 21 seconds to 10 minutes and 35 seconds.

I know that for user mphillips (me) the bulk of the work I tend to do in the edit system is fixing quick mistakes like missing language codes, editing dates that aren’t in Extended Data Time Format (EDTF) or hiding and un-hiding records.  Other users such as sdillard have been working exclusively on a project to create metadata for a collection of Texas Patents that we are describing in the Portal.

 Collections

The top ten most edited collections and their statistics are presented below.

Collection Code Collection Name min max edit events duration sum mean stddev ABCM Abilene Library Consortium 2 2,083 8,418 1,358,606 161.39 240.36 JBPC Jim Bell Texas Architecture Photograph Collection 3 2,100 5,335 2,576,696 482.98 460.03 JJHP John J. Herrera Papers 3 2,095 4,940 1,358,375 274.97 346.46 ODNP Oklahoma Digital Newspaper Program 5 2,084 3,946 563,769 142.87 243.83 OKPCP Oklahoma Publishing Company Photography Collection 4 2,098 5,692 869,276 152.72 280.99 TCO Texas Cultures Online 3 2,095 5,221 1,406,347 269.36 343.87 TDNP Texas Digital Newspaper Program 2 1,989 7,614 1,036,850 136.18 185.41 TLRA Texas Laws and Resolutions Archive 3 2,097 8,600 1,050,034 122.1 172.78 TXPT Texas Patents 2 2,099 6,869 3,740,287 544.52 466.05 TXSAOR Texas State Auditor’s Office: Reports 3 1,814 2,724 428,628 157.35 142.94 UNTETD UNT Theses and Dissertations 5 2,098 4,708 1,603,857 340.67 474.53 UNTPC University Photography Collection 3 2,096 4,408 1,252,947 284.24 340.36

This data is a little easier to see with a graph.

Average edit duration per collection

Here is my interpretation of what I see in these numbers based on personal knowledge of these collections.

The collections with the highest average duration are the TXPT and JBPC collection,  these are followed by the UNTETD, UNTPC, TCP and JJHP collections.  The first two (Texas Patents (TXPT) and Jim Bell Texas Architecture Photograph Collection (JBPC) are example of collections that were having metadata records created for the first time via our online editing system.  These collections generally required more investigation (either by reading the patent or researching the photograph) and therefore took more time on average to create the records.

Two of the others, the UNT Theses and Dissertation Collection (UNTETD) and the UNT Photography Collection (UNTPC) involved an amount of copy cataloging for the creation of the metadata either from existing MARC records or local finding aids.  TheJohn J. Herrera Papers (JJHP) involved,  I believe,  a working with an existing finding aid,  and I know that there was a two step process of creating the record,  and then publishing it as unhidden in a different event,  therefore lowering the average time considerably.  I don’t know that much about the Texas Cultures Online (TCO) work in 2014 to be able to comment there.

On the other end of of the spectrum you have collections like ABCM, ODNP, OKPCP, and TDNP that were projects that averaged a much shorter amount of time on records.  For these there were many small edits to the records that were typically completed one field at a time.  For some of these it might have just involved fixing a consistent typo,  adding the record to a collection or hiding or un-hiding it from public view.

This raises a question for me,  is it possible to detect the “kind” of edits that are being made based on their average edit times?  That’s something to look at.

Partner Institutions

And now the ten partner institutions that had the most metadata edit events.

Partner Code Partner Name min max edit events duration sum mean stddev UNTGD UNT Libraries Government Documents Department 2 2,099 21,342 5,385,000 252.32 356.43 OKHS Oklahoma Historical Society 4 2,098 10,167 1,590,498 156.44 279.95 UNTA UNT Libraries Special Collections 3 2,099 9,235 2,664,036 288.47 362.34 UNT UNT Libraries 2 2,098 6,755 2,051,851 303.75 458.03 PCJB Private Collection of Jim Bell 3 2,100 5,335 2,576,696 482.98 460.03 HMRC Houston Metropolitan Research Center at Houston Public Library 3 2,095 5,127 1,397,368 272.55 345.62 HPUL Howard Payne University Library 2 1,860 4,528 544,420 120.23 113.97 UNTCVA UNT College of Visual Arts + Design 4 2,098 4,169 1,015,882 243.68 364.92 HSUL Hardin-Simmons University Library 3 2,020 2,706 658,600 243.39 361.66 HIGPL Higgins Public Library 2 1,596 1,935 131,867 68.15 118.5

Again presented as a simple chart.

Average edit duration per partner.

It is easy to see the difference between the Private Collection of Jim Bell (PCJB) with an average of 482 seconds or roughly 8 minutes per edit and the Higgins Public Library (HIGPL)  which had an average of 68 seconds, or just over one minute.  In the first case with the Private Collection of Jim Bell (PCJB),  we were active in creating records for the first time for these items and the average of eight minutes seems to track with what one would imagine it takes to create a metadata record for a photograph.  The Higgins Public Library (HIGPL) collection is a newspaper collection that had a single change in the physical description made to all of the items in that partner’s collection.  Other partners between these two extremes and have similar characteristics with the lower edit averages happening for partner’s content that is either being edited in a small way, hidden or un-hidden from view.

Resource Type

The final way we will slice the data for this post is by looking at the stats for the top ten resource types.

resource type min max count sum mean stddev image_photo 2 2,100 30,954 7,840,071 253.28 356.43 text_newspaper 2 2,084 11,546 1,600,474 138.62 207.3 text_leg 3 2,097 8,604 1,050,103 122.05 172.75 text_patent 2 2,099 6,955 3,747,631 538.84 466.25 physical-object 2 2,098 5,479 1,102,678 201.26 326.21 text_etd 5 2,098 4,713 1,603,938 340.32 474.4 text 3 2,099 4,196 1,086,765 259 349.67 text_letter 4 2,095 4,106 1,118,568 272.42 326.09 image_map 3 2,034 3,480 673,707 193.59 354.19 text_report 3 1,814 3,339 465,168 139.31 145.96

Average edit duration for the top ten resource types

The resource type that really stands out in this graph is the text_patents at 538 seconds per record.  These items belong to the Texas Patent Collection and they were loaded into the system with very minimal records and we have been working to add new metadata to these resources.  The almost ten minutes per record seems to be very standard for the amount of work that is being done with the records.

The text_leg collection is one that I wanted to take another quick look at.

If we calculate the statistics for the users that edited records in this collection we get the following data.

username                                    min max count sum mean stddev bmonterroso 3 1,825 890 85,254 95.79 163.25 htarver 9 23 5 82 16.4 5.64 mjohnston 3 1,909 3,309 329,585 99.6 62.08 mphillips 5 33 30 485 16.17 7.68 rsittel 3 1,436 654 22,168 33.9 88.71 tharden 3 2,097 1,143 213,817 187.07 241.2 thuang 4 1,812 2,573 398,712 154.96 227.7

Again you really see it with the graph.

Average edit duration for users who edited records that were the text_leg resource type

In this you see that there were a few users (htarver, mphillips, rsittel) who brought down the average duration because they had very quick edits while the rest of the editors either averaged right around 100 seconds per edit average or around two minutes per edit average.

I think that there is more to do with these numbers,  I think calculating the average total duration for a given metadata record in the system as edits are performed on it will be something of interest for a later post. So check back for the next post in this series.

As always feel free to contact me via Twitter if you have questions or comments.

Galen Charlton: Preserving the usefulness of the Hugo Awards as a selection tool for libraries

Tue, 2015-04-07 11:21

The Hugo Awards have been awarded by the World Science Fiction Convention for decades, and serve to recognize the works of authors, editors, directors – fans and professionals – in the genres of science fiction and fantasy.  The Hugos are unique in being a fan-driven award that has as much process – if not more – as juried awards.

That process has two main steps.  First, there’s a nomination period where members of Worldcon select works to appear on the final ballot. Second, members of the upcoming Worldcon vote on the final ballot and the awards are given out at the convention.

Typically, rather more folks vote on the final ballot than nominate – and that means that small, organized groups of people can unduly influence the nominations.  However, there’s been surprisingly few attempts to actually do that.

Until this year.

Many of the nominations this year match the slates of two groups, the “Sad Puppies” and the “Rabid Puppies.”  Not only that, some of the categories contain nothing but Puppy nominations.

The s.f. news site File 770 has a comprehensive collection of back-and-forth about the matter, but suffice it so say that the Puppy slates are have a primarily political motivation – and one, in the interests of full disclosure, that I personally despise.

There are a lot of people saying smart things about the situation, so I’ll content myself with the following observation:

Slate nominations and voting destroy the utility of the Hugo Award lists for librarians who select science fiction and fantasy.

Why? Ideally, the Hugo process ascertains the preferences of thousands of Worldcon members to arrive at a general consensus of science fiction and fantasy that is both good and generally appealing.  As it happens, that’s a pretty useful starting point for librarians trying to round out collections or find new authors that their patrons might like – particularly for those librarians who are not themselves fans of the genre.

However, should slate voting become a successful tactic, the Hugo Awards are in danger of ending up simply reflecting which factions in fandom are best able to game the system.  The results of that… are unlikely to be all that useful for librarians.

Here’s my suggestion for librarians who are fans of science fiction and fantasy and who want to help preserve a collection development tool: get involved.  In particular:

  1. Join Worldcon. A $40 supporting membership suffices to get voting privileges.
  2. Vote on the Hugos this year. I won’t tell you who the vote for, but if you agree with me that slate nominations are a problem, consider voting accordingly.
  3. Next year, participate in the nomination process. Don’t participate in nomination slates; instead, nominate those works that you think are worthy of a Hugo – full stop.

HangingTogether: Working in Shared Files

Tue, 2015-04-07 10:00

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by John Riemer of UCLA. Working in shared files is a critical efficiency to free up time to address new metadata needs and roles. Metadata managers who need to allocate staff to cover more objects of interest to researchers in the information landscape and at the same time preserve metadata describing this material have every incentive to consider working collaboratively, in shared files.

Libraries have tended to treat WorldCat as a resource to be further edited locally. The 2009 report Study of the North American MARC Record Marketplace bemoaned the “widespread resistance to the idea of simply accepting the work of another library.”  We have been saddled with hundreds of copies of records across libraries and constrained to limit the amount of catalog maintenance done. When Kurt Groetsch described how Google was attempting to take advantage of library-created metadata during the 2010 ALA Midwinter meeting, he noted they “would like to find a way to get corrected records back into the library data ecosystem so that they don’t have to fix them again.” The linked data environment offers a new opportunity to create and maintain metadata only once and simply pointed to by all interested parties.

The discussions revolved around these themes:

Sharing edited records: In general, staff focus on only editing records that affect access points. Most libraries accept vendor records or records for shelf-ready books without review. Vendor records may need to be modified for the data to be consistent and linked. Vendor records are of varying quality, some of which hinder access. It was suggested that libraries can advocate vendors’ contracting the metadata creation with OCLC as part of their purchase negotiations. [Note: Focus group member Carlen Ruschoff of University of Maryland served on the cross-industry group that identified problems in the data quality in the content supply chain and gave practical recommendations for improved usage, discovery and access of e-content. See the 2014 OCLC white paper, Success Strategies for Electronic Content Discovery and Access.]

Policies and practices have been put in place to stop staff from doing what they don’t have to do. “Reuse rather than modify.” But it can be difficult to stop some staff from investing in correcting minor differences between AACR2 and RDA that don’t matter, such as pagination. One approach is to assign those staff important tasks (create metadata for a new digital collection for example) so that they just don’t have time to take on these minor tasks as well. Not everyone can accept records “as is”, but with all the effort the community has invested in common cataloging standards and practices, if we all “do it right the first time” we should be able to accept others’ records without review or editing.

When edits are applied to local system records, or other databases such as national union catalogs, the updated records are not contributed to WorldCat. The University of Auckland uses four databases: the local database, the New Zealand National Union Catalogue, WorldCat and the Alma “community zone” available only to other Ex Libris catalogers. When Library of Congress records are corrected in WorldCat, the corrections are not reflected in the LC database. When OCLC loads LC’s updated records, any changes that had been made in the WorldCat records are wiped out. We need to get better at synchronizing data with WorldCat. Perhaps updated “statements” can be shared more widely in a linked data environment?

Sharing data in centralized and distributed models: Discussants were divided whether a centralized file would be needed in a future linked data environment where WorldCat became a place where people could simply point to. Developers say there is no need for a centralized file; data could be distributed with peer-to-peer sharing. Others feel that a centralized file provides provenance, and thus confidence and trustworthiness. How would you be able to gauge trustworthiness if you don’t have that provenance pointing you to an authoritative source?

The OCLC Expert Community expanded the pool of labor able to make contributions to the WorldCat master records. This offers a new opportunity for focus group members who have been working primarily in their local systems. OCLC’s discontinuation of Institution Records is prompting some focus group members who have been using them to rethink their workflows, determine what data represents “common ground” and consider using WorldCat as the database of record. The OCLC WorldShare Metadata Collection Manager treats WorldCat records as a database of record and allows libraries to receive copies of changed records. It was noted that controlling WorldCat headings by linking to the authority file obviates the need for “authority laundering” by third-parties.

Importance of provenance: Certain sources are more trusted and give catalogers confidence in their accuracy. Libraries often have a list of “preferred sources” (also known as “white lists”.)  Some select sources based on the type of material that is being cataloged, for example, Oxford, Yale and Harvard were mentioned as a trusted source for copy cataloging old books on mathematics. Another criteria is to choose the WorldCat record which has the most holdings as source copy.

Sharing statements: Everyone welcomes the move to use identifiers instead of text strings. Identifiers could solve the problem of names appearing in documents harvested from the Web, electronic theses and dissertations, encoded archival aids, etc. not matching those used in catalog records and the authority file. Different statements might be correct in their own contexts; it would be up to the individual or library which one to use, based on what you want to present to your users. In a linked data world one can swap one set of identifiers with another set of identifiers if you want to make local changes. In the aggregate, there would be tolerance for “conflicting statements” which might represent different world views; at the local implementation level you may want to select the statements from your preferred sources. Librarians can share their expertise by establishing the relationships between and among statements from different sources.

Some consider creating identifiers for names as one of their highest priorities, spurred by the increased interest in Open Access. For researchers not represented in authority files, libraries have started considering implementing ORCIDs or ISNIs. [See the 2014 OCLC Research report, Registering Researchers in Authority Files.]

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (57)

Open Library Data Additions: Amazon Crawl: part ga

Tue, 2015-04-07 01:44

Part ga of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

District Dispatch: Advocacy works: Broad number of legislators back library funding

Mon, 2015-04-06 20:33

photo by Dwayne Bent

Each year around this time, Appropriations Committees in both chambers of Congress begin their cycle of consideration and debate of what federal programs will be funded the following year. For both political and fiscal reasons, the process is marked by tremendous competition for a limited and often shrinking “pie” of Appropriations dollars.

In this environment, demonstrating early, strong and bipartisan support of federal library programs by as many Members of Congress as possible is vital to giving critical programs such as the Library Services and Technology Act (LSTA) and Innovative Approaches to Literacy (IAL) the best possible chance of being funded at the highest possible level in the coming year as part of the “Labor, Health and Human Services, Education, and Related Agencies” Appropriations bill.  That crucial Member support for LSTA and IAL is best shown by them signing on to what are called “Dear Appropriator” letters drafted each year by congressional library champions in the U.S. House and U.S. Senate. These letters, sent to every member of the two Appropriations Committees, “make the case” for robust LSTA and IAL funding and put budget “hawks” (who often seek to eliminate domestic discretionary programs, such as LSTA and IAL on notice of the importance and broad support for these programs nationwide.

This year for LSTA, Sens. Jack Reed (D-RI) and Susan Collins (R-ME) and Representative Raul Grijalva (D-AZ) spearheaded efforts to gather signatures on two separate letters for each chamber of Congress expressing support for LSTA.  ALA also wishes to particularly thank Sens. Jack Reed (D-RI), Roger Wicker (R-MS), Charles Grassley (R-IA), and Debbie Stabenow (D-MI) for leading efforts in that chamber and Representatives Eddie Bernice Johnson (D-TX), Don Young (R-AK), and James McGovern (D-MA) for their leadership in the House for the IAL letters.

In response to alerts by the American Library Association’s (ALA) Washington Office, more than 2,100 librarians across the country sent a total of nearly 6,300 emails to almost every Member of Congress (487 of 533) asking for their signatures on these crucial “Dear Appropriator” letters and the results in all cases topped last year’s figures. Ultimately, 32 Senators and 70 Members of the House supported LSTA, while 29 Senators and 128 Representatives backed IAL. View final versions of all four “Dear Appropriator” letters supporting LSTA and IAL in the Senate and House: Senate LSTA (pdf), Senate IAL (pdf), House LSTA (pdf), House IAL (pdf).

The current Appropriations process will be a long and, for LSTA and IAL, potentially very bumpy road.  However, thanks to our Congressional champions and librarians everywhere, we’ve made a great beginning.  Fasten your seat belts and stay tuned for word of what’s around the next bend.

Please thank your Representative and Senators if they signed any of the letters.

The post Advocacy works: Broad number of legislators back library funding appeared first on District Dispatch.

PeerLibrary: Free From Our Alphabetic Cage

Mon, 2015-04-06 20:02

What is a logo?


Is a logo a representation of an organization’s values, goals, strengths, heart, and solidarity on the cause of lubricating the annals of academic knowledge and the communication of it to the people of the third planet from the sun?



PeerLibrary used to be represented by the letter, “P,” but could that honestly describe an organization which seeks to change not merely academic literature’s presentation to the masses, but honestly the universe as a whole? In the mind’s eye of PeerLibrary (because we are busy ameliorating the wrongs of modern society), we are making EVERYTHING change.


PeerLibrary was once described by the now apparent shallowness that is the letter, “P.” Insanity. We tried to represent a fundamentally brave, bold, and brilliant burst out of the box that academia has been hoping for since the advent of the intellectual superfreeway that is the interwebz, and now we see that we must move on. Additionally, the word pronounced as, “Pee,” is simply not representative of an organization that seeks to not waste the full power of academic literature nor the electronic superverse. PeerLibrary is not some yellow-green, warm, odorous entity wishing to be routinely expelled from users, but instead something engaging and enthralling that will not let the user let it go. Users will not turn away and slam the door on PeerLibrary because PeerLibrary will never bother them anyway. PeerLibrary will let them in and let them see something that they want to know and will not wish to let go.


Why would participants in the social experiment that is PeerLibrary not wish to let it go? Simply put, they want to get the most out of academic literature. At the physical level, academic literature is just a list of words and figures that researchers combined to describe their research. In order for an individual to turn this into something useful for themselves, they would want to comprehend the background of the topic, the direction the researchers decided to take investigation and why, the setup and results of their experimentation, in addition to the author’s conclusion on the supposition investigated. Post-comprehension, the viewer may wish to replicate the experiment, or design their own experiment. In both of these phases of academic literature review, scholars may want to discuss their thoughts and interpretations of the material with others. This desire could stem from an enjoyment of the accompaniment of an arrangement of folks or merely from a perspective that deep understanding comes most effectively from a discussion rather than instruction.


This is the power of PeerLibrary: to take the traditional library ideology of transferring knowledge from source-to-person, and expanding it to source-to-people, which is now technologically empowered.


So, as you now see before you, our logo is thus a book, one page text, the other a web. Alas, representation.

District Dispatch: Silly rulemaking; unworkable solution for libraries

Mon, 2015-04-06 19:24

ATS Cine Projector Operators, Aldershot, Hampshire, England, UK, 1941

The U.S. Copyright Office posted reply comments for this year’s round of the triennial 1201 rulemaking. The Library Copyright Alliance (LCA), a coalition of U.S. library associations of which ALA is a member, filed initial comments (pdf) in February requesting an exemption to circumvent digital technology employed by rights holders when technological protection measures (TPMs) prevent users from exercising a lawful use, such as a fair use. LCA argued for an exemption so faculty and students at non-profit educational institutions can bypass technology (content scrambling system (CSS)) on DVDs in order to make a clip to show in the classroom or for close analysis and research. In this year’s request, LCA joined the American Association of University Professors (AAUP), the College Art Association (CAA), the International Communication Association (ICA) and others requesting that in addition to the renewal of the DVD exemption that the rule should be expanded all media formats including Blu-Ray discs. In the reply comment phase, the rights holders make their case why circumvention requests should not be allowed.

If lawyers are paid by the word, some are doing well financially (not that there is anything wrong with that). The lengthy comments, at least in the past, have always come from lawyers representing the content community. When I saw the 85-page comment (pdf) from Steve Metalitz representing the Joint Owners and Creators—aka the motion picture and recording industry companies—I thought, oh geez.

But it turned out that the comment section was the shortest ever submitted by Metalitz—only 12 pages! The rest of the submission was devoted to “exhibits” of articles and advertisements of various streaming and downloading services available in the marketplace like VUDU and Netflix. One exhibit provides instructions on how to embed a video in a PowerPoint presentation. How very helpful, but what does this have to do with the rulemaking?

The Joint Owners and Creators state that “the confidence afforded by the security of TPMs, and the flexibility in business models that such TPMs enable, are essential marketplace pillars which have led creators of motion pictures to expand their streaming and downloading options and to experiment with a broad range of business models to increase access to their works, such that some films can now be purchased and digitally downloaded before they are made available on physical discs.”

They go on to suggest that the Warner Brothers Archive, Disney Movies Now, UltraViolet digital storage locker services and the like are services that educators can use for film clips, making most circumvention unnecessary. Really? All of these services are available via license agreements that restrict access to “personal, non-commercial use.” If educators did use these services for non-profit educational, public performances, they would be in violation of the non-negotiated, click on contract. (You would think experienced intellectual property [sic] lawyers would know that and maybe read the terms of service, but hey, I am just a librarian).

Marketplace solutions like non-negotiated contracts for Hollywood content are not solutions for libraries and non-profit educational institutions because they are written with only the individual consumer in mind. TPMs have not enabled business models that work for libraries and educators. Alas, we have no market pillar. Librarians and educators cannot do their jobs when license agreements have erased fair use and other copyright exceptions from existence.

In the 2005 triennial rulemaking, the content community argued that instead of circumventing technology on DVDs to extract clips, that users go into a darkened room with a video recorder and copy the clips they needed from the television screen as the DVD is played. They played a demonstration video at the public meeting. That suggestion still remains at the top of the list for craziest ideas proposed during a rulemaking. But proposing the use of services not even legally available to educators and librarians makes a close second.

The post Silly rulemaking; unworkable solution for libraries appeared first on District Dispatch.

HangingTogether: Champion Revealed! Real-ly!

Mon, 2015-04-06 19:19

OCLC Research Collective Collections Tournament

#oclctourney

The 2015 OCLC Research Collective Collections Tournament Champion is …

[Click to Enlarge]

Our final round of competition tried to “keep it real” – realia, that is. Realia are “three-dimensional objects from real-life”, which can mean anything from valuable historical artifacts to … well, not-so-valuable yet interesting objects from all corners of everyday life: games, teaching/learning aids, models, musical instruments, memorabilia … and occasionally some just plain strange stuff! Check out this New York Times article for a sample of the fascinating and unexpected realia some libraries hold in their collections.

Our tournament Finals pitted Conference USA against Atlantic 10 to see who has the most realia in their collective collection.* In the end, it was no contest … Atlantic 10 won easily, with 1,578 distinct objects compared to 980 objects for Conference USA. Congratulations to Atlantic 10, your Collective Collections Tournament Champion!

So what kinds of realia do our Finals participants harbor in their respective collective collections? Our runner-up Conference USA offers a number of unusual items, such as a specimen of a stamp used under the Stamp Act of 1765 (Florida Atlantic University); a motorized solar system and planetarium model (University of Southern Mississippi); and a 1937 Luftwaffe-issue jam jar (University of North Texas).  Our Tournament Champion Atlantic 10 features such oddities as a set of giant inflatable nocturnal creatures (University of Rhode Island); a plaster cast of the head of political activist Mario Savio; and a bowl made from a vinyl record of Bob Dylan’s “Greatest Hits” album (both at La Salle University). Keep your eyes peeled next time you’re in the library; you never know what will be on the shelves!

Bracket competition participants: Nobody picked the winning conference!!! We will have a random drawing among all entrants to determine who wins the big prize! The winner will be announced on April 8. Stay tuned!

 

*Number of items cataloged as “realia” in each conference’s collective collection. Data is current as of January 2015.

More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

Round of 16: The plot thickens … and so do the books

Round of 8: Peaches and Pumpkins

The Semi-Finals

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (14)

Library of Congress: The Signal: Residents Chosen for NDSR 2015 in Washington, DC

Mon, 2015-04-06 17:46

We are pleased to announce that the Washington, DC National Digital Stewardship Residency class for 2015 has now been chosen! Five very accomplished people have been selected from a highly competitive field of candidates. The new residents will arrive in Washington, DC this June to begin the program. Updates on the program, including more information on the resident projects, will be published in The Signal during the coming months.

The new residents are listed in the Library of Congress press release below:

2015 Class of National Digital Stewardship Residents Selected

The Library of Congress, in conjunction with the Institute of Museum and Library Services, has named five members to the 2015 class of the National Digital Stewardship Residency program. The 12-month program begins in June 2015.

The NDSR program offers recent master’s degree graduates/doctoral candidates in specialized fields–library science, information science, museum studies, archival studies and related technology–the opportunity to gain valuable professional experience in digital preservation. Residents will start the program with an intensive digital stewardship workshop at the Library of Congress, followed by specialized project work at one of five host institutions in the Washington, D.C. area. The projects will allow them to acquire hands-on knowledge and skills regarding collection, selection, management, long-term preservation and accessibility of digital assets.

The residents listed were selected by a committee of experts from the Library of Congress, the Institute of Museum and Library Services and other organizations, including the host institutions:

  • John Caldwell of Lutherville, Maryland. Caldwell, who has studied at the University of Maryland, will be resident in the U.S. Senate Historical Office to study and assess current Senate workflows in appraisal, management, ingest, description and transfer of digital assets. He will benchmark current policies against best practices.
  • Valerie Collins of Eagle River, Alaska. Collins, who has studied at Dalhousie University, will be resident at the American Institute of Architects to co-lead testing and implementation of an institutional digital repository system to preserve born-digital records that represent AIA’s intellectual capital or that have permanent value for the history of the architectural profession.
  • Nicole Contaxis of Easton, Connecticut. Contaxis, who has studied at the University of California, Los Angeles, will be resident at the National Library of Medicine to create a pilot workflow for the curation, preservation and presentation of a historically valuable software product developed by the National Library of Medicine which is deemed to be historically noteworthy due to its usage by a user community and/or its distinctive technical properties. This is at risk of being lost due to obsolescence.
  • Jaime Mears of Deltaville, Virginia. Mears, who has studied at the University of Maryland, will be resident at the D.C. Public Library to create a sustainable, public-focused lab, tools, and instruction for building public knowledge and skills around the complex problems of personal digital recordkeeping.
  • Jessica Tieman of Lincoln, Illinois. Tieman, who has studied at the University of Illinois at Urbana-Champaign, will be resident in the Government Publishing Office; to certify GPO’s Federal Digital System as a Trustworthy Digital Repository and to conduct an internal audit to help achieve the goal of certification.

For more information about the National Digital Stewardship Residency program, including information about how to be a host, partner or resident for next year’s class, visit www.loc.gov/ndsr/.

PeerLibrary: Free From Our Alphabetic Cage

Mon, 2015-04-06 17:01

What is a logo?


Is a logo a representation of an organization’s values, goals, strengths, heart, and solidarity on the cause of lubricating the annals of academic knowledge and the communication of it to the people of the third planet from the sun?


PeerLibrary used to be represented by the letter, “P,” but could that honestly describe an organization which seeks to change not merely academic literature’s presentation to the masses, but honestly the universe as a whole? In the mind’s eye of PeerLibrary (because we are busy ameliorating the wrongs of modern society), we are making EVERYTHING change.


PeerLibrary was once described by the now apparent shallowness that is the letter, “P.” Insanity. We tried to represent a fundamentally brave, bold, and brilliant burst out of the box that academia has been hoping for since the advent of the intellectual superfreeway that is the interwebz, and now we see that we must move on. Additionally, the word pronounced as, “Pee,” is simply not representative of an organization that seeks to not waste the full power of academic literature nor the electronic superverse. PeerLibrary is not some yellow-green, warm, odorous entity wishing to be routinely expelled from users, but instead something engaging and enthralling that will not let the user let it go. Users will not turn away and slam the door on PeerLibrary because PeerLibrary will never bother them anyway. PeerLibrary will let them in and let them see something that they want to know and will not wish to let go.


Why would participants in the social experiment that is PeerLibrary not wish to let it go? Simply put, they want to get the most out of academic literature. At the physical level, academic literature is just a list of words and figures that researchers combined to describe their research. In order for an individual to turn this into something useful for themselves, they would want to comprehend the background of the topic, the direction the researchers decided to take investigation and why, the setup and results of their experimentation, in addition to the author’s conclusion on the supposition investigated. Post-comprehension, the viewer may wish to replicate the experiment, or design their own experiment. In both of these phases of academic literature review, scholars may want to discuss their thoughts and interpretations of the material with others. This desire could stem from an enjoyment of the accompaniment of an arrangement of folks or merely from a perspective that deep understanding comes most effectively from a discussion rather than instruction.


This is the power of PeerLibrary: to take the traditional library ideology of transferring knowledge from source-to-person, and expanding it to source-to-people, which is now technologically empowered.


So, as you now see before you, our logo is thus a book, one page text, the other a web. Alas, representation.


HangingTogether: The OCLC Evolving Scholarly Record Workshop, Chicago Edition

Mon, 2015-04-06 16:00

On March 23, 2015, we held the third in the Evolving Scholarly Record Workshop series  at Northwestern University. The workshops build on the framework in the OCLC Research report, The Evolving Scholarly Record.

Jim Michalko, Vice President OCLC Research Library Partnership, introduced the third of four workshops to address library roles and new communities of practice in the stewardship of the evolving scholarly record.

Cliff Lynch, Director of CNI, started out by talking about memory institutions as a system — more than individual collections – to capture both the scholarly record and the endlessly ramifying cultural record.  It’s impossible to capture them completely, but hopefully we are sampling the best.

It is our role to safeguard the evidentiary record upon which the scholarly record and future scholarship depend.  But the scholarly record is taking on new definitions. It includes the relationship between the data and the science acted upon it. Its contents are both refereed and un-refereed. It includes videos, blogs, websites, social media… And even the traditional should be made accessible in new ways. There is an information density problem and prioritization must be done.

We need to be careful when thinking about the scholarly record and look at new ways in which scholarly information flows.

There is a lot of stuff that doesn’t make it into IRs because all eyes are on capturing things that are already published somewhere. The eyes are on the wrong ball…

[presentations are available on the event page]

Brian Lavoie, Research Scientist in OCLC Research provided a framework for a common understanding and shared terminology for the day’s conversations.

He defined the scholarly record as being the portions of scholarly outputs that have been systematically gathered, organized, curated, identified and made persistently accessible.

OCLC developed the Evolving Scholarly Record Framework to help support discussions, to define key categories of materials and stakeholder roles, to be high-level so it can be cross disciplinary and practical, to serve as a common reference point across domains, and to support strategic planning.The major component is still outcomes, but in addition there are materials from the process (e.g., algorithms, data, preprints, blogs, grant reviews) and materials from the aftermath (e.g., blogs, reviews, commentaries, revision, corrections, repurposing for new audiences).

The stakeholder ecosystem combines old roles (fix, create, collect, and use) in new combinations and among a variety of organizations.  To succeed, selection of the scholarly record must be supported by a stable configuration of stakeholder roles.

We’ve been doing this, but passively and often at the end of a researcher’s career.  We need to do so much more, proactively and by getting involved early in the process.

Herbert Van de Sompel, Scientist at Los Alamos National Laboratory gave his Perspective on Archiving the Evolving Scholarly Record.  A scholarly communication system has to support the research process (which is more visible than ever before) and fulfill these functions:

  • Registration: allows claims of precedence for scholarly finding (e.g. Mss submission), which is now less discrete and more continuous
  • Certification: establishes the validity of the claim (e.g., peer review), which is becoming less formal
  • Awareness: allows actors to remain aware of new claims (alerts, stacks browsing, web discovery), which is trending toward instantaneous
  • Archiving: allows preservation of the record (by libraries and other stakeholders), which is evolving from medium- to content-driven.

Herbert characterized the future in the following ways:  The scholarly record is undergoing massive extension with objects that are heterogeneous, dynamic, compound, inter-related and distributed across the web – and often hosted on common web platforms that are not dedicated to scholarship.

Our goal is to achieve the ability to persistently, precisely, and seamlessly revisit the Scholarly Web of the Past and of the Now at some point in the Future.  We need to capture compound objects, with context, and in a state of flux at the request of the owner and at the time of relevance.

Herbert’s distinction between recording and archiving is critical. Recording platforms make no commitment to long-term access or preservation.  They may be a significant part of the scholarly process, but they are not a dependable part of the scholarly record.

We need to start creating workflows that support researcher-motivated movement of objects from private infrastructure to recording infrastructure and support curator-motivated movement of objects and context from recording infrastructure to archiving infrastructure.

Sarah Pritchard, Dean of Libraries, Northwestern University put things in the campus politics and technology context.

The evolving scholarly record requires that we work with a variety of stakeholders on campus:  faculty and students (as creators), academic departments (as managers of course content and grey literature), senior administrators (general counsel, CFO, HR), trustees (governance policy), office of research (as proxy for funder’s requirements), information technology units, and disciplinary communities.

There are many research information systems on campus, beyond the institutional repository: course management systems, faculty research networking systems, grant and sponsored research management systems, student and faculty personnel system, campus servers and intranets, and – because the campus boundaries are pervious — disciplinary repositories, cloud and social platforms.  And also office hard drives.

Policies and compliance issues go far beyond the content licensing libraries are familiar with:  copyright (at  the institutional and individual levels), privacy of records (student work, clinical data, business records), IT security controls and web content policies, state electronic records retention laws, open access (institutionally or funder mandated), and rights of external system owners (hosted content).

Sarah finished with some provocative thoughts:

  • The library sees itself as a “selector”, but many may see this as overstepping
  • The library looks out for the institution which can be at odds with the faculty sense of individual professional identity
  • There is a high cost to change the technical infrastructure and workflow mechanisms and to reshape governance and policy
  • There is a lack of a sense of urgency

She recommended that we start with low hanging fruit, engage centers of expertise, find pilot project opportunities, and accept that there won’t be a wholesale move into this environment.

Sarah Pritchard’s presentation really affected me: sort of a rallying cry to go out and make things happen!

The campus context provided a perfect launching point for the Breakout Discussions. From ten pages of notes, I’ve distilled the following action-oriented outcomes:

Within the library

  • If your library has receded from your university goals and strategies, move the library back into the prime business of your institution with a roster of candidate service offerings to re-position yourselves in the campus community.
  • Earn reputation through service provision and through access as opposed to reputation through ownership.
  • Selection
    • Ask yourself, what are we selecting? How do we define the object? What commitments will we make? And how does it fit into the broader system?
    • Consider some minimum requirements in terms of number of hits or other indications of interest for blogs/websites to be archived.  Those indexed by organizations like MLA or that are cited in scholarly articles seem worthy.
    • Declare collections of record so that others can depend on it, but beware of the commitment if you have to create new storage and access systems for a particular type of material.
    • Communicate when you have taken on a commitment to web archiving particular resources, possibly via the MARC preservation commitment field.
    • A lot of stuff that doesn’t get archived because we focus on materials that are already well-tended elsewhere. Look for the at-risk materials.
    • Accept adequate content sampling.
  • Focus on training librarians.  Get them to use the dissertation as the first opportunity to establish a relationship, establish an ORCID, and mint a DOI.  Do some of these things that publishers do to provide a gateway to infrastructure that is not campus-centric but system-centric.
  • Decide where the library will focus; it can’t be expert in all things.  Assess where the vulnerabilities are and set priorities.
  • Provide a solution where none exists to capture the things that have fallen through the cracks.
  • Technical solutions
  • Linked data could be the glue for connecting IDs with institutions. Identifiers for individuals and for organizations, and possibly identifiers for departments, funding agencies, projects…
  • Follow a standard to create metadata to provide consistency in the way it’s formed, in the content, and in the identifiers being used.
  • Use technology that is ready now to
    • help with link rot (the URL is bad) and reference rot (the content has changed), so researchers can reference a resource as it was when they used the data or cited it.  Memento makes it easy to archive a web page at a point in time.
    • provide identifiers
      • ORCID and ISNI are ready for researcher identification.
      • DOIs, Perma.cc, and Memento are ready for use.
    • harvest web resources. Archive-It is ready for web harvesting and the Internet Archive’s Wayback Machine is ready for accessing archived web pages.
    • transport of big data packets. Globus is a solution for researchers and institutions
    • create open source repositories. Consider using DSpace, EPrints, Fedora or Drupal to make your own.
  • Explore ways in which people track conversation around the creation of an output, like the Future of the Book platform or Twitter conversations. Open Annotation is a solution that allows people to discuss where they prefer.
  • Before building a data repository, ask for whom are we doing this and why?  If no one is asking for it, turn your attention elsewhere.
  • Create a hub for scholars who don’t know what they need, where the main activity may be referring researchers to other services.
  • To get quick support, promote and provide assistance with the DMPTool, minting DOIs, and archiving that information.
  • Get your message into two simple sentences.
  • Evolve the model and the people to move from support to collaboration

With researchers

  • Do the work to understand researchers’ perspectives.  Meet them where they live.  A good way to engage researchers is to ask them what’s important in their field. Then ask who is looking after it. Include grad students and untenured and newly-tenured faculty as they may be most receptive.
  • Data services may vary dramatically among disciplines.  Social Sciences want help with SPSS and R.  Others want GIS.  For STEM and Humanities there are completely different needs.
  • Before supporting an open access journal, ask the relevant community: do you need a journal, who is the audience, and what is the best way to communicate with them?
  • Stop hindering researchers with roadblocks relating to using cameras or scanners, copying, or putting up web pages.
  • Help users make good choices in use of existing disciplinary data repositories and provide a local option for disciplines lacking good choices.
  • Help faculty avoid having to profile themselves in multiple venues. Offer bibliography and resume services and portability as they move from institution to institution.
  • Explain the benefits of deposit in the record to students and faculty in terms of their portfolio and resume, and for collaboration.
  • To educate reluctant researchers, use assistants in the workflow, i.e. grant management assistants or use graduate student ambassadors to discount rumors and half-truths.  Try quick lunch and learn workshops.  Market through established channels and access points.
  • Talk to researchers about the levels of granularity available to appropriately manage access to their content.
  • Coordinate with those writing proposals and make sure they know that if they expect library staff to do some of the work, the library needs to be involved in the discussion. Get involved early in the research proposal process. Stress that maintenance has to be built in.    When committing to archiving, include an MOU covering service levels and end-of-life.
  • A formalized request process may help with communication.

With other parts of your institution

  • Get at least one other partner on campus on board early — maybe an academic faculty or department who are moving in the same direction you need to go (or administration, grants manager, IT people, educators, other librarians, funders).
  • Begin with a strategy, a call for partnership and implementation, then have conversations with faculty departments to get an environmental scan.  Identified what is needed (e.g., GIS, text-mining, data analysis), and distill into areas you can support internally or send along to campus partners.
  • Don’t duplicate services. Cede control to another area on the campus.  Communicate what is going on in different divisions and establish relationships. Provide guidance to get researchers to those places.
  • Work with associate deans and others at that level to find out about grant opportunities.
  • Develop partnerships with research centers and computing services, deciding what where in the lifecycle things are to be archived and by whom.
  • Other parts of the university may decide to license data from vendors like Elsevier. The library has a relationship that vendor, offer to do the negotiation.
  • Spin your message to a stakeholder’s context (e.g., archiving the scholarly record is a part of business continuity planning and risk management for the University’s CFO).
  • Coordinate with other campus pockets of activity involved in assigning DOIs, data management, and SEO activities for the non-traditional objects to optimize institutional outcomes. Integrating these objects into the infrastructure makes them able to circulate with the rest of the record.
  • Alliances on campus should be about integrating library services into the campus infrastructure. Unless you’ve done that on campus, you’re not doing your best to connect to the larger scholarly record.

With external entities

  • We should work with scholarly societies to learn about what we need to collect in a particular discipline (data sets, lab books, etc.) — and how to work with those researchers to get those things.
  • Identify the things can be done elsewhere and those that need to be done locally.  Storing e-science data sets may not be a local thing, whereas support for collaboration may be.
  • Make funder program officers aware of how libraries can help with grant proposals, so they can refer researchers’ questions back to the library.
  • Rely on external services like JSTOR, arXiv, SSRN, and ICPSR, which are dependable delivery and access systems with sustainable business models.
  • Use centers of excellence. Consider offering your expertise, for instance, with a video repository and rely on another institution for data deposit.
  • Work with publishers to provide the related metadata that might, for instance, be associated with a dataset uploaded to PLoSOne.
  • To help with the impact of researcher output, work with others, such as Symplectic, because they have the metadata we need.
  • To establish protocols for transferring between layers, make sure conversations include W3C and IETF.
  • Identify pockets of interoperability and find how to connect rather than waiting for interoperability to happen.

We are at the beginning of this; it will get better.

Thanks to all of our participants, but particularly to our hosts at Northwestern University, our speakers, and our note-takers. We’re looking forward to culminating the series at the workshop in San Francisco in June, where we’ll focus on how we can collaboratively move things forward to do our best to ensure stewardship of the scholarly record.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (37)

LITA: Teamwork and Jazz

Mon, 2015-04-06 14:01

“Jazz Players” by Pedro Ribeiro Simões / CC BY 2.0

Jazz is a pretty unique genre that demands a lot from musicians; a skilled jazz artist must not only be adept at their instrument, they must be highly skilled improvisors and communicators as well. Where other styles of music may only require that a musician remember how to play a piece and run through it the same way every time, good jazz artists can play the same song in an infinite number of ways. Furthermore, they must also be able to collaborate with other jazz artists who can also play the same song an infinite number of ways. This makes jazz an inherently human art form because a listener never knows what to expect; when a jazz group performs, the outcome is the unpredictable result of each musician’s personal taste and style merging into a group effort.

In a lot of ways, team projects are kind of like a jazz performance: you have several people with different skill sets coming together to work toward a common goal, and the outcome is dependent on the people involved. While there are obvious limits to how far we can stretch this metaphor, I think we can learn a lot about being an effective team member from some of the traits all jazz greats have in common.

 

Trust your bandmates

Many hands make light work. Sometimes we may feel like we could get more done if we simply work alone, but this puts an artificial limit on how effective you can be. Learn to get over the impulse to do it all yourself and trust in your colleagues enough to delegate some of your work. Everyone has different strengths and weaknesses, and great teams know how to balance these differences. Even though Miles Davis was a great trumpeter, his greatest performances were always collaborations with other greats, or at least with a backing band. Great musicians inspire each other to do their best and try to remove all creative hindrances. This hyper-creative environment just isn’t possible to replicate in isolation.

When we got a new metadata librarian here at FSU, I had been making my own MODS records for a few months and was uncomfortable with giving up control over this aspect of my workflow. I’ve since learned that this is his specialty and not mine, and I trust in his expertise. As a result, our projects now have better metadata, I have more time to work on other things that I do have expertise in, and I have learned a lot more about metadata than I ever could have working alone.

 

Learn to play backup

Everyone wants to play the solo. It’s the fun part, and all the attention is on you. There’s nothing wrong with wanting to shine, but if everyone solos at the same time it defeats the purpose and devolves into noise. Good jazz musicians may be known for their solos, but the greats know how to play in a way that supports others when it’s their turn to solo, too. They are more concerned with the sound of the band as a whole instead of selfishly focusing on their own sound.

A big part of trusting your “bandmates” is staying out of their way when it’s their turn to “solo”. Can you imagine trying play music on stage with someone who doesn’t even play your instrument yelling instructions at you about how you should be playing? That would be pretty distracting, but the office equivalent happens all the time. Micromanaging teammates can kill project morale quickly without even being aware of it. Sometimes projects have bottlenecks where no one can move forward until a specific thing gets done, and this is just a fact of life. If you are waiting for a team member to get something done so you can start on your part of the project, politely let them know that you are available if they need help or advice, and only provide help and advice if they ask. If they don’t need help, then politely stay out of their way.

 

Communication is key

Jazz musicians aren’t mind readers, but you might think they were after a great performance. It’s unbelievable how some bands can improvise in the midst of such complex patterns without getting lost. This is because improvisation requires a great deal of communication. Musicians communicate to each other using a variety of cues, either musical (one might drop in volume to signal the end of a solo), physical (one might step towards the center of the group to signal the start of a solo and then step away to signal the end), or visual (one might nod, wink or shift their foot as a signal to the rest of the group). These cue systems are all specific to the context of people performing on stage, but we can imagine a different set of cues for a team project that work just as well.

Like jazz musicians, team projects can be incredibly complex and a successful project requires all team members to be aware of their context. It is essential that everyone knows exactly where a project is at on a timeline so that they can act accordingly, and this information can be expressed in a variety of ways. Email is a popular choice, as it leaves a written record of who said what that can be consulted later. Email is great at communicating small, specific bits of information, but it is always helpful to have a “30,000 foot view” of the project as well so the team can see the big picture. Fellow LITA blogger Leo Stezano wrote a post about different ways to keep track of a project’s high-level progress, covering the use of software, spreadsheets, and the classic “post-it notes on a whiteboard” approach. I prefer to use Trello since it combines the simplicity of post-it notes on a wall with the flexibility of software, but there are a lot of options. The best option is whatever works for your team.

Equally important to finding good ways to communicate and sticking with them is uncovering harmful methods of communication and stopping them. Don’t send emails about a project to the rest of your team outside of working hours, it sends the wrong message about work-life balance. Try to eliminate unnecessary meetings and replace them with emails if you can. Emails are asynchronous and team members can respond when it is convenient for them, but meetings pollute our schedules and are productivity kryptonite. Finally, don’t drop into someone’s office unannounced (I do this all the time). Send an email or schedule a short meeting instead. Random office drop-ins derail the victim’s train of thought and sends the signal that whatever they were working on isn’t as important as you are. Can you imagine Miles Davis tapping John Coltrane on the shoulder during a solo to ask what song they should play next? I didn’t think so. Being considerate with your communication is an underrated skill that may be the secret sauce that makes your project run more smoothly.

Brown University Library Digital Technologies Projects: Announcing a Researchers @ Brown data service

Mon, 2015-04-06 13:59

Campus developers might want to use data from Researchers@Brown (R@B) in other websites. The R@B team has developed a JSON web service that allows for this.  We think it will satisfy many uses on campus. Please give it a try and send feedback to researchers@brown.edu.

Main types/resources
  • faculty
  • organizational units (departments, centers, programs, institutes, etc)
  • research topics
Requesting data

To request data, begin with an identifier.  Let’s use Prof. Diane Lipscombe as an example:

/services/data/v1/faculty/dlipscom

Looking through the response you will notice affiliations and topics from Prof. Lipscombe’s profile.  You can make additional requests for information about those types by following the “more” link in the response.

/services/data/v1/ou/org-brown-univ-dept56/

Following the affiliations links from a faculty data profile will return information about the Department of Neuroscience, which Prof. Lipscombe is a member.

/services/data/v1/topic/n49615/

Looking up this topic will return more information about the research topic “molecular biology”, including other faculty who have identified this as a research interest.

Responses Faculty
  • first name
  • last name
  • middle
  • title
  • Brown email
  • url (R@B)
  • thumbnail
  • image – original image uploaded
  • affiliations – list with lookups
  • overview – this is HTML and may contain links or other formatting
  • topics – list with lookups
Organizations
  • name
  • image (if available)
  • url (to R@B)
  • affiliations – list with lookups
Topics
  • name
  • url (to R@B)
  • faculty – list with lookups
Technical Details
  • Requests are cached for 18 hours.
  • CORS support for embedding in other sites with JavaScript
  • JSONP for use in browsers that don’t support CORs.
Example implementation

As an example, we have prepared an example of using the R@B data service with JavaScript using the React framework.

David Rosenthal: The Mystery of the Missing Dataset

Sun, 2015-04-05 19:00
I was interviewed for an upcoming news article in Nature about the problem of link rot in scientific publications, based on the recent Klein et al paper in PLoS One. The paper is full of great statistical data but, as would be expected in a scientific paper, lacks the personal stories that would improve a news article.

I mentioned the interview over dinner with my step-daughter, who was featured in the very first post to this blog when she was a grad student. She immediately said that her current work is hamstrung by precisely the kind of link rot Klein et al investigated. She is frustrated because the dataset from a widely cited paper has vanished from the Web. Below the fold, a working post that I will update as the search for this dataset continues.


My step-daughter works on sustainability and life-cycle analysis. Here is her account of the background to her search:
The data was originally recommended to me by one of our scientific advisors at [a previous company] for use in the software we were developing and for our use in our consulting work. On their recommendation I googled "impact2002+" and found my way to the download page. I originally downloaded it in summer 2011.

It is a model for characterizing environmental flows into impacts. This is incredibly useful when looking at hundreds of pollutants and resource uses across a supply chain to understand how they roll-up into impacts to human health, ecosystem quality, and resources. For example it estimates the disability adjusted life years (impact to human life expectancy) associated with a release of various pollutants to air/land/soil. Another example is the estimate of the ecosystem quality loss (biodiversity loss) associated with various chemical emissions. Another example is the estimate of the future energy required to extract an incremental amount of additional minerals or energy resources (e.g. coal).

I looked for it again in summer 2014 when I noticed it was gone. I always assumed that by just searching "Impact2002+" I'd be able to find the data again - how wrong I was!

I reached out to the webmaster listed on the University of Michigan site and actually got a response but after a couple emails requesting the data with no luck I stopped pursuing that path. I ended up purchasing a dataset that has some of the Impact2002+ data embedded in it but there are still some pieces of my analysis that are limited by not having the original dataset. Here is where the search starts. In 2003, Olivier Jolliet et al published IMPACT 2002+: A new life cycle impact assessment methodology:
The new IMPACT 2002+ life cycle impact assessment methodology proposes a feasible implementation of a combined midpoint/damage approach, linking all types of life cycle inventory results (elementary flows and other interventions) via 14 midpoint categories to four damage categories. ... The IMPACT 2002+ method presently provides characterization factors for almost 1500 different LCI-results, which can be downloaded at http://www.epfl.ch/impactIn its field, this is an extremely important paper. Google Scholar finds 810 citations to it. Unfortunately, this isn't a paper for which Springer provides article-level metrics. The International Journal of Life Cycle Assessment, in which the paper was published, is ranked 8th in the Sustainable Development field by Google's Scholar Metrics. Its h5-median index is 54, so a paper with 810 citations is vastly more cited than the papers it typically publishes.

The authors very creditably provided their data, the 1500 characterization factors, for download from the specified URL. That link, http://www.epfl.ch/impact, now redirects to http://www.riskscience.umich.edu/jolliet/downloads.htm, which returns a 404 Not Found error, so it has unambiguously rotted. The Wayback Machine does not have that page, although it has over 1000 URLs from http://www.riskscience.umich.edu/, nor does the Memento Time Travel service. So not merely has the link rotted, but there don't appear to be any archived versions of the data supporting the paper.

The bookmark my step-daughter had for the dataset was http://www.earthshift.com/software/simapro/impact2002, which links to  http://www.epfl.ch/impact, which redirects to the broken http://www.riskscience.umich.edu/jolliet/downloads.htm.

The Wayback Machine has 11 captures of  http://www.epfl.ch/impact between February 11, 2002 and July 7, 2014. The most recent is actually a capture of the page it redirected to at the Michigan's School of Public Health, which now returns 404. That page said:
In order to access the IMPACT 2002+ model we ask that you provide us with your name, affiliation and email address at the bottom of this page. You do not have to be affiliated with the Center for Risk Science and Commnication or the University of Michigan to access the IMPACT 2002 model. Your information will only be used to notify you of any updates concerning the model. Your data will be kept strictly confidential.This is the explanation for the lack of any archived versions of the dataset. Web crawlers, such as the Internet Archive's Heritrix, are unable to fill out Web forms without site-specific knowledge, which in this case was obviously not available.

Similarly, in 2005 the Internet Archive captured pages from the EPFL site before the move to Michigan. They included this page describing the IMPACT2002+ method, which used a form to ask for:
your name, affiliation and your email-address, which will will enable us to keep you informed about important updates from time to time. None of your data will be transmitted to anyone else. Then you can download the following files concerning the IMPACT 2002+ method ... Your data are not used to control or restrict the download, but will help us to keep you informed about updates concerning the IMPACT 2002+ methodology.Again, archiving of the freely download-able data was prevented.

One obvious lesson from this is that authors should be strongly discouraged from forcing researchers to supply information, such as names and e-mail addresses, before they can download data that has been made freely available, because the result is likely to be, as in this case, that with the ravages of time the data will become totally unavailable. It seems likely that this dataset became unavailable as a side-effect of the Risk Science Center migrating to its own website rather than being a part of the School of Public Health's website.

Another lesson is the completely inadequate state of Institutional Repositories. The University of Michigan's IR, Deep Blue, contains only 6 of the 76 "Selected Publications" from Olivier Jolliet's Michigan home page, but it has PDFs for their full text. Infoscience, the EPFL IR lists 58 publications with Olivier Jolliet as an author, including the paper in question, but for that it says:
There is no available fulltext. Please contact the lab or the authors.and:
The IMPACT 2002+ method presently provides characterization factors for almost 1500 different LCI-results, which can be downloaded at http://www.epfl.ch/impactwhich is no longer the case. Note that ResearchGate claims to know about 177 publications from OlivierJolliet.

Patrick Hochstenbach: Penguins Are Back

Sun, 2015-04-05 09:02
Filed under: Doodles Tagged: aprilfools, cartoon, comic, easter, Penguin

Galen Charlton: Three tales regarding a decrease in the number of catalogers

Sat, 2015-04-04 20:25

Discussions on Twitter today – see the timelines of @cm_harlow and @erinaleach for entry points – got me thinking.

In 1991, the Library of Congress had 745 staff in its Cataloging Directorate. By the end of FY 2004, the LC Bibliographic Access Divisions had between 5061 and 5612 staff.

What about now? As of 2014, the Acquisitions and Bibliographic Access unit has 238 staff3.

While I’m sure one could quibble about the details (counting FTE vs. counting humans, accounting for the reorganizations, and so forth), the trend is clear: there has been a precipitous drop in the number of cataloging staff employed by the Library of Congress.

I’ll blithely ignore factors such as shifts in the political climate in the U.S. and how they affect civil service. Instead, I’ll focus on library technology, and spin three tales.

The tale of the library technologists

The decrease in the number of cataloging staff are one consequence of a triumph of library automation. The tools that we library technologists have written allow catalogers to work more efficiently. Sure, there are fewer of them, but that’s mostly been due to retirements. Not only that, the ones who are left are now free to work on more intellectually interesting tasks.

If we, the library technologists, can but slip the bonds of legacy cruft like the MARC record, we can make further gains in the expressiveness of our tools and the efficiencies they can achieve. We will be able to take advantage of metadata produced by other institutions and people for their own ends, enabling library metadata specialists to concern themselves with larger-scale issues.

Moreover, once our data is out there – who knows what others, including our patrons, can achieve with it?

This will of course be pretty disruptive, but as traditional library catalogers retire, we’ll reach buy-in. The library administrators have been pushing us to make more efficient systems, though we wish that they would invest more money in the systems departments.

We find that the catalogers are quite nice to work with one-on-one, but we don’t understand why they seem so attached to an ancient format that was only meant for record interchange.

The tale of the catalogers

The decrease in the number of cataloging staff reflects a success of library administration in their efforts to save money – but why is it always at our expense? We firmly believe that our work with the library catalog/metadata services counts as a public service, and we wish more of our public services colleagues knew how to use the catalog better.  We know for a fact that what doesn’t get catalogued may as well not exist in the library.

We also know that what gets catalogued badly or inconsistently can cause real problems for patrons trying to use the library’s collection.  We’ve seen what vendor cataloging can be like – and while sometimes it’s very good, often it’s terrible.

We are not just a cost center. We desperately want better tools, but we also don’t think that it’s possible to completely remove humans from the process of building and improving our metadata. 

We find that the library technologists are quite nice to work with one-on-one – but it is quite rare that we get to actually speak with a programmer.  We wish that the ILS vendors would listen to us more.

The tale of the library directors

The decrease in the number of cataloging staff at the Library of Congress is only partially relevant to the libraries we run, but hopefully somebody has figured out how to do cataloging more cheaply. We’re trying to make do with the money we’re allocated. Sometimes we’re fortunate enough to get a library funding initiative passed, but more often we’re trying to make do with less: sometimes to the point where flu season makes us super-nervous about our ability to keep all of the branches open.

We’re concerned not only with how much of our budgets are going into electronic resources, but with how nigh-impossible it is to predict increases in fees for ejournal subscriptions/ fees for ebook services.

We find that the catalogers and the library technologists are pleasant enough to talk to, but we’re not sure how well they see the big picture – and we dearly wish they could clearly articulate how yet another cataloging standard / yet another systems migration will make our budgets any more manageable.

Each of these tales is true. Each of these tales is a lie. Many other tales could be told. Fuzziness abounds.

However, there is one thing that seems clear: conversations about the future of library data and library systems involve people with radically different points of view. These differences do not mean that any of the people engaged in the conversations are villains, or do not care about library users, or are unwilling to learn new things.

The differences do mean that it can be all too easy for conversations to fall apart or get derailed.

We need to practice listening.

1. From testmony by the president of the Library of Congress Professional Guild to Congress on 6 March 2015. 2. From the BA FY 2004 report. This including 32 staff from the Cataloging Distribution Service, which had been merged into BA and had not been part of the Cataloging Directorate. 3. From testmony by the president of the Library of Congress Professional Guild to Congress on 6 March 2015.

Cynthia Ng: Musing: Playing Around with the NNELS Logo

Sat, 2015-04-04 02:51
It’s come up recently that we might consider revising our logo. I saw a coworker playing around with it and thought I’d give it a try. The thinking behind it is simple. Transpose the letters into Braille, and then try to match the Braille version to a hexagonal grid. Turns out the hardest is the … Continue reading Musing: Playing Around with the NNELS Logo

HangingTogether: The Semi-Finals

Fri, 2015-04-03 18:19

OCLC Research Collective Collections Tournament

#oclctourney

Thirty-two conferences started this journey, and now only two remain. The OCLC Research Collective Collection tournament is just one step away from crowning a Champion. Throw your brackets away and buckle your seat belts, because the tournament semi-finals are over and the finals are next!

[Click to enlarge]

How many languages does your conference collective collection speak? Competition in the semi-finals centered around the number of languages represented in each conference’s collective collection.* In the first semi-finals match-up, Conference USA cruised to an easy victory over Summit League, 366 languages to 265 languages. In the second match-up, Atlantic 10 also had little trouble with its opponent, moving past Missouri Valley 374 languages to 289 languages. So Conference USA and Atlantic 10 will square off in the tournament finals, with the honor and glory of the title “2015 Collective Collections Tournament Champion” at stake!

As the results of the semi-finals competition show, conference collective collections are very multilingual. Atlantic 10 had the most languages of any competitor in this round, with more than 370. But even the conference with the fewest languages – Summit League – had 265 languages in its collective collection! Suppose that an average book is 1.25 inches thick. If Summit League stacked up one book for every language represented in its collection, the resulting pile would be almost 28 feet tall! If Atlantic 10 did it, the stack would be nearly 40 feet tall!

The mega-collective-collection of all libraries – as represented in the WorldCat bibliographic database – contains publications in 481 different languages. English is the most common language in WorldCat; here’s a look at the top 50 most frequently-found languages other than English:

[Word cloud created with worditout.com. Click to enlarge]

After English, the most common languages in WorldCat are German, French, Spanish, and Chinese. Despite the high number of English-language materials, more than half of the materials in WorldCat are non-English! And as we’ve seen, many of these non-English-language publications have found their way into the collective collections of our tournament semi-finalists! So are you interested in reading something in Urdu? Atlantic 10 has nearly 2,300 Urdu-language publications to choose from. How about Welsh? Conference USA can furnish you with nearly 1,400 publications in Welsh. No matter what language you’re interested in, these collective collections likely have something for you – they speak a lot of languages!

Bracket competition participants: Remember, even if the conference you chose is not in the Finals, hope still flickers! If no one picked the tournament Champion, all entrants will be part of a random drawing for the big prize!

Get set for the Tournament Finals! Results will be posted April 6.

 

*Number of languages represented in language-based (text or spoken) publications comprising each conference collective collection. Data is current as of January 2015.

More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

Round of 16: The plot thickens … and so do the books

Round of 8: Peaches and Pumpkins

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (13)

FOSS4Lib Updated Packages: The Great Reading Adventure

Fri, 2015-04-03 16:59

Last updated April 3, 2015. Created by Jim Craner on April 3, 2015.
Log in to edit this page.

From The Great Reading Adventure website:

"The Great Reading Adventure is a robust, open source software designed to manage library reading programs. It is currently in its second version... The Great Reading Adventure was developed by the Maricopa County Library District with support by the Arizona State Library, Archives and Public Records, a division of the Secretary of State, with federal funds from the Institute of Museum and Library Services."

The Great Reading Adventure lets libraries and library consortia set up a full online summer reading program for patrons. Features include reporting, customization per library, digital badges, avatars, reading lists, and much more.

The software runs on a Windows IIS/MSSQL server.

License: MIT License Package Links Development Status: Production/StableOperating System: WindowsDatabase: MsSQL

Pages