You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 10 hours 4 min ago

HangingTogether: Working in Shared Files

Tue, 2015-04-07 10:00

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by John Riemer of UCLA. Working in shared files is a critical efficiency to free up time to address new metadata needs and roles. Metadata managers who need to allocate staff to cover more objects of interest to researchers in the information landscape and at the same time preserve metadata describing this material have every incentive to consider working collaboratively, in shared files.

Libraries have tended to treat WorldCat as a resource to be further edited locally. The 2009 report Study of the North American MARC Record Marketplace bemoaned the “widespread resistance to the idea of simply accepting the work of another library.”  We have been saddled with hundreds of copies of records across libraries and constrained to limit the amount of catalog maintenance done. When Kurt Groetsch described how Google was attempting to take advantage of library-created metadata during the 2010 ALA Midwinter meeting, he noted they “would like to find a way to get corrected records back into the library data ecosystem so that they don’t have to fix them again.” The linked data environment offers a new opportunity to create and maintain metadata only once and simply pointed to by all interested parties.

The discussions revolved around these themes:

Sharing edited records: In general, staff focus on only editing records that affect access points. Most libraries accept vendor records or records for shelf-ready books without review. Vendor records may need to be modified for the data to be consistent and linked. Vendor records are of varying quality, some of which hinder access. It was suggested that libraries can advocate vendors’ contracting the metadata creation with OCLC as part of their purchase negotiations. [Note: Focus group member Carlen Ruschoff of University of Maryland served on the cross-industry group that identified problems in the data quality in the content supply chain and gave practical recommendations for improved usage, discovery and access of e-content. See the 2014 OCLC white paper, Success Strategies for Electronic Content Discovery and Access.]

Policies and practices have been put in place to stop staff from doing what they don’t have to do. “Reuse rather than modify.” But it can be difficult to stop some staff from investing in correcting minor differences between AACR2 and RDA that don’t matter, such as pagination. One approach is to assign those staff important tasks (create metadata for a new digital collection for example) so that they just don’t have time to take on these minor tasks as well. Not everyone can accept records “as is”, but with all the effort the community has invested in common cataloging standards and practices, if we all “do it right the first time” we should be able to accept others’ records without review or editing.

When edits are applied to local system records, or other databases such as national union catalogs, the updated records are not contributed to WorldCat. The University of Auckland uses four databases: the local database, the New Zealand National Union Catalogue, WorldCat and the Alma “community zone” available only to other Ex Libris catalogers. When Library of Congress records are corrected in WorldCat, the corrections are not reflected in the LC database. When OCLC loads LC’s updated records, any changes that had been made in the WorldCat records are wiped out. We need to get better at synchronizing data with WorldCat. Perhaps updated “statements” can be shared more widely in a linked data environment?

Sharing data in centralized and distributed models: Discussants were divided whether a centralized file would be needed in a future linked data environment where WorldCat became a place where people could simply point to. Developers say there is no need for a centralized file; data could be distributed with peer-to-peer sharing. Others feel that a centralized file provides provenance, and thus confidence and trustworthiness. How would you be able to gauge trustworthiness if you don’t have that provenance pointing you to an authoritative source?

The OCLC Expert Community expanded the pool of labor able to make contributions to the WorldCat master records. This offers a new opportunity for focus group members who have been working primarily in their local systems. OCLC’s discontinuation of Institution Records is prompting some focus group members who have been using them to rethink their workflows, determine what data represents “common ground” and consider using WorldCat as the database of record. The OCLC WorldShare Metadata Collection Manager treats WorldCat records as a database of record and allows libraries to receive copies of changed records. It was noted that controlling WorldCat headings by linking to the authority file obviates the need for “authority laundering” by third-parties.

Importance of provenance: Certain sources are more trusted and give catalogers confidence in their accuracy. Libraries often have a list of “preferred sources” (also known as “white lists”.)  Some select sources based on the type of material that is being cataloged, for example, Oxford, Yale and Harvard were mentioned as a trusted source for copy cataloging old books on mathematics. Another criteria is to choose the WorldCat record which has the most holdings as source copy.

Sharing statements: Everyone welcomes the move to use identifiers instead of text strings. Identifiers could solve the problem of names appearing in documents harvested from the Web, electronic theses and dissertations, encoded archival aids, etc. not matching those used in catalog records and the authority file. Different statements might be correct in their own contexts; it would be up to the individual or library which one to use, based on what you want to present to your users. In a linked data world one can swap one set of identifiers with another set of identifiers if you want to make local changes. In the aggregate, there would be tolerance for “conflicting statements” which might represent different world views; at the local implementation level you may want to select the statements from your preferred sources. Librarians can share their expertise by establishing the relationships between and among statements from different sources.

Some consider creating identifiers for names as one of their highest priorities, spurred by the increased interest in Open Access. For researchers not represented in authority files, libraries have started considering implementing ORCIDs or ISNIs. [See the 2014 OCLC Research report, Registering Researchers in Authority Files.]

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (57)

Open Library Data Additions: Amazon Crawl: part ga

Tue, 2015-04-07 01:44

Part ga of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

District Dispatch: Advocacy works: Broad number of legislators back library funding

Mon, 2015-04-06 20:33

photo by Dwayne Bent

Each year around this time, Appropriations Committees in both chambers of Congress begin their cycle of consideration and debate of what federal programs will be funded the following year. For both political and fiscal reasons, the process is marked by tremendous competition for a limited and often shrinking “pie” of Appropriations dollars.

In this environment, demonstrating early, strong and bipartisan support of federal library programs by as many Members of Congress as possible is vital to giving critical programs such as the Library Services and Technology Act (LSTA) and Innovative Approaches to Literacy (IAL) the best possible chance of being funded at the highest possible level in the coming year as part of the “Labor, Health and Human Services, Education, and Related Agencies” Appropriations bill.  That crucial Member support for LSTA and IAL is best shown by them signing on to what are called “Dear Appropriator” letters drafted each year by congressional library champions in the U.S. House and U.S. Senate. These letters, sent to every member of the two Appropriations Committees, “make the case” for robust LSTA and IAL funding and put budget “hawks” (who often seek to eliminate domestic discretionary programs, such as LSTA and IAL on notice of the importance and broad support for these programs nationwide.

This year for LSTA, Sens. Jack Reed (D-RI) and Susan Collins (R-ME) and Representative Raul Grijalva (D-AZ) spearheaded efforts to gather signatures on two separate letters for each chamber of Congress expressing support for LSTA.  ALA also wishes to particularly thank Sens. Jack Reed (D-RI), Roger Wicker (R-MS), Charles Grassley (R-IA), and Debbie Stabenow (D-MI) for leading efforts in that chamber and Representatives Eddie Bernice Johnson (D-TX), Don Young (R-AK), and James McGovern (D-MA) for their leadership in the House for the IAL letters.

In response to alerts by the American Library Association’s (ALA) Washington Office, more than 2,100 librarians across the country sent a total of nearly 6,300 emails to almost every Member of Congress (487 of 533) asking for their signatures on these crucial “Dear Appropriator” letters and the results in all cases topped last year’s figures. Ultimately, 32 Senators and 70 Members of the House supported LSTA, while 29 Senators and 128 Representatives backed IAL. View final versions of all four “Dear Appropriator” letters supporting LSTA and IAL in the Senate and House: Senate LSTA (pdf), Senate IAL (pdf), House LSTA (pdf), House IAL (pdf).

The current Appropriations process will be a long and, for LSTA and IAL, potentially very bumpy road.  However, thanks to our Congressional champions and librarians everywhere, we’ve made a great beginning.  Fasten your seat belts and stay tuned for word of what’s around the next bend.

Please thank your Representative and Senators if they signed any of the letters.

The post Advocacy works: Broad number of legislators back library funding appeared first on District Dispatch.

PeerLibrary: Free From Our Alphabetic Cage

Mon, 2015-04-06 20:02

What is a logo?


Is a logo a representation of an organization’s values, goals, strengths, heart, and solidarity on the cause of lubricating the annals of academic knowledge and the communication of it to the people of the third planet from the sun?



PeerLibrary used to be represented by the letter, “P,” but could that honestly describe an organization which seeks to change not merely academic literature’s presentation to the masses, but honestly the universe as a whole? In the mind’s eye of PeerLibrary (because we are busy ameliorating the wrongs of modern society), we are making EVERYTHING change.


PeerLibrary was once described by the now apparent shallowness that is the letter, “P.” Insanity. We tried to represent a fundamentally brave, bold, and brilliant burst out of the box that academia has been hoping for since the advent of the intellectual superfreeway that is the interwebz, and now we see that we must move on. Additionally, the word pronounced as, “Pee,” is simply not representative of an organization that seeks to not waste the full power of academic literature nor the electronic superverse. PeerLibrary is not some yellow-green, warm, odorous entity wishing to be routinely expelled from users, but instead something engaging and enthralling that will not let the user let it go. Users will not turn away and slam the door on PeerLibrary because PeerLibrary will never bother them anyway. PeerLibrary will let them in and let them see something that they want to know and will not wish to let go.


Why would participants in the social experiment that is PeerLibrary not wish to let it go? Simply put, they want to get the most out of academic literature. At the physical level, academic literature is just a list of words and figures that researchers combined to describe their research. In order for an individual to turn this into something useful for themselves, they would want to comprehend the background of the topic, the direction the researchers decided to take investigation and why, the setup and results of their experimentation, in addition to the author’s conclusion on the supposition investigated. Post-comprehension, the viewer may wish to replicate the experiment, or design their own experiment. In both of these phases of academic literature review, scholars may want to discuss their thoughts and interpretations of the material with others. This desire could stem from an enjoyment of the accompaniment of an arrangement of folks or merely from a perspective that deep understanding comes most effectively from a discussion rather than instruction.


This is the power of PeerLibrary: to take the traditional library ideology of transferring knowledge from source-to-person, and expanding it to source-to-people, which is now technologically empowered.


So, as you now see before you, our logo is thus a book, one page text, the other a web. Alas, representation.

District Dispatch: Silly rulemaking; unworkable solution for libraries

Mon, 2015-04-06 19:24

ATS Cine Projector Operators, Aldershot, Hampshire, England, UK, 1941

The U.S. Copyright Office posted reply comments for this year’s round of the triennial 1201 rulemaking. The Library Copyright Alliance (LCA), a coalition of U.S. library associations of which ALA is a member, filed initial comments (pdf) in February requesting an exemption to circumvent digital technology employed by rights holders when technological protection measures (TPMs) prevent users from exercising a lawful use, such as a fair use. LCA argued for an exemption so faculty and students at non-profit educational institutions can bypass technology (content scrambling system (CSS)) on DVDs in order to make a clip to show in the classroom or for close analysis and research. In this year’s request, LCA joined the American Association of University Professors (AAUP), the College Art Association (CAA), the International Communication Association (ICA) and others requesting that in addition to the renewal of the DVD exemption that the rule should be expanded all media formats including Blu-Ray discs. In the reply comment phase, the rights holders make their case why circumvention requests should not be allowed.

If lawyers are paid by the word, some are doing well financially (not that there is anything wrong with that). The lengthy comments, at least in the past, have always come from lawyers representing the content community. When I saw the 85-page comment (pdf) from Steve Metalitz representing the Joint Owners and Creators—aka the motion picture and recording industry companies—I thought, oh geez.

But it turned out that the comment section was the shortest ever submitted by Metalitz—only 12 pages! The rest of the submission was devoted to “exhibits” of articles and advertisements of various streaming and downloading services available in the marketplace like VUDU and Netflix. One exhibit provides instructions on how to embed a video in a PowerPoint presentation. How very helpful, but what does this have to do with the rulemaking?

The Joint Owners and Creators state that “the confidence afforded by the security of TPMs, and the flexibility in business models that such TPMs enable, are essential marketplace pillars which have led creators of motion pictures to expand their streaming and downloading options and to experiment with a broad range of business models to increase access to their works, such that some films can now be purchased and digitally downloaded before they are made available on physical discs.”

They go on to suggest that the Warner Brothers Archive, Disney Movies Now, UltraViolet digital storage locker services and the like are services that educators can use for film clips, making most circumvention unnecessary. Really? All of these services are available via license agreements that restrict access to “personal, non-commercial use.” If educators did use these services for non-profit educational, public performances, they would be in violation of the non-negotiated, click on contract. (You would think experienced intellectual property [sic] lawyers would know that and maybe read the terms of service, but hey, I am just a librarian).

Marketplace solutions like non-negotiated contracts for Hollywood content are not solutions for libraries and non-profit educational institutions because they are written with only the individual consumer in mind. TPMs have not enabled business models that work for libraries and educators. Alas, we have no market pillar. Librarians and educators cannot do their jobs when license agreements have erased fair use and other copyright exceptions from existence.

In the 2005 triennial rulemaking, the content community argued that instead of circumventing technology on DVDs to extract clips, that users go into a darkened room with a video recorder and copy the clips they needed from the television screen as the DVD is played. They played a demonstration video at the public meeting. That suggestion still remains at the top of the list for craziest ideas proposed during a rulemaking. But proposing the use of services not even legally available to educators and librarians makes a close second.

The post Silly rulemaking; unworkable solution for libraries appeared first on District Dispatch.

HangingTogether: Champion Revealed! Real-ly!

Mon, 2015-04-06 19:19

OCLC Research Collective Collections Tournament

#oclctourney

The 2015 OCLC Research Collective Collections Tournament Champion is …

[Click to Enlarge]

Our final round of competition tried to “keep it real” – realia, that is. Realia are “three-dimensional objects from real-life”, which can mean anything from valuable historical artifacts to … well, not-so-valuable yet interesting objects from all corners of everyday life: games, teaching/learning aids, models, musical instruments, memorabilia … and occasionally some just plain strange stuff! Check out this New York Times article for a sample of the fascinating and unexpected realia some libraries hold in their collections.

Our tournament Finals pitted Conference USA against Atlantic 10 to see who has the most realia in their collective collection.* In the end, it was no contest … Atlantic 10 won easily, with 1,578 distinct objects compared to 980 objects for Conference USA. Congratulations to Atlantic 10, your Collective Collections Tournament Champion!

So what kinds of realia do our Finals participants harbor in their respective collective collections? Our runner-up Conference USA offers a number of unusual items, such as a specimen of a stamp used under the Stamp Act of 1765 (Florida Atlantic University); a motorized solar system and planetarium model (University of Southern Mississippi); and a 1937 Luftwaffe-issue jam jar (University of North Texas).  Our Tournament Champion Atlantic 10 features such oddities as a set of giant inflatable nocturnal creatures (University of Rhode Island); a plaster cast of the head of political activist Mario Savio; and a bowl made from a vinyl record of Bob Dylan’s “Greatest Hits” album (both at La Salle University). Keep your eyes peeled next time you’re in the library; you never know what will be on the shelves!

Bracket competition participants: Nobody picked the winning conference!!! We will have a random drawing among all entrants to determine who wins the big prize! The winner will be announced on April 8. Stay tuned!

 

*Number of items cataloged as “realia” in each conference’s collective collection. Data is current as of January 2015.

More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

Round of 16: The plot thickens … and so do the books

Round of 8: Peaches and Pumpkins

The Semi-Finals

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (14)

Library of Congress: The Signal: Residents Chosen for NDSR 2015 in Washington, DC

Mon, 2015-04-06 17:46

We are pleased to announce that the Washington, DC National Digital Stewardship Residency class for 2015 has now been chosen! Five very accomplished people have been selected from a highly competitive field of candidates. The new residents will arrive in Washington, DC this June to begin the program. Updates on the program, including more information on the resident projects, will be published in The Signal during the coming months.

The new residents are listed in the Library of Congress press release below:

2015 Class of National Digital Stewardship Residents Selected

The Library of Congress, in conjunction with the Institute of Museum and Library Services, has named five members to the 2015 class of the National Digital Stewardship Residency program. The 12-month program begins in June 2015.

The NDSR program offers recent master’s degree graduates/doctoral candidates in specialized fields–library science, information science, museum studies, archival studies and related technology–the opportunity to gain valuable professional experience in digital preservation. Residents will start the program with an intensive digital stewardship workshop at the Library of Congress, followed by specialized project work at one of five host institutions in the Washington, D.C. area. The projects will allow them to acquire hands-on knowledge and skills regarding collection, selection, management, long-term preservation and accessibility of digital assets.

The residents listed were selected by a committee of experts from the Library of Congress, the Institute of Museum and Library Services and other organizations, including the host institutions:

  • John Caldwell of Lutherville, Maryland. Caldwell, who has studied at the University of Maryland, will be resident in the U.S. Senate Historical Office to study and assess current Senate workflows in appraisal, management, ingest, description and transfer of digital assets. He will benchmark current policies against best practices.
  • Valerie Collins of Eagle River, Alaska. Collins, who has studied at Dalhousie University, will be resident at the American Institute of Architects to co-lead testing and implementation of an institutional digital repository system to preserve born-digital records that represent AIA’s intellectual capital or that have permanent value for the history of the architectural profession.
  • Nicole Contaxis of Easton, Connecticut. Contaxis, who has studied at the University of California, Los Angeles, will be resident at the National Library of Medicine to create a pilot workflow for the curation, preservation and presentation of a historically valuable software product developed by the National Library of Medicine which is deemed to be historically noteworthy due to its usage by a user community and/or its distinctive technical properties. This is at risk of being lost due to obsolescence.
  • Jaime Mears of Deltaville, Virginia. Mears, who has studied at the University of Maryland, will be resident at the D.C. Public Library to create a sustainable, public-focused lab, tools, and instruction for building public knowledge and skills around the complex problems of personal digital recordkeeping.
  • Jessica Tieman of Lincoln, Illinois. Tieman, who has studied at the University of Illinois at Urbana-Champaign, will be resident in the Government Publishing Office; to certify GPO’s Federal Digital System as a Trustworthy Digital Repository and to conduct an internal audit to help achieve the goal of certification.

For more information about the National Digital Stewardship Residency program, including information about how to be a host, partner or resident for next year’s class, visit www.loc.gov/ndsr/.

PeerLibrary: Free From Our Alphabetic Cage

Mon, 2015-04-06 17:01

What is a logo?


Is a logo a representation of an organization’s values, goals, strengths, heart, and solidarity on the cause of lubricating the annals of academic knowledge and the communication of it to the people of the third planet from the sun?


PeerLibrary used to be represented by the letter, “P,” but could that honestly describe an organization which seeks to change not merely academic literature’s presentation to the masses, but honestly the universe as a whole? In the mind’s eye of PeerLibrary (because we are busy ameliorating the wrongs of modern society), we are making EVERYTHING change.


PeerLibrary was once described by the now apparent shallowness that is the letter, “P.” Insanity. We tried to represent a fundamentally brave, bold, and brilliant burst out of the box that academia has been hoping for since the advent of the intellectual superfreeway that is the interwebz, and now we see that we must move on. Additionally, the word pronounced as, “Pee,” is simply not representative of an organization that seeks to not waste the full power of academic literature nor the electronic superverse. PeerLibrary is not some yellow-green, warm, odorous entity wishing to be routinely expelled from users, but instead something engaging and enthralling that will not let the user let it go. Users will not turn away and slam the door on PeerLibrary because PeerLibrary will never bother them anyway. PeerLibrary will let them in and let them see something that they want to know and will not wish to let go.


Why would participants in the social experiment that is PeerLibrary not wish to let it go? Simply put, they want to get the most out of academic literature. At the physical level, academic literature is just a list of words and figures that researchers combined to describe their research. In order for an individual to turn this into something useful for themselves, they would want to comprehend the background of the topic, the direction the researchers decided to take investigation and why, the setup and results of their experimentation, in addition to the author’s conclusion on the supposition investigated. Post-comprehension, the viewer may wish to replicate the experiment, or design their own experiment. In both of these phases of academic literature review, scholars may want to discuss their thoughts and interpretations of the material with others. This desire could stem from an enjoyment of the accompaniment of an arrangement of folks or merely from a perspective that deep understanding comes most effectively from a discussion rather than instruction.


This is the power of PeerLibrary: to take the traditional library ideology of transferring knowledge from source-to-person, and expanding it to source-to-people, which is now technologically empowered.


So, as you now see before you, our logo is thus a book, one page text, the other a web. Alas, representation.


HangingTogether: The OCLC Evolving Scholarly Record Workshop, Chicago Edition

Mon, 2015-04-06 16:00

On March 23, 2015, we held the third in the Evolving Scholarly Record Workshop series  at Northwestern University. The workshops build on the framework in the OCLC Research report, The Evolving Scholarly Record.

Jim Michalko, Vice President OCLC Research Library Partnership, introduced the third of four workshops to address library roles and new communities of practice in the stewardship of the evolving scholarly record.

Cliff Lynch, Director of CNI, started out by talking about memory institutions as a system — more than individual collections – to capture both the scholarly record and the endlessly ramifying cultural record.  It’s impossible to capture them completely, but hopefully we are sampling the best.

It is our role to safeguard the evidentiary record upon which the scholarly record and future scholarship depend.  But the scholarly record is taking on new definitions. It includes the relationship between the data and the science acted upon it. Its contents are both refereed and un-refereed. It includes videos, blogs, websites, social media… And even the traditional should be made accessible in new ways. There is an information density problem and prioritization must be done.

We need to be careful when thinking about the scholarly record and look at new ways in which scholarly information flows.

There is a lot of stuff that doesn’t make it into IRs because all eyes are on capturing things that are already published somewhere. The eyes are on the wrong ball…

[presentations are available on the event page]

Brian Lavoie, Research Scientist in OCLC Research provided a framework for a common understanding and shared terminology for the day’s conversations.

He defined the scholarly record as being the portions of scholarly outputs that have been systematically gathered, organized, curated, identified and made persistently accessible.

OCLC developed the Evolving Scholarly Record Framework to help support discussions, to define key categories of materials and stakeholder roles, to be high-level so it can be cross disciplinary and practical, to serve as a common reference point across domains, and to support strategic planning.The major component is still outcomes, but in addition there are materials from the process (e.g., algorithms, data, preprints, blogs, grant reviews) and materials from the aftermath (e.g., blogs, reviews, commentaries, revision, corrections, repurposing for new audiences).

The stakeholder ecosystem combines old roles (fix, create, collect, and use) in new combinations and among a variety of organizations.  To succeed, selection of the scholarly record must be supported by a stable configuration of stakeholder roles.

We’ve been doing this, but passively and often at the end of a researcher’s career.  We need to do so much more, proactively and by getting involved early in the process.

Herbert Van de Sompel, Scientist at Los Alamos National Laboratory gave his Perspective on Archiving the Evolving Scholarly Record.  A scholarly communication system has to support the research process (which is more visible than ever before) and fulfill these functions:

  • Registration: allows claims of precedence for scholarly finding (e.g. Mss submission), which is now less discrete and more continuous
  • Certification: establishes the validity of the claim (e.g., peer review), which is becoming less formal
  • Awareness: allows actors to remain aware of new claims (alerts, stacks browsing, web discovery), which is trending toward instantaneous
  • Archiving: allows preservation of the record (by libraries and other stakeholders), which is evolving from medium- to content-driven.

Herbert characterized the future in the following ways:  The scholarly record is undergoing massive extension with objects that are heterogeneous, dynamic, compound, inter-related and distributed across the web – and often hosted on common web platforms that are not dedicated to scholarship.

Our goal is to achieve the ability to persistently, precisely, and seamlessly revisit the Scholarly Web of the Past and of the Now at some point in the Future.  We need to capture compound objects, with context, and in a state of flux at the request of the owner and at the time of relevance.

Herbert’s distinction between recording and archiving is critical. Recording platforms make no commitment to long-term access or preservation.  They may be a significant part of the scholarly process, but they are not a dependable part of the scholarly record.

We need to start creating workflows that support researcher-motivated movement of objects from private infrastructure to recording infrastructure and support curator-motivated movement of objects and context from recording infrastructure to archiving infrastructure.

Sarah Pritchard, Dean of Libraries, Northwestern University put things in the campus politics and technology context.

The evolving scholarly record requires that we work with a variety of stakeholders on campus:  faculty and students (as creators), academic departments (as managers of course content and grey literature), senior administrators (general counsel, CFO, HR), trustees (governance policy), office of research (as proxy for funder’s requirements), information technology units, and disciplinary communities.

There are many research information systems on campus, beyond the institutional repository: course management systems, faculty research networking systems, grant and sponsored research management systems, student and faculty personnel system, campus servers and intranets, and – because the campus boundaries are pervious — disciplinary repositories, cloud and social platforms.  And also office hard drives.

Policies and compliance issues go far beyond the content licensing libraries are familiar with:  copyright (at  the institutional and individual levels), privacy of records (student work, clinical data, business records), IT security controls and web content policies, state electronic records retention laws, open access (institutionally or funder mandated), and rights of external system owners (hosted content).

Sarah finished with some provocative thoughts:

  • The library sees itself as a “selector”, but many may see this as overstepping
  • The library looks out for the institution which can be at odds with the faculty sense of individual professional identity
  • There is a high cost to change the technical infrastructure and workflow mechanisms and to reshape governance and policy
  • There is a lack of a sense of urgency

She recommended that we start with low hanging fruit, engage centers of expertise, find pilot project opportunities, and accept that there won’t be a wholesale move into this environment.

Sarah Pritchard’s presentation really affected me: sort of a rallying cry to go out and make things happen!

The campus context provided a perfect launching point for the Breakout Discussions. From ten pages of notes, I’ve distilled the following action-oriented outcomes:

Within the library

  • If your library has receded from your university goals and strategies, move the library back into the prime business of your institution with a roster of candidate service offerings to re-position yourselves in the campus community.
  • Earn reputation through service provision and through access as opposed to reputation through ownership.
  • Selection
    • Ask yourself, what are we selecting? How do we define the object? What commitments will we make? And how does it fit into the broader system?
    • Consider some minimum requirements in terms of number of hits or other indications of interest for blogs/websites to be archived.  Those indexed by organizations like MLA or that are cited in scholarly articles seem worthy.
    • Declare collections of record so that others can depend on it, but beware of the commitment if you have to create new storage and access systems for a particular type of material.
    • Communicate when you have taken on a commitment to web archiving particular resources, possibly via the MARC preservation commitment field.
    • A lot of stuff that doesn’t get archived because we focus on materials that are already well-tended elsewhere. Look for the at-risk materials.
    • Accept adequate content sampling.
  • Focus on training librarians.  Get them to use the dissertation as the first opportunity to establish a relationship, establish an ORCID, and mint a DOI.  Do some of these things that publishers do to provide a gateway to infrastructure that is not campus-centric but system-centric.
  • Decide where the library will focus; it can’t be expert in all things.  Assess where the vulnerabilities are and set priorities.
  • Provide a solution where none exists to capture the things that have fallen through the cracks.
  • Technical solutions
  • Linked data could be the glue for connecting IDs with institutions. Identifiers for individuals and for organizations, and possibly identifiers for departments, funding agencies, projects…
  • Follow a standard to create metadata to provide consistency in the way it’s formed, in the content, and in the identifiers being used.
  • Use technology that is ready now to
    • help with link rot (the URL is bad) and reference rot (the content has changed), so researchers can reference a resource as it was when they used the data or cited it.  Memento makes it easy to archive a web page at a point in time.
    • provide identifiers
      • ORCID and ISNI are ready for researcher identification.
      • DOIs, Perma.cc, and Memento are ready for use.
    • harvest web resources. Archive-It is ready for web harvesting and the Internet Archive’s Wayback Machine is ready for accessing archived web pages.
    • transport of big data packets. Globus is a solution for researchers and institutions
    • create open source repositories. Consider using DSpace, EPrints, Fedora or Drupal to make your own.
  • Explore ways in which people track conversation around the creation of an output, like the Future of the Book platform or Twitter conversations. Open Annotation is a solution that allows people to discuss where they prefer.
  • Before building a data repository, ask for whom are we doing this and why?  If no one is asking for it, turn your attention elsewhere.
  • Create a hub for scholars who don’t know what they need, where the main activity may be referring researchers to other services.
  • To get quick support, promote and provide assistance with the DMPTool, minting DOIs, and archiving that information.
  • Get your message into two simple sentences.
  • Evolve the model and the people to move from support to collaboration

With researchers

  • Do the work to understand researchers’ perspectives.  Meet them where they live.  A good way to engage researchers is to ask them what’s important in their field. Then ask who is looking after it. Include grad students and untenured and newly-tenured faculty as they may be most receptive.
  • Data services may vary dramatically among disciplines.  Social Sciences want help with SPSS and R.  Others want GIS.  For STEM and Humanities there are completely different needs.
  • Before supporting an open access journal, ask the relevant community: do you need a journal, who is the audience, and what is the best way to communicate with them?
  • Stop hindering researchers with roadblocks relating to using cameras or scanners, copying, or putting up web pages.
  • Help users make good choices in use of existing disciplinary data repositories and provide a local option for disciplines lacking good choices.
  • Help faculty avoid having to profile themselves in multiple venues. Offer bibliography and resume services and portability as they move from institution to institution.
  • Explain the benefits of deposit in the record to students and faculty in terms of their portfolio and resume, and for collaboration.
  • To educate reluctant researchers, use assistants in the workflow, i.e. grant management assistants or use graduate student ambassadors to discount rumors and half-truths.  Try quick lunch and learn workshops.  Market through established channels and access points.
  • Talk to researchers about the levels of granularity available to appropriately manage access to their content.
  • Coordinate with those writing proposals and make sure they know that if they expect library staff to do some of the work, the library needs to be involved in the discussion. Get involved early in the research proposal process. Stress that maintenance has to be built in.    When committing to archiving, include an MOU covering service levels and end-of-life.
  • A formalized request process may help with communication.

With other parts of your institution

  • Get at least one other partner on campus on board early — maybe an academic faculty or department who are moving in the same direction you need to go (or administration, grants manager, IT people, educators, other librarians, funders).
  • Begin with a strategy, a call for partnership and implementation, then have conversations with faculty departments to get an environmental scan.  Identified what is needed (e.g., GIS, text-mining, data analysis), and distill into areas you can support internally or send along to campus partners.
  • Don’t duplicate services. Cede control to another area on the campus.  Communicate what is going on in different divisions and establish relationships. Provide guidance to get researchers to those places.
  • Work with associate deans and others at that level to find out about grant opportunities.
  • Develop partnerships with research centers and computing services, deciding what where in the lifecycle things are to be archived and by whom.
  • Other parts of the university may decide to license data from vendors like Elsevier. The library has a relationship that vendor, offer to do the negotiation.
  • Spin your message to a stakeholder’s context (e.g., archiving the scholarly record is a part of business continuity planning and risk management for the University’s CFO).
  • Coordinate with other campus pockets of activity involved in assigning DOIs, data management, and SEO activities for the non-traditional objects to optimize institutional outcomes. Integrating these objects into the infrastructure makes them able to circulate with the rest of the record.
  • Alliances on campus should be about integrating library services into the campus infrastructure. Unless you’ve done that on campus, you’re not doing your best to connect to the larger scholarly record.

With external entities

  • We should work with scholarly societies to learn about what we need to collect in a particular discipline (data sets, lab books, etc.) — and how to work with those researchers to get those things.
  • Identify the things can be done elsewhere and those that need to be done locally.  Storing e-science data sets may not be a local thing, whereas support for collaboration may be.
  • Make funder program officers aware of how libraries can help with grant proposals, so they can refer researchers’ questions back to the library.
  • Rely on external services like JSTOR, arXiv, SSRN, and ICPSR, which are dependable delivery and access systems with sustainable business models.
  • Use centers of excellence. Consider offering your expertise, for instance, with a video repository and rely on another institution for data deposit.
  • Work with publishers to provide the related metadata that might, for instance, be associated with a dataset uploaded to PLoSOne.
  • To help with the impact of researcher output, work with others, such as Symplectic, because they have the metadata we need.
  • To establish protocols for transferring between layers, make sure conversations include W3C and IETF.
  • Identify pockets of interoperability and find how to connect rather than waiting for interoperability to happen.

We are at the beginning of this; it will get better.

Thanks to all of our participants, but particularly to our hosts at Northwestern University, our speakers, and our note-takers. We’re looking forward to culminating the series at the workshop in San Francisco in June, where we’ll focus on how we can collaboratively move things forward to do our best to ensure stewardship of the scholarly record.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (37)

LITA: Teamwork and Jazz

Mon, 2015-04-06 14:01

“Jazz Players” by Pedro Ribeiro Simões / CC BY 2.0

Jazz is a pretty unique genre that demands a lot from musicians; a skilled jazz artist must not only be adept at their instrument, they must be highly skilled improvisors and communicators as well. Where other styles of music may only require that a musician remember how to play a piece and run through it the same way every time, good jazz artists can play the same song in an infinite number of ways. Furthermore, they must also be able to collaborate with other jazz artists who can also play the same song an infinite number of ways. This makes jazz an inherently human art form because a listener never knows what to expect; when a jazz group performs, the outcome is the unpredictable result of each musician’s personal taste and style merging into a group effort.

In a lot of ways, team projects are kind of like a jazz performance: you have several people with different skill sets coming together to work toward a common goal, and the outcome is dependent on the people involved. While there are obvious limits to how far we can stretch this metaphor, I think we can learn a lot about being an effective team member from some of the traits all jazz greats have in common.

 

Trust your bandmates

Many hands make light work. Sometimes we may feel like we could get more done if we simply work alone, but this puts an artificial limit on how effective you can be. Learn to get over the impulse to do it all yourself and trust in your colleagues enough to delegate some of your work. Everyone has different strengths and weaknesses, and great teams know how to balance these differences. Even though Miles Davis was a great trumpeter, his greatest performances were always collaborations with other greats, or at least with a backing band. Great musicians inspire each other to do their best and try to remove all creative hindrances. This hyper-creative environment just isn’t possible to replicate in isolation.

When we got a new metadata librarian here at FSU, I had been making my own MODS records for a few months and was uncomfortable with giving up control over this aspect of my workflow. I’ve since learned that this is his specialty and not mine, and I trust in his expertise. As a result, our projects now have better metadata, I have more time to work on other things that I do have expertise in, and I have learned a lot more about metadata than I ever could have working alone.

 

Learn to play backup

Everyone wants to play the solo. It’s the fun part, and all the attention is on you. There’s nothing wrong with wanting to shine, but if everyone solos at the same time it defeats the purpose and devolves into noise. Good jazz musicians may be known for their solos, but the greats know how to play in a way that supports others when it’s their turn to solo, too. They are more concerned with the sound of the band as a whole instead of selfishly focusing on their own sound.

A big part of trusting your “bandmates” is staying out of their way when it’s their turn to “solo”. Can you imagine trying play music on stage with someone who doesn’t even play your instrument yelling instructions at you about how you should be playing? That would be pretty distracting, but the office equivalent happens all the time. Micromanaging teammates can kill project morale quickly without even being aware of it. Sometimes projects have bottlenecks where no one can move forward until a specific thing gets done, and this is just a fact of life. If you are waiting for a team member to get something done so you can start on your part of the project, politely let them know that you are available if they need help or advice, and only provide help and advice if they ask. If they don’t need help, then politely stay out of their way.

 

Communication is key

Jazz musicians aren’t mind readers, but you might think they were after a great performance. It’s unbelievable how some bands can improvise in the midst of such complex patterns without getting lost. This is because improvisation requires a great deal of communication. Musicians communicate to each other using a variety of cues, either musical (one might drop in volume to signal the end of a solo), physical (one might step towards the center of the group to signal the start of a solo and then step away to signal the end), or visual (one might nod, wink or shift their foot as a signal to the rest of the group). These cue systems are all specific to the context of people performing on stage, but we can imagine a different set of cues for a team project that work just as well.

Like jazz musicians, team projects can be incredibly complex and a successful project requires all team members to be aware of their context. It is essential that everyone knows exactly where a project is at on a timeline so that they can act accordingly, and this information can be expressed in a variety of ways. Email is a popular choice, as it leaves a written record of who said what that can be consulted later. Email is great at communicating small, specific bits of information, but it is always helpful to have a “30,000 foot view” of the project as well so the team can see the big picture. Fellow LITA blogger Leo Stezano wrote a post about different ways to keep track of a project’s high-level progress, covering the use of software, spreadsheets, and the classic “post-it notes on a whiteboard” approach. I prefer to use Trello since it combines the simplicity of post-it notes on a wall with the flexibility of software, but there are a lot of options. The best option is whatever works for your team.

Equally important to finding good ways to communicate and sticking with them is uncovering harmful methods of communication and stopping them. Don’t send emails about a project to the rest of your team outside of working hours, it sends the wrong message about work-life balance. Try to eliminate unnecessary meetings and replace them with emails if you can. Emails are asynchronous and team members can respond when it is convenient for them, but meetings pollute our schedules and are productivity kryptonite. Finally, don’t drop into someone’s office unannounced (I do this all the time). Send an email or schedule a short meeting instead. Random office drop-ins derail the victim’s train of thought and sends the signal that whatever they were working on isn’t as important as you are. Can you imagine Miles Davis tapping John Coltrane on the shoulder during a solo to ask what song they should play next? I didn’t think so. Being considerate with your communication is an underrated skill that may be the secret sauce that makes your project run more smoothly.

Brown University Library Digital Technologies Projects: Announcing a Researchers @ Brown data service

Mon, 2015-04-06 13:59

Campus developers might want to use data from Researchers@Brown (R@B) in other websites. The R@B team has developed a JSON web service that allows for this.  We think it will satisfy many uses on campus. Please give it a try and send feedback to researchers@brown.edu.

Main types/resources
  • faculty
  • organizational units (departments, centers, programs, institutes, etc)
  • research topics
Requesting data

To request data, begin with an identifier.  Let’s use Prof. Diane Lipscombe as an example:

/services/data/v1/faculty/dlipscom

Looking through the response you will notice affiliations and topics from Prof. Lipscombe’s profile.  You can make additional requests for information about those types by following the “more” link in the response.

/services/data/v1/ou/org-brown-univ-dept56/

Following the affiliations links from a faculty data profile will return information about the Department of Neuroscience, which Prof. Lipscombe is a member.

/services/data/v1/topic/n49615/

Looking up this topic will return more information about the research topic “molecular biology”, including other faculty who have identified this as a research interest.

Responses Faculty
  • first name
  • last name
  • middle
  • title
  • Brown email
  • url (R@B)
  • thumbnail
  • image – original image uploaded
  • affiliations – list with lookups
  • overview – this is HTML and may contain links or other formatting
  • topics – list with lookups
Organizations
  • name
  • image (if available)
  • url (to R@B)
  • affiliations – list with lookups
Topics
  • name
  • url (to R@B)
  • faculty – list with lookups
Technical Details
  • Requests are cached for 18 hours.
  • CORS support for embedding in other sites with JavaScript
  • JSONP for use in browsers that don’t support CORs.
Example implementation

As an example, we have prepared an example of using the R@B data service with JavaScript using the React framework.

David Rosenthal: The Mystery of the Missing Dataset

Sun, 2015-04-05 19:00
I was interviewed for an upcoming news article in Nature about the problem of link rot in scientific publications, based on the recent Klein et al paper in PLoS One. The paper is full of great statistical data but, as would be expected in a scientific paper, lacks the personal stories that would improve a news article.

I mentioned the interview over dinner with my step-daughter, who was featured in the very first post to this blog when she was a grad student. She immediately said that her current work is hamstrung by precisely the kind of link rot Klein et al investigated. She is frustrated because the dataset from a widely cited paper has vanished from the Web. Below the fold, a working post that I will update as the search for this dataset continues.


My step-daughter works on sustainability and life-cycle analysis. Here is her account of the background to her search:
The data was originally recommended to me by one of our scientific advisors at [a previous company] for use in the software we were developing and for our use in our consulting work. On their recommendation I googled "impact2002+" and found my way to the download page. I originally downloaded it in summer 2011.

It is a model for characterizing environmental flows into impacts. This is incredibly useful when looking at hundreds of pollutants and resource uses across a supply chain to understand how they roll-up into impacts to human health, ecosystem quality, and resources. For example it estimates the disability adjusted life years (impact to human life expectancy) associated with a release of various pollutants to air/land/soil. Another example is the estimate of the ecosystem quality loss (biodiversity loss) associated with various chemical emissions. Another example is the estimate of the future energy required to extract an incremental amount of additional minerals or energy resources (e.g. coal).

I looked for it again in summer 2014 when I noticed it was gone. I always assumed that by just searching "Impact2002+" I'd be able to find the data again - how wrong I was!

I reached out to the webmaster listed on the University of Michigan site and actually got a response but after a couple emails requesting the data with no luck I stopped pursuing that path. I ended up purchasing a dataset that has some of the Impact2002+ data embedded in it but there are still some pieces of my analysis that are limited by not having the original dataset. Here is where the search starts. In 2003, Olivier Jolliet et al published IMPACT 2002+: A new life cycle impact assessment methodology:
The new IMPACT 2002+ life cycle impact assessment methodology proposes a feasible implementation of a combined midpoint/damage approach, linking all types of life cycle inventory results (elementary flows and other interventions) via 14 midpoint categories to four damage categories. ... The IMPACT 2002+ method presently provides characterization factors for almost 1500 different LCI-results, which can be downloaded at http://www.epfl.ch/impactIn its field, this is an extremely important paper. Google Scholar finds 810 citations to it. Unfortunately, this isn't a paper for which Springer provides article-level metrics. The International Journal of Life Cycle Assessment, in which the paper was published, is ranked 8th in the Sustainable Development field by Google's Scholar Metrics. Its h5-median index is 54, so a paper with 810 citations is vastly more cited than the papers it typically publishes.

The authors very creditably provided their data, the 1500 characterization factors, for download from the specified URL. That link, http://www.epfl.ch/impact, now redirects to http://www.riskscience.umich.edu/jolliet/downloads.htm, which returns a 404 Not Found error, so it has unambiguously rotted. The Wayback Machine does not have that page, although it has over 1000 URLs from http://www.riskscience.umich.edu/, nor does the Memento Time Travel service. So not merely has the link rotted, but there don't appear to be any archived versions of the data supporting the paper.

The bookmark my step-daughter had for the dataset was http://www.earthshift.com/software/simapro/impact2002, which links to  http://www.epfl.ch/impact, which redirects to the broken http://www.riskscience.umich.edu/jolliet/downloads.htm.

The Wayback Machine has 11 captures of  http://www.epfl.ch/impact between February 11, 2002 and July 7, 2014. The most recent is actually a capture of the page it redirected to at the Michigan's School of Public Health, which now returns 404. That page said:
In order to access the IMPACT 2002+ model we ask that you provide us with your name, affiliation and email address at the bottom of this page. You do not have to be affiliated with the Center for Risk Science and Commnication or the University of Michigan to access the IMPACT 2002 model. Your information will only be used to notify you of any updates concerning the model. Your data will be kept strictly confidential.This is the explanation for the lack of any archived versions of the dataset. Web crawlers, such as the Internet Archive's Heritrix, are unable to fill out Web forms without site-specific knowledge, which in this case was obviously not available.

Similarly, in 2005 the Internet Archive captured pages from the EPFL site before the move to Michigan. They included this page describing the IMPACT2002+ method, which used a form to ask for:
your name, affiliation and your email-address, which will will enable us to keep you informed about important updates from time to time. None of your data will be transmitted to anyone else. Then you can download the following files concerning the IMPACT 2002+ method ... Your data are not used to control or restrict the download, but will help us to keep you informed about updates concerning the IMPACT 2002+ methodology.Again, archiving of the freely download-able data was prevented.

One obvious lesson from this is that authors should be strongly discouraged from forcing researchers to supply information, such as names and e-mail addresses, before they can download data that has been made freely available, because the result is likely to be, as in this case, that with the ravages of time the data will become totally unavailable. It seems likely that this dataset became unavailable as a side-effect of the Risk Science Center migrating to its own website rather than being a part of the School of Public Health's website.

Another lesson is the completely inadequate state of Institutional Repositories. The University of Michigan's IR, Deep Blue, contains only 6 of the 76 "Selected Publications" from Olivier Jolliet's Michigan home page, but it has PDFs for their full text. Infoscience, the EPFL IR lists 58 publications with Olivier Jolliet as an author, including the paper in question, but for that it says:
There is no available fulltext. Please contact the lab or the authors.and:
The IMPACT 2002+ method presently provides characterization factors for almost 1500 different LCI-results, which can be downloaded at http://www.epfl.ch/impactwhich is no longer the case. Note that ResearchGate claims to know about 177 publications from OlivierJolliet.

Patrick Hochstenbach: Penguins Are Back

Sun, 2015-04-05 09:02
Filed under: Doodles Tagged: aprilfools, cartoon, comic, easter, Penguin

Galen Charlton: Three tales regarding a decrease in the number of catalogers

Sat, 2015-04-04 20:25

Discussions on Twitter today – see the timelines of @cm_harlow and @erinaleach for entry points – got me thinking.

In 1991, the Library of Congress had 745 staff in its Cataloging Directorate. By the end of FY 2004, the LC Bibliographic Access Divisions had between 5061 and 5612 staff.

What about now? As of 2014, the Acquisitions and Bibliographic Access unit has 238 staff3.

While I’m sure one could quibble about the details (counting FTE vs. counting humans, accounting for the reorganizations, and so forth), the trend is clear: there has been a precipitous drop in the number of cataloging staff employed by the Library of Congress.

I’ll blithely ignore factors such as shifts in the political climate in the U.S. and how they affect civil service. Instead, I’ll focus on library technology, and spin three tales.

The tale of the library technologists

The decrease in the number of cataloging staff are one consequence of a triumph of library automation. The tools that we library technologists have written allow catalogers to work more efficiently. Sure, there are fewer of them, but that’s mostly been due to retirements. Not only that, the ones who are left are now free to work on more intellectually interesting tasks.

If we, the library technologists, can but slip the bonds of legacy cruft like the MARC record, we can make further gains in the expressiveness of our tools and the efficiencies they can achieve. We will be able to take advantage of metadata produced by other institutions and people for their own ends, enabling library metadata specialists to concern themselves with larger-scale issues.

Moreover, once our data is out there – who knows what others, including our patrons, can achieve with it?

This will of course be pretty disruptive, but as traditional library catalogers retire, we’ll reach buy-in. The library administrators have been pushing us to make more efficient systems, though we wish that they would invest more money in the systems departments.

We find that the catalogers are quite nice to work with one-on-one, but we don’t understand why they seem so attached to an ancient format that was only meant for record interchange.

The tale of the catalogers

The decrease in the number of cataloging staff reflects a success of library administration in their efforts to save money – but why is it always at our expense? We firmly believe that our work with the library catalog/metadata services counts as a public service, and we wish more of our public services colleagues knew how to use the catalog better.  We know for a fact that what doesn’t get catalogued may as well not exist in the library.

We also know that what gets catalogued badly or inconsistently can cause real problems for patrons trying to use the library’s collection.  We’ve seen what vendor cataloging can be like – and while sometimes it’s very good, often it’s terrible.

We are not just a cost center. We desperately want better tools, but we also don’t think that it’s possible to completely remove humans from the process of building and improving our metadata. 

We find that the library technologists are quite nice to work with one-on-one – but it is quite rare that we get to actually speak with a programmer.  We wish that the ILS vendors would listen to us more.

The tale of the library directors

The decrease in the number of cataloging staff at the Library of Congress is only partially relevant to the libraries we run, but hopefully somebody has figured out how to do cataloging more cheaply. We’re trying to make do with the money we’re allocated. Sometimes we’re fortunate enough to get a library funding initiative passed, but more often we’re trying to make do with less: sometimes to the point where flu season makes us super-nervous about our ability to keep all of the branches open.

We’re concerned not only with how much of our budgets are going into electronic resources, but with how nigh-impossible it is to predict increases in fees for ejournal subscriptions/ fees for ebook services.

We find that the catalogers and the library technologists are pleasant enough to talk to, but we’re not sure how well they see the big picture – and we dearly wish they could clearly articulate how yet another cataloging standard / yet another systems migration will make our budgets any more manageable.

Each of these tales is true. Each of these tales is a lie. Many other tales could be told. Fuzziness abounds.

However, there is one thing that seems clear: conversations about the future of library data and library systems involve people with radically different points of view. These differences do not mean that any of the people engaged in the conversations are villains, or do not care about library users, or are unwilling to learn new things.

The differences do mean that it can be all too easy for conversations to fall apart or get derailed.

We need to practice listening.

1. From testmony by the president of the Library of Congress Professional Guild to Congress on 6 March 2015. 2. From the BA FY 2004 report. This including 32 staff from the Cataloging Distribution Service, which had been merged into BA and had not been part of the Cataloging Directorate. 3. From testmony by the president of the Library of Congress Professional Guild to Congress on 6 March 2015.

Cynthia Ng: Musing: Playing Around with the NNELS Logo

Sat, 2015-04-04 02:51
It’s come up recently that we might consider revising our logo. I saw a coworker playing around with it and thought I’d give it a try. The thinking behind it is simple. Transpose the letters into Braille, and then try to match the Braille version to a hexagonal grid. Turns out the hardest is the … Continue reading Musing: Playing Around with the NNELS Logo

HangingTogether: The Semi-Finals

Fri, 2015-04-03 18:19

OCLC Research Collective Collections Tournament

#oclctourney

Thirty-two conferences started this journey, and now only two remain. The OCLC Research Collective Collection tournament is just one step away from crowning a Champion. Throw your brackets away and buckle your seat belts, because the tournament semi-finals are over and the finals are next!

[Click to enlarge]

How many languages does your conference collective collection speak? Competition in the semi-finals centered around the number of languages represented in each conference’s collective collection.* In the first semi-finals match-up, Conference USA cruised to an easy victory over Summit League, 366 languages to 265 languages. In the second match-up, Atlantic 10 also had little trouble with its opponent, moving past Missouri Valley 374 languages to 289 languages. So Conference USA and Atlantic 10 will square off in the tournament finals, with the honor and glory of the title “2015 Collective Collections Tournament Champion” at stake!

As the results of the semi-finals competition show, conference collective collections are very multilingual. Atlantic 10 had the most languages of any competitor in this round, with more than 370. But even the conference with the fewest languages – Summit League – had 265 languages in its collective collection! Suppose that an average book is 1.25 inches thick. If Summit League stacked up one book for every language represented in its collection, the resulting pile would be almost 28 feet tall! If Atlantic 10 did it, the stack would be nearly 40 feet tall!

The mega-collective-collection of all libraries – as represented in the WorldCat bibliographic database – contains publications in 481 different languages. English is the most common language in WorldCat; here’s a look at the top 50 most frequently-found languages other than English:

[Word cloud created with worditout.com. Click to enlarge]

After English, the most common languages in WorldCat are German, French, Spanish, and Chinese. Despite the high number of English-language materials, more than half of the materials in WorldCat are non-English! And as we’ve seen, many of these non-English-language publications have found their way into the collective collections of our tournament semi-finalists! So are you interested in reading something in Urdu? Atlantic 10 has nearly 2,300 Urdu-language publications to choose from. How about Welsh? Conference USA can furnish you with nearly 1,400 publications in Welsh. No matter what language you’re interested in, these collective collections likely have something for you – they speak a lot of languages!

Bracket competition participants: Remember, even if the conference you chose is not in the Finals, hope still flickers! If no one picked the tournament Champion, all entrants will be part of a random drawing for the big prize!

Get set for the Tournament Finals! Results will be posted April 6.

 

*Number of languages represented in language-based (text or spoken) publications comprising each conference collective collection. Data is current as of January 2015.

More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

Round of 16: The plot thickens … and so do the books

Round of 8: Peaches and Pumpkins

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (13)

FOSS4Lib Updated Packages: The Great Reading Adventure

Fri, 2015-04-03 16:59

Last updated April 3, 2015. Created by Jim Craner on April 3, 2015.
Log in to edit this page.

From The Great Reading Adventure website:

"The Great Reading Adventure is a robust, open source software designed to manage library reading programs. It is currently in its second version... The Great Reading Adventure was developed by the Maricopa County Library District with support by the Arizona State Library, Archives and Public Records, a division of the Secretary of State, with federal funds from the Institute of Museum and Library Services."

The Great Reading Adventure lets libraries and library consortia set up a full online summer reading program for patrons. Features include reporting, customization per library, digital badges, avatars, reading lists, and much more.

The software runs on a Windows IIS/MSSQL server.

License: MIT License Package Links Development Status: Production/StableOperating System: WindowsDatabase: MsSQL

John Miedema: Lila “tears down” old categories and suggests new ways of looking at content. Word concreteness is a good candidate.

Fri, 2015-04-03 14:22

Many of the good things we love about language are essentially hierarchical. Narrative is linear: a beginning, middle, and end. Order shapes the story. Hierarchy gives a bird’s eye view, a table of contents, a summary that allows a reader to consider a work as a whole.

Lila will compute hierarchy by comparing passages on word qualities that suggest order. Concreteness is considered a good candidate. Passages with more abstract words express ideas and concepts, whereas passages with more concrete words express examples. Of the views that Lila can suggest, it is useful to have a view that presents abstract concepts first and concrete examples second. I have listed four candidate qualities here, but I will focus in the posts that follow on concreteness.

Quality Description Examples 1 Abstract Intangible qualities, ideas and concepts. Different than frequency of word usage. Both academic terms and colorful prose can have low word frequency. freedom (227*), justice (307), love (311) Concrete Tangible examples, illustrations and sensory experience grasshopper (660*), tomato (662), milk (670) 2 General Categories and groupings. Similar to 1, but 1 is more dichotomous and this one is more of a range. furniture Specific Particular instances La-Z-Boy rocker-recliner 3 Logical Analytical thinking, understatement and fact. Note the conflict with 1 and 2 — facts are both logical and concrete. The fastest land dwelling creature is the Cheetah. Emotional/Sentimental Feeling, emphasis, opinion. Can take advantage of the vast amount of sentiment measures available. The ugliest sea creature is the manatee. 4 Static Constancy and passivity It was earlier demonstrated that heart attacks can be caused by high stress. Dynamic Change and activity. Energy. Researchers earlier showed that high stress can cause heart attacks.

* Concreteness index. MRC Psycholinguistic database. Grasshopper is a more concrete word than freedom. Indexes like the MRC can be used to compute concreteness for passages.

Lila can compute hierarchy for passages, and for groups of passages. Together, it builds a hierarchy, a view of how the content can be organized. Think of what this offers a writer. A writer stuck in his or her manually produced categories and view can ask Lila for alternate views. Lila “tears down” the old categories and suggests a new way of looking at the content. It is unlikely that the writer will stick exactly to Lila’s view, but it could provide a fresh start or give new insight. And Lila can compute new views dynamically, on demand, as the content changes.

LITA: Agile Development: Tracking Progress

Fri, 2015-04-03 14:00

Image courtesy of Wikipedia (Jeff Iasovski)

In my last post, I discussed effort estimation and scheduling, which leads into the beginning of actual development. But first, you need to decide how you’re going to track progress. Here are some commonly used methods:

The Big Board

In keeping with Agile philosophy, you should choose the simplest tool that gives you the functionality you need. If your team does all of its development work in the same physical space, you could get by with post-it notes on a big white board. There’s a lot to be said for a tangible object: it communicates the independent nature of each task or story in a way that software may not. It provides the team with a ready-made meeting point: if you want to see how the project is going, you have to go stand in front of the big board. A board can also help to keep projects lean and simple, because there’s only so much available space on it. There are no multiple screens or pages to hide complexity.

Sticky notes, however, are ephemeral in nature. You can lose your entire project plan to an overzealous janitor; more importantly, unless you periodically take pictures of your board, there’s no way to trace user story evolution. Personally, I like to use this method in the initial stages of planning; the board is a very useful anchor for user story definition and prioritization. Once we move into the development process, I find that moving into the virtual realm adds crucial flexibility and tracking functionality.

Spreadsheets

If the scope of the project is limited, it may be possible to track it using a basic office productivity suite like MS Office. MS Excel and similar spreadsheet tools are fairly easy to use, and they’re ubiquitous, which means your team will likely face a lower learning curve. Remember that in Agile the business side of the organization is an integral part of the development effort, and it may not make sense to spend time and effort to train sales and management staff on a complex tracking tool.

If you choose to go the spreadsheet route, however, you are giving up some functionality: it’s easy enough to create and maintain spreadsheets that give you project snapshots and track current progress, but this type of software is not designed to accurately measure long term progress and productivity, which helps you upgrade your processes and increase your team’s efficiency. There are ways to track Agile metrics using Excel, but if you find that you need to do that you may just want to switch to dedicated software anyway.

Tracking Software

There are several tracking tools out there that can help manage Agile projects, although my personal experience so far has been limited to to JIRA and its companion GreenHopper. JIRA is a fairly simple issue-tracking tool: you can create issues (manually or directly from a reporting form), add a description, estimate effort, prioritize, and assign to a team member for completion. You can also track it through the various stages of development, adding comments at each step of the way and preserving meaningful conversations about its progress and evolution. As you can see in this article comparing similar tools, JIRA’s main advantage is the lack of unnecessary UI complexity, which makes it easier to master. Its main shortcoming is the lack of sprint management functionality, which is what GreenHopper provides. With the add-on, users can create sprints, assign tickets to them, and track sprint progress.

Can all of this functionality be replicated using spreadsheets? Yes, although maintenance and authentication can becomes problematic as the complexity of the project increases. At some point a tool like JIRA starts to pay for itself in terms of increased efficiency, and most if not all of these products are web-based and offer some sort of free trial or small enterprise pricing. My advice is to do analyze your operations to determine if you need to go the tracking tool route, and then some basic research to identify popular options and their pros and cons. Once you’ve identified one or two options that seem to fit your needs, give then a try to see if they’re what you’re looking for.

Again, which method you go with will depend on how much effort you will need to spend up front (in training and adapting new software) versus later on (added maintenance and decreased efficiency).

How do you track user story progress? What are the big advantages/disadvantages of your chosen method? JIRA in particular seems to elicit strong feelings in users, positive or negative; what are your thoughts on it?

DuraSpace News: OR2015 Conference Stands Behind Commitment to Ensure All Participants are Treated With Respect

Fri, 2015-04-03 00:00

Indianapolis, IN  The Open Repositories 2015 conference will take place June 8-11 in Indianapolis and is wholly committed to creating an open and inclusive conference environment. As expressed in its Code of Conduct, OR is dedicated to providing a welcoming and positive experience for everyone and to having an environment in which all colleagues are treated with dignity and respect.

Pages