You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 14 hours 1 min ago

Jason Ronallo: HTML5 Video Caption Cue Settings in WebVTT

Tue, 2014-10-21 13:25

TL;DR Check out my tool to better understand how cue settings position captions for HTML5 video.

Having video be a part of the Web with HTML5 <video> opens up a lot of new opportunities for creating rich video experiences. Being able to style video with CSS and control it with the JavaScript API makes it possible to do fun stuff and to create accessible players and a consistent experience across browsers. With better support in browsers for timed text tracks in the <track> element, I hope to see more captioned video.

An important consideration in creating really professional looking closed captions is placing them correctly. I don’t rely on captions, but I do increasingly turn them on to improve my viewing experience. I’ve come to appreciate some attributes of really well done captions. Accuracy is certainly important. The captions should match the words spoken. As someone who can hear, I see inaccurate captions all too often. Thoroughness is another factor. Are all the sounds important for the action represented in captions. Captions will also include a “music” caption, but other sounds, especially those off screen are often omitted. But accuracy and thoroughness aren’t the only factors to consider when evaluating caption quality.

Placement of captions can be equally important. The captions should not block other important content. They should not run off the edge of the screen. If two speakers are on screen you want the appropriate captions to be placed near each speaker. If a sound or voice is coming from off screen, the caption is best placed as close to the source as possible. These extra clues can help with understanding the content and action. These are the basics. There are other style guidelines for producing good captions. Producing good captions is something of an art form. More than two rows long is usually too much, and rows ought to be split at phrase breaks. Periods should be used to end sentences and are usually the end of a single cue. There’s judgment necessary to have pleasing phrasing.

While there are tools for doing this proper placement for television and burned in captions, I haven’t found a tool for this for Web video. While I haven’t yet have a tool to do this, in the following I’ll show you how to:

  • Use the JavaScript API to dynamically change cue text and settings.
  • Control placement of captions for your HTML5 video using cue settings.
  • Play around with different cue settings to better understand how they work.
  • Style captions with CSS.
Track and Cue JavaScript API

The <video> element has an API which allows you to get a list of all tracks for that video.

Let’s say we have the following video markup which is the only video on the page. This video is embedded far below, so you should be able to run these in the console of your developer tools right now.

<video poster="soybean-talk-clip.png" controls autoplay loop> <source src="soybean-talk-clip.mp4" type="video/mp4"> <track label="Captions" kind="captions" srclang="en" src="soybean-talk-clip.vtt" id="soybean-talk-clip-captions" default> </video>

Here we get the first video on the page:

var video = document.getElementsByTagName('video')[0];

You can then get all the tracks (in this case just one) with the following:

var tracks = video.textTracks; // returns a TextTrackList var track = tracks[0]; // returns TextTrack

Alternately, if your track element has an id you can get it more directly:

var track = document.getElementById('soybean-talk-clip-captions').track;

Once you have the track you can see the kind, label, and language:

track.kind; // "captions" track.label; // "Captions" track.language; // "en"

You can also get all the cues as a TextTrackCueList:

var cues = track.cues; // TextTrackCueList

In our example we have just two cues. We can also get just the active cues (in this case only one so far):

var active_cues = track.activeCues; // TextTrackCueList

Now we can see the text of the current cue:

var text = active_cues[0].text;

Now the really interesting part is that we can change the text of the caption dynamically and it will immediately change:

track.activeCues[0].text = "This is a completely different caption text!!!!1"; Cue Settings

We can also then change the position of the cue using cue settings. The following will move the first active cue to the top of the video.

track.activeCues[0].line = 1;

The cue can also be aligned to the start of the line position:

track.activeCues[0].align = "start";

Now for one last trick we’ll add another cue with the arguments of start time and end time in seconds and the cue text:

var new_cue = new VTTCue(1,30, "This is the next of the new cue.");

We’ll set a position for our new cue before we place it in the track:

new_cue.line = 5;

Then we can add the cue to the track:

track.addCue(new_cuew);

And now you should see your new cue for most of the duration of the video.

Playing with Cue Settings

The other settings you can play with including position and size. Position is the text position as a percentage of the width of the video. The size is the width of the cue as a percentage of the width of the video.

While I could go through all of the different cue settings, I found it easier to understand them after I built a demonstration of dynamically changing all the cue settings. There you can play around with all the settings together to see how they actually interact with each other.

At least as of the time of this writing there is some variability between how different browsers apply these settings.

Test WebVTT Cue Settings and Styling

Cue Settings in WebVTT

I’m honestly still a bit confused about all of the optional ways in which cue settings can be defined in WebVTT. The demonstration outputs the simplest and most straightforward representation of cue settings. You’d have to read the spec for optional ways to apply some cue settings in WebVTT.

Styling Cues

In browsers that support styling of cues (Chrome, Opera, Safari), the demonstration also allows you to apply styling to cues in a few different ways. This CSS code is included in the demo to show some simple examples of styling.

::cue(.red){ color: red; } ::cue(.blue){ color: blue; } ::cue(.green){ color: green; } ::cue(.yellow){ color: yellow; } ::cue(.background-red){ background-color: red; } ::cue(.background-blue){ background-color: blue; } ::cue(.background-green){ background-color: green; } ::cue(.background-yellow){ background-color: yellow; }

Then the following cue text can be added to show red text with a yellow background. The

<c.red.background-yellow>This cue has red text with a yellow background.</c>

In the demo you can see which text styles are supported by which browsers for styling the ::cue pseudo-element. There’s a text box at the bottom that allows you to enter any arbitrary styles and see what effect they have.

Example Video

Test WebVTT Cue Settings and Styling

FOSS4Lib Recent Releases: ArchivesSpace - 1.1.0

Tue, 2014-10-21 12:53
Package: ArchivesSpaceRelease Date: Tuesday, October 21, 2014

Last updated October 21, 2014. Created by Peter Murray on October 21, 2014.
Log in to edit this page.

The ArchivesSpace team is happy to release version v1.1.0.

Please see the documentation for information on how to upgrade your ArchivesSpace installs.

This release includes upgrading Rails to 3.2.19, which addresses another important security patch. It is recommended that users update ArchivesSpace in order to apply this patch.

Jodi Schneider: Genre defined, a quote from John Swales

Tue, 2014-10-21 12:06

A genre comprises a class of communicative events, the members of which share some set of communicative purposes. These purposes are recognized by the expert members of the parent discourse community and thereby constitute the rationale for the genre. This rationale shapes the schematic structure of the discourse and influences and constrains choice of content and style. Communicative purpose is both a privileged criterion and one that operates to keep the scope of a genre as here conceived narrowly focused on comparable rhetorical action. In addition to purpose, exemplars of a genre exhibit various patterns of similarity in terms of structure, style, content and intended audience. If all high probability expectations are realized, the exemplar will be viewed as prototypical by the parent discourse community. The genre names inherited and produced by discourse communities and imported by others constitute valuable ethnographic communication, but typically need further validation.1

  1. Genre defined, from John M. Swales, page 58, Chapter 3 “The concept of genre” in Genre Analysis: English in academic and research settings. Cambridge University Press 1990. Reprinted with other selections in
    The Discourse Studies Reader: Main currents in theory and analysis (see pages 305-316).

Open Knowledge Foundation: Storytelling with Infogr.am

Tue, 2014-10-21 11:55

As we well know, Data is only data until you use it for storytelling and insights. Some people are super talented and can use D3 or other amazing visual tools, just see this great list of resources on Visualising Advocacy. In this 1 hour Community Session, Nika Aleksejeva of Infogr.am shares some easy ways that you can started with simple data visualizations. Her talk also includes tips for telling a great story and some thoughtful comments on when to use various data viz techniques.

We’d love you to join us and do a skillshare on tools and techniques. Really, we are tool agnostic and simply want to share with the community. Please do get in touch and learn more: about Community Sessions.

Open Knowledge Foundation: New Open Knowledge Initiative on the Future of Open Access in the Humanities and Social Sciences

Tue, 2014-10-21 10:58

To coincide with Open Access Week, Open Knowledge is launching a new initiative focusing on the future of open access in the humanities and social sciences.

The Future of Scholarship project aims to build a stronger, better connected network of people interested in open access in the humanities and social sciences. It will serve as a central point of reference for leading voices, examples, practical advice and critical debate about the future of humanities and social sciences scholarship on the web.

If you’d like to join us and hear about new resources and developments in this area, please leave us your details and we’ll be in touch.

For now we’ll leave you with some thoughts on why open access to humanities and social science scholarship matters:

“Open access is important because it can give power and resources back to academics and universities; because it rightly makes research more widely and publicly available; and because, like it or not, it’s beginning and this is our brief chance to shape its future so that it benefits all of us in the humanities and social sciences” – Robert Eaglestone, Professor of Contemporary Literature and Thought, Royal Holloway, University of London.

*

“For scholars, open access is the most important movement of our times. It offers an unprecedented opportunity to open up our research to the world, irrespective of readers’ geographical, institutional or financial limitations. We cannot falter in pursuing a fair academic landscape that facilitates such a shift, without transferring prohibitive costs onto scholars themselves in order to maintain unsustainable levels of profit for some parts of the commercial publishing industry.” Dr Caroline Edwards, Lecturer in Modern & Contemporary Literature, Birkbeck, University of London and Co-Founder of the Open Library of Humanities

*

“If you write to be read, to encourage critical thinking and to educate, then why wouldn’t you disseminate your work as far as possible? Open access is the answer.” – Martin Eve, Co-Founder of the Open Library of Humanities and Lecturer, University of Lincoln.

*

“Our open access monograph The History Manifesto argues for breaking down the barriers between academics and wider publics: open-access publication achieved that. The impact was immediate, global and uniquely gratifying–a chance to inject ideas straight into the bloodstream of civic discussion around the world. Kudos to Cambridge University Press for supporting innovation!” — David Armitage, Professor and Chair of the Department of History, Harvard University and co-author of The History Manifesto

*

“Technology allows for efficient worldwide dissemination of research and scholarship. But closed distribution models can get in the way. Open access helps to fulfill the promise of the digital age. It benefits the public by making knowledge freely available to everyone, not hidden behind paywalls. It also benefits authors by maximizing the impact and dissemination of their work.” – Jennifer Jenkins, Senior Lecturing Fellow and Director, Center for the Study of the Public Domain, Duke University

*

“Unhappy with your current democracy providers? Work for political and institutional change by making your research open access and joining the struggle for the democratization of democracy” – Gary Hall, co-founder of Open Humanities Press and Professor of Media and Performing Arts, Coventry University

District Dispatch: I’m right! Librarians have to think

Tue, 2014-10-21 09:00

I will pat myself on the back (somebody has to). I wrote in the 2004 edition of Copyright Copyright, “Fair use cannot be reduced to a checklist. Fair use requires that people think.” This point has been affirmed (pdf) by the Eleventh Circuit Court of Appeals in the long standing Georgia State University (GSU) e-reserves copyright case. The appeals court rejected the lower court’s use of quantitative fair use guidelines in making its fair use ruling, stating that fair use should be determined on a case-by-case basis and that the four factors of fair use should be evaluated and weighed.

Lesson: Guidelines are arbitrary and silly. Determine fair use by considering the evidence before you. (see an earlier District Dispatch article).

The lower court decision was called a win for higher education and libraries because only five assertions of infringement (out of 99) were actually infringing. Hooray for us! But most stakeholders on both sides of the issue, felt that the use of guidelines in weighing the third factor—amount of the work—was puzzling to say the least (but no matter, we won!)

Now that the case has been sent back to the lower court, some assert that GSU has lost the case. But not so fast. This decision validates what the U.S. Supreme Court has long held that fair use is not to be simplified with “bright line rules, for the statute, like the doctrine it recognizes, calls for case-by-case analysis. . . . Nor may the four statutory factors be treated in isolation, one from another. All are to be explored, and the results weighed together, in light of the purposes of copyright.” (510 U.S. 569, 577–78).

Thus, GSU could prevail. Or it might not. But at least fair use will be applied in the appropriate fashion.

Thinking—it’s a good thing.

The post I’m right! Librarians have to think appeared first on District Dispatch.

HangingTogether: BIBFRAME Testing and Implementation

Tue, 2014-10-21 08:00

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Philip Schreur of Stanford. We were fortunate that staff from several BIBFRAME testers participated: Columbia, Cornell, George Washington University, Princeton, Stanford and University of Washington. They shared their experiences and tips with others who are still monitoring BIBFRAME developments.

Much of the testers’ focus has been on data evaluation and identifying problems or errors in converting MARC records to BIBFRAME using either the BIBFRAME Comparison Service or Transformation Service.  Some have started to create BIBFRAME data from scratch using the BIBFRAME Editor. This raised a concern among managers about how much time and staffing was needed to conduct this testing. Several institutions have followed Stanford’s advice and enrolled staff in the Library Juice Academy series to gain competency in XML and RDF- based systems, a good skill set to have for digital library and linked data work, not just for BIBFRAME. Others are taking Zepheira’s Linked Data and BIBFRAME Practical Practitioner Training course.  The Music Library Association’s Bibliographic Control Committee has created a BIBFRAME Task Force focusing on how LC’s MARC-to-BIBFRAME converter handles music materials.

Rather than looking at how MARC data looks like in BIBFRAME, people should be thinking about how RDA (Resource Description and Access) works with BIBFRAME. We shouldn’t be too concerned if BIBFRAME doesn’t handle all the MARC fields and subfields, as many are rarely used anyway. See for example Roy Tennant’s “MARC Usage in WorldCat”, which shows the fields and subfields that are actually used in WorldCat, and how they are used, by format. (Data is available by quarters in 2013 and for 1 January 2013 and 1 January 2014, now issued annually.) Caveat: A field/subfield might be used rarely, but is very important when it occurs. For example, a Participant/Performer note (511) is mostly used in visual materials and recordings; for maps, scale is incredibly important. People agreed the focus should be on the most frequently used fields first.

Moving beyond MARC gives libraries an opportunity to identify entities as “things not strings”. RDA was considered “way too stringy” for linked data. The metadata managers mentioned the desire to use various identifiers, including id.loc.gov, FAST, ISNI, ORCID, VIAF and OCLC WorkIDs.  Sometimes transcribed data would still be useful, e.g., a place of publication that has changed names. Many still questioned how authority data fits into BIBFRAME (we had a separate discussion earlier this year on Implications of BIBFRAME Authorities.) Core vocabularies need to be maintained and extended in one place so that everyone can take advantage of each other’s work.

Several noted “floundering” due to insufficient information about how the BIBFRAME model was to be applied. In particular, it is not always clear how to differentiate FRBR “works” from “BIBFRAME “works”. There may never be a consensus on what a “work” is between “FRBR and non-FRBR people”. Concentrate instead on identifying the relationships among entities. If you have an English translation linked to a German translation linked to a work originally published in Danish, does it really matter whether you consider the translations separate works or expressions?

Will we still have the concept of “database of record”? Stanford currently has two databases of record, one for the ILS and one for the digital library. A triple store will become the database of record for materials not expressed in MARC or MODS.  This raised the question of developing a converter for MODS used by digital collections. Columbia, LC and Stanford have been collaborating on mapping MODS to BIBFRAME. Colorado College has done some sample MODS to BIBFRAME transformations.

How do managers justify the time and effort spent on BIBFRAME testing to administrators and other colleagues? Currently we do not have new services built upon linked data to demonstrate the value of this investment. The use cases developed by the Linked Data for Libraries project offers a vision of what could be done, that can’t be now, in a linked data environment. A user interface is needed to show others what the new data will look like; pulling data from external resources is the most compelling use case.

Tips offered:

  • The LC Converter has a steep learning curve; to convert MARC data into BIBFRAME use Terry Reese’s MARCEdit MARCNext Bibframe Testbed–also converts EADs (Encoded Archival Descriptions). See Terry’s blog post introducing the MARCNext toolkit.
  • Use Turtle rather than XML to look at records (less verbose).
  • Use subfield 0 (authority record control number) when including identifiers in MARC access points (several requested that OCLC start using $0 in WorldCat records).

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (51)

FOSS4Lib Upcoming Events: High pitched noise which Replica Cartier Love Bracelet

Tue, 2014-10-21 06:44
Date: Tuesday, October 21, 2014 - 02:45Supports: eXtensible Text Framework

Last updated October 21, 2014. Created by cartierlove on October 21, 2014.
Log in to edit this page.

It includes a mother of pearl dial, luminous hands and indexes plus a date display. There are numerous kinds of couple watches at Price - Angels, including Crystal Dial Stainless Steel Water Resistant Sweethearts Couple Wrist Watch, Fashionable Style Rectangular Dial Crystal Display Stainless Steel Band Couple Watch (Grey). Article Source: is just a review of mens Invicta watches.

PeerLibrary: PeerLibrary Facebook

Tue, 2014-10-21 02:48

Please follow and support us on our recently-opened Facebook page: https://www.facebook.com/PeerLibrary

PeerLibrary: Knight News Challenge Update

Tue, 2014-10-21 02:46

Semifinalists for the Knight News Challenge will be chosen tomorrow and the refinement period will begin. This is your last chance to show your support for our submission before the next stage of the competition. The Knight Foundation is asking “How might we leverage libraries as a platform to build more knowledgeable communities?” We believe that PeerLibrary closely parallels the theme of the challenge and provides an answer to the foundation’s question. By facilitating a community of independent learners and promoting collaborative reading and discussion of academic resources, PeerLibrary is modernizing the concept of a library in order to educate and enrich the global community. Please help us improve our proposal, give us feedback, and wish PeerLibrary good luck in the next stage of the Knight News Challenge.

Peter Murray: Case Studies on Open Source Adoption in Libraries: Koha, CoralERM, and Kuali OLE

Tue, 2014-10-21 02:12


LYRASIS has published three open source software case studies on FOSS4LIB.org as part of its continuation of support and services for libraries and other cultural heritage organizations interested in learning about, evaluating, adopting, and using open source software systems.

With support from a grant from The Andrew W. Mellon Foundation, LYRASIS asked academic and public libraries to share their experiences with open source systems, such as content repositories, integrated library systems, and websites. Of the submitted proposals, LYRASIS selected three concepts for development into case studies from Crawford County Federated Library System (Koha), Fenway Libraries Online (Coral), and the University of Chicago Library (Kuali OLE). The three selected organizations then prepared narrative descriptions of their experience and learning, to provide models, advice, and ideas for others.

Each case study details how the organization handled the evaluation, selection, adoption, conversion, and implementation of the open source system. They also include the rationale for going with an open source solution. The case studies all provide valuable information and insights, including:

  • Actual experiences, both good and bad
  • Steps, decision points, and processes used in evaluation, selection, and implementation
  • Factors that led to selection of an open source system
  • Organization-wide involvement of and impact to staffs and patrons
  • Useful tools created or applied to enhance the open source system and/or expand its functionality, usefulness, or benefit
  • Plans for ongoing support and future enhancement
  • Key takeaways from the process, including what worked well, what didn’t work as planned, and what the organization might do differently in the future

The goal of freely offering these case studies to the public is to help cultural heritage organizations use firsthand experience with open source to inform their evaluation and decision-making process, the same objective of FOSS4LIB.org. While open source software is typically available at no cost, these case studies provide tangible examples of the associated costs, time, energy, commitment and resources required to effectively leverage open source software and participate in the community.

“These three organizations expertly outline the in-depth process of selecting and implementing open source software with insight, humor, candor and clarity. LYRASIS is honored to work with these organizations to share this invaluable information with the larger community,” stated Kate Nevins, Executive Director of LYRASIS. “The case studies exemplify the importance of understanding the options and experiences necessary to fully utilize open source software solutions.”

Link to this post!

Tara Robertson: The Library Juice Press Handbook of Intellectual Freedom

Tue, 2014-10-21 00:28

Ahhhh! It’s done!

This project took over 7 years and went through a few big iterations. I was just finishing library school when it started and learned a lot from the other advisory board members. I appreciate how the much more experienced folks on the advisory board helped bring me up to speed on issues I was less familiar with, and how they treated me, even though I was just a student.

It was published this spring but my copy just arrived in the mail.  Here’s the page about the book on the Library Juice Press site, and here’s where you can order a copy on Amazon.

Tara Robertson: Porn in the library

Tue, 2014-10-21 00:09

At the Gender and Sexuality in Information Studies Colloquium the program session I was the most excited about was Porn in the library. There were 3 presentations in this panel exploring this theme.

First,  Joan Beaudoin and Elaine Ménard presented The P Project: Scope Notes and Literary Warrant Required! Their study looked at 22 websites that are aggregators of free porn clips. Most of these sites were in English, but a few were in French. Ménard acknowledged that it is risky and sometimes uncomfortable to study porn in the academy. They looked at the terminology used to describe porn videos, specifically the categories available to access porn videos. They described their coding manual which outlined  various metadata facets (activity, age, cinematography, company/producers, age, ethnicity, gender, genre, illustration/cartoon, individual/stars, instruction, number of individuals, objects, physical characteristics, role, setting, sexual orientation). I learned that xhamster has scope notes for their various categories (mouseover the lightbulb icon to see).

While I appreciate that Beaudoin and Ménard are taking a risk to look at porn, I think they made the mistake of using very clinical language to legitimize and sanitize their work. I’m curious why they are so interested in porn, but realize that it might be too risky for them to situate themselves in their research.

It didn’t seem like they understood the difference between production company websites and free aggregator sites. Production company sites  have very robust and high quality metadata and excellent information architecture. Free aggregator sites that have variable quality metadata and likely have a business model that is based on ads or referring users to the main production company websites. Porn is, after all, a content business, and most porn companies are invested in making their content findable, and making it easy for the user to find more content with the same performers, same genre, or by the same director.

Beaudoin and Ménard expressed disappointment that porn companies didn’t want to participate in their study. As these two researchers don’t seem to understand the porn industry or have relationships with individuals I don’t think it’s surprising at all. For them to successfully build on this line of inquiry I think they need to have some skin in the game and clearly articulate what they offer their research subjects in exchange for building their own academic capital.

It was awesome to have a quick Twitter conversation with Jiz Lee and Chris Lowrance, the web manager for feminist porn company Pink and White productions,  about how sometimes the terms a consumer might be looking for is prioritized over the  performers’ own gender identity.

Jiz Lee is genderqueer porn performer and uses the pronouns they/them and is sometimes misgendreed by mainstream porn and by feminist porn. I am a huge fan of their work.

I think this is the same issue that Amber Billy, Emily Drabinski and K.R. Roberto raise in their paper What’s gender got to do with it? A critique of RDA rule 9.7. They argue that it is regressive for a cataloguer to assign a binary gender value to an author. In both these cases someone (porn company or consumer, or cataloguer) is assigning gender to someone else (porn performer or content creator). This process can be disrespectful, offensive, inaccurate and highlights a power dynamic where the consumer’s (porn viewer or researcher/student/librarian) desires/politics/needs/worldview is put above someone’s own identity.

Next, Lisa Sloniowski and Bobby Noble. presented Fisting the Library: Feminist Porn and Academic Libraries (which is the best paper title ever). I’ve been really excited their SSHRC funded porn archive research. This research project has become more of a conceptional project, rather than building a brick and mortar porn archive. Bobby talked about the challenging process of getting his porn studies class going at York University. Lisa talked they initially hoped to start a porn collection as part of York University Library’s main collection, not as a reading room or a marginal collection. Lisa spoke about the challenges of drafting a collection development policy and some of the labour issues, presumably about staff who were uncomfortable with porn having to order, catalogue, process and circulate porn. They also talked about the Feminist Porn Awards and second feminist porn conference that took place before the Feminist Porn Awards last year.

Finally, Emily Lawrence and Richard Fry presented Pornography, Bomb Building and Good Intentions: What would it take for an internet filter to work? They presented a philosophical argument against internet filters. They argued that for a filter to not overblock and underblock it would need to be mind reading and fortune telling. A filter would need to be able to read an individual’s mind and note factors like the person viewing, their values, their mood, etc and be fortune telling by knowing exactly what information that the user was seeking  before they looked at it. I’ve been thinking about internet filtering a lot lately, because of Vancouver Public Library’s recent policy change that forbids “sexually explicit images”. I was hoping to get a new or deeper understanding on filtering but was disappointed.

This colloquium was really exciting for me. The conversations that people on the porn in the library panel were having are discussions I haven’t heard elsewhere in librarianship. I look forward to talking about porn in the library more.

Library Tech Talk (U of Michigan): Practical Relevance Ranking for 11 Million Books

Tue, 2014-10-21 00:00
Relevance is a complex concept which reflects aspects of a query, a document, and the user as well as contextual factors. Relevance involves many factors such as the user's preferences, task, stage in their information-seeking, domain knowledge, intent, and the context of a particular search. Tom Burton-West, one of the HathiTrust developers, has been working on practical relevance ranking for all the volumes in HathiTrust for a number of years.

DuraSpace News: NEW tool for Archiving Social Media–Instagram and Facebook

Tue, 2014-10-21 00:00

From Jon Ippolito, Professor of New Media,Director, Digital Curation graduate program, The University of Maine

Orono, ME  Digital conservator Dragan Espenschied and the crew at Rhizome, one of the leading platforms for new media art, have created a tool for archiving social media such as Instagram and Facebook.

Open Knowledge Foundation: Celebrating Open Access Week by highlighting community projects!

Mon, 2014-10-20 16:15

This week is Open Access Week all around the world, and from Open Knowledge’s side we are following up on last year’s tradition by putting together a blog post series to highlight great Open Access projects and activities in communities around the world. Every day this week will feature a new writer and activity.

Open Access Week, a global event now entering its eighth year, is an opportunity for the academic and research community to continue to learn about the potential benefits of Open Access, to share what they’ve learned, and to help inspire wider participation in helping to make Open Access a new norm in scholarship and research.

This past year has seen lots in great progress and with the Open Knowledge blog we want to help amplify this amazing work done in communities around the world:

  • Tuesday, Jonathan Gray from Open Knowledge: “Open Knowledge work on Open Access in humanities and social sciences”
  • Wednesday, David Carroll from Open Access Button: “Launching the New Open Access Button”
  • Thursday, Alma Swan from SPARC Europe: “Open Access and the humanities: on our travels round the UK”
  • Friday, Jenny Molloy from Open Science working group: “OK activities in open access to science”
  • Saturday, Kshitiz Khanal from Open Knowledge Nepal: “Combining Open Science, Open Access, and Collaborative Research”
  • Sunday, Denis Parfenov from Open Knowledge Ireland: “Open Access: Case of Ireland”

We’re hoping that this series can inspire even more work around Open Access in the year to come and that our community will use this week to get involved both locally and globally. A good first step is to sign up at http://www.openaccessweek.org for access to a plethora of support resources, and to connect with the worldwide Open Access Week community. Another way to connect is to join the Open Access working group.

Open Access Week is an invaluable chance to connect the global momentum toward open sharing with the advancement of policy changes on the local level. Universities, colleges, research institutes, funding agencies, libraries, and think tanks use Open Access Week as a platform to host faculty votes on campus open-access policies, to issue reports on the societal and economic benefits of Open Access, to commit new funds in support of open-access publication, and more. Let’s add to their brilliant work this week!

District Dispatch: ALA encouraged by “fair use” decision in Georgia State case

Mon, 2014-10-20 16:13

Georgia State University Library. Photo by Jason Puckett via flickr.

On Friday, the U.S. Court of Appeals for the 11th Circuit handed down an important decision in Cambridge University Press et al. v. Carl V. Patton et al. concerning the permissible “fair use” of copyrighted works in electronic reserves for academic courses. Although publisher’s sought to bar the uncompensated excerpting of copyrighted material for “e-reserves,” the court rejected all such arguments and provided new guidance in the Eleventh Circuit for how “fair use” determinations by educators and librarians should best be made. Remanding to the lower court for further proceedings, the court ruled that fair use decisions should be based on a flexible, case-by-case analysis of the four factors of fair use rather than rigid “checklists” or “percentage-based” formulae.

Courtney Young, president of the American Library Association (ALA), responded to the ruling by issuing a statement.

The appellate court’s decision emphasizes what ALA and other library associations have always supported—thoughtful analysis of fair use and a rejection of highly restrictive fair use guidelines promoted by many publishers. Critically, this decision confirms the importance of flexible limitations on publisher’s rights, such as fair use. Additionally, the appeals court’s decision offers important guidance for reevaluating the lower courts’ ruling. The court agreed that the non-profit educational nature of the e-reserves service is inherently fair, and that that teachers’ and students’ needs should be the real measure of any limits on fair use, not any rigid mathematical model. Importantly, the court also acknowledged that educators’ use of copyrighted material would be unlikely to harm publishers financially when schools aren’t offered the chance to license excerpts of copyrighted work.

Moving forward, educational institutions can continue to operate their e-reserve services because the appeals court rejected the publishers’ efforts to undermine those e-reserve services. Nonetheless, institutions inside and outside the appeals court’s jurisdiction—which includes Georgia, Florida and Alabama—may wish to evaluate and ultimately fine tune their services to align with the appeals court’s guidance. In addition, institutions that employ checklists should ensure that the checklists are not applied mechanically.

In 2008, publishers Cambridge, Oxford University Press, and SAGE Publishers sued Georgia State University for copyright infringement. The publishers argued that the university’s use of copyright-protected materials in course e-reserves without a license was a violation of the copyright law. Previously, in May 2012, Judge Orinda Evans of the U.S. District Court ruled in favor of the university in a lengthy 350-page decision that reviewed the 99 alleged infringements, finding all but five infringements to be fair uses.

The post ALA encouraged by “fair use” decision in Georgia State case appeared first on District Dispatch.

HangingTogether: Evolving Scholarly Record workshop (Part 3)

Mon, 2014-10-20 16:00

This is the third of three posts about the workshop.

Part 1 introduced the Evolving Scholarly Record framework.  Part 2 described the two plenary discussions.  This part summarizes the breakout discussions.

Following the presentations, attendees divided into breakout groups.  There were a variety of suggested topics, but the discussions took on lives of their own.  The breakout discussions surfaced many themes that may merit further attention:

Support for researchers

It may be the institution’s responsibility to provide infrastructure to support compliance with mandates, but it is certainly the library’s role to assist researchers in depositing their content somewhere and to ensure that deposits are discoverable.  We should establish trust by offering our expertise and familiarity with reliable external repositories, deposit, compliance with mandates, selection, description …  and support the needs of researchers during and after their projects.  Access to research outcomes involves both helping researchers find and access information they need as inputs to their work and helping them to ensure that their outputs are discovered and accessible by others.  We should also find ways to ensure portability of research outputs throughout a researcher’s career.  We need to partner with faculty and help them take the long view.  We cannot do this by making things harder for the researcher, but by making it seamless, building on the ways they prefer to work.

Adapting to the challenge

We need to retool and reskill to add new expertise:  ensuring that processes are retained along with data, promoting licenses that allow reusability, thinking about what repositories can deliver back to research, and adding developers to our teams.  When we extend beyond existing library standards, we need to look elsewhere to see what we can adopt rather than create.  We need to leverage and retain the trust in libraries, but need resources to do the work.  While business models don’t exist yet, we need to find ways to rebalance resources and contain costs.  One of the ways we might do that is to build library expertise and funding into the grant proposal process, becoming an integral part of the process from inception to dissemination of results.

Selection

Academic libraries should first collect, preserve, and provide access to materials created by those at their institution.  How do libraries put a value on assets (to the institution, to researchers, to the wider public)?  Not just outputs but also the evidence-base and surrounding commentary.  What should proactively be captured from active research projects?   How many versions should be retained?  What role should user-driven demand play?  What is needed to ensure we have evidence for verification and retain results of failed experiments?  What need not be saved (locally or at all)?  When is sampling called for?  What about deselection?  While we can involve researchers in identifying resources for preservation, in some cases we may have to be proactive and hunt them down and harvest them ourselves.

Sensitivity

Competitiveness (regarding tenure, reputation, IP, and scooping) can inhibit sharing.  Timing of data sharing can be important, sometimes requiring an embargo.  Privacy issues regarding research subjects must be considered.  Researchers may be sensitive about sharing “personal” scientific notes – or sharing data before their research is published.  Different disciplines have different traditions about sharing.

Collaboration with others in the university

Policy and financial drivers (mandates, ROI expectations, reputation and assessment) will motivate a variety of institutional stakeholders in various ways.  How can expertise be optimized and duplication be minimized?  Libraries can’t change faculty behaviors, so need to join together with those with more influence.  When Deans see that libraries can address parts of the challenge, they will welcome involvement.  When multiple units are employing different systems and services, IT departments and libraries may become key players.  There are limits to institutional capacity, so cooperating with other institutions is also necessary.

Collaboration with other stakeholders in a distributed archive across publishers, subjects, nations

The variety and quantity of information now making up the scholarly record is far greater than it used to be and publishers are no longer the gatekeepers of the scholarly record.  This is a time to restructure the role of libraries vis-à-vis the rest of the ecosystem.  We need to function as part of a ecosystem that includes commercial and governmental entities.  We need to act locally, think globally, employing DOIs and Researcher IDs to interoperate with other systems and to appeal to scholars.  Help researchers negotiate on IP rights, terms of use, privacy, and other issues when engaging with environments like GitHub, SlideShare, and publishers’ systems, being aware that, while others will engage with identifiers, metadata systems, discovery, etc., few may be committed to preservation.  How do we decide who are trustworthy stewards and what kinds of assurances are needed?

Technical approaches

We need to understand various solutions for fixity, versioning, and citation. We need to accommodate persistent object identifiers and multiple researcher name identifiers.  We need to explore ways to link the various research materials related to the same project.  We need to coordinate metadata in objects (e.g., an instrument’s self-generated metadata) with metadata about the objects and metadata about the context).  Embedded links need to be maintained.  Campus systems may need to interoperate with external systems (such as SHARE).  We should help find efficient metrics for assessing researcher impact and enhancing institutional reputation.  We should consider collaborating on processes to capture content from social media.  In doing these things we should be contributing to developing standards, best practices, and tools.

Policy issues

What kinds of statements of organizational responsibility are needed:  a declaration of intent covering what we will collect, a service agreement covering what services we will provide to whom, terms of use, and explicit assertions about which parts of the university are doing what?  Are there changes to copyright needed; does Creative Commons licensing work for researchers?  What about legal deposit for digital content, based on the print model?  What happens when open access policies conflict with publisher agreements?

 

Attendees of the workshop feel that stewardship efforts will evolve from informal to more formal.  Mandates, cost-savings, and scale will motivate this evolution.  It is a common good to have demonstrable historical record to document what is known, to protect against fraud, and for future research to build upon.  Failure to act is a risk for libraries, for research, and for the scholarly record.

Future Evolving Scholarly Record workshops will expand the discussion and contribute to identifying topics for further investigation.  The next scheduled workshops will be in Washington DC on December 10, 2014 and in San Francisco on June 4, 2015.  Watch for more details and for announcements of other workshops on the OCLC Research events page.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (36)

David Rosenthal: Journal "quality"

Mon, 2014-10-20 15:00
Anurag Acharya and co-authors from Google Scholar have a pre-print at arxiv.org entitled Rise of the Rest: The Growing Impact of Non-Elite Journals in which they use article-level metrics to track the decreasing importance of the top-ranked journals in their respective fields from 1995 to 2013. I've long argued that the value that even the globally top-ranked journals add is barely measurable and may even be negative; this research shows that the message is gradually getting out. Authors of papers subsequently found to be "good" (in the sense of attracting citations) are slowly but steadily choosing to publish away from the top-ranked journals in their field. You should read the paper, but below the fold I have some details.

Acharya et al:
attempt to answer two questions. First, what fraction of the top-cited articles are published in non-elite journals and how has this changed over time. Second, what fraction of the total citations are to non-elite journals and how has this changed over time. For the first question they observe that:
The number of top-1000 papers published in non-elite journals for the representative subject category went from 149 in 1995 to 245 in 2013, a growth of 64%. Looking at broad research areas, 4 out of 9 areas saw at least one-third of the top-cited articles published in non-elite journals in 2013. For 6 out of 9 areas, the fraction of top-cited papers published in non-elite journals for the representative subject category grew by 45% or more. and for the second that:
Considering citations to all articles, the percentage of citations to articles in non-elite journals went from 27% in 1995 to 47% in 2013. Six out of nine broad areas had at least 50% of citations going to articles published in non-elite journals in 2013.They summarize their method as:
We studied citations to articles published in 1995-2013. We computed the 10 most-cited journals and the 1000 most-cited articles each year for all 261 subject categories in Scholar Metrics. We marked the 10 most-cited journals in a category as the elite journals for the category and the rest as non-elite. In a post to liblicense, Ann Okerson asks:
  • Any thoughts about the validity of the findings? Google has access to high-quality data, so it is unlikely that they are significantly mis-characterizing journals or papers.They examine the questions separately in each of their 261 subject categories, and re-evaluate the top-ranked papers and journals each year.
  • Do they take into account the overall growth of article publishing in the time frame examined? Their method excludes all but the most-cited 1000 papers in each year, so they consider a decreasing fraction of the total output each year:
    • The first question asks what fraction of the top-ranked papers appear in top-ranked journals, so the total volume of papers is irrelevant.
    • The second question asks what fraction of all citations (from all journals, not just the top 1000) are to top-ranked journals. Increasing the number of articles published doesn't affect the proportion of them in a given year that cite top-ranked journals.
  • What's really going on here? Across all fields, the top-ranked 10 journals in their respective fields contain a gradually but significantly decreasing fraction of the papers subsequently cited. Across all fields, a gradually but significantly decreasing fraction of citations are to the top-ranked 10 journals in their respective fields.  This means that authors of cite-worthy papers are decreasingly likely to publish in, read from, and cite papers in their field's top-ranked journals. In other words, whatever value that top-ranked journals add to the papers they publish is decreasingly significant to authors.
Much of the subsequent discussion on liblicense misinterprets the paper, mostly by assuming that when the paper refers to "elite journals" it means Nature, NEJM, Science and so on. As revealed in the quote above, the paper uses "elite" to refer to the top-ranked 10 journals in each of the individual 261 fields. It seems unlikely that a broad journal such as Nature would publish enough articles in any of the 261 fields to be among the top-ranked 10 in that field. Looking at Scholar Metrics, I compiled the following list, showing all the categories (Scholar Metrics calls them subcategories) which currently have one or more global top-10 journals among their "elite journals" in the paper's sense:
  • Life Sciences & Earth Sciences (general): Nature, Science, PNAS
  • Health & Medical Sciences (general): NEJM, Lancet, PNAS
  • Cell Biology: Cell
  • Molecular Biology: Cell
  • Oncology: Journal of Clinical Oncology
  • Chemical & Material Sciences (general): Chemical Reviews, Journal of the American Chemical Society
  • Physics & Mathematics (general): Physical Review Letters
Only 7 of the 261 categories currently have one or more global top-10 journals among their "elite". Only 3 categories are specific, the other 4 are general. The impact of the global top-10 journals on the paper's results is minimal.

Lets look at this another way. No matter how well their work is regarded by others in their field, researchers in the vast majority of fields have no prospect of ever publishing in a global top-10 journal because those journals effectively don't publish papers in those fields. And if they ever did, the paper is likely to be junk, as illustrated by my favorite example, because the global top-10 journal's stable of reviewers don't work in that field. The global top-10 journals are important to librarians, because they look at scholarly communication from the top down, to publishers, because they are important to librarians so they anchor the "big deals", and to researchers in a small number of important fields. To every one else, they may be interesting but they are not important.

Acharya et al conclude:
First, the fraction of top-cited articles published in non-elite journals increased steadily over 1995-2013. While the elite journals still publish a substantial fraction of high-impact articles, many more authors of well-regarded papers in diverse research fields are choosing other venues.

Second, now that finding and reading relevant articles in non-elite journals is about as easy as finding and reading articles in elite journals, researchers are increasingly building on and citing work published everywhere. Both seem right to me, which reinforces the message that, even on a per-field basis, highly rated journals are not adding as much value as they did in the past (which was much less than commonly thought). Authors of other papers are the ultimate judge of the value of a paper (they are increasingly awarding citations to papers published elsewhere), and of the value of a journal (they are increasingly publishing work that other authors value elsewhere).

Pages