You are here

Feed aggregator

LITA: Learning to Master XSLT

planet code4lib - Fri, 2015-02-06 12:00

This semester, I have the exciting opportunity to work as an intern among the hum of computers and maze of cubicles at Indiana University’s Digital Library Program! My main projects include migrating two existing digital collections from TEI P4 to TEI P5 using XSLT. If you are familiar with XML and TEI, feel free to skim a bit! Otherwise, I’ve included short explanations of each and links to follow for more information.


Texts for digital archives and libraries are frequently marked up in a language called eXtensible Markup Language (XML), which looks and acts similarly to HTML. Marking up the texts allow them to be human- and machine-readable, displayed, and searched in different ways than if they were simply plain text.


The Text Encoding Initiative (TEI) Consortium “develops and maintains a standard for the representation of texts in digital form” (i.e. guidelines). Basically, if you wanted to encode a poem in XML, you would follow the TEI guidelines to markup each line, stanza, etc. in order to make it machine-readable and cohesive with the collection and standard. In 2007, the TEI consortium unveiled an updated form of TEI called TEI P5, to replace the older P4 version.

However, many digital collections still operate under the TEI P4 guidelines and must be migrated over to P5 moving forward. Here is where XSLT and I come in.


eXtensible Stylesheet Language (XSL) Transformations are used to convert an XML document to another text document, such as (new) XML, HTML or text. In my case, I’m migrating from one type of XML document to another type of XML document, and the tool in between, making it happen, is XSLT.

Many utilize custom XSLT to transform an XML representation of a text into HTML to be displayed on a webpage. The process is similar to using CSS to transform basic HTML into a stylized webpage. When working with digital collections, or even moving from XML to PDF, XSLT is an invaluable tool to have handy. Learning it can be a bit of an undertaking, though, especially adding to an already full work week.

I have free time, sign me up!

Here are some helpful tips I have been given (and discovered) in the month I’ve been learning XSLT to get you started:

  1. Register for a tutorial., YouTube, and Oracle provide tutorials to get your feet wet and see what XSLT actually looks like. Before registering for anything with a price, first see if your institution offers free tutorials. Indiana University offers an IT Training Workshop on XSLT each semester.

  1. Keep W3Schools bookmarked.

Their XSLT page acts as a self-guided tutorial, providing examples, function lists, and function implementations. I access it nearly every day because it is clear and concise, especially for beginners.

  1. Google is your best friend.

If you don’t know how to do something, Google it! Odds are someone before you didn’t have your exact problem, but they did have one like it. Looking over another’s code on StackOverflow can give you hints to new functions and expose you to more use possibilities. **This goes for learning every coding and markup language!!

  1. Create or obtain a set of XML documents and practice!

A helpful aspect of using Oxygen Editor (the most common software used to encode in XML) for your transformations is that you can see the results instantly, or at least see your errors. If you have one or more XML documents, figure out how to transform them to HTML and view them in your browser. If you need to go from XML to XML, create a document with recipes and simply change the tags. The more you work with XSLT, the simpler it becomes, and you will feel confident moving on to larger projects.

  1. Find a guru at your institution.

Nick Homenda, Digital Projects Librarian, is mine at IU. For my internship, he has built a series of increasingly difficult exercises, where I can dabble in and get accustomed to XSLT before creating the migration documents. When I feel like I’m spinning in circles, he usually explains a simpler way to get the desired result. Google is an unmatched resource for lines of code, but sometimes talking it out can make learning less intimidating.

Note : If textbooks are more your style, Mastering XSLT by Chuck White lays a solid foundation for the language. This is a great resource for users who already know how to program, especially in Java and the C varieties. White makes many comparisons between them, which can help strengthen understanding.


If you have found another helpful resource for learning and applying XSLT, especially an online practice site, please share it! Tell us about projects you have done utilizing XSLT at your institution!

Open Knowledge Foundation: Open Knowledge Switzerland’s 2014 in review, big plans ahead

planet code4lib - Fri, 2015-02-06 11:27

This is a cross-post from the Open Knowledge Switzerland blog, see the original here.

It has been a big year for us in Switzerland. An openness culture spreading among civil administration, NGOs, SMEs, backed by the efforts of makers, supporters and activists throughout the country, has seen the projects initiated over the past three years go from strength to strength – and establish open data in the public eye.

Here are the highlights of what is keeping us busy – and information on how you can get involved in helping us drive Open Knowledge forward, no matter where you are based. Check out our Storify recap, or German- and French-language blogs for further coverage.

To see the Events Calendar for 2015, scroll on down.

2014 in review #sports

Our hackdays went global, with Milan joining Basel and Sierre for a weekend of team spirit and data wrangling. The projects which resulted ranged from the highly applicable to the ludicrously inventive, and led us to demand better from elite sport. The event was a starting point for the Open Knowledge Sports Working Group, aiming to “build bridges between sport experts and data scientists, coaches and communities”. We’re right behind you, Rowland Jack!


The international highlight of the year was a chance for a sizeable group of our members to meet, interact and make stuff with the Open Knowledge community at OK Festival Berlin. Unforgettable! Later in the year, the Global Open Data Index got journalists knocking on our doorstep. However, the recently opened timetable data is not as open as some would like to think – leading us to continue making useful apps with our own open Transport API, and the issuing of a statement in Errata.


The yearly conference attracted yet again a big crowd of participants to hear talks, participate in hands-on workshops, and launch exciting projects (e.g. Lobbywatch). We got some fantastic press in the media, with the public encouraged to think of the mountains of data as a national treasure. At our annual association meeting we welcomed three new Directors, and tightened up with the Wikimedia community inviting us to develop open data together.


CERN’s launch of an open data portal made headlines around the world. We were excited and more than a little dazzled by what we found when we dug in – and could hardly imagine a better boost for the upcoming initiative Improving data access and research transparency is, indeed, the future of science. Swiss public institutions like the National Science Foundation are taking note, and together we are making a stand to make sure scientific knowledge stays open and accessible on the Internet we designed for it.


Swiss openness in politics was waymarked in 2014 with a motion regarding Open Procurement Data passing through parliament, legal provisions to opening weather data, the City of Zürich and Canton of St.Gallen voting in commitments to transparency, and fresh support for accountability and open principles throughout the country. This means more work and new responsibility for people in our movement to get out there and answer tough questions. The encouragement and leadership on an international level is helping us enormously to work towards national data transparency, step by step.


The Swiss Open Government Data Portal launched at OKCon 2013 has 1’850 datasets published on it as of January 2015, now including data from cantons and communes as well as the federal government. New portals are opening up on a cantonal and city level, more people are working on related projects and using the data in their applications to interact with government. With Open Government Data Strategy confirmed by the Swiss Federal Council in April, and established as one of the six priorities of the federal E-Government action plan, the project is only bound to pick up more steam in the years ahead.


With Open Budget visualisations now deployed for the canton of Berne and six municipalities – including the City of Zurich, which has officially joined our association – the finance interest group is quickly proving that it’s not all talk. Spending data remains a big challenge, and we look forward to continuing the fight for financial transparency. This cause is being boosted by interest and support from the next generation, such as the 29 student teams participating in a recent Open Data Management and Visualization course at the University of Berne.


We may be fast, but our community is faster. Many new open data apps and APIs have been released and enhanced by our community: New open data projects were released by the community: such as and SwissMetNet API, based on just-opened national weather data resulting from a partial revision of the Federal Act on Meteorology and Climatology. Talk about “hold your horses”: a city waste removal schedule app led to intense debate with officials over open data policy, the results making waves in the press and open data developers leading by doing.


An OpenGLAM Working Group started over the summer, and quickly formed into a dedicated organising committee of our first hackathon in the new year. Towards this at least a dozen Swiss heritage institutions are providing content, data, and expertise. We look forward to international participants virtually and on-location, and your open culture data!

What’s coming up in 2015

Even if we do half the things we did in ‘14, a big year is in store for our association. Chances are that it will be even bigger: this is the year when the elections of the Federal Council are happening for the first time since our founding. It is an important opportunity to put open data in the spotlight of public service. And we are going to be busy running multiple flagship projects at the same time in all the areas mentioned.

Here are the main events coming up – we will try to update this as new dates come in, but give us a shout if we are missing something:

Getting involved

So, happy new year! We hope you are resolved to make more of open data in 2015. The hardest part may be taking the first step, and we are here for sport and support.

There is lots going on, and the easiest way to get started is to take part in one of the events. Start with your own neighbourhood: what kind of data would you like to have about your town? What decisions are you making that could benefit from having a first-hand, statistically significant, visually impressive, and above all, honest and critical look at the issue?

Lots is happening online and offline, and if you express interest in a topic you’re passionate about, people are generally quick to respond with invitations and links. To stay on top of things we urge you to join our mailing list, follow us on social media, and check out the maker wiki and forum. Find something you are passionate about, and jump right in! Reach out if you have any questions or comments.

James Cook University, Library Tech: Link Resolution Quirk in ATSM Journals

planet code4lib - Fri, 2015-02-06 05:10
No great drama here, just thought this might interest people who get frustrated with Link Resolvers but don't know who to blame. This issue reported by a staff member who, using Summon, got a 360 Link to: Zhang, F., Guo, S., & Wang, B. (2014). Experimental research on cohesive sediment deposition and consolidation based on settlement column system. Geotechnical Testing Journal, 37(3),

LibraryThing (Thingology): Subjects and the Ship of Theseus

planet code4lib - Thu, 2015-02-05 23:56

I thought I might take a break to post an amusing photo of something I wrote out today:

The photo is a first draft of a database schema for a revamp of how LibraryThing will do library subjects. All told, it has 26 tables. Gulp.

About eight of the tables do what a good cataloging system would do:

  • Distinguishes the various subject systems (LCSH, Medical Subjects, etc.)
  • Preserves the semantic richness of subject cataloging, including the stuff that never makes it into library systems.
  • Breaks subjects into their facets (e.g., “Man-woman relationships — Fiction”) has two subject facets

Most of the tables, however, satisfy LibraryThing’s unusual core commitments: to let users do their own thing, like their own little library, but also to let them benefit from and participate in the data and contributions of others.(1) So it:

  • Links to subjects from various “levels,” including book-level, edition-level, ISBN-level and work-level.
  • Allows members to use their own data, or “inherit” subjects from other levels.
  • Allows for members to “play librarian,” improving good data and suppressing bad data.(2)
  • Allows for real-time, fully reversible aliasing of subjects and subject facets.

The last is perhaps the hardest. Nine years ago (!) I compared LibraryThing to the “Ship of Theseus,” a ship which is “preserved” although its components are continually changed. The same goes for much of its data, although “shifting sands” might be a better analogy. Accounting for this makes for some interesting database structures, and interesting programming. Not every system at LibraryThing does this perfectly. But I hope this structure will help us do that better for subjects.(3)

Weird as all this is, I think it’s the way things are going. At present most libraries maintain their own data, which, while generally copied from another library, is fundamentally siloed. Like an evolving species, library records descend from each other; they aren’t dynamically linked. The data inside the records are siloed as well, trapped in a non-relational model. The profession that invented metadata, and indeed invented sharing metadata, is, at least as far as its catalogs go, far behind.

Eventually that will end. It may end in a “Library Goodreads,” every library sharing the same data, with global changes possible, but reserved for special catalogers. But my bet is on a more LibraryThing-like future, where library systems will both respect local cataloging choices and, if they like, benefit instantly from improvements made elsewhere in the system.

When that future arrives, we got the schema!

1. I’m betting another ten tables are added before the system is complete.
2. The system doesn’t presume whether changes will be made unilaterally, or voted on. Voting, like much else, existings in a separate system, even if it ends up looking like part of the subject system.
3. This is a long-term project. Our first steps are much more modest–the tables have an order-of-use, not shown. First off we’re going to duplicate the current system, but with appropriate character sets and segmentation by thesaurus and language.

Coral Sheldon-Hess: Two keys to a failed CS education

planet code4lib - Thu, 2015-02-05 23:48

I think there are two keys to why I was a successful electrical engineer, when I did not (initially) succeed as a computer scientist—despite being more interested in the latter, to begin with, and despite wanting to pursue the latter now.

The first key: invisible struggle, no displays of fallibility

I went to the University of Virginia as an undergrad. I transferred into the Engineering School a year in, which put me approximately one semester behind my peers. I chose Electrical Engineering (EE) instead of Computer Science (CS), even though it was a CS major who convinced me to switch. You see, I fell for a lot of the misconceptions laid out in Unlocking the Clubhouse: despite evidence to the contrary (I earned a high enough grade in the class to be hired as a teaching assistant for CS 101 in my second semester of college), I didn’t believe I could compete* with people who had been programming for their whole lives; and I vastly over-estimated how many people really fell into that bucket.

Also, because nobody told me that programming is hard for everyone when they start, I didn’t think CS was a field where I could be successful. I didn’t see everyone around me struggling, the way I did in my first EE class (which, to be fair, was pretty hellish).

Classic blunders.

I think that points to an important difference between my EE and CS education: I saw other EEs fail as often as I failed. Although it sounds that way, this isn’t me being modest, or feeling like an impostor, or anything else; I worked very hard and did very well. But I also know that a large part of my success comes from my peers and me taking time outside of class to teach each other the things the faculty didn’t see fit to impart; the homework assignments were too hard for us to do otherwise, because we were (deliberately?) not taught how to solve our homework problems in class. This is a common experience in engineering and, apparently, in physics — this Medium article does a fantastic job of explaining what’s wrong with teachers refusing to teach (though it comes with a trigger warning: it was written in the aftermath of a professor sexually harassing students).

So, one key to how my EE experience differs from CS is that I got to see my peers struggling, and it got me past my initial concern that they had all been tearing apart VCRs and putting them back together since the age of 10. (It was a very specific fear: I remember, it was VCRs, specifically, not watches or robots or anything else. Perhaps that points to a lack of imagination on my part.)

In CS, all of the assigned work was individual, and the focus on the school’s Honor Code meant that we were afraid to work together. I saw other CS students in the computer lab, but I didn’t know they were struggling as hard as I was. Even after working as a TA and helping people through their struggles, it took me more than a decade to internalize the fact that CS, like most things, is hard for beginners.

So, key one: In CS I kept believing the “everyone has been programming forever” lie, combined with the “I am not naturally good at this, and other people are” lie. In EE it was actively disproven, pretty much immediately.

The second key: starting with ‘hello world’

Not a great place to start

But there was one other key to my success as an electrical engineering student: I took the “intro to EE for non EEs” course that they were piloting at the time—even though, unfortunately (for them), most of my colleagues didn’t join me in taking it. In that class we got an introduction to the broader field, with short descriptions of the various sub-fields of EE and beginner-level introductions to concepts we would later be taking in-depth classes on. The portion of the class dealing with information theory and signal processing gave me the background to understand several really difficult subjects when they were introduced (poorly) in 300-level classes, and that confidence (bolstered by the experience of explaining it to some of my peers) ultimately led me to double-specialize in “Communications” (by which I mean wireless engineering, signal processing, etc.), along with “Computers/Digital” (processor and chip design, etc.).

I would probably not have become a wireless engineer without that experience.

CS, on the other hand, had nothing like that. CS 101 was “Hey, here’s how you program really simple stuff in C++. Also, ignore half of what you’re typing.” It wasn’t “Here are the sub-fields of computer science,” or “Here are introductory-level explanations of some of the important stuff we’ll talk about later,” both of which would have been better.

CS 101 should be an introduction to the field of computer science and computer programming, not a first programming course. It should consist of a little Boolean logic, maybe some control flow (i.e. loops), and some basic information about data structures; then, “here’s what an algorithm is”; then, some high-level information about computer networks; then, maybe slip in something about software testing and/or version control; and, finally, it should definitely include an exploration of the differences between web programming, DevOps, middleware, and math-heavy CS research. Not only would that class help people understand the field and how they might like to be part of it; it would also improve interview questions, later on. (Seriously, front-end developers don’t need to know how to implement QuickSort!)

There are lots of important changes we should be making to the way CS is taught, but when we’re looking at how to find and retain students for a four-year major, I think adding a high-level class before beginning programming would help tremendously. It’s certainly better than the then-popular (and, I sincerely hope, now-outmoded) practice of making the second programming course into a “weeding” class—a course so hard that half the students quit or fail, then change majors. And I think that, in the process of designing the intro course’s curriculum, the CS faculty might find themselves rethinking the whole major. So, yes, you could say I’m proposing a band-aid, and I agree; but it might also be a first step to structural change.

*In an environment where grades are issued on a curve, education is a competition. Assignments and tests were so hard at UVA’s engineering school that one time I got 38% on a midterm, and that translated to an A. (back)


John Miedema: Lila is cognitive writing technology built on top of software like Evernote. Key differences.

planet code4lib - Thu, 2015-02-05 21:54

Writers everywhere benefit from content management software like Evernote. Evernote can collect data from multiple devices and locations and organize it into a single writing repository. Evernote is beautiful software. For the last few years, I have been using Google Drive to collect notes. Recently I tried Evernote again, and I am impressed enough to switch. Notebooks, tags, collaboration, web clipping, related searches. All very nice.

Lila is cognitive writing technology built on top of software like Evernote. Here are some key differences between the products:

1. Evernote users read long-form content manually, decide if it is relevant, and then write notes to integrate it into their project. Lila will pre-read content for users and embed relevant notes (slips) in the context of the user’s writing. This will save the writer lots of reading and evaluation time.

2. Evernote users get “related searches” from a very limited number of web sources. Lila will perform open web searches for related content.

3. Evernote users can visualize a limited number of connections between notes. I am yet to get any utility out of this. Lila will use natural language search to generate a vast number of connections between notes, allowing a user to quickly understand complex relationships between notes.

4. Evernote users can use tags to construct a hierarchical organization of content. Notebooks can only have one sub-level of categorization, essentially chapters, but many writers need additional levels of classification. Tags can be ordered hierarchically and if you prefix them with a number they will sort in a linear order. You can use tags for hierarchical classification but it creates problems.

  • If you want both categories and tags, you will have to use a naming convention to split tags into two types.
  • Numbering tags causes them to lose type-ahead look-up functionality, i..e, you have to start by typing the number. It is a problem because numbers can be expected to change often.
  • If you decide to insert a category in the middle of two tags, you have to manually re-number all the tags below.
  • Tags are shared between Notebooks. Maybe that works for tags? Not for hierarchical sectioning of a single work.

None of these problems are technically insurmountable. I hope Evernote comes out with enhancements soon. I would like to build Lila on top of Evernote. Lila has something to add. To be cognitive means an inherent ability to automate hierarchical classification. Lila will be able to suggest hierarchical views, different ways of understanding the data, different choices for what could be a table of contents.

Roy Tennant: Mea Culpa: Wikipedia Comes Through

planet code4lib - Thu, 2015-02-05 21:37
Recently I wrote about a Wikipedia issue based on the best information I had at the time. I know better than to rely upon sources that are irreputable, but I clearly made a mistake relying upon the reporting of The Guardian. Normally, I would check the actual source of the information, but I find Wikipedia’s management machinations to be difficult to find and parse. In short, I came to the wrong conclusion and this post is my attempt to set things right. In doing so, I am quoting heavily from a message from the Wikimedia Foundation: Now that the case has closed, you may be interested in the Wikimedia Foundation’s summary of the issue:

Last night [some days ago now], the Arbitration Committee for English Wikipedia reached a final decision on the case: The Committee chose to issue one complete site ban for a male editor, citing a pattern of disruptive behavior that included more than 20 lesser sanctions since 2006. No other Wikipedia editors received site-wide bans.

We can confirm that in addition to a single site-wide ban, the Committee issued and endorsed nearly 150 warnings, sanctions, or topic bans to other editors from various sides of the case. We can clarify that of the eleven Committee-issued topic bans, only one was applied to an editor who identifies as female. All of the sanctioned editors have the right to appeal in the future: over the years, the Committee has approved appeals if they are found to no longer be necessary.

Some reporting portrayed this case as a referendum on Gamergate itself, or as a purge of women or feminist voices from Wikipedia. That mischaracterizes the case, the role of the volunteer Arbitration Committee, and the nature of their findings.

 The Committee does not consider the content of articles, it only focuses on the behavior of editors. This decision was also not a purge. Only one user has been removed from Wikipedia. Finally, it is not intended as a referendum on Gamergate — what is right, what is wrong, and its place in broader discourse — and should not be understood that way. That discussion may be necessary, but it is better suited for another forum.

Wikipedia is an encyclopedia. It is also the largest free knowledge resource in human history — and it is written by people from all over the world, often from very different backgrounds, who may hold differing points of view. This is made possible thanks to a fundamental principle of mutual respect: respectful discourse, and respect for difference and diversity.

 The Wikimedia Foundation offers resources for programs and outreach with our partners across the global Wikimedia movement, and engage people that have been underrepresented in traditional encyclopedias. These include women, people of color, people from the Global South, immigrant communities, and members of the LGBTQ community. They are invaluable contributors to our community and partners in our mission.

For Wikipedia to represent the sum of all knowledge, it has to be a place where people can collaborate and disagree constructively even on difficult topics. It has to be a place that is welcoming for all voices. This is essential to ensuring people are free to to focus on being creative and constructive, and contributing to this remarkable collective human achievement.

For more on our stance on this issue, please see a blog post we released this week:

I am sorry for jumping the gun on this, and I am deeply sorry for mischaracterizing the situation. I hope that this post can help to rectify any damage I may have unwittingly caused. Mea culpa.

District Dispatch: FOIA is heating up!

planet code4lib - Thu, 2015-02-05 20:45

On Monday of this week, legislators introduced two bipartisan Freedom of Information Act (FOIA) bills in both the U.S. House (H.R. 653) and the U.S. Senate (S.337). Representative Darrell Issa (R-CA) introduced H.R. 653, with Elijah Cummings (D-MD) and Mike Quigley (D-IL) cosponsoring. The bill was referred to the House Committee on Oversight and Government Reform.

Photo by Craig Kohtz via Flickr

Action in the Senate was slightly more interesting; not only did Senator John Cornyn (R-Tx) introduce S. 337 with Patrick Leahy (D-Vt) and Charles Grasssley (R-LA)—the ranking member and chair of the Senate Judiciary Committee cosponsoring—but the Senate Judiciary Committee today passed the bill out of committee!

Earlier today, ALA joined with forty-six other groups to state our support for these bills (pdf) and to thank these men for introducing them. As the letter states, “Public oversight is critical to ensuring accountability, and the reforms embodied in both the FOIA Oversight and Implementation Act (H.R. 653), introduced by Representatives Issa and Cummings, and the FOIA Improvement Act of 2015 (S.337), introduced by Senators Cornyn and Leahy, are necessary to enable that oversight.”

It’s exciting to have this legislation be introduced and move so early in the 114th Congress and we will keep you informed as things move forward!

The post FOIA is heating up! appeared first on District Dispatch.

Cynthia Ng: Presentation: Making Accessible Content Easy and Part of Your Work

planet code4lib - Thu, 2015-02-05 18:41
This was originally presented as an online workshop for the Center for Instructional Development & Distance Education, University of Pittsburgh on February 5, 2015. If you’re familiar with my presentations, everything up to and including the introduction to universal design is quite similar to my past presentations (especially the most recent one). However, after that, … Continue reading Presentation: Making Accessible Content Easy and Part of Your Work

District Dispatch: It’s a Big Deal: FCC Chairman outlines strong network neutrality protections

planet code4lib - Thu, 2015-02-05 18:21

Today Federal Communications Commission (FCC) Chairman Tom Wheeler will circulate his network neutrality proposal to fellow Commissioners in preparation for a February 26 vote. While we can’t read the detailed draft as it is not yet public, the Chairman did outline his plans in a Wired op-ed and fact sheet released yesterday. To paraphrase our Vice President, this is a Big Deal.

FCC Building in Washington, D.C.

“I am submitting to my colleagues the strongest open Internet protections ever proposed by the FCC. These enforceable, bright-line rules will ban paid prioritization, and the blocking and throttling of lawful content and services,” Chairman Wheeler writes.

Today, the American Library Assicuation (ALA) President Courtney Young responded: “I am very pleased that Chairman Wheeler’s outlined proposal matches the network neutrality principles ALA and nearly a dozen library and higher education groups called for last July. America’s libraries collect, create and disseminate essential information to the public over the Internet, and enable our users to create and distribute their own digital content and applications. Network neutrality is essential to meeting our mission in serving America’s communities and preserving the Internet as a platform for free speech, innovation, research and learning for all.”

In a nutshell, the proposal:

  • Asserts FCC authority under both Title II of the Communications Act and Section 706 of the Telecommunications Act of 1996 to provide the strongest possible legal foundation for network neutrality rules;
  • Applies network neutrality protections to both fixed and mobile broadband (which the ALA, Association of Research Libraries and EDUCAUSE advocated for—unsuccessfully—in the 2010 Open Internet Order and in our most recent filings);
  • Prohibits blocking or degrading access to legal content, applications, services and non-harmful devices; as well as banning paid prioritization, or favoring some content over other traffic;
  • Allows for reasonable network management while enhancing transparency rules regarding how Internet service providers (ISPs) are doing this;
  • Creates a general Open Internet standard for future ISP conduct;
  • Identifies major provisions of Title II that will apply and others that will be subject to forbearance (i.e., not enforced).

Among the provisions that will be enforced are sections that assert no “unjust and unreasonable practices” (Sections 201 and 202), protect consumer privacy (Section 222), protect people with disabilities (Sections 225 and 226) and parts of Section 254, which includes the E-rate program and other Universal Service Fund (USF) programs. After the recent successful completion of E-rate program modernization to better enable affordable access to high-capacity broadband through libraries and schools, ALA has a particular interest in safeguarding FCC authority related to the Universal Service Fund. We agree the new Order should not automatically apply any new USF fees, but we would like to better understand how a partial application of Section 254 will work in practice. We’re reaching out to the FCC on this question now.

As always, more information on libraries and network neutrality is available on the ALA website and we’ll keep blogging here on the Dispatch.

The post It’s a Big Deal: FCC Chairman outlines strong network neutrality protections appeared first on District Dispatch.

Open Knowledge Foundation: Open Knowledge Belgium: Bringing Together Open Communities, Policy Makers & Industry

planet code4lib - Thu, 2015-02-05 16:02

Open Knowledge Belgium to host The Second Edition of Open Belgium in Namur on Feb 23rd, 2015! Register Today!

On 23 February, Open Knowledge Belgium is hosting the second edition of Open Belgium, an event expected to attract over 200 people, coming together to learn and discuss the growing open knowledge movement in Belgium. This year Open Knowledge Belgium is hosting the conference, together with our Walloon colleagues and partners, at the Palais des Congrès in Namur.

OpenBelgium 2015 Teaser from Open Knowledge Belgium on Vimeo.

The jam-packed programme is not to be missed! With over 35 speakers, the objective of the day is unpack challenges, explore opportunities and learn about technological developments as they relate to Open Data and Open Knowledge. The event presents an ideal opportunity to exchange best practices with national and international experts.

The conference program includes:

The conference will open with a panel discussion on the state-of-play of open data and open knowledge in Belgium, followed by a series of keynote talks and eight participatory workshops!

State-of-play Session

A panel discussion on Open data in Belgium, with representatives from the federal and regional governments.

A Series of Keynotes

  1. Jörgen Gren of DG Connect on the future of Open Data in Europe
  2. Dimitri Brosens of the institute of Nature and Forests (INBO) becoming an open research institut
  3. Thomas Hermine (Nextride) and Antoine Patris (TEC) on how opening up Walloon public transport data offers new opportunities and economic value.

Eight Participatory Workshops:

Following the keynotes, participants will have the opportunity to participate in eight workshops focused on specific themes and organised by national and international experts.

  1. Open Transport, from data source to journey planner (moderated by Pieter Colpaert)
  2. Open Culture, tackling barriers with benefits (Barbara Dierickx)
  3. Open Tools, using tools to release the full Open Data potential (Philippe Duchesne)
  4. Open Tourism, the importance of framing the scheme online efforts (Raf Buyle)
  5. OpenStreetMap, the importance of working with communities (Ben Abelshausen)
  6. Open Science, going beyond open access (Gwen Franck)
  7. Local Open Data efforts in Belgium (Wouter Degadt)
  8. Emerging Open Data business models (Tanguy De LESTRE).

Open Knowledge Belgium will close the day with networking drinks on a rooftop terrace overlooking the city of Namur.

View the full programme and all the speakers on the website.

Practical information and registration

  • Date and Location: Monday, February 23, 2015 in [Namur Palais des Congrès](
  • Admission: € 130 – [Register online](
  • Contact the organisers:

DPLA: DPLA at Code4Lib 2015

planet code4lib - Thu, 2015-02-05 16:00

Code4Lib is an annual, volunteer-organized conference focused on the intersection of technology and cultural heritage. DPLA is participating heavily in  Code4Lib 2015, taking place on February 9 – 12 in Portland, Oregon. Here’s a handy guide detailing some of the key places they’ll be and how you can connect with them.

  • Monday, February 9 (9 AM – noon): Tom Johnson (DPLA Metadata and Platform Architect) will lead a Linked Data Workshop with Karen Estlund (University of Oregon).
  • Monday, February 9 (1:30 – 4:30 PM): Tom Johnson, Mark Matienzo (DPLA Director of Technology), Mark Breedlove (DPLA Technology Specialist), Audrey Altman (DPLA Technology Specialist), Gretchen Gueguen (DPLA Data Services Coordinator), and Amy Rudersdorf (DPLA Assistant Director for Content) will lead an introductory workshop on the DPLA API.
  • Wednesday, February 11 (4:30 PM): Audrey Altman, Mark Breedlove, and Gretchen Gueguen will present on DPLA’s new ingestion system. The presentation is entitled, “Heiðrún: DPLA’s Metadata Harvesting, Mapping and Enhancement System.”

Beyond these formal opportunities to connect, these folks are eager to chat and answer questions about timely topics including the Community Reps application, DPLAfest 2015, and DPLA’s recent work upgrading its ingestion system.

In addition to staff participation, and in keeping with DPLA’s broader commitment to diversity, DPLA has also supported Code4Lib 2015 by helping to sponsor one of the Code4Lib 2015 Diversity Scholarships as part of the Code4Lib community.

Questions about where specific DPLA staffers will be at Code4Lib 2015? Drop one of us a line!

David Rosenthal: Disk reliability

planet code4lib - Thu, 2015-02-05 16:00
Two recent publications about disk reliability are of considerable interest. Continuing their exemplary tradition of transparency, Backblaze updated their 2013 report on their experience of disk failures with a report on 2014, and the raw data and a set of FAQs. And J-F Paris et al published Self-Repairing Disk Arrays. Below the fold, thoughts on the relationship between these two.

Backblaze now have over 41K drives ranging from 1.5TB to 6TB spinning. Their data for a year consists of 365 daily tables each with one row for each spinning drive, so there is a lot of it, over 12M records. The 4TB disk generation looks good:
We like every one of the 4 TB drives we bought this year. For the price, you get a lot of storage, and the drive failure rates have been really low. The Seagate Desktop HDD.15 has had the best price, and we have a LOT of them. Over 12 thousand of them. The failure rate is a nice low 2.6% per year. Low price and reliability is good for business.
The HGST drives, while priced a little higher, have an even lower failure rate, at 1.4%. It’s not enough of a difference to be a big factor in our purchasing, but when there’s a good price, we grab some. We have over 12 thousand of these drives.Its too soon to tell about the 6TB generation:
Currently we have 270 of the Western Digital Red 6 TB drives. The failure rate is 3.1%, but there have been only 3 failures. ... We have just 45 of the Seagate 6 TB SATA 3.5 drives, although more are on order. They’ve only been running a few months, and none have failed so far.What grabbed all the attention was the 3TB generation:
The HGST Deskstar 5K3000 3 TB drives have proven to be very reliable, but expensive relative to other models (including similar 4 TB drives by HGST). The Western Digital Red 3 TB drives annual failure rate of 7.6% is a bit high but acceptable. The Seagate Barracuda 7200.14 3 TB drives are another story.Their 1163 Seagate 3TB drives with an average age of 2.2 years had an annual failure rate (AFR) over 40% in 2014. Backblaze's economics mean that they can live with a reasonably high failure rate:
Double the reliability is only worth 1/10th of 1 percent cost increase. ...

Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.

The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).

Moral of the story: design for failure and buy the cheapest components you can. :-)40% AFR is really high, but labor to replace the failed drives would still have cost less than $8/drive. The cost isn't the interesting aspect of this story. The drives would have failed at some point anyway, incurring the replacement labor cost. The 40% AFR just meant the labor cost, and the capital cost of new drives, was incurred earlier than expected, reducing the return on the investment in purchasing those drives.

Alas, there is a long history of high failure rates among particular batches of drives. An experience similar to Backblaze's at Facebook is related here, with an AFR over 60%. My first experience of this was nearly 30 years ago in the early days of Sun Microsystems. Manufacturing defects, software bugs, mishandling by distributors, vibration resonance, there are many causes for these correlated failures. It is the correlated failures that make the interesting connection with the Self-Repairing Disk Arrays paper.

The first thing to note about the paper is that Paris et al are not dealing with Backblaze-scale arrays:
These solutions are not difficult to implement in installations that have trained personnel on site round-the-clock. When this is not the case, disk repairs will have to wait until a technician can service the failed disk. There are two major disadvantages to this solution. First, it introduces an additional delay, which will have a detrimental effect on the reliability of the storage system. Second, the cost of the service call is likely to exceed that of the equipment being replaced.4-slot DroboThe first problem with the paper is that there has been a technological solution to this problem for a decade since Data Robotics (now Drobo) introduced the Drobo. I've been using them ever since. They are available in configurations from 4 to 12 slots and in all cases when a drive fails the light by the slot flashes red. All that is needed is to pull out the failed drive and push in a replacement disk the same size or bigger. The Drobo's firmware handles hot-swapping and recovers the failed drive's data with no human intervention. No technician and much less than 15 minutes per drive needed.

The second problem is that although the paper's failure model is based on 2013 failure data from Backblaze, it appears to assume that the failures are uncorrelated. The fact that errors in storage systems are correlated has been known since at least the work of Talagala at Berkeley in 1999. Correlated failures such as those of the 3TB Seagate drives at Backblaze in 2014 would invalidate the paper's claim that:
we have shown that several complete two-dimensional disk arrays with n parity disks, n ( n– 1)/2 data disks, and less than n ( n + 1)/2 data disks could achieve a 99.999 percent probability of not losing data over four years.A 99.999 percent probability would mean that only 1 in 100,000 arrays would lose data in 4 years. But the very next year's data from their data source would probably have caused most of the arrays to lose data. When designing reliable storage, the failure model needs to be pessimistic, not average. And it needs to consider correlated failures, which is admittedly very hard to do.

HangingTogether: The scholarly record: a view from the campus

planet code4lib - Thu, 2015-02-05 15:59

[Thanks to Geneva Henry, University Librarian and Vice Provost for Libraries at the George Washington University, for contributing this guest blog post.]

Geneva Henry, George Washington University

While many may think of the scholarly record as the products surrounding scholarly works that are eventually disseminated, usually through publications, there is another aspect to the scholarly record that people at academic institutions – especially administrators – care about. This can be thought of as the campus scholarly record that frames the identity of an institution. In considering this perspective, there is an even more compelling reason to consider how the many activities surrounding scholarly dissemination are captured and managed. The libraries at academic institutions are arguably the obvious leaders to assume responsibility for managing these resources; libraries have been the stewards of the scholarly record for a very long time.  But librarians must now recognize the changing nature of the elements of that record and take a proactive role in its capture and preservation. Moreover, they have a responsibility to the many campus stakeholders who have an interest in these resources for differing and sometimes conflicting purposes.

Research activities and early dissemination of findings have changed with the proliferation of social media and the Web. Scholars can exchange information via blog posts, twitter messages, Facebook posts and every other means of social media available, with feedback from colleagues helping to refine the final formal publication. The traditional methods of peer review are now being further enhanced through web-based prepublications and blogs where reviewers from anywhere can provide less formal feedback to authors. For an increasing number of scholars, social media is the new preprint. Data is posted and shared, comments are exchanged, methods are presented and questioned, revisions happen and the process can continue, even after the “formal” publication has been released in a more traditional form. This requires librarians to think about how they’re preserving their websites and social media outputs that now need to be part of the scholarly record as well as the overall campus record of scholarship.

The campus is full of stakeholders who have an interest in this new, constantly evolving record. Some would like all of this information fully exposed to publicize the work being done, while others feel that there are limits to how much should be made available for everyone to view. Systems such as VIVO and Elements provide platforms that will highlight faculty activities to provide more visibility into the research activities on campus. Sponsored research offices want insights into what people are doing so that they can match research opportunities with relevant researchers and help with identifying partners at other institutions. Media relations staff want to identify experts as media inquiries come in related to current issues happening in the world. Academic departments are interested in showcasing the scholarly record of their faculty in order to attract more graduate students and new faculty to their departments. Promotion and tenure committees want a full understanding of all of the activities of faculty members, including their service activities; increasingly, social media is blurring the line between scholarship and service as one feeds into the other.

Faculty members, the source of creating these resources, are understandably confused. Their attitudes and perceptions range from excited to worried, from protective to open. Their activities on social media do not always relate cleanly to a single scholarly record and will often be mixed with personal, non-scholarly information they may not want the world to see (e.g. pictures of their dinner, political commentaries, stories of their family vacation). This mixed landscape helps to fuel the legal concerns of an institution’s general counsel and the image consciousness of the public relations folks who are cautious about what might end up in the public with the exercising of academic freedom.

Circling back, now, to the library as the logical keeper of the academic record, it is important to realize that there is a vast range of stakeholders that the records serve. These stakeholders become partners with the library in helping to determine what information will be kept, what will be exposed and what needs to remain in restricted access. Partnerships with campus IT units that manage security and authoritative feeds from enterprise systems are critical. Sometimes some stakeholders will ask that exposed information be “redacted” from its online availability and librarians must be able to intelligently communicate the limits of successfully removing this from the world wide web.

The change in the scholarly record raises many questions and will continue to present challenges for libraries and academic institutions. As faculty change institutions, who will be responsible for managing their record of scholarship that is disseminated through social media so that it is preserved long-term? Constantly changing methods for communicating and sharing knowledge will require a scholarly record that can readily accommodate innovations. What will the scholarly record of the future be and what should be captured?  While we don’t have a crystal ball to help with this prediction, we do have a good barometer surrounding us in our libraries everyday: study your students and how they communicate.

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (278)

CrossRef: A FundRef Progress Report

planet code4lib - Thu, 2015-02-05 14:39

In the two years since the launch of FundRef we have been helping participating publishers with their implementations and listening to their feedback. As is often the case with new services, we have found that some of our original assumptions now need tweaking, and so the FundRef Advisory Group (made up of representatives from a dozen or so publishers and funding agencies) has been discussing the next phase of FundRef. I'd like to share some of our findings and proposals for improving the service.

When CrossRef launched FundRef, the FundRef Registry - the openly available taxonomy of standardised funder names that is central to the project - contained around 4,000 international funders. In the past 24 months this has doubled to over 8,000, thanks to input from funders and publishers and the ongoing work of the team at Elsevier who curate and update the list. There are over 170,000 articles with a properly identified funders. Unfortunately, there are also over 400,000 articles with a funder name that hasn't been match to the Registry and doesn't have a Funder Identifier. While a number of publishers are routinely supplying Funder IDs in all of their deposits, some are only managing to supply Funder IDs in as little as 30% of cases. Funder IDs are critical to FundRef - they allow us to collate and display the data accurately. Analysis shows that the deposits we are receiving without IDs fall into roughly three categories:

  1. Funder names that are in the Registry but have not been matched to an ID
  2. Entries into the funder name field that are clearly grant numbers, program names, or free-form text that has been entered or extracted incorrectly
  3. Funders that are not yet listed in the Registry.

At the outset we expected most of the deposits with no IDs to be a result of the third of these use cases. What we are finding, however, is that the vast majority are a result of the first two. Delving into this a little more and talking to publishers about their processes and experiences, we have identified the following reasons:

  • Where authors are asked to input funding data on submission or acceptance of their paper, the margin for error appears to be quite high. They are not used to being asked for this data, and so very clear instructions are needed to stress its importance and ensure that they understand what it is they are being asked for. Authors should be strongly encouraged to pick their funding sources from the list in the FundRef Registry, but presenting a list of 8000+ funder names in a navigable, straightforward way is not without its challenges. Back in 2013 CrossRef worked with UI experts to develop a widget that publishers and their vendor partners could use - either outright or as a guideline - for collecting data from authors. Two years down the line we are reviewing this UI to see how we can further encourage authors to select the canonical funder name and only enter a free-form funder if it is genuinely missing from the Registry. Even with the most intuitive of interfaces, however, some authors will copy and paste an alternative name, or enter a program name instead of a funding body. Editorial and production staff should be aware of FundRef requirements and incorporate this metadata into their routine reviews. 
  • Some publishers have opted to extract funding data from manuscripts instead of asking authors to supply it in a form. This is perfectly acceptable - after all, the information is usually right there in the paper's acknowledgements. However, this process also needs to be accompanied by a certain amount of QA. We are seeing instances of grant numbers being extracted instead of funders, funder names that are concatenated into a single field, and funder names that are 100% accurate but have simply not been matched with IDs ahead of deposit. (In the CrossRef database we currently have 16,989 FundRef deposits that contain the name "National Natural Science Foundation of China" but have no accompanying ID. These are clearly slipping through the QA net.)

So what are we going to do to try and improve things?

Firstly, we are undergoing a review of our own UI and talking with vendors about changes that might encourage better data input by authors. We are also going to find out more about what processes are being undertaken by the publishers that are depositing consistently accurate data, and share these with the publishing community as a set of best practices. Whether publishers are asking authors or extracting the data from manuscripts, an element of QA seems to be critical to ensure the integrity of the data being deposited.

Secondly, we are going to start on some data "tidying" tasks at our end. Traditionally, CrossRef has not altered or corrected any of the data that publishers deposit: we provide error reports and ask that they make the corrections themselves. But with FundRef there seem to be a few quick wins - those 16,000 instances of the Natural National Science Foundation of China could easily and without ambiguity be matched to the correct FundRef ID (, along with other names that have some very obvious minor discrepancies ("&" in place of "and", "US" instead of "U.S."). Cleaning up these deposits and adding the Funder IDs should result in a significant increase in the amount of FundRef data that is retrievable through FundRef Search and our Search API (and by extension, CHORUS Search).

We are also asking publishers to continue to review their own processes and procedures to see where improvements can be made, as the success of FundRef ultimately depends on the quality of the data that is fed into it.

LITA: Share Your Committee and IG Activities on the LITA Blog!

planet code4lib - Thu, 2015-02-05 13:00

The LITA Blog features original content by LITA members on technologies and trends relevant to librarians. The writers represent a variety of perspectives, from library students to public, academic, and special librarians.

The blog also delivers announcements about LITA programming, conferences, and other events, and serves as a place for LITA committees to share information back with the community if they so choose.

Sharing on the LITA blog ensures a broad audience for your content. Four recent LITA blog posts (authored by Brianna Marshall, Michael Rodriguez, Bryan Brown, and John Klima) have been picked up by American Libraries Direct – and most posts have been viewed hundreds of times and shared dozens of times on social media. John Klima’s post on 3D printers has been shared 40 times from the LITA Twitter account and another 40 times directly from the blog (a cumulative record), Bryan Brown’s post on MOOCs has been viewed over 800 times (also a record as of this writing), and Michael Rodriguez’s post on web accessibility was shared over 60 times direct from the blog (another record).

Anyone can write a guest post for the LITA Blog, even non-LITA members, as long as the topic is relevant. Would you like to write a guest post or share posts reflecting the interests of your committee or interest group? Contact blog editor Brianna Marshall at briannahmarshall(at)gmail(dot)com or Mark Beatty at mbeatty(at)ala(dot)org.

Peter Murray: Thursday Threads: Web Time Travel, Fake Engine Noise, The Tech Behind Delivering Pictures of Behinds

planet code4lib - Thu, 2015-02-05 11:21
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

In this week’s DLTJ Thursday Threads: the introduction of a web service that points you to old copies of web pages, dispelling illusions of engine noise, and admiring the technical architecture of Amazon Web Services that gives us the power to witness Kim Kardashian’s back side.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

Introducing the Memento Web Time Travel Service

The Time Travel service helps you find versions of a page that existed at some time in the past. These prior versions of web pages are named Mementos. Mementos can be found in web archives or in systems that support versioning such as wikis and revision control systems.

When you enter the web address of a page and a time in the past, the Time Travel service tries to find a Memento for that page as it existed around the time of your choice. This will work for addresses of pages that currently exist on the web but also for those that have meanwhile vanished.

- About the Time Travel Service, Last updated: 19-Jan-2015

The folks at Los Alamos National Laboratory have been working on web-time-travel for years. What started with browser plugins has now become a web service that can be used to find old copies of web pages found in caches throughout the world. Thought the Internet Archive’s Wayback Machine was the only game in town? Check out the Memento Time Travel service.

America’s best-selling cars and trucks are built on lies: The rise of fake engine noise

Stomp on the gas in a new Ford Mustang or F-150 and you’ll hear a meaty, throaty rumble — the same style of roar that Americans have associated with auto power and performance for decades.

It’s a sham. The engine growl in some of America’s best-selling cars and trucks is actually a finely tuned bit of lip-syncing, boosted through special pipes or digitally faked altogether. And it’s driving car enthusiasts insane.

- America’s best-selling cars and trucks are built on lies: The rise of fake engine noise, by Drew Harwell, The Washington Post, 21-Jan-2014

I knew they were adding “engine noise” to the all-electric Prius car because it was so quiet that it could startle people, but I didn’t know it was happening to so-called “muscle cars”.

A look at Amazon’s world-class data-center ecosystem

Amazon VP and Distinguished Engineer James Hamilton shares what makes the company’s armada of data centers run smoothly.

- A look at Amazon’s world-class data-center ecosystem, by Michael Kassner, TechRepublic, 8-Dec-2014

Among the geek community, there must be some awe at how Amazon seems to create infinitely big data centers that can be used for everything from powering Netflix to this humble blog. Amazon is also notoriously secret about how it does things. This article provides a glimpse into how Amazon Web Services achieves the scale that it does.

How PAPER Magazine’s web engineers scaled their back-end for Kim Kardashian

On November 11th 2014, the art-and-nightlife magazine PAPER “broke the Internet” when it put a Jean-Paul Goude photograph of a well-oiled, mostly-nude Kim Kardashian on its cover and posted the same nude photos of Kim Kardashian to its website (NSFW). It linked together all of these things—and other articles, too—under the “#breaktheinternet” hashtag. There was one part of the Internet that PAPER didn’t want to break: The part that was serving up millions of copies of Kardashian’s nudes over the web.

Hosting that butt is an impressive feat. You can’t just put Kim Kardashian nudes on the Internet and walk away —that would be like putting up a tent in the middle of a hurricane. Your web server would melt. You need to plan.

- How PAPER Magazine’s web engineers scaled their back-end for Kim Kardashian (SFW), by Paul Ford, Medium, 21-Jan-2015

Speaking of how Amazon can seemingly scale to infinite levels, this article tells the story of how one online publisher ramped up their server capacity to meet the demands of users flocking to see Kim Kardashian’s rear end. (And who said the internet wasn’t a valuable tool…)

Link to this post!

Nicole Engard: Bookmarks for February 4, 2015

planet code4lib - Wed, 2015-02-04 20:30

Today I found the following resources and bookmarked them on <a href=

  • Greenfoot Teach and learn Java programming
  • Blockly Games Blockly Games is a series of educational games that teach programming. It is designed for children who have not had prior experience with computer programming. By the end of these games, players are ready to use conventional text-based languages.
  • Blockly Blockly is a library for building visual programming editors

Digest powered by RSS Digest

The post Bookmarks for February 4, 2015 appeared first on What I Learned Today....

Related posts:

  1. How To Get More Kids To Code
  2. Learn from The Sims
  3. NFAIS 2009: Born Digital – Born Mobile

HangingTogether: The Evolving Scholarly Record, Washington, DC edition

planet code4lib - Wed, 2015-02-04 19:32


Brian Lavoie, presenting in the GWU International Brotherhood of Teamsters Labor History Research Center

On December 10th, we held our second Evolving Scholarly Record Workshop at George Washington University in Washington, DC (you can read Ricky Erway’s summary of the first workshop, starting here). Many thanks to Geneva Henry and all the staff at GWU for hosting us in the fabulous International Brotherhood of Teamsters Labor History Research Center.This workshop, and others, build on the framework presented in the OCLC Research report, The Evolving Scholarly Record.

At George Washington Gelman library attending OCLC workshop #esrworkshop

— Martha Kyrillidou (@kyrillidou) December 10, 2014

Our first speaker, Brian Lavoie (OCLC Research) presented the ESR Framework and put it into context. What is considered part of the record is constantly expanding – for example, blogs and social media, which would previously not have been included. The evolution of how scholarship is recorded, makes it challenging to organize the record in a consistent and reliable ways. The ecosystem of stakeholders is evolving as well. It became clear to Brian and others involved in discussions around the problem space that a framework was necessary in order to support strategic discussions across stakeholders and across domains.

Formats shift (print to dig), boundaries blur (from books to data sets), characteristics change (static works to dynamic) #esrworkshop

— Keith Webster (@CMKeithW) December 10, 2014

In addition to traditional scholarly outcomes, there are two additional areas of focus, process and aftermath.

Process is what leads up to the publication of the outcomes – in the framework, process is composed of method, evidence and discussion (important because outcomes usually consolidate thanks to discussions with peers). Anchoring outcomes in process will help reproducibility. Scholarly activities continue in aftermath: discussion (including post publication reviews and commentaries), revision (enhancement, clarification), re-use (including repackaging for other audiences).

In the stakeholder ecosystem, the traditional roles (create, fix, collect, use) are being reconfigured. For example, in addition to libraries, service providers like Portico and JSTOR are now important in the collect role. Social media and social storage services, which are entirely outside the academy, are now part of create and use.  New platforms, like figshare, are taking on the roles of fix and collect. The takeaway here? The roles are constant, but the configurations of the stakeholders beneath them are changing.

How does the traditional "scholarly record" fit into today's consumer-producer model of web content? Not well, it seems. #esrworkshop

— Scott W. H. Young (@hei_scott) December 10, 2014

Our second speaker, Herbert van de Sompel (Los Alamos National Laboratory) gave perspective from the network point of view. His talk was a modified reprise of his presentation at the June OCLC / DANS workshop in Amsterdam, which Ricky nicely summarized in a previous posting. Herbert will also be speaking at our workshop coming up in March, so if you’d like to catch him in action, sign up for that session.

Wow @hvdsomp is basically reading my mind. "The web will fundamentally change from human-readable to machine-actionable." #esrworkshop

— Scott W. H. Young (@hei_scott) December 10, 2014

Our third speaker was Geneva Henry (George Washington University) – Geneva represented the view from the campus. We will be posing her viewpoint in a separate blog post, later this week but her remarks touched on the various campus stakeholders in the scholarly record – scholars, media relations, promotion and tenure committee, the office of research, the library.

Look to your students: How are they communicating? asks Henry. They’re future scholars; don’t expect drastic behavior changes #esrworkshop

— Mark Newton (@libmark) December 10, 2014

Daniel Hook (Digital Science), shared his “view from the platform.” (Digital Science is the parent company of several platform services, such as FigShare, AltMetrics, Symplectic Elements, and Overleaf). Daniel stressed the importance in transparency and reproducibility of research – there is a need for a demonstrable pay-off for investors in research. There is a delicate balance to be reached in collaboration versus competition in research. We are in an era of increased collaboration and the “fourth age of research” is marked by international collaboration. Who “owns” research, and the scholarly record? Individual researchers? Their institutions? Evaluation of research increasingly calls for demonstrating impact of research. Identifiers are glue – identifiers for projects, for researchers, for institutions. The future will be in dynamically making assertions of value and impact across institutions, and to build confidence in those assertions.

Funders keen to assess impact of research they funded at a macro-level – have you influenced policy? The economy? #esrworkshop

— Keith Webster (@CMKeithW) December 10, 2014

Finally Clifford Lynch (Coalition for Networked Information) gave some additional remarks, highlighting stress points. Potentially, the scholarly record is huge, especially with an expanded range of media and channels. The minutes of science are recording every minute, year in year out. Selection issues are challenging, to say the least. Is it sensible to consider keeping everything?  Cliff called for hard questions to be asked, and for studies to be done. Some formats seem to be overlooked — video, for example.

We concluded the meeting with a number of break-out sessions that took up focused topics. The groups came back with tons of notes, and also some possible “next steps” or actions that could be taken to move us forward. Those included.

  • Promulgating name identifiers and persistent IDs for use by other stakeholders
  • Focusing on research centers and subject/disciplinary repositories to see what kinds of relationships are needed
  • Mining user studies/reviews to pull out research needs/methods/trends/gaps and find touch-points to the library
  • Following the money in the ESR ecosystem to see whether there are disconnects between shareholder interests and scholar value
  • Pursuing with publishers whether they will collect the appropriate contextual processes and aftermaths
  • Investigating funding, ROI, and financial tradeoffs
  • Getting involved during the grant planning processes so that materials flow to the right places instead of needing to be rescued after the fact

Thanks to all of our participants, but particularly to our hosts, our speakers, our notetakers and those who helped record the event on Twitter. We’re looking forward to another productive workshop in Chicago (in March) and then expect to culminate the three workshops at the ESR workshop in San Francisco (in June) where we’ll focus on how we can collaboratively move things forward to do our best to ensure stewardship of the scholarly record now and into the future.

I have 19 pages of notes from 5 breakout sessions. We'll boil down discussions and talking points for next steps soon #esrworkshop

— Merrilee Proffitt (@MerrileeIAm) December 10, 2014

The @DANSKNAW & #esrworkshop events on archiving the future scholarly record have given me focus on tech aspects that must be tackled first

— Herbert (@hvdsomp) December 11, 2014

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (278)

LITA: Jobs in Information Technology: February 4

planet code4lib - Wed, 2015-02-04 17:58

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Digital Initiatives Librarian, University of North Carolina Wilmington, Wilmington, NC

Director of Library Services, Marymount California University, Rancho Palos Verdes, CA

Sr. UNIX Systems Administrator, University Libraries, Virginia Tech, Blacksburg, VA

Technology Projects Coordinator, Oak Park Public Library, Oak Park, IL

Visit the LITA Job Site for more available jobs and for information on submitting a  job posting.



Subscribe to code4lib aggregator