You are here

Feed aggregator

HangingTogether: Celebrate Fair Use Week – by not getting in the way

planet code4lib - Tue, 2016-02-23 16:03

It’s Fair Use Week and lots of libraries are getting on board, offering workshops, infographics, tips, and drop in office hours that are all geared towards encouraging fair use of copyrighted materials. For some awesome examples, check out the #fairuseweek2016 hash tag on Twitter.

This is great and good activity but also a reminder (to me) that all too often, libraries, archives, and museums can be unnecessary gatekeepers when it comes to cultural heritage. We blogged about this last year and pointed to Michelle Light’s talk Controlling Goods or Promoting the Public Good: Choices for Special Collections in the Marketplace — this article calls for for an end to inappropriate control of intellectual property rights, and calls for us to change our practices around charging permission fees for use of archival materials.

Boy Scouts – With giant American flag. From The New York Public Library,

Earlier this year, and with much fanfare, the New York Public Library announced that they had released digital access to their public domain materials, making it easy for the public to use and reuse more than 180,000 digitized items. This was an important milestone to be sure, but perhaps hidden amidst the excitement about “free for all” was the fact that NYPL also does not put restrictions around use of materials that are in copyright or where copyright status is unknown. They have provided a nice request that you credit NYPL and link back to the item in NYPL Digital Collections (and, they make it dead easy to get that link in their system).

So, as you cook up your own celebrations during Fair Use Week, I encourage you to think about other ways you can empower researchers and other users, and consider how you can get out of the way in reproductions and permissions practices (and become one of the Good Guys).


About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (283)

FOSS4Lib Upcoming Events: Archivematica Camp

planet code4lib - Tue, 2016-02-23 15:58
Date: Wednesday, August 24, 2016 - 08:00 to Friday, August 26, 2016 - 17:00Supports: Archivematica

Last updated February 23, 2016. Created by Peter Murray on February 23, 2016.
Log in to edit this page.

From the announcement:

Artefactual Systems is pleased to announce the first ever Archivematica Camp, which will be hosted by the University of Michigan School of Information, in Ann Arbor, Michigan, August 24 – 26, 2016.

DPLA: DPLAfest 2016 Agenda Now Available

planet code4lib - Tue, 2016-02-23 15:30

The agenda for DPLAfest 2016 is now available! Taking place in the heart of Washington, DC, DPLAfest 2016 (April 14-15) will bring together hundreds from DPLA’s large and growing community for interactive workshops, hackathons and other collaborative activities, engaging discussions with community leaders and practitioners, fun events, and more.

This year’s DPLAfest will feature dozens of fascinating conversations with leading thinkers and doers and those who care about the world of libraries, archives, museums, and our shared culture. One such session, Authorship in the Digital Age, will include three prominent contemporary authors, Virginia Heffernan, Robin Sloan, and Craig Mod. Another session, The Future of Libraries, will bring together leading library figures including Richard Reyes-Gavilan (DC Public Library) and Amy Garmer (Aspen Institute). Other sessions will cover such topics as eBooks, major technology trends, groundbreaking projects taking place within the DPLA’s extended network of institutions, and much more.

The third-ever DPLAfest will appeal to anyone interested in libraries, technology, ebooks, education, creative reuse of cultural materials, law, open access, digitization and digital collections, and family research. Area institutions serving as co-hosts include the National Archives and Records Administration, the Library of Congress, and the Smithsonian Institution, and the leaders of those institutions will be speaking at the fest.

To attend DPLAfest 2016, register today.

We look forward to seeing you in DC!

View DPLAfest 2016 Agenda

Mark E. Phillips: How many of the EOT2008 PDF files were harvested in EOT2012

planet code4lib - Tue, 2016-02-23 15:02

In my last post I started looking at some of the data from the End of Term 2012 Web Archive snapshot that we have at the UNT Libraries.  For more information about EOT2012 take a look at that previous post.

EOT2008 PDFs

From the EOT2008 Web archive I had extracted the 4,489,675 unique (by hash) PDF files that were present and carried out a bit of analysis on them as a whole to see if there was anything interesting I could tease out.  The results of that investigation I presented at an IS&T Archiving conference a few years back.  The text from the proceedings for that submission is here and the slides presented are here.

Moving forward several years,  I was curious to see how many of those nearly 4.5 million PDFs were still around in 2012 when we crawled the federal Web again as part of the EOT2012 project.

I used the same hash dataset from the previous post to do this work which made things very easy.  I first pulled the hash values for the 4.489,675 PDF files from EOT2008.  Next I loaded all of the hash values from the EOT2012 crawls. The next and final step was to iterate through each of the PDF file hashes and do a lookup to see if that content hash is present in the EOT2012 hash dataset.  Pretty straightforward.


After the numbers finished running,  it looks like we have the following.

  PDFs Percentage Found 774,375 17% Missing 3,715,300 83% Total 4,489,675 100%

Put into a pie chart where red equals bad.

EOT2008 PDFs in EOT2012 Archive

So 83% of the PDF files that were present in 2008 are not present in the EOT2012 Archive.

With a little work it wouldn’t be hard to see how many of these PDFs are still present on the web today at the same URL as in 2008.  I would imagine it is a much smaller number than the 17%.

A thing to note about this is that because I am using content hashes and not URLs, it is possible that an EOT2008 PDF is available at a different URL entirely in 2012 when it was harvested again. So the URL might not be available but the content could be available at another location.

If you have questions or comments about this post,  please let me know via Twitter.

LibUX: Value vs. Feasibility — How to Prioritize Web Projects with Multiple Stakeholders

planet code4lib - Tue, 2016-02-23 14:35

I’m a higher education library employee at West Virginia University (WVU) Libraries. I work as a user experience architect / designer on a small development team comprised of a full-stack developer, a software developer, a graduate assistant (GA) junior developer, a GA designer, and a web librarian.

I assume that our workflow is probably similar to that of other higher education libraries. The majority of our development team’s projects come simultaneously from internal systems work, special collections, a web team (that encompasses librarians from multiple libraries and departments), various committees, and individual personnel requests.

It is common for our development team to constantly shoulder a myriad of different projects and tasks that range from web applications, usability testing, digital collections, third-party applications, custom development, custom design, custom website development, special projects, and much more as needed.

One of the problem that our development team seems to frequently encounter is that the loudest voices (or squeaky wheels) in the library often get workflow priority, and thus get their work completed first. Most of the time it isn’t clear to our development team what is truly important to stakeholders, personnel, or WVU Libraries’ patrons. We also seem to be stuck in a waterfall workflow — a workflow model that originated in the manufacturing and construction industries — wherein deadlines are decided without our input, and there seems to be no order of what projects we should work on, and what projects should slide.

To even further compound the problem, it also seems that a lot our librarians are largely unaware of each others’ projects, what progress others are making, or even our priorities to system maintenance and priorities. We are only a full-time staff of four, with two more part-time staff, and we have a major problem not being able to move away from the backlog table.

Our whole team knows that we have to move from a waterfall workflow to a more agile workflow (see this Base36 article about advantages and disadvantages of both workflows). We know that we desperately need a way to get multiple stakeholders’ input, while being aware of each others’ projects. We know that we should be focusing on what is valuable to patrons and library personnel, but in regards to what’s viable and feasible.

In short, we need a miracle!

The “Value / Feasibility” Exercise

Last year I was fortunate enough to attend UX Intensive, a four-day workshop series for user experience professionals and design managers that examines design strategy, design research, service design, and interaction design. It was a great workshop and I highly recommend it for facilitating strategies for jumping right in, tackling, and fixing problems with design management and design thinking techniques.

Suffice to say, I came back to the library empowered, and in the middle of last November, the WVU Libraries development team facilitated their first request and workflow experiment using  design management/thinking strategy.

How it works

These are the steps we employed:

  1. Get similar stakeholders together
  2. Have them make a list of major tasks and projects for the next six months that require the involvement of the development team
  3. Count the number of tasks
  4. Multiply the number of tasks by three — that is their number of total points.

Stakeholders were then asked to discuss the tasks with our team, and each other, and assign points to each task; however, the stakeholders were limited by the total number of points that they could spend, every task had to have at least one point, and they should assign or spend points in regards to the task/projects:

  • Importance/Value to both the users/patrons and the library/personnel, where the most valuable are higher numbers.
  • Viability/Feasibility wherein the least effort, cost, or maintenance were higher numbers.

The web team made a list of 14 tasks, which when multiplied by three equaled a total of 42 points. They then spent 42 points in the Importance/Value column, another 42 points in the Viability/Feasibility column, and made sure to leave at least 1 point in each project or task.

In the example above we can see that the web team ranked updating the “Databases” web application and redesigning the “Collections” web application as having the most importance/value to our patrons and the library. Simultaneously, they ranked archiving the intranet committees and developing an alert box as taking the least amount of time and effort. During this entire process it was very important for our team to be present to consult on projects, discuss deadlines, help facilitate task/project understanding, and resolve any technology questions that needed answered.

Visualizing the Results

With these results we could dynamically and immediately visualize projects and tasks in chart-form for everyone, ensure that different teams and stakeholders have the same opportunity to set priorities based on value and feasibility, and create understanding of:

  • where to start working based on what takes the least amount of time and what is most important to stakeholders, personnel, and users
  • what is least important and takes the largest amount of time to complete.

We dynamically plot the 14 tasks with a chart in Microsoft Excel so the web team can see that updating the databases web application, redesigning the collections web application, and developing an alert box are the most important tasks which take the least amount of time.

Our development team ended up calling for three separate meetings that incorporated all of the members of the web team, another meeting for special collections, and a final meeting with the staff from the systems office. We then created three visualizations in chart-form for each meeting group, and went one step further to combine all three of the meeting group’s data into one visualization. It was our team’s hope that the combined visualization could serve as a workflow priority for the development team, especially if both administration and each of the meeting groups approved of the process and results.

By combining multiple team’s charts in Microsoft Excel we see an overall workflow priority for our development team, which is established from stakeholders input; this provides a place on which to focus, and a roadmap to work towards.


Reception and Going Forward

The response was immediate! The individual teams greatly appreciate the process and thanked the development team for answering their questions, facilitating the session, and helping them to see each other’s projects and priorities. Not only did the individual teams approve of the strategy sessions, but administration is also happy with the initiative our team showed, and wants to work with us to refine and improve the process.

Going forward our development team is scheduled to facilitate this process on a seasonal basis (every-three-months) to gauge major project/task importance, involvement, management, and priority. I can now say that the deafening sound of the squeaky wheels has subsided, there is a light at the end of the tunnel, and our development team’s morale has greatly improved.

Workflow Prioritization (Value/Feasibility) from Tim Broadwater

The Web for Libraries is a brief and potent newsletter at the intersection of libraries and the cutting-edge web.

Email Address

The post Value vs. Feasibility — How to Prioritize Web Projects with Multiple Stakeholders appeared first on LibUX.

William Denton: The Idealist

planet code4lib - Tue, 2016-02-23 13:04

The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet, by Justin Peters, is a workmanlike overview of Swartz’s life and the mass JSTOR download that got him arrested and ultimately led to his suicide. It’s also a rambling partial history of copyright and open access in the United States. All of its inadequacies are set out by the author himself in his introduction, except for one: it’s not available through open access. It should be.

In the introduction, Peters says:

The Idealist is not mean as a comprehensive biography of Aaron Swartz or a comprehensive history of Internet activism or American copyright law. Little at all about this book is comprehensive. Someone could easily make another book consisting exclusively of material omitted from this one, and if you do, please send me a copy. The Idealist is a provisional narrative introduction to the story of free culture in America, using Swartz’s life as a lens on the reuse of information sharing in the digital age.

Bear that in mind, and that the first hundred pages are a history of copyright in America that shifts into a discussion of the Library of Congress and Vannevar Bush, and you’ll better appreciate the book for what it is. We do get the story of Michael Hart and Project Gutenberg, and background on Carl Malamud and Lawrence Lessig and how Swartz got involved with their work. The details of Swartz’s downloading at MIT and the response from the university, JSTOR and the government are set out clearly. All of this makes the remaining 170 pages of the book—not counting the detailed endnotes, bibliography and index—a good overview of Swartz’s life and work to anyone first looking at it in depth, and there may be a few things new to people already familiar with the story, but the real book about Swartz remains to be written.

I think of Aaron Swartz every time I see this.

There are some odd asides, like this snark about Richard Stallman:

He called the program GNU, a “recursive acronym” that stands for “GNU’s Not Unix.” (Unix is a popular computer-operating [sic] system. A gnu is also a large, hairy wildebeest, an animal to which Stallman bears a faint resemblance, if you squint and use your imagination).

And there is some bad writing:

Like its arachnoid namesake, the Web was good at drawing people in.

Peters’s discussion of Swartz’s political involvement seems inadequate, and I was surprised there was no mention of his afterword to Cory Doctorow’s 2013 novel Homeland. The book is under a CC BY-NC-ND license, so I can quote it in full here; if The Idealist was, Peters could have done the same.

Afterword by Aaron Swartz, Demand Progress (co-founder,

Hi there, I’m Aaron. I’ve been given this little space here at the end of the book because I’m a flesh-and-blood human and, as such, I can tell you something you wouldn’t believe if it came out of the mouth of any of those fictional characters:

This stuff is real.

Sure, there isn’t anyone actually named Marcus or Ange, at least not that I know, but I do know real people just like them. If you want, you can go to San Francisco and meet them. And while you’re there, you can play D&D with John Gilmore or build a rocketship at Noisebridge or work with some hippies on an art project for Burning Man.

And if some of the more conspiracy-minded stuff in the book seems too wild to be true, well, just google Blackwater, Xe, or BlueCoat. (I myself have a FOIA request in to learn more about “persona management software,” but the Feds say it’ll take three more years to redact all the relevant documents.)

Now I hope you had fun staying up all night reading about these things, but this next part is important, so pay attention: what’s going on now isn’t some reality TV show you can just sit at home and watch. This is your life, this is your country – and if you want to keep it safe, you need to get involved.

I know it’s easy to feel like you’re powerless, like there’s nothing you can do to slow down or stop “the system.” Like all the calls are made by shadowy and powerful forces far outside your control. I feel that way, too, sometimes. But it just isn’t true.

A little over a year ago, a friend called to tell me about an obscure bill he’d heard of called the Combatting Online Infringement and Counterfeitting Act, or COICA. As I read the bill, I started to get more and more worried: under its provisions, the government would be allowed to censor websites it didn’t like without so much as a trial. It would be the first time the U.S. government was given the power to censor its citizens’ access to the net.

The bill had just been introduced a day or two ago, but it already had a couple dozen senators cosponsoring it. And, despite there never being any debate, it was already scheduled for a vote in just a couple days. Nobody had ever reported on it, and that was just the point: they wanted to rush this thing through before anyone noticed.

Luckily, my friend noticed. We stayed up all weekend and launched a website explaining what the bill did, with a petition you could sign opposing it that would look up the phone numbers for your representatives. We told a few friends about it and they told a few friends and within a couple days we had over 200,000 people on our petition. It was incredible.

Well, the people pushing this bill didn’t stop. They spent literally tens of millions of dollars lobbying for it. The head of every major media company flew out to Washington, D.C. and met with the president’s chief of staff to politely remind him of the millions of dollars they’d donated to the president’s campaign and explain how what they wanted — the only thing they wanted — was for this bill to pass.

But the public pressure kept building. To try to throw people off the trail, they kept changing the name of the bill — calling it PIPA and SOPA and even the E-PARASITES Act — but no matter what they called it, more and more people kept telling their friends about it and getting more and more people opposed. Soon, the signers on our petition stretched into the millions.

We managed to stall them for over a year through various tactics, but they realized if they waited much longer they might never get their chance to pass this bill. So they scheduled it for a vote first thing after they got back from winter break.

But while members of Congress were off on winter break, holding town halls and public meetings back home, people started visiting them. Across the country, members started getting asked by their constituents why they were supporting that nasty Internet censorship bill. And members started getting scared — some going so far as to respond by attacking me.

But it wasn’t about me anymore — it was never about me. From the beginning, it was about citizens taking things into their own hands: making YouTube videos and writing songs opposing the bill, making graphs showing how much money the bill’s cosponsors had received from the industries pushing it, and organizing boycotts putting pressure on the companies who’d endorsed the bill.

And it worked — it took the bill from a political nonissue that was poised to pass unanimously to a toxic football no one wanted to touch. Even the bill’s cosponsors started rushing to issue statements opposing it! Boy, were those media moguls pissed…

This is not how the system is supposed to work. A ragtag bunch of kids doesn’t stop one of the most powerful forces in Washington just by typing on their laptops!

But it did happen. And you can make it happen again.

The system is changing. Thanks to the Internet, everyday people can learn about and organize around an issue even if the system is determined to ignore it. Now, maybe we won’t win every time — this is real life, after all — but we finally have a chance.

But it only works if you take part. And now that you’ve read this book and learned how to do it, you’re perfectly suited to make it happen again. That’s right: now it’s up to you to change the system.

Aaron Swartz

DuraSpace News: Upcoming FREE Digital POWRR Workshop in Little Rock

planet code4lib - Tue, 2016-02-23 00:00

DeKalb, IL  On April 22, 2016, the Digital POWRR team will be conducting a FREE, day-long workshop at the Arkansas Studies Institute in Little Rock, AR entitled From Theory to Action: A Pragmatic Approach to Digital Preservation Tools and Strategies. This full-day workshop is made possible in part by a major grant from the National Endowment for the Humanities: Exploring the Human Endeavor.

DuraSpace News: Submissions are OPEN for the OR2016 Technical (Developer) Track

planet code4lib - Tue, 2016-02-23 00:00

From Adam Field and Claire Knowles, OR2016 Technical Track co-chairs

The Open Repositories 2016 Conference Technical (Developer) Track (June 13 - 16, Dublin) is a forum to present anything that is of interest to us, the repositories technical community.  If you think it’s worth saying, then we almost certainly will want to hear it.  It’s an opportunity to network, show what you do to your peers and learn from them.

Be part of the Developer Track in 3 easy steps:

1. Decide what to talk about.

District Dispatch: Fair use déjà vu

planet code4lib - Mon, 2016-02-22 22:36

An image of the card catalogs many baby boomers grew up using.

Occasionally I’ll walk back to the office library and pull out a decades old ALA Bulletin—the precursor to American Libraries—and open to a random page just to see what drew ALA readers’ interest in days gone by. In the year 1980, we were talking about the “coming revolution” of the latest technical innovation—Videotext—a technology that is antiquated given the Internet. One used a “dumb terminal” with Videotext (enough said). Another topic of interest was AACR2. (For non-librarian readers, AACR2 is a national cataloging code.) Its implementation was such a challenge for some librarians that they needed to be told to get a grip.

Librarians need not always view AACR2 as a mind-boggling problem. The panic resulting from one’s first realization that the heading for Chaikovskii will be changed to Tchaikovsky (thus potentially affecting hundreds of catalog cards) should be confined to a few shudders.

What about fair use in 1980? The Copyright Act of 1976 was still relatively new, and photocopying was a major concern. The Copyright Office held a hearing on the effect of “Section 108 on the Rights of Creators and Needs of Users of Works Reproduced by Certain Libraries and Archives,” a title that one commenter thought would “put listeners asleep but [discovered that] some lively testimony punctuated the day-long event.” And (this is going to sound all too familiar), publisher representatives charged that librarians “interchange sections 107 (fair use) and 108 to suit their purposes.” This was one of the arguments the Authors Guild made in Authors Guild v HathiTrust that was rejected by the judge. Section 108 does not restrict fair use.

The Copyright Act of 1976 codified both fair use and the library/archives reproduction exception. If you look back at the legislative history, the libraries were successful in getting exceptions for activities they knew they would do – preservation, interlibrary loan, using the photocopy machine and so on, which appear in 108. Section 107 was a compromise among the stakeholders who agreed that a flexible exception would be necessary to account for “the endless variety of situations and combinations of circumstances that can arise in particular cases preclude[ing] the formulation of exact rules in the statute.”  The House said the legislation:

…endorses the purpose and general scope of the judicial doctrine of fair use, but there is no disposition to freeze the doctrine in the statute, especially during a period of rapid technological change. Beyond a very broad statutory explanation of what fair use is and some of the criteria applicable to it, their courts must be free to adapt the doctrine to particular situations on a case-by-case basis.

Now that’s some good reading!

It’s been said before that without fair use, among scores of other things, digital technologies that we rely on every day would be illegal. We would be using the Videotext instead of the Internet. We would type up catalog cards with arbitrary subject headings instead of allowing everyone to key word search.

The post Fair use déjà vu appeared first on District Dispatch.

David Rosenthal: 1000 long-tail publishers!

planet code4lib - Mon, 2016-02-22 20:00
The e-journal content that is at risk of loss or cancellation comes from the "long tail" of small publishers. Somehow, the definition of "small publisher" has come to be one that publishes ten or fewer journals. This seems pretty big to me, but if we adopt this definition the LOCKSS Program just passed an important milestone. We just sent out a press release announcing that the various networks using LOCKSS technology now preserve content from over 1000 long-tail publishers. There is still a long way to go, but as the press release says:
there are tens of thousands of long tail publishers worldwide, which makes preserving the first 1,000 publishers an important first step to a larger endeavor to protect vulnerable digital content.

LITA: The next LITA web course and webinar, register now!

planet code4lib - Mon, 2016-02-22 17:48

There’s still time to register for the next great LITA continuing education web course or webinar offerings.

Check out this infomative and fast paced new LITA webinar:

How Your Public Library Can Inspire the Next Tech Billionaire: an Intro to Youth Coding Programs

Presenters: Kelly Smith, Crystle Martin and Justin Hoenke
Thursday March 3, 2016
Noon – 1:00 pm Central Time
Register Online, page arranged by session date (login required)

Kids, tweens, teens and their parents are increasingly interested in computer programming education, and they are looking to public and school libraries as a host for the informal learning process that is most effective for learning to code. This webinar will share lessons learned through youth coding programs at libraries all over the U.S. We will discuss tools and technologies, strategies for promoting and running the program, and recommendations for additional resources. An excellent webinar for youth and teen services librarians, staff, volunteers and general public with an interest in tween/teen/adult services.

Details here and Registration here.

Or make the investment in learning with this web course:

Which Test for Which Data: Statistics at the Reference Desk

Instructor: Rachel Williams
Starting Monday February 29, 2016, running for 4 weeks
Register Online, page arranged by session date (login required)

This web course is designed to help librarians faced with statistical questions at the reference desk. Whether assisting a student reading through papers or guiding them when they brightly ask “Can I run a t-test on this?”, librarians will feel more confident facing statistical questions. This course will be ideal for library professionals who are looking to expand their knowledge of statistical methods in order to provide assistance to students who may use basic statistics in their courses or research. Students taking the course should have a general understanding of mean, median, and mode.

Details here and Registration here

And don’t miss the other upcoming LITA spring continuing education offerings:


The Why and How of HTTPS for Libraries, with Jacob Hoffman-Andrews
Offered: Monday March 14, 2016, 1:00 pm Central Time

Yes You Can Video, with Anne Burke, and Andreas Orphanides
Offered: Tuesday April 12, 2016, 1:00 pm – 2:30 pm Central Time

Web course:

Universal Design for Libraries and Librarians, with Jessica Olin, and Holly Mabry
Starting Monday April 11, 2016, running for 6 weeks

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty,

Hugh Cayless: Thank You

planet code4lib - Mon, 2016-02-22 17:41
Back in the day, Joel Spolsky had a very influential tech blog, and one of the pieces he wrote described the kind of software developer he liked to hire, one who was "Smart, and gets things done." He later turned it into a book ( Steve Yegge, who was also a very influential blogger in the oughties, wrote a followup, in which he tackled the problem of how you find and hire developers who are smarter than you. Given the handicaps of human psychology, how do you even recognize what you're looking at? His rubric for identifying these people (flipping Spolsky's) was "Done, and gets things smart". That is, this legendary "10X" developer was the sort who wouldn't just get done the stuff that needed to be done, but would actually anticipate what needed to be done. When you asked them to add a new feature, they'd respond that it was already done, or that they'd just need a few minutes, because they'd built things in such a way that adding your feature that you just thought of would be trivial. They wouldn't just finish projects, they'd make everything better—they'd create code that other developers could easily build upon. Essentially, they'd make everyone around them more effective as well.

I've been thinking a lot about this over the last few months, as I've worked on finishing a project started by Sebastian Rahtz: integrating support for the new "Pure ODD" syntax into the TEI Stylesheets. The idea is to have a TEI syntax for describing the content an element can have, rather than falling back on embedded RelaxNG. Lou Burnard has written about it here: Sebastian wrote the XSLT Stylesheets and the supporting infrastructure which are both the reference implementation for publishing TEI and the primary mechanism by which the TEI Guidelines themselves are published. And they are the basis of TEI schema generation as well. So if you use TEI at all, you have Sebastian to thank.

Picking up after Sebastian's retirement last year has been a tough job. It was immediately obvious to me just how much he had done, and had been doing for the TEI all along. When Gabriel Bodard described to me how the TEI Council worked, after I was elected for the first time, he said something like: "There'll be a bunch of people arguing about how to implement a feature, or even whether it can be done, and then Sebastian will pipe up from the corner and say 'Oh, I just did it while you were talking.'" You only have to look at the contributors pages for both the TEI and the Stylesheets to see that Sebastian was indeed operating at a 10X level. Quietly, without making any fuss about it, he's been making the TEI work for many years.

The contributions of software developers are often easily overlooked. We only notice when things don't work, not when everything goes smoothly, because that's what's supposed to happen, isn't it? Even in Digital Humanities, which you'd expect to be self-aware about this sort of thing, the intellectual contributions of software developers can often be swept under the rug. So I want to go on record, shouting a loud THANK YOU to Sebastian for doing so much and for making the TEI infrastructure smart.

Roy Tennant: Broken Furniture and Blood on the Floor

planet code4lib - Mon, 2016-02-22 16:07

I’ve been troubled lately by what I perceive as a fundamental misunderstanding of the nature of our transition from record-based bibliographic metadata to linked data. Although this misunderstanding can be expected, given how long our profession has been invested in a record-based infrastructure and standards, it is potentially disastrous should we prove not up to the task of overcoming it.

Let me break this down for you.

  • All of the power of linked data derives from the links. For some reason I think it bears saying that linked data without useful links is simply data. It isn’t linked data.
  • Links are only useful if they lead to an authoritative source that has something useful to provide. Some libraries have been creating “linked data” by minting identifiers that lead nowhere worth following. This also is not linked data.
  • Simply translating bibliographic data from one format (MARC) to another (BIBFRAME or, for example) does not create useful links. This is one of the essential bits that everyone needs to understand. Our bibliographic transition is not one of translating records from one format to another. It will instead involve processes that are devilishly complex and difficult to carry out well. Given this, only some of us in the library world will be capable of doing it as well as it should be done to be truly effective.
  • To achieve true library linked data, individual MARC elements must be turned into actionable entities. By this I mean that an individual MARC element such as “author” must be translated into an assertion that includes an actionable URI that leads to an authoritative source that has something useful to say about that author.
  • Creating actionable entities will require new kinds of processes and services that mostly don’t yet exist. These are still early days in this transition, but we are already beginning to see the kinds of services that will be required to take our static, inert library data and turn it into a living part of what we are beginning to call the Bibliographic Graph. At OCLC Research, where I work, we call this process “entification.” This might encompass the creation of your own linked data entities or the use of those created by others by using a “reconciliation” service.

We need to shake off the shackles of our record-based thinking and think in terms of an interlinked Bibliographic Graph. As long as we keep talking about translating records from one format to another we simply don’t understand the meaning of linked data and both the transformative potential it has for our workflows and user interfaces as well as the plain difficult and time consuming work that will be required to get us there.

Sure, we at OCLC are a long way down a road that should do a lot to help our member libraries make the transition, but there will be plenty of work to go around. The sooner we fully grasp what that work will be, the better off we will all be in this grand transition. No, let’s call it what it really is: a bibliographic revolution. Before this is over there will be broken furniture and blood on the floor. But at least we will be free of the tyrant.

Note: portions of this post originally appeared as a message on the BIBFRAME discussion.

Library of Congress: The Signal: Demystifying Digital Preservation for the Audiovisual Archiving Community

planet code4lib - Mon, 2016-02-22 16:00

The following is a guest post by Kathryn Gronsbell, Digital Asset Manager, Carnegie Hall; Shira Peltzman, Digital Archivist, UCLA Library; Ashley Blewer, Applications Developer, NYPL; and Rebecca Fraimow, Archivist and AAPB NDSR Program Coordinator, WGBH.

The intersection of digital preservation and audiovisual archiving has reached a tipping point. As the media production and use landscape evolves, so too have preservation strategies. From the rapid improvements in digital capture technology, to the adoption of file-based production workflows and digital distribution technology, to storage that has increased in capacity and decreased in price, to the widespread use of cloud-based storage solutions, over the past decade we have witnessed a series of transformations that fundamentally alter dominant theories and practices of moving image preservation and access.

A candid snapshot of a remote meeting with (top) Kathryn, (left to right) Ashley, Rebecca, and Shira discussing the proposed stream.

To generate conversation around this shifting terrain, we spent one month creating a collaborative proposal for a stream of sessions at the annual Association of Moving Image Archivists (AMIA) conference. Although there have been talks at past AMIA conferences that have addressed aspects of digital preservation, AMIA’s recent transition to a stream-based programming model provided us with a unique and valuable opportunity to discuss this topic in greater depth and length.

The stream we proposed focuses on questions, challenges, and benefits related to the intersection of audiovisual content and digital preservation. We asked a wide range of stakeholders with an interest in digital preservation and/or moving image archiving to propose sessions, sign a letter of support (which garnered nearly 180 signatures from organizations across five different continents as of February 15th), and add topics that would benefit the A/V community. Our initial proposal included:

  • Digital preservation in practice (strategies)
  • How to preserve: sharing, learning, and building collaboratively (innovation and practical engagement)
  • Dissolving boundaries: the necessity of multi-disciplinary input for preservation (borrow and steal from people smarter than us)
  • Transitioning from a short-term digital preservation project to a long-term program (sustainability)

Submissions for additional sessions and talk proposals began to come in as soon as we opened the document for public comment and contributions. We received contributions from A/V archivists, digital asset managers, conservation and preservation specialists, and academics from across library, archive, museum, gallery, government, and broadcast industries. As we expected, contributors addressed conceptual and practical conversations — everything from hands-on technologies to workflows to ISO standards. During the two-week period the document was open, our initial list of topics ballooned to 10+ session proposals.

The wholesale acceptance of digital preservation has been slower within the moving image archiving and preservation community than in adjacent fields, and events focused on digital preservation rarely delve deeply into the challenges presented by audiovisual materials. Acceptance may be slower in the moving image community because individuals must devote resources to advocating and managing their physical collections, which are frequently segregated from larger, strategic initiatives (even within a single organization) due to the cost and complexity of preserving analog film/video content. The shifting focus towards digital preservation is an opportunity to dissolve the manufactured boundary between A/V and still image (or other) content and include audiovisual specialists in broader discussions of preservation and access. For us, the broad-based support that our stream proposal received has been a clear indicator of both a pressing need and collective desire to address some of the practical, theoretical, and ontological questions that digital preservation raises for some of our colleagues. What we’re hoping to achieve with this proposed stream is–at least in part–to demonstrate that although audiovisual and other complex materials (e.g. software) may require special attention, the basics are already well understood and championed by other communities where expertise in this area is profound: data storage research, all forms of computer sciences, information and knowledge management, emulation and migration specialists, and research repositories have a deep history in long-term data preservation. We can also learn from efforts in adjacent fields, like financial data security, and tweak their lessons learned to our needs and resources. Increasing the engagement of the analog film and video world with the digital preservation community, and vice versa, will yield tremendous benefits on both sides of the divide.

A view of the open proposal planning document for the AMIA 2016 stream about digital preservation.

Integrating digital preservation activities into A/V preservation and increasing digital literacy across the field is integral to the field’s continued success and relevance. There have been some important gains in this area in recent years, including:

By proposing a stream that is dedicated to the intersection of digital preservation and audiovisual archiving, we’re hoping to break down any the barriers to knowledge that may still exist so that digital preservation may be widely understood as a core competency in the A/V archiving field, rather than as a topic that gets pushed to the perimeter. Hopefully this will allow archiving community to become part of a wider conversation about digital preservation across disciplines.

Our community-based solicitation approach (via cloud-based shared documents) mirrors the need for more voices in digital preservation conversation (across fields, communities, specialties — no more operation in silos!). By providing alternate entry into a dedicated discussion of digital preservation, we welcome organizations and individuals who have limited resources or haven’t started strategically thinking about digital preservation — and feel out-of-place or unwelcome in “specialized” discussions spaces like NDSA regional/national meetings, iPRES, PASIG, DPC etc. We want to bring digital preservation discussion into a comfortable space for AV folks — somewhere like AMIA — where the community can dip its toes in without feeling lost in a wave of information. In the next month alone, there are many opportunities (for example, the listings on the DLF’s Community Calendar) where A/V professionals can participate to share their expertise in broader conversations.

We look forward to more visibility of the intersection between audiovisual preservation and digital preservation, and are excited to continue the developing conversation between these two fields.

Islandora: Another Lobstometre Bump

planet code4lib - Mon, 2016-02-22 15:50

The Islandora Lobstometre has gone up again this week, thanks to a renewed committed from Partner Simon Fraser University. We're finally up to our elbows!

We're also within $40,000 of our minimum funding goal to hire a Technical Lead for the Islandora Project, so if you or your organization are in a position to support Islandora, please do consider joining us this year.

District Dispatch: ALA DC Office welcomes U Michigan grad students

planet code4lib - Mon, 2016-02-22 14:17

Lucia Lee

University of Michigan School of Information is sending two students to ALA’s Office for Information Technology Policy (OITP).

Yi Yang








Once again, the American Library Association (ALA) Washington Office is participating in the Alternative Spring Break (ASB) program with the University of Michigan’s Information School.

The ASB program:

Creates the opportunity for students to engage in a service-oriented integrative learning experience; connects public sector organizations to the knowledge and abilities of students through a social impact project; and facilitates and enhances the relationship between the School and the greater community.

Two master’s students will be in residence during the week of Feb 29 to March 4  here at the Office for Information Technology Policy (OITP):  Yi Yang and Lucia Lee. Yi and Lucia both focus on human computer interaction at the University of Michigan. Yi completed a master’s degree in media production and design at Indiana University and undergraduate study in journalism and digital media design at Chonqing University. Lucia completed an undergraduate program in pharmacy at Seoul National University.

Yi and Lucia will spend their week to study and provide ideas about innovative communications directions for our dissemination and outreach. Additionally, they will participate in meetings and inside-the-beltway events to get a flavor of our work generally. We look forward to their arrival.

The post ALA DC Office welcomes U Michigan grad students appeared first on District Dispatch.

Mark E. Phillips: Poking at the End of Term 2012 Presidential Web Archive

planet code4lib - Mon, 2016-02-22 14:00

In preparation for some upcoming work with the End of Term 2016 crawl and a few conference talks I should be prepared for, I thought it might be a good thing to start doing a bit of long-overdue analysis of the End of Term 2012 (EOT2012) dataset.

A little bit of background for those that aren’t familiar with the End of Term program.  Back in 2008 a group of institutions got together to collaboratively collect a snapshot of the federal government with a hope to preserve the transition from the Bush administration into what became the Obama administration.  In 2012 this group added a few additional partners and set out to take another snapshot of the federal Web presence.

The EOT2008 dataset was studied as part of a research project funded by IMLS but the EOT2012 really hasn’t been looked at too much since it was collected.

As part of the EOT process, there are several institutions that crawl data that is directly relevant to their collection missions and then we all share what we collect with the group as a whole for any of the institutions who are interested in acquiring a set of the entire collected EOT archive.  In 2012 the Internet Archive, Library of Congress and the UNT Libraries were the institutions that committed resources to crawling. UNT also was interested in acquiring this archive for its collection which is why I have a copy locally.

For the analysis that I am interested in doing for this blog post, I took a copy of the combined CDX files for each of the crawling institutions as the basis of my dataset.  There was one combined CDX for each of IA, LOC, and UNT.

If you look at the three CDX files to see how many total lines are present, this can give you the number of URLs in the collection pretty easily.  This ends up being the following

Collecting Org Total CDX Entries % of EOT2012 Archive IA 80,083,182 41.0% LOC 79,108,852 40.5% UNT 36,085,870 18.5% Total 195,277,904 100%

Here is how that looks as a pie chart.

EOT2016 Collection Distribution

If you pull out all of the content hash values you get the number of “unique files by content hash” in the CDX file. By doing this you are ignoring repeat captures of the same content on different dates, as well the same content occurring at different URL locations on the same or on different hosts.

Collecting Org Unique CDX Hashes % of EOT2012 Archive IA 45,487,147 38.70% LOC 50,835,632 43.20% UNT 25,179,867 21.40% Total 117,616,637 100.00%

Again as a pie chart

Unique hash values

It looks like there was a little bit of change in the percentages of unique content with UNT and LOC going up a few percentage points and IA going down.  I would guess that this is to do with the fact that for the EOT projects,  the IA conducted many broad crawls at multiple times during the project that resulted in more overlap.

Here is a table that can give you a sense of how much duplication (based on just the hash values) there is in each of the collections and then overall.

Collecting Org Total CDX Entries Unique CDX Hashes Duplication IA 80,083,182 45,487,147 43.20% LOC 79,108,852 50,835,632 35.70% UNT 36,085,870 25,179,867 30.20% Total 195,277,904 117,616,637 39.80%

You will see that UNT has the least duplication (possibly more focused crawls with less repeating) than IA (broader with more crawls of the same data?)

Questions to answer.

There were three questions that I wanted to answer for this look at the EOT data.

  1. How many hashes are common across all CDX files
  2. How many hashes are unique to only one CDX file
  3. How many hashes are shared by two CDX files but not by the third.
Common Across all CDX files

The first was pretty easy to answer and just required taking all three lists of hashes, and identifying which hash appears in each list (intersection).

There are only 237,171 (0.2%) hashes shared by IA, LOC and UNT.

Content crawled by all three

You can see that there is a very small amount of content that is present in all three of the CDX files.

Unique Hashes to one CDX file

Next up was number of hashes that were unique to a collecting organizations CDX file. This took two steps, first I took the difference of two hash sets, took that resulting set and took the difference from the third set.

Collecting Org Unique Hashes Unique to Collecting Org Percentage Unique IA 45,487,147 42,187,799 92.70% LOC 50,835,632 48,510,991 95.40% UNT 25,179,867 23,269,009 92.40%

Unique to a collecting org

It appears that there is quite a bit of unique content in each of the CDX files.  With over 92% or more of the content being unique to the collecting organization.

Common between two but not three CDX files

The final question to answer was how much of the content is shared between two collecting organizations but not present in the third’s contribution.

Shared by: Unique Hashes Shared by IA and LOC but not UNT 1,737,980 Shared by IA and UNT but not LOC 1,324,197 Shared by UNT and LOC but not IA 349,490 Closing

Unique and shared hashes

Based on this brief look at how content hashes are distributed across the three CDX files that make up the EOT2012 archive, I think a takeaway is that there is very little overlap between the crawling that these three organizations carried out during the EOT harvests.  Essentially 97% of content hashes are present in just one repository.

I don’t think this tells all of the story though.  There are quite a few caveats that need to be taken into account.  First of all this only takes into account the content hashes that are included in the CDX files.  If you crawl a dynamic webpage and it prints out the time each time you visit the page, you will get a different content hash.  So “unique” is only in the eyes of the hash function that is used.

There are quite a few other bits of analysis that can be done with this data, hopefully I’ll get around to doing a little more in the next few weeks.

If you have questions or comments about this post,  please let me know via Twitter.

LITA: 10 iPhone Tricks Every Librarian Should Know

planet code4lib - Mon, 2016-02-22 14:00

We as librarians deal with questions every day. These days, questions tend to be about devices. We can’t be expected to know everything about every device, but it’s always good to have a few tools ready at our disposal. Here are some handy tricks to keep at the ready if anyone comes at you with an iPhone and demands service.

  • Use your headphones as a camera shutter – It’s not a selfie stick (thank god) but it’s one way to trigger a remote shutter on your camera. Simply plug in the earbuds that came with your iPhone and use the volume buttons to snap away.
How to turn off an iPhone may be the simplest question you get. (Courtesy Apple)


    • Check which apps use the most battery – Some apps eat battery like it’s candy. Go into  Settings >General > Battery > Battery Usage and find out which ones do the most damage so you can turn them off.


  • …and take up the most space – If space is at a premium, go into settings > general > storage and icloud usage > manage storage. Tap the app to delete it if it’s taking up too much space and you don’t use it.
  • Use Search to find apps and stuff faster – I’m surprised how often people (myself included) don’t use this feature: swipe left from your home screen to pull up a search box. Then type in the app (or contact or music file or book or…) you’re looking for.
  • Bundle apps in folders – And once you find that app, group it with others that have a similar purpose by pressing on the app icon until it vibrates and dragging it toward another app. iOS will give the group a name (“Productivity” or “Entertainment”) but you can always change it.


  • Hard vs soft resets – It is so important to learn the differences between these two. Doing a soft reset is where you press an iPhone’s power and home buttons at the same time until the screen goes blank. A hard reset is a full factory reset, which wipes everything off your phone. A soft reset will fix 90% of most problems. If you have to do a hard reset, make sure you backup the device on a computer first.
  • Clear browser history, cookies in Safari – You know how you can clear your browser history and your cookies on your computer, and it can improve your web browsing? You can do the same thing with an iPhone. Go into Settings > Safari and click on “Clear History and Cookies”.
    • Find a lost phone’s owner – Show of hands: how many times have you found a lost iPhone by a computer? If it has Siri, you can simply ask “Whose phone is this?” and she will tell you.
    • Recover recently deleted photos (If you deleted a photo that you want to get back, open up your photos app, click on “All Photos” and then go to “Albums.” You will notice that there is an album called “Recently Deleted.” Within that album is all of the photos that you have deleted within the last 30 days.)


  • Search in the “settings” menu – Forget how to change a wallpaper? Or maybe want to reset screen timeout? Type what you’re looking for in the search in settings to find what you need.

Do you have any other iPhone tools librarians need to know? Add them in the comments!

(Sources: 1, 2, 3, picture)

Open Knowledge Foundation: ILDA to join Open Data Day Mini grants!

planet code4lib - Mon, 2016-02-22 12:43

This post was written by Fabrizio Scrollini

We are happy to announce that The Latin American Open Data initiative (ILDA) is joining the global efforts to enrich Open Data Day mini aims to promote and support the engagement of the Latin American community on Open Data Day. Our support will go to Latin American individuals and organisation that already applied to this year’s call.

In 2015, almost every country in Latin America held an Open Data Day event. Latin America is one of the most active regions in the world in the field of open data, with a number of growing national and city-based organisations, a strong community and very creative ideas. We hope this contribution will keep the movement growing by exploring relevant challenges for an Open Latin America that will lead to new forms of innovation and strengthen relationships between citizens and their governments.

See announcement in Spanish on ILDA blog:


DuraSpace News: VIVO Updates for February 21–Griffith Experts, Ontologies for Biomedicine, VIVO SHARE Webinars, CNI Workshop

planet code4lib - Mon, 2016-02-22 00:00

From Mike Conlon, VIVO Project Director

Griffith University in Brisbane Australia has finished a major upgrade to its VIVO-based information systems.  Formerly known as Griffith University Research Hub, the new system, Griffith Experts organizes more than 74,000 publications of Griffith in an attractive, easy to use portal.  Check it out!


Subscribe to code4lib aggregator