Code4Lib Journal: The Geospatial Metadata Manager’s Toolbox: Three Techniques for Maintaining Records
The following post is by Ted Westervelt, head of acquisitions and cataloging for U.S. Serials in the Arts, Humanities & Sciences section at the Library of Congress.
Issuing the Recommended Format Specifications
When the Recommended Format Specifications were issued last summer, the Library of Congress was making an attempt to come to grips with the challenges of building a comprehensive collection when the formats in which that content are being created are broad and getting broader. The charge of the Library is to acquire content both broadly and deeply, regardless of geography, subject or format. But it is not enough to merely collect this content. The Library must also manage and preserve that content, so that our patrons may have access to it, both those who use the Library’s collection today and those who will use it in decades and centuries to come.
This charge makes the issue of the formats exceptionally important. This has been true with physical formats, but it becomes even more challenging when the world of digital creation is included. And there is no denying that a major part of any well-rounded and comprehensive collection consists of digital materials. Yet the great advantage of digital content – its flexibility in terms of how it can be created and distributed – is also a potential weakness, as that flexibility can require more resources to ensure that it is preserved and remains accessible.
Therefore, the Library some years back began working on identifying the characteristics of creative works, both physical and digital, which encourage preservation and long-term access. Using an array of staff within the Library who are experts in the business of acquisitions, who understand the needs of our patrons and the technologies of creative works, the Library was able to develop the Recommended Format Specifications.
Goals of the Recommended Formats
The fundamental goal of the Recommended Format Specifications document (PDF) was to provide guidance, both for staff in the Library and for our external stakeholders who share our interest in preservation and long-term access of creative works. For Library staff, the Recommended Formats provide them with lists of characteristics that can help them make an informed decision when it comes to acquiring content for the collection. An acquisitions specialist can determine whether potential acquisitions might need more or fewer resources on the part of the Library to ensure that they remain accessible to patrons as the years go on. Likewise, whether it is a creator, publisher, producer, vendor or archiving institution, the Recommended Formats offer them some informed advice on what they should be using or looking for when creating, managing, distributing or saving creative works. It is not the final word, but the Recommended Formats do provide an educated analysis of the technical aspects of creative works.
In issuing the Recommended Formats, the Library knew that it was not drawing a line under the matter and could leave it there. The business of preservation and long-term access is one beyond the scope of a single institution to manage on its own, especially with the proliferation of digital content in various formats. This is an effort that can only be accomplished by collaboration and cooperation among all the parties that have an interest in ensuring that content lasts and remains accessible. And that common interest extends to anyone or any institution that is involved with creative works, from the person who creates a work, to the publisher or producer who makes it ready for distribution to the vendor who sells it to the individual or institution who wants to keep it. Everyone has a vested interest in ensuring that these works last.
Moreover, there can be no disputing that fact that how works are created and the technical characteristics they have are changing all the time. To create a list of the technical characteristics in 2014 and then expect that to remain solid and unchanging would be folly. So, from the start, the Library has been actively committed to getting feedback from those other stakeholders so that it can identify the aspects that need improvement as part of an annual cycle of revisions to the Recommended Formats. By addressing them on a yearly basis, and by actively soliciting the input of others, we increase the likelihood that the Recommended Formats will remain accurate and useful, not merely for the Library but for any other stakeholder who cares about preservation and long-term access.
Updating the Recommended Formats
Almost as soon as the Recommended Formats were issued in June 2014, the Library has been communicating them to others and has made it as clear as possible that their feedback is actively encouraged so that we can make the Recommended Formats the best they can be. And we are very glad to say that we have received a lot of very positive and constructive feedback from across the range of stakeholders.
We were very pleased at the responses from some of the national libraries, such as the National Library of New Zealand, which is going to refer questions on preferred formats to this document, and the British Library, who found the Recommended Formats useful in developing their own guidance for legal deposit submissions. And we are happy that the positive feedback extended beyond the library world, ranging from experts in photography to the Recording Industry Association of America (RIAA).
We were just as pleased at the receipt of constructive advice on how the Recommended Formats could be improved. Some of this feedback was very specific. There were some very beneficial revisions to the file formats for Still Images as a result of both internal consultation and feedback from experts in the field. Likewise, the generally supportive response from the RIAA included suggestions on changes to the metadata for Audio Works that we have included.
But the best feedback we received is reflected in the layout and presentation. This starts with the new name, the Recommended Formats Statement, which we hope will make it clearer that this document is not technical specifications but a broader guide for a larger pool of users. And we are very pleased to present the statement in a new, tabular layout, suggested by our colleagues at the National Agricultural Library, which we think makes the content clearer and far more accessible. Between that and highlighting the metadata by arranging it in lists, we feel this is a document that, now that we know others are interested in it, will be all the more useful for them and for us.
But please let us know! This is the 2015-2016 version of the Recommended Formats Statement (PDF), which means that, now that it is done, we are ready to start hearing about what we should do to make next year’s version even more useful, whether that it is in change to the content or improvements to the layout. This is an ongoing process and one in which we actively seek the feedback and participation or our colleagues from throughout the lifecycle of creative works. We hope that together, we can use the Recommended Formats Statement as a good first step to enable us all to enjoy creative works which will last so that future generations can enjoy them tomorrow as much as we do today.
Brembs starts with this graph, showing that the result of the negotiation between librarians and publishers has been price increases vastly outstripping inflation. This is not unexpected, the negotiation is not between equals:
Given this publisher track record, I think it is quite reasonable to remain somewhat skeptical that in the hypothetical future scenario of the librarian negotiating APCs with publishers, the publisher-librarian partnership will not again be lopsided in the publishers’ favor.We already see this, for example with APC double-dipping. Brembs continues:
So while the currently paid APCs per article (about US$3k) seem comparatively cheap (i.e., compared to currently US$5k for each subscription article), publishers would not be offering them, if that would entail a drop in their profit margins, which currently are on the order of 40%. As speculated before, a large component of current publisher revenue (of about US$10bn annually) appears to be spent on making sure nobody actually reads the articles we write (i.e., paywalls). This probably explains why the legacy subscription publishers today, despite receiving all their raw material for free and getting their quality control (peer-review) also done for free, still only post profit margins under 50%. Given that many non-profit open access organizations post actual publishing costs of under US$100, it is hard to imagine what else other than paywall infrastructure would cost that much, given that the main difference between these journals are the paywalls and not much else. By the way, precisely because the actual publishing process is so cheap, the majority of all open access journals do not even bother to charge any APCs at all. There is something beyond profits that makes subscription access so expensive and any OA scenario would make these costs disappear.But APCs don't merely cover costs and contribute to profits, they are also a signalling mechanism:
It is hence not surprising that also among open access journals, APCs correlate with their standing in the rankings and hence their selectivity. It is reasonable to assume that authors in the future scenario will do the same they are doing now: compete not for the most non-selective journals (i.e., the cheapest), but for the most selective ones (i.e., the most expensive). Why should that change, only because now everybody is free to read the articles? The new publishing model would even exacerbate this pernicious tendency, rather then mitigate it. After all, it is already (wrongly) perceived that the selective journals publish the best science. If APCs become predictors of selectivity because selectivity is expensive, nobody will want to publish in a journal without or with low APCs, as this will carry the stigma of not being able to get published in the expensive/selective journals.And for authors, who do not pay the APCs, high APCs are a feature not a bug:
Moreover, if libraries keep paying the APCs, the ones who so desperately want the Rolls Royce don’t even have to pay the bill. Doesn’t this mean that any publisher who does not shoot for at least US$5k in their average APCs (better more) fails to fulfill their fiduciary duty in not one but two ways: not only will they lose out on potential profit, due to their low APCs, they will also lose market share and prestige. Thus, in this new scenario, if anything, the incentives for price hikes across the board are even higher than what they are today. Isn’t this scenario a perfect storm for runaway hyperinflation?Poynder points out that the big beneficiaries of Open Access are the big publishers:
And to the chagrin of OA advocates, much of the revenue generated by APCs is currently being sucked up by traditional publishers like Elsevier and Wiley, especially through the use of hybrid OA.
In reviewing the figures for 2013-2014, for instance, Wellcome’s Robert Kiley reported that Elsevier and Wiley “represent some 40% of our total APC spend, and are responsible for 35% of all Trust-funded papers published under the APC model.” (74% of the papers concerned were published as hybrid OA).
The story is similar at RCUK. As the Times Higher noted in April: “Publishers Elsevier and Wiley have each received about £2 million in article processing charges from 55 institutions as a result of RCUK’s open access policy.” In total RCUK paid out £10m, which is in addition to the subscription fees universities are already paying.It is clear that hybrid open access, in which authors pay for their paper to eventually be made open access in a subscription journal, is the publisher's way of subverting the open access movement:
- Hybrid is not gold open access, because the journal is not open access and nor is the paper for the initial period, the most valuable period to the publisher. Years ago, Highwire Press introduced the "moving wall', by which publishers of subscription journals made all papers open access after 6 or 12 months. Enabling this did not significantly impair the publishers' business. So there are no costs associated with the APC charge in a hybrid journal.
- Hybrid is not green open access, in which open access comes from a self-archived version of the paper in an institutional repository or the author's web-site. Publishers insist that the author transfer copyright to them, and use their (alleged) ownership of the copyright to require that the paper not be open access for an embargo period.
- Hybrid is a way to kill off institutional repositories. Since open access from the publisher after the embargo expires satisfies the funder's mandate, there is no incentive for authors to deposit their work in an institutional repository. And those public-spirited authors who take the trouble to deposit their work in their institution's repository are likely to find that it has been outsourced to, wait for it, Elsevier! The pernicious Judy Russell, Dean of Libraries at the University of Florida, is spearheading this surrender to the big publishers.
Poynder analyses at length the attempt to fix the problem of hybrid journals and their embargo periods via the "copy request" button, and concludes that it doesn't work because authors don't respond to requests. More important, the legality of the button is unclear, so university lawyers won't agree to its implementation. Again, since Elsevier is likely to be running the repository, the button won't be implemented even if the University's lawyers agree.
Poynder and Brembs both argue that APCs are a major contributor to the problems of open access. Poynder writes:
in pioneering use of article-processing charges PLOS (along with fellow OA publisher BioMed Central) created the enabling environment that has allowed subscription publishers to appropriate gold open access. As such, we can expect the current oligopoly to continue to dominate scholarly publishing, and in an undesirable way.As I see it, the fundamental problem is not APCs as such, it is what the APCs buy. Submitting an article to a subscription journal was an understandable transaction. The author gave the publisher something of value that they (arguably) owned, namely the copyright on their work, and received in lieu of money the valuable service of having their work published. But paying an APC to a hybrid journal is not an understandable transaction. The author gives the publisher the copyright on their work, and the author's institution gives the publisher money to cover the costs of publication. The publisher gets both the copyright and the money. This is not equitable.
Transfer of copyright to the publisher is the problem. The transfer is not necessary for publication; all the publisher needs is a non-exclusive license to publish. Suppose copyright transfer when an APC was paid transferred copyright to the payer, the author's institution. Publishers could choose what they wanted from papers subject to an open access mandate:
- They could have the copyright, and use it to enforce an embargo.
- Or they could have the money and be unable to enforce an embargo.
Publishers might (and do) argue that their systems are incapable of publishing material whose copyright they don't own. This cannot be true. If it were, they would be unable to publish any work by employees of the US Federal government. Work by officers and employees of the government as part of their official duties is "a work of the United States government" and, as such, is not entitled to domestic copyright protection under U.S. law. So, inside the US there is no copyright to transfer, and outside the US the copyright is owned by the US government, not by the employee. It is easy to find papers that apparently violate this, such as James Hansen et al's Global Temperature Change. It carries the statement "© 2006 by The National Academy of Sciences of the USA" and states Hansen's affiliation as "National Aeronautics and Space Administration Goddard Institute for Space Studies".
The HighWire Press "moving wall" experience shows that what hybrid publishers are offering, open access after an embargo, costs them little or nothing. Paying them both with the APC and the copyright for something that costs them very little is unjustifiable, and explains why hybrid publication is so popular with publishers. Equally, the experience shows that the delay from an embargo removes most of the value of eventual open access, so paying for publication with the copyright is adequate.
Brembs graph supports my skepticism that librarians are capable of doing anything that might annoy the big publishers. So an alternative, more radical suggestion is for research funders to make clear in their research grants that papers reporting the result of their funding are works for hire and that the copyright in them thus belongs to the research funders and not to the authors or their institutions. This is hardly innovative, I recently agreed just such a provision in respect of work at Stanford funded by a major foundation.
You may have noticed that I write that publishers "allegedly" own the copyright of the papers. They certainly claim that they own the copyright, but is this claim factually correct? Not in at least one personal example that still rankles. This screen-grab shows ACM claiming to own the copyright on the version of Keeping bits safe: how hard can it be? that appeared in CACM in November 2010.
Contrast this with the statement on the same paper as it earlier appeared in ACM Queue. ACM's claim to own the copyright in this case is false; I never signed a copyright transfer. It is more than four years since I notified them of this problem and was promised it would be fixed, but it still isn't. I wonder what would happen if I sent ACM a DMCA takedown for the CACM version?
I'm not the only person who believes that the publisher's claims to own the copyright on the papers they publish is shaky. Cory Doctorow has argued, as I do, that in many cases the person signing the transfer does not in fact own the copyright, so the transfer they signed is not valid. I am staff at Stanford, so anything I write on Stanford's time is a work for hire. But I was only half-time, and I wrote Keeping bits safe: how hard can it be? on my own time. So ACM Queue's statement is correct, and CACM's is false.