Journal of Web Librarianship: A Review of “OPTIMIZING ACADEMIC LIBRARY SERVICES IN THE DIGITAL MILIEU: DIGITAL DEVICES AND THEIR EMERGING TRENDS” A Review of “OPTIMIZING ACADEMIC LIBRARY SERVICES IN THE DIGITAL MILIEU: DIGITAL DEVICES AND THEIR...
Journal of Web Librarianship: A Review of “MORE TECHNOLOGY FOR THE REST OF US: A SECOND PRIMER ON COMPUTING FOR THE NON-IT LIBRARIAN” A Review of “MORE TECHNOLOGY FOR THE REST OF US: A SECOND PRIMER ON COMPUTING FOR THE NON-IT LIBRARIAN”Courtney, Nancy...
Journal of Web Librarianship: A Review of “INFORMATION SERVICES AND DIGITAL LITERACY: IN SEARCH OF THE BOUNDARIES OF KNOWING” A Review of “INFORMATION SERVICES AND DIGITAL LITERACY: IN SEARCH OF THE BOUNDARIES OF KNOWING”Huvila, Isto. Oxford, UK:...
Journal of Web Librarianship: A Review of “HANDBOOK OF INDEXING TECHNIQUES: A GUIDE FOR BEGINNING INDEXERS” A Review of “HANDBOOK OF INDEXING TECHNIQUES: A GUIDE FOR BEGINNING INDEXERS”Fetters, Linda K. Medford, NJ: Information Today, 2013, 178 pp., ...
Journal of Web Librarianship: A Review of “GOING BEYOND GOOGLE AGAIN: STRATEGIES FOR USING AND TEACHING THE INVISIBLE WEB” A Review of “GOING BEYOND GOOGLE AGAIN: STRATEGIES FOR USING AND TEACHING THE INVISIBLE WEB”Devine, Jane, and Francine Egger...
Journal of Web Librarianship: Assessment of Digitized Library and Archives Materials: A Literature Review Assessment of Digitized Library and Archives Materials: A Literature Review
I'm David Rosenthal from the LOCKSS (Lots Of Copies Keep Stuff Safe) Program at the Stanford Libraries. We've been sustainably preserving digital information for a reasonably long time, and I'm here to talk about some of the lessons we've learned along the way that are relevant for research data.
In May 1995 Stanford Libraries' HighWire Press pioneered the shift of academic journals to the Web by putting the Journal of Biological Chemistry on-line. Almost immediately librarians, who pay for this extraordinarily expensive content, saw that the Web was a far better medium than paper for their mission of getting information to current readers. But they have a second mission, getting information to future readers. There were both business and technical reasons why, for this second mission, the Web was a far worse medium than paper:
- The advent of the Web forced libraries to change from purchasing a copy of the content to renting access to the publisher's copy. If the library stopped paying the rent, it would lose access to the content.
- Because in the Web the publisher stored the only copy of the content, and because it was on short-lived, easily rewritable media, the content was at great risk of loss and damage.
The LOCKSS Program started in October 1998 with the goal of replicating the paper library system for the Web. We built software that allowed libraries to deploy a PC, a LOCKSS box, that was the analog for the Web of the paper library's stacks. By crawling the Web, the box collected a copy of the content to which the library subscribed and stored it. Readers could access their library's copy if for any reason they couldn't get to the publisher's copy. Boxes at multiple libraries holding the same content cooperated in a peer-to-peer network to detect and repair any loss or damage.
The program was developed and went into early production with initial funding from the NSF, and then major funding from the Mellon Foundation, the NSF and Sun Microsystems. But grant funding isn't a sustainable business model for digital preservation. In 2005, the Mellon Foundation gave us a grant with two conditions; we had to match it dollar-for-dollar and by the end of the grant in 2007 we had to be completely off grant funding. We made both conditions, and we have (with one minor exception which I will get to later) been off grant funding and in the black ever since. The LOCKSS Program has two businesses:
- We develop, and support libraries that use, our open-source software for digital preservation. The software is free, libraries pay for support. We refer to this as the "Red Hat" business model
- Under contract to a separate not-for-profit organization called CLOCKSS run jointly by publishers and libraries, we use our software to run a large dark archive of e-journals and e-books. This archive has recently been certified as a "Trustworthy Repository" after a third-party audit which awarded it the first-ever perfect score in the Technologies, Technical Infrastructure, Security category.
- Do nothing. In that case we can stop worrying about bit rot, format obsolescence, operator error and all the other threats digital preservation systems are designed to combat. These threats are dwarfed by the threat of can't afford to preserve. It is going to mean that more than 50% of the stuff that should be available to future readers isn't.
- Double the budget for digital preservation. This is so not going to happen. Even if it did, it wouldn't solve the problem because, as I will show, the cost per unit content is going to rise.
- Halve the cost per unit content of current systems. This can't be done with current architectures. Yesterday morning I gave a talk at the Library of Congress describing a radical re-think of long-term storage architecture that might do the trick. You can find the text of the talk on my blog.
As an engineer, I'm used to using rules of thumb. The one I use to summarize most of the cost research is that ingest takes half the lifetime cost, preservation takes one third, and access takes one sixth.
Research grants might be able to fund the ingest part, this is a one-time up-front cost. But preservation and access are ongoing costs for the life of the data, so grants have no way to cover them. We've been able to ignore this problem for a long time, for two reasons. From at least 1980 to 2010 costs followed Kryder's Law, the disk analog of Moore's Law, dropping 30-40%/yr. This meant that, if you could afford to store the data for a few years, the cost of storing it for the rest of time could be ignored, because of course Kryder's Law would continue forever. The second is that as the data got older, access to it was expected to become less frequent. Thus the cost of access in the long term could be ignored.
Kryder's Law held for three decades, an astonishing feat for exponential growth. Something that goes on that long gets built into people's model of the world, but as Randall Munroe points out, in the real world exponential curves cannot continue for ever. They are always the first part of an S-curve.
This graph, from Preeti Gupta of UC Santa Cruz plots the cost per GB of disk drives against time. In 2010 Kryder's Law abruptly stopped. In 2011 the floods in Thailand destroyed 40% of the world's capacity to build disks, and prices doubled. Earlier this year they finally got back to 2010 levels. Industry projections are for no more than 10-20% per year going forward (the red lines on the graph). This means that disk is now about 7 times as expensive as was expected in 2010 (the green line), and that in 2020 it will be between 100 and 300 times as expensive as 2010 projections.
Thanks to aggressive marketing, it is commonly believed that "the cloud" solves this problem. Unfortunately, cloud storage is actually made of the same kind of disks as local storage, and is subject to the same slowing of the rate at which it was getting cheaper. In fact, when all costs are taken in to account, cloud storage is not cheaper for long-term preservation than doing it yourself once you get to a reasonable scale. Cloud storage really is cheaper if your demand is spiky, but digital preservation is the canonical base-load application.
You may think that cloud storage is a competitive market; in fact it is dominated by Amazon. When Google recently started to get serious about competing, they pointed out that Amazon's margins on S3 may have been minimal at introduction, by then they were extortionate:
cloud prices across the industry were falling by about 6 per cent each year, whereas hardware costs were falling by 20 per cent. And Google didn't think that was fair. ... "The price curve of virtual hardware should follow the price curve of real hardware."Notice that the major price drop triggered by Google was a one-time event; it was a signal to Amazon that they couldn't have the market to themselves, and to smaller players that they would no longer be able to compete.
In fact commercial cloud storage is a trap. It is free to put data in to a cloud service such as Amazon's S3, but it costs to get it out. For example, getting your data out of Amazon's Glacier without paying an arm and a leg takes 2 years. If you commit to the cloud as long-term storage, you have two choices. Either keep a copy of everything outside the cloud (in other words, don't commit to the cloud), or stay with your original choice of provider no matter how much they raise the rent.
The storage part of preservation isn't the only on-going cost that will be much higher than people expect, access will be too. In 2010 the Blue Ribbon Task Force on Sustainable Digital Preservation and Access pointed out that the only real justification for preservation is to provide access. With research data this is a difficulty, the value of the data may not be evident for a long time. Shang dynasty astronomers inscribed eclipse observations on animal bones. About 3200 years later, researchers used these records to estimate that the accumulated clock error was about 7 hours. From this they derived a value for the viscosity of the Earth's mantle as it rebounds from the weight of the glaciers.
In most cases so far the cost of an access to an individual item has been small enough that archives have not charged the reader. Research into past access patterns to archived data showed that access was rare, sparse, and mostly for integrity checking.
But the advent of "Big Data" techniques mean that, going forward, scholars increasingly want not to access a few individual items in a collection, but to ask questions of the collection as a whole. For example, the Library of Congress announced that it was collecting the entire Twitter feed, and almost immediately had 400-odd requests for access to the collection. The scholars weren't interested in a few individual tweets, but in mining information from the entire history of tweets. Unfortunately, the most the Library of Congress can afford to do with the feed is to write two copies to tape. There's no way they can afford the compute infrastructure to data-mine from it. We can get some idea of how expensive this is by comparing Amazon's S3, designed for data-mining type access patterns, with Amazon's Glacier, designed for traditional archival access. S3 is currently at least 2.5 times as expensive; until recently it was 5.5 times.
The real problem here is that scholars are used to having free access to library collections, but what scholars now want to do with archived data is so expensive that they must be charged for access. This in itself has costs, since access must be controlled and accounting undertaken. Further, data-mining infrastructure at the archive must have enough performance for the peak demand but will likely be lightly used most of the time, increasing the cost for individual scholars. A charging mechanism is needed to pay for the infrastructure. Fortunately, because the scholar's access is spiky, the cloud provides both suitable infrastructure and a charging mechanism.
For smaller collections, Amazon provides Free Public Datasets, Amazon stores a copy of the data with no charge, charging scholars accessing the data for the computation rather than charging the owner of the data for storage.
Even for large and non-public collections it may be possible to use Amazon. Suppose that in addition to keeping the two archive copies of the Twitter feed on tape, the Library kept one copy in S3's Reduced Redundancy Storage simply to enable researchers to access it. For this year, it would have averaged about $4100/mo, or about $50K. Scholars wanting to access the collection would have to pay for their own computing resources at Amazon, and the per-request charges; because the data transfers would be internal to Amazon there would not be bandwidth charges. The storage charges could be borne by the library or charged back to the researchers. If they were charged back, the 400 initial requests would each need to pay about $125 for a year's access to the collection, not an unreasonable charge. If this idea turned out to be a failure it could be terminated with no further cost, the collection would still be safe on tape. In the short term, using cloud storage for an access copy of large, popular collections may be a cost-effective approach. Because the Library's preservation copy isn't in the cloud. they aren't locked-in.
One thing it should be easy to agree on about digital preservation is that you have to do it with open-source software; closed-source preservation has the same fatal "just trust me" aspect that closed-source encryption (and cloud storage) suffer from. Sustaining open source preservation software is interesting, because unlike giants like Linux, Apache and so on it is a niche market with little commercial interest.
We have managed to sustain open-source preservation software well for 7 years, but have encountered one problem. This brings me to the exception I mentioned earlier. To sustain the free software, paid support model you have to deliver visible value to your customers regularly and frequently. We try to release updated software every 2 months, and new content for preservation weekly. But this makes it difficult to commit staff resources to major improvements to the infrastructure. These are needed to address problems that don't impact customers yet, but will in a few years unless you work on them now.
The Mellon Foundation supports a number of open-source initiatives, and after discussing this problem with them they gave us a small grant specifically to work on enhancements to the LOCKSS system such as support for collecting websites that use AJAX, and for authenticating users via Shibboleth. Occasional grants of this kind may be needed to support open-source preservation infrastructure generally, even if pay-for-support can keep it running.
Unfortunately, economics aren't the only hard problem facing the long-term storage of data. There are serious technical problems too. Lets start by examining the technical problem in its most abstract form. Since 2007 I've been using the example of "A Petabyte for a Century". Think about a black box into which you put a Petabyte, and out of which a century later you take a Petabyte. Inside the box there can be as much redundancy as you want, on whatever media you choose, managed by whatever anti-entropy protocols you want. You want to have a 50% chance that every bit in the Petabyte is the same when it comes out as when it went in.
Now consider every bit in that Petabyte as being like a radioactive atom, subject to a random process that flips it with a very low probability per unit time. You have just specified a half-life for the bits. That half-life is about 60 million times the age of the universe. Think for a moment how you would go about benchmarking a system to show that no process with a half-life less than 60 million times the age of the universe was operating in it. It simply isn't feasible. Since at scale you are never going to know that your system is reliable enough, Murphy's law will guarantee that it isn't.
Here's some back-of-the-envelope hand-waving. Amazon's S3 is a state-of-the-art storage system. Its design goal is an annual probability of loss of a data object of 10-11. If the average object is 10K bytes, the bit half-life is about a million years, way too short to meet the requirement but still really hard to measure.
Note that the 10-11 is a design goal, not the measured performance of the system. There's a lot of research into the actual performance of storage systems at scale, and it all shows them under-performing expectations based on the specifications of the media. Why is this? Real storage systems are large, complex systems subject to correlated failures that are very hard to model.
Worse, the threats against which they have to defend their contents are diverse and almost impossible to model. Nine years ago we documented the threat model we use for the LOCKSS system. We observed that most discussion of digital preservation focused on these threats:
- Media failure
- Hardware failure
- Software failure
- Network failure
- Natural Disaster
- Operator error
- External Attack
- Insider Attack
- Economic Failure
- Organizational Failure
Consider two storage systems with the same budget over a decade, one with a loss rate of zero, the other half as expensive per byte but which loses 1% of its bytes each year. Clearly, you would say the cheaper system has an unacceptable loss rate.
However, each year the cheaper system stores twice as much and loses 1% of its accumulated content. At the end of the decade the cheaper system has preserved 1.89 times as much content at the same cost. After 30 years it has preserved more than 5 times as much at the same cost.
Adding each successive nine of reliability gets exponentially more expensive. How many nines do we really need? Is losing a small proportion of a large dataset really a problem? The canonical example of this is the Internet Archive's web collection. Ingest by crawling the Web is a lossy process. Their storage system loses a tiny fraction of its content every year. Access via the Wayback Machine is not completely reliable. Yet for US users archive.org is currently the 150th most visited site, whereas loc.gov is the 1519th. For UK users archive.org is currently the 131st most visited site, whereas bl.uk is the 2744th.
Why is this? Because the collection was always a series of samples of the Web, the losses merely add a small amount of random noise to the samples. But the samples are so huge that this noise is insignificant. This isn't something about the Internet Archive, it is something about very large collections. In the real world they always have noise; questions asked of them are always statistical in nature. The benefit of doubling the size of the sample vastly outweighs the cost of a small amount of added noise. In this case more really is better.
To sum up, the good news is that sustainable preservation of digital content such as research data is possible, and the LOCKSS Program is an example.
The bad news is that people's expectations are way out of line with reality. It isn't possible to preserve nearly as much as people assume is already being preserved, nearly as reliably as they assume it is already being done. This mismatch is going to increase. People don't expect more resources yet they do expect a lot more data. They expect that the technology will get a lot cheaper but the experts no longer believe it will.
Research data, libraries and archives are a niche market. Their problems are technologically challenging but there isn't a big payoff for solving them, so neither industry nor academia are researching solutions. We end up cobbling together preservation systems out of technology intended to do something quite different, like backups.
Journal of Web Librarianship: A Review of “THE CSS3 ANTHOLOGY: TAKE YOUR SITES TO NEW HEIGHTS, 4th ed.” A Review of “THE CSS3 ANTHOLOGY: TAKE YOUR SITES TO NEW HEIGHTS, 4th ed.”Andrew, Rachel. Melbourne, Australia: SitePoint, 2012, 400 pp., $28.08,...
Journal of Web Librarianship: A Review of “BUILDING AND MANAGING E-BOOK COLLECTIONS: A HOW-TO-DO-IT MANUAL FOR LIBRARIANS” A Review of “BUILDING AND MANAGING E-BOOK COLLECTIONS: A HOW-TO-DO-IT MANUAL FOR LIBRARIANS”Kaplan, Richard, ed. Chicago, IL:...
Social media is something I have in common with popular library speaker Joe Murphy. We’ve both given talks about the power of social media at loads of conferences. I love the radical transparency that social media enables. It allows for really authentic connection and also really authentic accountability. So many bad products and so much bad behavior have come to light because of social media. Everyone with a cell phone camera can now be an investigative reporter. So much less can be swept under the rug. It’s kind of an amazing thing.
But what’s disturbing is what has not become more transparent. Sexual harassment for one. When a United States senator doesn’t feel like she can name the man who told her not to lose weight after having her baby because “I like my girls chubby,” then we know this problem is bigger than just libraryland.
It’s been no secret among many women (and some men) who attend and speak at conferences like Internet Librarian and Computers in Libraries that Joe Murphy has a reputation for using these conferences as his own personal meat markets. Whether it’s true or not, I don’t know. I’ve known these allegations since before 2010, which was when I had the privilege of attending a group dinner with him.
He didn’t sexually harass anyone at the table that evening, but his behavior was entitled, cocky, and rude. He barely let anyone else get a word in edgewise because apparently what he had to say (in a group with some pretty freaking illustrious people) was more important than what anyone else had to say. The host of the dinner apologized to me afterwards and said he had no idea what this guy was like. And that was the problem. This information clearly wasn’t getting to the people who needed it most; particularly the people who invited him to speak at conferences. For me, it only cemented the fact that it’s a man’s world (even in our female-dominated profession) and men can continue to get away with and profit from offering more flash than substance and behaving badly.
Why don’t we talk about sexual harassment in the open? I can only speak from my own experience not revealing a public library administrator who sexually harassed me at a conference. First, I felt embarrassed, like maybe I’d encouraged him in some way or did something to deserve it. Second, he was someone I’d previously liked and respected and a lot of other people liked and respected him, and I didn’t want to tarnish his reputation over something that didn’t amount to that much. Maybe also the fact that he was so respected also made me scared to say something, because, in the end, it could end up hurting me.
People who are brave enough to speak out about sexual harassment and name names are courageous. As Barbara Fister wrote, they are whistleblowers. They protect other women from suffering a similar fate, which is noble. When Lisa Rabey and nina de jesus (AKA #teamharpy) wrote about behavior from Joe Murphy that many of us had been hearing about for years, they were acting as whistleblowers, though whistleblowers who had only heard about the behavior second or third-hand, which I think is an important distinction. I believe they shared this information in order to protect other women. And now they’re being sued by Joe Murphy for 1.25 million dollars in damages for defaming his character. You can read the statement of claim here. I assume he is suing them in Canada because it’s easier to sue for libel and defamation outside of the U.S.
On his blog, Wayne Biven’s Tatum wonders “whether the fact of the lawsuit might hurt Murphy within the librarian community more than any accusations of sexual harassment.” Is it the Streisand effect, whereby Joe Murphy is bringing more attention to his alleged behavior by suing these women? It’s possible that this will bite him in the ass more than the original tweets and blog post (which I hadn’t seen prior) ever could.
I fear the impact of this case will be that women feel even less safe speaking out against sexual harassment if they believe that they could be sued for a million or more dollars. In the end, how many of us really have “proof” that we were sexually harassed other than our word? If you know something that substantiates their allegations of sexual predatory behavior, consider being a witness in #teamharpy’s case. If you don’t but still want to help, contribute to their defense fund.
That said, that this information comes second or third-hand does concern me. I don’t know for a fact that Joe Murphy is a sexual predator. Do you? Here’s what I do know. Did he creep me out when I interacted with him? Yes. Did he creep out other women at conferences? Yes. Did he behave like an entitled jerk at least some of the time? Yes. Do many people resent the fact that a man with a few years of library experience who hasn’t worked at a library in years is getting asked to speak at international conferences when all he offers is style and not substance? Yes.
While all of the rumors about him that have been swirling around for at least the past 4-5 years may be 100% true, I don’t know if they are. I don’t know if anyone has come out and said they were harassed by him beyond the general “nice shirt” comment that creeped out many women. As anyone who has read my blog for a while knows, I am terrified of groupthink. So I feel really torn when it comes to this case. Part of me wonders whether my dislike of Joe Murphy makes me more prone to believe these things. Another part of me feels that these allegations are very consistent with my experience of him and with the rumors over these many years. But I’m not going to decide whether the allegations are true without hearing it from someone who experienced it first-hand.
I wish I could end this post on a positive note, but this is pretty much sad for everyone. Sad for the two librarians who felt they were doing a courageous thing (and may well have been) by speaking out and are now being threatened by a tremendously large lawsuit. Sad for the victims of harassment who may be less likely to speak out because of this lawsuit. And sad for Joe Murphy if he is truly innocent of what he’s been accused (and imagine for a moment the consequences of tarring and feathering an innocent man). I wish we lived in a world where we felt as comfortable reporting abuse and sexual harassment as we do other wrongdoing. I wish as sharp a light was shined on this as has recently been shined on police brutality, corporate misbehavior, and income inequality. And maybe the only positive is that this is shining a light on the fact that this happens and many women, even powerful women, do not feel empowered to report it.
Photo credit: She whispered into the wrong ears by swirling thoughts
A notion that haunts me is found in Neil Gaiman’s The Sandman: the library of the Dreaming, wherein can be found books that no earth-bound librarian can collect. Books that caught existence only in the dreams – or passing thoughts – of their authors. The Great American Novel. Every Great American Novel, by all of the frustrated middle managers, farmers, and factory workers who had their heart attack too soon. Every Great Nepalese Novel. The conclusion of the Wheel of Time, as written by Robert Jordan himself.
That library has a section containing every book whose physical embodiment was stolen. All of the poems of Sappho. Every Mayan and Olmec text – including the ones that, in the real world, did not survive the fires of the invaders.
Books can be like cockroaches. Text thought long-lost can turn up unexpectedly, sometimes just by virtue of having been left lying around until someone things to take a closer look. It is not an impossible hope that one day, another Mayan codex may make its reappearance, thumbing its nose at the colonizers and censors who despised it and the culture and people it came from.
Books are also fragile. Sometimes the censors do succeed in utterly destroying every last trace of a book. Always, entropy threatens all. Active measures against these threats are required; therefore, it is appropriate that librarians fight the suppression, banning, and challenges of books.
Banned Books Week is part of that fight, and is important that folks be aware of their freedom to read what they choose – and to be aware that it is a continual struggle to protect that freedom. Indeed, perhaps “Freedom to Read Week” better expresses the proper emphasis on preserving intellectual freedom.
But it’s not enough.
I am also haunted by the books that are not to be found in the Library of the Dreaming – because not even the shadow of their genesis crossed the mind of those who could have written them.
Because their authors were shot for having the wrong skin color.
Because their authors were cheated of an education.
Because their authors were sued into submission for daring to challenge the status quo. Even within the profession of librarianship.
Because their authors made the decision to not pursue a profession in the certain knowledge that the people who dominated it would challenge their every step.
Because their authors were convinced that nobody would care to listen to them.
Librarianship as a profession must consider and protect both sides of intellectual freedom. Not just consumption – the freedom to read and explore – but also the freedom to write and speak.
The best way to ban a book is to ensure that it never gets written. Justice demands that we struggle against those who would not just ban books, but destroy the will of those who would write them.
Updated September 23, 2014
Total no. participating publishers & societies 5363
Total no. voting members 2609
% of non-profit publishers 57%
Total no. participating libraries 1902
No. journals covered 36,035
No. DOIs registered to date 69,191,919
No. DOIs deposited in previous month 582,561
No. DOIs retrieved (matched references) in previous month 35,125,120
DOI resolutions (end-user clicks) in previous month 79,193,741
Last updated September 23, 2014
Brazilian Journal of Internal Medicine
Brazilian Journal of Irrigation and Drainage - IRRIGA
Djokosoetono Research Center
Education Association of South Africa
Laboreal, FPCE, Universidade do Porto
Libronet Bilgi Hizmetleri ve Yazilim San. Tic. Ltd., Sti.
Open Access Text Pvt, Ltd.
Pontifical University of John Paul II in Krakow
Revista Brasileira de Quiropraxia - Brazilian Journal of Chiropractic
Scientific Online Publishing, Co. Ltd.
Symposium Books, Ltd.
Turkiye Yesilay Cemiyeti
Uniwersytet Ekonomiczny w Krakowie - Krakow University of Economics
Volgograd State University
IJNC Editorial Committee
Japanese Association of Cardioangioscopy
Lithuanian Universtity of Educational Sciences
The Operations Research Society of Japan
Acta Medica Anatolia
Ankara University Faculty of Agriculture
Dnipropetrovsk National University of Railway Transport
English Language and Literature Association of Korea
Institute for Humanities and Social Sciences
Institute of Korean Independence Movement Studies
Journal of Chinese Language and Literature
Journal of Korean Linguistics
Knowledge Management Society of Korea
Korea Association for International Commerce and Information
Korea Research Institute for Human Settlements
Korean Academic Society for Public Relations
Korean Marketing Association
Korean Society for Art History
Korean Society for the Study of Physical Education
Korean Society of Consumer Policy and Education
Law Research Institute, University of Seoul
Research Institute Centerprogamsystem, JSC
Research Institute of Science Education, Pusan National University
Research Institute of Social Science
Silicea - Poligraf, LLC
The Altaic Society of Korea
The Hallym Academy of Sciences
The Korean Association of Ethics
The Korean Association of Translation Studies
The Korean Society for Culture and Arts Education Studies
The Korean Society for Feminist Studies in English Literature
The Korean Society for Investigative Cosmetology
The Regional Association of Architectural Institute of Korea
The Society for Korean Language and Literary Research
Ural Federal University
V.I. Shimakov Federal Research Center of Transplantology and Artificial Organs
World Journal of Traditional Chinese Medicine
Yonsei Institute for North Korean Studies
Last updated September 10, 2014
Fucape Business School
Journal Issues Limited
Revista Bio Ciencias
The Russian Law Academy of the Ministry of Justice of the RF
Japan Society for Simulation Technology
Asian Journal of Education
Center for Studies of Christian Thoughts and Culture
Contemporary Film Research Institute
Democratic Legal Studies Association
Foreign Studies Institute
Institute for English Cultural Studies
Institute for Japanese Studies
Institute for Philosophy
Institute for the Translation of Korean Classics
Institute of Humanities
International Journal of Entrepreneurial Knowledge
Korean Academy of Kinesiology
Korean Association for the Study of English Language and Linguistics (KASELL)
Korean Logistics Society
The Association of Korean Education
The Korean Philosophy of Education Society
The Korean Society for School Science
Last Thursday, the U.S. House Judiciary Subcommittee on Courts, Intellectual Property, and the Internet held a hearing to gather information about the work of the U.S. Copyright Office and to learn about the challenges the Office faces in trying to fulfill its many responsibilities. Testifying before the Committee was Maria Pallante, Register of Copyrights and Director of the Copyright Office (view Pallante’s testimony (pdf)). Pallante gave a thorough overview of the Office’s administrative, public policy and regulatory functions, and highlighted a number of ways in which the Office’s structure and position within the federal bureaucracy create inefficiencies in its day-to-day operations. Pallante described these inefficiencies as symptoms of a larger problem: The 1976 Copyright Act vested the Office with the resources and authority it needed to thrive in an analog world, but it failed to anticipate the new needs the Office would develop in adjusting to a digital world.
Although the Office’s registration system—the system by which it registers copyright claims—was brought online in 2008, Pallante describes it as nothing more than a 20th century system presented in a 21st century format. The Office’s recordation system—the process by which it records copyright documents—is still completed manually and has not been updated for decades. Pallante considers fully digitizing the registration and recordation functions of the Copyright Office a top priority:
From an operational standpoint, the Office’s electronic registration system was fully implemented in 2008 by adapting off-the-shelf software. It was designed to transpose the paper-based system of the 20th century into an electronic interface, and it accomplished that goal. However, as technology continues to move ahead we must continue to evaluate and implement improvements. Both the registration and recordation systems need to be increasingly flexible to meet the rapidly changing needs of a digital marketplace.
Despite Pallante’s commitment to updating these systems, she cited her lack of administrative autonomy within the Library of Congress and her Office’s tightening budget as significant impediments to achieving this goal. Several members of the Committee suggested that the Office would have greater latitude to update its operations for the digital age if it were moved out from under the authority of the Library of Congress (LOC). While Pallante did not explicitly support this idea, she was receptive to suggestions from members of the Subcommittee that her office carries out very specialized functions that differ from those that are carried out by the rest of the LOC. Overall, Pallante seemed open to—if not supportive of—having a longer policy discussion on the proper position of the Copyright Office within the federal government.
In addition to providing insight into the inner-workings of the copyright office, the hearing continued the policy discussion on the statutory and regulatory frameworks that govern the process of documenting a copyright. As the Judiciary Committee continues to review the copyright law, it will be interesting to see if it further examines statutory and regulatory changes to the authority and structure of the Copyright Office.
The post Copyright Office under the congressional spotlight appeared first on District Dispatch.
1. Tell us about your library job. What do you love most about it?
I am the Scholarly Communications Librarian at the University of Idaho. This is a brand new position within the library and also my first ‘real’ librarian job, so it’s been a constant learning experience. I work along with the Digital Initiatives Librarian on the various digital projects happening at the library, including building an institutional repository, creating digital collections, redesigning the library website, creating and managing open access journals, and working on VIVO (a semantic-web application we are using as a front-end to our IR). I also do some education and advocacy around copyright, author’s rights, open access, etc.
The thing I love most about this job (aside from being able to design websites in crayon – image attached) is taking an idea and bringing it into fruition. Whether it’s a digital collection of postcards with custom navigation or a new journal or database, being able to make an idea a functional, beautiful reality is really rewarding. Also I’m just really excited about increasing access to information, and designing new ways to make that information accessible to a broader audience.
2. Where do you see yourself going from here?
Having just started this career, I’m not completely sure what’s next for me. I’m very happy in my current position, and I love all of the people I work with at the University of Idaho. I think my next step will probably be to start pursing another degree to help expand my knowledge in this field, or to fulfil my dream to become a professional comic artist/graphic novelist on the side.
3. Why did you apply to be an Emerging Leader? What are your big takeaways from the ALA-level activities so far?
I was encouraged by my mentor, a previous Emerging Leader, to apply. I am actually the fourth Emerging Leader in a row to be selected from the University of Idaho Library, so there is a lot of administrative support and encouragement for this kind of activity. The big thing I’ve learned through working with ALA is that although the organization and the sub-organizations have a massive population, it is a handful of active participants who make nearly everything happen. My goal is to become one of those change-agents at the ALA level, eventually.
4. What have you learned about LITA governance and activities so far?
I’ve learned that LITA is inclusive and active with its membership. This is a very fun organization, and I’m impressed with the discussion and activities that come out of LITA and its membership.
5. What’s your favorite LITA moment? What would you like to do next in the organization?
My favorite LITA moment was working with Rachel Vacek and Kyle Denlinger on the Town Meeting activities at Midwinter. My favorite kind of brainstorming involves large sheets of paper and crayons (see above) and being able to do that with other LITA members was really fun.
In light of my writing about privacy on library websites, this poorly drafted bill, though well intentioned, would turn my library's website into a law-breaker, subject to a $500 civil fine for every user. (It would also require us to make some minor changes at Unglue.it.)
- It defines "personal information" as "(1) any information that identifies, relates to, describes, or is associated with a particular user's use of a book service; (2) a unique identifier or Internet Protocol address, when that identifier or address is used to identify, relate to, describe, or be associated with a particular user, as related to the user’s use of a book service, or book, in whole or in partial form; (3) any information that relates to, or is capable of being associated with, a particular book service user’s access to a book service."
- “Provider” means any commercial entity offering a book service to the public.
- A provider shall only disclose the personal information of a book service user [...] to a person or private entity pursuant to a court order in a pending action brought by [...] by the person or private entity.
- Any book service user aggrieved by a violation of this act may recover, in a civil action, $500 per violation and the costs of the action together with reasonable attorneys’ fees.
And so every user of the catalog could sue Innovative for $500 each, plus legal fees.
Existing library privacy laws in NJ have reasonable exceptions for "proper operations of the library". This law does not have a similar exemption.
I urge Governor Christie to veto the bill and send it back to the legislature for improvements that take account of the realities of library websites and make it easier for internet bookstores and libraries to operate legally in the Garden State.
You can contact Gov. Christie's office using this form.
Update: Just talked to one of Nia Gill's staff; they're looking into it. Also updated to include the 2nd set of amendments.
Update 2: A close reading of the California law on which the NJ statute was based reveals that poor wording in section 4 is the source of the problem. In the California law, it's clear that it pertains only to the situation where a private entity is seeking discovery in a legal action, not when the private entity is somehow involved in providing the service.
Where the NJ law reads
A provider shall only disclose the personal information of a book service user to a government entity, other than a law enforcement entity, or to a person or private entity pursuant to a court order in a pending action brought by the government entity or by the person or private entity. it's meant to read
In a pending action brought by the government entity other than a law enforcement entity, or by a person or by a private entity, a provider shall only disclose the personal information of a book service user to such entity or person pursuant to a court order.
A few people were critical of my directness in my letter to the VPL board, so I was surprised to get a response. I have permission to post the reply I received here. I’d love to know what other people think.
Thank you for your email dated August 26, to the VPL Board regarding the new VPL Policy.
VPL upholds high standards with regard to access to information and intellectual freedom. We have demonstrated this repeatedly in response to challenges to items in our collection and room rentals. The issue of public displays in a public space is a challenging one that raises unique issues that access to collections for personal private use does not.
Staff considered a multitude of options before and during the development of this policy solution, including all of the considerations you mentioned in your email – space design, equipment options, specific versus more general language. Ultimately, each of these solutions creates their own problems and it was determined that the approved approach, while not perfect, was the most appropriate given the library’s circumstances.
The Board agrees that implementation of the policy and appropriate training for staff will be critical to ensure that people’s rights to access content are not unreasonably restricted. Our professional librarians at VPL – who share common library professional values – have considerable experience in managing and balancing diverse values and public goods in policy and service. In fact, we have high confidence in our professional librarians’ ability to apply this policy in a nuanced and appropriate manner that does not unreasonably restrict access to content. We also all agree that the appropriate person to have this conversation are public service staff; however, there are occasional circumstances when Security staff are appropriate.
Staff will monitor the outcomes of this policy change and will report to the Board after a full year of implementation. At that point, they may or may not recommend adjustments to the policy.
If you have any further questions, we invite you to connect with VPL management. We understand you have many personal contacts on the VPL management team who are always open to discussing matters related to the library with colleagues.
Mary Lynn Baum
Chair, Vancouver Public Library Board
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Digital was everywhere at this year’s Society of American Archivists annual meeting. What is particularly exciting is that many of these sessions were practical and pragmatic. That is, many sessions focused on exactly how archivists are meeting the challenge of born-digital records.
In one such session, Sibyl Schaefer, Head of Digital Programs at the Rockefeller Archive Center, offered such advice. I am excited to discuss some of the themes from her talk, “We’re All Digital Archivists: Digital Forensic Techniques in Everyday Practice,” here as part of the ongoing Insights Interview series.
Trevor: Could you unpack the title of your talk a bit for us? Why exactly is it time for all archivists to be digital archivists? What does that mean to you in practice?
Sibyl: We don’t all need to be digital archivists, but we do need to be archivists who work with digital materials. It’s not scalable to have one person, or one team, focus on the “digital stuff.” When I was first considering how to structure the Digital Team (or D-Team) at the RAC, it crossed my mind to mirror the structure of my organization, which is based on the main functions of an archive: collection development, accessioning, preservation, description, and access. I quickly realized that integrating digital practices into existing functions was essential.
The archivists at my institution take great pride in their knowledge of the collections, and not tapping into that knowledge would disadvantage the digital collections. We also don’t have many purely digital collections; the vast majority are hybrid. It wouldn’t make sense for one person to arrange and describe analog materials and another the digital materials. The principles of arrangement and description don’t change due to the format of the materials. Our archivists just need guidance in how to be effective in handling digital records, they need experience using tools so they feel comfortable with them, and they need someone available to ask if they have questions. So the digital archivists on my team are figuring out which software and tools to adopt, which workflows are the most efficient, and how to best educate the rest of the staff so they can do the actual archival work. The digital archivists aren’t actually doing traditional archival work and in that sense, “digital archivist” is a misnomer.
Trevor: If an archivist wants to get caught up-to-speed on the state and role of digital forensics for his or her work, what would you suggest they read/review? Further, what about these works do you see as particularly important?
Sibyl: The CLIR report, “Digital Forensics and Born-Digital Content in Cultural Heritage Collections,” is an excellent place to start. It clearly outlines what is gained by using forensics techniques in archival practice: namely the ability to capture digital archival materials in a secure manner that preserves more of their context and original order. These techniques also allow archivists to search through and review those materials without worrying about inadvertently altering them and affecting their authenticity.
I was ecstatic when I first saw Peter Chan’s YouTube video on processing born-digital materials using the Forensic ToolKit software. It was the first time I saw how functionality in FTK could be mapped to traditional processing activities: weeding duplicates, identifying Personally Identifiable Information and restricted records, arranging materials hierarchically, etc. It really answers the question of “So you have a disk image, now what do you do with it?” It also conveyed that the program could be picked up fairly easily by processing archivists.
The “From Bitstreams to Heritage: Putting Digital Forensics into Practice in Collecting Institutions” report (pdf) provides a really good overview of the recent activities in this area and a practical analysis of some of the capabilities and limitations of the forensics tools available.
Trevor: Could you tell us a bit about how the digital team works at the Rockefeller Archive Center? What kinds of roles do people take in the staff? How does the team fit into the structure of the Archive? How do you define the services you provide?
Sibyl: My team takes a user-centered approach in fulfilling our mission of leveraging technology to support all our program areas. We generally start by identifying a need for new technology, whether it be to place our finding aids online, create digital exhibits for our donors, preserve the context and authenticity of materials as they move from one physical medium to another, or increase our efficiency in managing researcher requests. We then try to involve users — both internal and external — as much as possible throughout the process. This involvement is crucial given that we usually aren’t the primary users of the software we implement.
One archivist focuses on delivery and access, which includes managing our online finding aid delivery system, as well as working very closely with our reference staff to develop and integrate tools that will help increase the efficiency of their work. Another team member is focused on digitization and metadata projects which includes things like scanning and outsourced digitization projects, as well as migrating from the Archivists’ Toolkit to ArchivesSpace. We just hired a new digital archivist to really delve into the digital forensics work I discussed in my presentation at SAA. She will be disk imaging and teaching our processing archivists to use FTK for description. In addition to overseeing the work of all the team members, I interface with our donor institutions, create policies and procedures, set team priorities and oversee our digital preservation system.
As I mentioned before, the RAC is divided up into five different archival functional areas: donor services, collections management, processing, reference and the digital team. Certain services, like digital preservation and digital duplication for special projects, are within our realm of responsibility, while with others we take a more advisory role. For example, we’re in the midst of an Aeon special collections management tool implementation, and although we won’t be internally hosting the server, we are helping our reference staff articulate and revise their workflows to take advantage of the efficiencies that system enables.
Our services are quite loosely defined; one of our program goals is to “leverage technology in an innovative way in support of all RAC program areas.” This gives us a lot of leeway in what we choose to do. I prioritize our preservation work based on risk and our systems work based on an evaluation of institutional priorities. For example, over the last year the RAC has been trying to increase the efficiency of our reference services, so we evaluated their workflows, replaced an unscalable method for organizing reference interactions with a user-friendly ticketing system, and are now aiding with the Aeon implementation.
Trevor: Could you tell us a bit about the workflow you have put in place to implement digital forensics in processing digital records? What roles do members of your team play and what roles do others play in that workflow?
Sibyl: My team takes care of inventorying removable media, creating disk images, running virus checks on those images, and providing them to the processing staff for analysis and description. Processing staff then identifies duplicates, restricted materials, and materials that contain PII. They arrange and describe materials within FTK. When they have finished, they notify the D-Team and we add the description to the Archivists’ Toolkit (or ArchivesSpace — we’re preparing to transition over soon) and ingest those files and related metadata into Archivematica.
There’s a lot of details we need to add in that will greatly increase the complexity of the process, and some of them will require actual policy decisions to be made. For example, the question of redaction comes up every time I review this process with our archivists. Redaction can be pretty straightforward with certain file formats, but definitely not with all. Also, how do we relay that information has been redacted to our researchers? We need to have a policy that clearly outlines when we redact information (for materials going online? for use in the reading room?) what types of information we redact, and what types of files can securely be redacted.
Trevor: As your process is established and refined, what do you see as the future role and place of the digital team within the archive? That is, what things are on the horizon for you and your team?
Sibyl: In the years since I joined the RAC we’ve placed our finding aids and digital objects online in an access system, architected a system for digital preservation, and configured forensics workflows. Now that we’ve got that foundation for managing and accessing our digital materials, I want to start embodying our goals to be innovative and leaders in the field. One area I think we can contribute to is integrating systems. For example, we’re launching a new project with Artefactual, the developers of Archivematica, to create a self-submission mechanism for donors to transfer records to us. Part of the project includes integrating ArchivesSpace with Archivematica. How cool would it be to have an accession record automatically created in ArchivesSpace when a donor transfers materials to our Archivematica instance?
Likewise, I’ve been talking with a few people about using data in FTK to create interactive interfaces for researchers. We could use directory data captured during imaging or created during analysis (like labeling materials “restricted”) to recreate (but not necessarily emulate) the way files were originally organized, including listing deleted and duplicate files and then linking that directly to their final, archival organization. The researcher would be able to see how the files were originally organized by the donor and what is missing (or restricted) from what is presented as the final archival organization. I get giddy when I think of how we can use technology to increase the transparency of what happens during archival processing. I’m also excited about the prospect of working EAC-CPF records into our discovery interface to bolster our description.
We also have a great deal of less innovative but very necessary tasks ahead of us. We need to implement a DAMS to help corral the digitized materials that are created on request and also to provide more granular permissions to materials than what we currently have. We need to create and implement policies to fill in gaps in our policy framework and inch towards TRAC compliance. And lastly, we need to systematize our preservation planning. We have a lot of work to keep us busy! That said, it’s a really great time to be in the archival field. Digital materials may present new and complex challenges, but we also have a chance to be creative and innovative with systems design and applying traditional archival practices to new workflows.
The never changing web, traffic lights, art, and bots. All with a healthy dose of fun.
Understanding subjects by decomposing them into their algorithms, then implementing in code. In this case, art and JS
Soft robots feel like something we’d have in a reading room. Love this effort to package DIY info on soft robots.
A dancing crosswalk sign helps people wait. Real live dancing translated into the little sign!
I want to order some clocks in bulk. http://www.fastcodesign.com/3035893/this-kinetic-wall-of-clocks-is-utterly-hypno
The never changing web. Django docs and admin UI are still usable and not ugly.