At the AMICAL 2016 conference, I heard an inspiring story about the cyclical destruction and revival of libraries. Dr. Richard Hodges, President of the AU of Rome, began his welcome message as follows: “Unlike what many of you may believe, you don’t come from Silicon Valley, you come from the monks”. He went on to explain that libraries were created at the end of the 8th century by monasteries. It was then that monks started to make books, besides growing food and brewing beer. They crafted the leather skin covers, the straps, the folded leaves of vellum, and all the instrumentation necessary to write books. They created the blue print of the library. In subsequent years, full blown libraries developed, like the ones in Saint Denis and Montecassino. Like with all things successful, once they grow, you need to sustain them. The monks thus devised a model to attract donors with the lure of a counter gift: a hand-crafted book. When, in the mid-9th century, the Vikings and Saracens destroyed many monasteries and their libraries, the books survived in the hands of those who had been donors. And so, the spirituality of what libraries stood for, as preservers of intellectual heritage, survived the destruction and was the seed for the new learning of the Renaissance. The storyteller hinted at globalization as a similar wave of destruction, which might leave us without libraries but with the promise that the spirituality of what libraries stand for, will resurface in a new guise.
According to keynote speaker Jim Groom this new guise is the Archiving Movement. He painted the Web landscape as a wonderful space of many small initiatives with do-it-yourself blogs and a Wiki-infra on which one can build an entire curriculum for free, with fascinating open technology and with exciting new learning experiences. In his view, it is all about our individual content, building domains of our own and leaving our personal digital footprints. He advocated the need for individuals to become archivists, reclaiming ownership and control over their data from the “big companies”. In this world of the small against the giants, “rogue Internet archivists” (or morphing librarians, as you wish) are excavating and rescuing the remains of parts of the web, that are dying and being destroyed.
In my presentation on “Adapting to the new scholarly record” I talked about shifting trends in the research ecosystem and disturbances which are disrupting the tasks and responsibilities of librarians, as stewards of the record of science. I conveyed the concerns of experts and practitioners in the field, who met during a series of OCLC Research workshops on this matter. They talked about the short-term need for a demonstrable pay-off by universities and funding agencies; the diverse concerns on campus around image, IPR and compliance; the emergence of new digital platforms like ResearchGate and others, that lure researchers into providing data to them and bypassing their institutional repositories; etc. All these forces at play are distracting libraries from safeguarding the record for future scholarship. These observations beg the question, which came from the audience: “what can we do about it?” and in particular “What can we do, as AMICAL libraries”?
I had been impressed by the information literacy (IL) session the day before. AMICAL libraries from Paris to Sharjah presented their efforts to engage faculty and to broaden the understanding of IL within the university. Many of the libraries face challenges with their student population, such as reluctance and resistance to reading, deficiencies in academic writing skills, inexperienced information retrieval expectations and ineffective search practices. The session concluded with the desirability to integrate IL in the curriculum.
So, I answered my audience without hesitation: Please continue the good work you are doing in IL! Why do we hear so little about IL at other library conferences in Europe? Isn’t IL a core part of that spirituality Richard Hodges talked about – a core part of what libraries stand for? The next generation needs to be prepared for the new learning in the digital information age. This requires education and training. People are not born being-digital!About Titia van der Werf
Titia van der Werf is a Senior Program Officer in OCLC Research based in OCLC's Leiden office. Titia coordinates and extends OCLC Research work throughout Europe and has special responsibilities for interactions with OCLC Research Library Partners in Europe. She represents OCLC in European and international library and cultural heritage venues.Mail | Web | More Posts (3)
Every day, technology is making it possible to collect and analyze ever more data about students’ performance and behavior, including their use of library resources. The use of “big data” in the educational environment, however, raises thorny questions and deep concerns about individual privacy and data security. California responded to these concerns by passing the Student Online Private Information Protection Act, and student data privacy also is now the focus of several bills in Congress.
Participate in a discussion on the big picture on student data privacy at the conference session “Student Privacy: The Big Picture on Big Data,” which takes place during the 2016 American Library Association (ALA) Annual Conference in Orlando, Fla. During the session, Khaliah Barnes, associate director of the Electronic Privacy Information Center and Director of its Student Privacy Project, will discuss how the growing use of big data threatens student privacy and how evolving state and federal data privacy laws impact school and academic libraries. The session takes place on Monday, June 27, 2016, 10:00-11:30 a.m., in the Orange County Convention Center, room W206A.
As director of the EPIC Student Privacy Project, Khaliah created the Student Privacy Bill of Rights. Khaliah defends student privacy rights before federal regulatory agencies and federal court. She has testified before states and local districts on the need to safeguard student records. Khaliah is a frequent panelist, commentator, and writer on student data collection. Khaliah has provided expert commentary to local and national media, including CBS This Morning, the New York Times, the Washington Post, NPR, Fox Business, CNN, Education Week, Politico, USA Today, and Time Magazine.
The post Finding the “big picture” on big data at the 2016 ALA Annual Conference appeared first on District Dispatch.
Breakout of content stored in DuraCloud based on storage provider
DuraSpace News: Collaboration Between DSpace Registered Service Providers To Integrate And Deploy a Module On a Tight Deadline
From Peter Dietz, Longsight
Independence, Ohio Longsight manages DSpace for the Woods Hole Open Access System (WHOAS), a repository of marine biology publications and datasets. To provide as much value to this rich data, WHOAS has added a Linked Data module to DSpace to allow researchers to query their data.
From Peter Dietz, Longsight
DuraSpace News: Fedora at Open Repositories: Hands-on Fedora 4, RepoRodeo, API Extension, State of the CLAW, Hydra at 30
Austin, TX In two weeks the open repository community will gather at the Open Repositories Conference in Dublin, Ireland to share ideas and catch up with old friends and colleagues. The Fedora community will be on hand to participate and offer insights into current and future development of the flexible and extensible open source repository platform used by leading academic institutions.
Introduction to Fedora 4 Workshop
From Mike Conlon, VIVO Project Director
From the organizers of Open Repositories 2017
Brisbane, Australia The Open Repositories (OR) Steering Committee in conjunction with the University of Queensland (UQ), Queensland University of Technology (QUT) and Griffith University are delighted to inform you that Brisbane will host the annual Open Repositories 2017 Conference.
It's exciting to have Open Repositories return to Australia, where it all began in 2006.
From Lisa Cardy, Library Services Manager, Natural History Museum
So, LibUX has been a super vehicle for me — hi, I’m Michael — to talk, write, and make friends around the user experience design and development of libraries, non-profits, and the higher-ed web. These are special niches wherein the day-to-day challenges pervading the web are compounded by unique hyperlocal user bases and ethical imperatives largely without parallel on the commercial web.
And because there is so much to mine, so many hours in the day, and the LibUX audience is so varied — designers, developers, enthusiasts, dabblers, directors, big-bosses, students, vendors — I curate against topics that interest me-the-developer but maybe aren’t exactly relevant to me-the-librarian. There is soooo much happening in this space that captures my imagination, so of course I thought I’d start a new podcast: W3 Radio — you know, as in “world wide web.”
Do you need your web design news right now in ten minutes or less? Well it’s just your luck that soon I Michael Schofield am starting a new podcast: W3 Radio – bite sized best-of the world wide web. You’ll soon be able to tune in to W3 Radio on your podcatcher of choice
and real soon at w3radio.com.
The gimmick is that it’s just a weekly recap of headlines in under ten minutes, which I think makes it perfect for playing catch-up – oh, also, I am pretending to be an old-timey radio anchor, and I am almost positive that won’t get old.
So, as of this writing, publishing this announcement generates the feed I use to populate the various podcatchers. It will likely pend for a day or two before they make it available. Stay tuned to this space for links.
With ALA’s annual conference in Orlando just around the corner, travel is in the plans for many librarians and staff. Fortunately, as I live in Florida, I don’t have that far to go. But if you do, then you’re going to need some good apps.
I travel frequently and have a few of my favorite apps that I use for travel, and I’d like to share them with you:
Airline App of Choice
I personally only use two airlines so I can only speak to their particular apps, but seriously, if you have a smartphone and you aren’t using it to hold tickets or boarding passes, you’re missing out. You can also use your app to check flight times and delays, book future travel, or just to play around (one of my airline’s apps lets you send virtual postcards).
Even if it’s just a weekend trip, this app is great at letting you know what you should bring depending on the weather and your activities. You can adjust the lists according to your preferences as well. Though this is the free version, there is a paid version where you can save your packing lists to Evernote. (Android)
I only use Foursquare when I travel. It got put through its paces in Boston when I needed to find a place to eat near my location or was looking for a historic site I hadn’t been to. It also helped in giving me tips about the place: what to order, when to avoid the place, how the staff was. On top of that, it links with your Map App of Choice (Google Maps FTW!) to give you directions and contact information. It’s not Yelp, but I feel it’s more genuine. (Android)
Take it from someone who lived in Orlando: driving in that city is not fun. This is why you want Waze: it can show you directions as well as let you input traffic accidents you happen across as you drive (well, maybe after you drive). It even helps out with finding cheap gas. (Android)
Photo-Editing App of Choice
You’re no doubt going to be taking a lot of photos on your trip, so why not spice them up with some creative edits and share them? There are a plethora of photo apps out there to choose from, the most ubiquitous being Instagram (Android), but I love Hipstamatic (paid, iOS only) because you can randomize your filters and get a totally unexpected result every shot. Other apps that are fun are Pixlr (Android) (there’s a desktop version, too!) and Photoshop Express (Android)
What are some travel apps that you cannot live without? Post them in the comments or tweet them my way @LibrarianStevie!
Hotel had put-put, not relevant but interesting none the less.
I’ve known Jean Felisme for awhile through WordCamp Miami. We see each other quite a bit at meetups and he’s a ton of fun – he’s also been pretty hardcore about evangelizing freelance. Recently he made the switch from freelance into the very special niche that is the higher-ed web, so when he was just six weeks into his new position at the School of Computing and Information Sciences at Florida International University I took the opportunity to pick his brain.
Hope you enjoy.
If you like, you can download the MP3 or subscribe to LibUX on Stitcher, iTunes, Google Play Music, or just plug our feed straight into your podcatcher of choice. Help us out and say something nice. Your sharing and positive reviews are the best marketing we could ask for.Here’s what we talked about
- 1:08 – WP Campus is coming up!
- 2:45 – All about Jean
- 4:28 – How the trend toward building-out in-house teams will impact freelance
- 9:38 – What is the day-to-day like just six weeks in?
- 12:03 – Student-hosted applications and content – scary
- 13:09 – The makeup of Jean’s team
- 17:43 – Are you playing with any web technology you haven’t before?
- 19:37 – The tight relationship with the students
- 20:31 – On web design curriculum
- 28:00 – We fail to wrap up and keep talking about freelance for a few more minutes.
Please join us on June 2 for a free webinar on another form of copyright creep. This one on recent efforts to copyright state government works.
Issues behind State governments copyrighting government works
The purpose of copyright is to provide incentives for creativity in exchange for a time limited government provided monopoly. When drafting the federal copyright law, Congress explicitly prohibited the federal government as well as employees of the federal government from having the authority to create a copyright in government created works. However, the federal law is silent on state government power to create, hold, and enforce copyrights. This has resulted in a patchwork of varying levels of state copyright laws across all fifty states.
Currently, California favors the approach where a vast majority of works created by the state and local government are by default in the public domain. An ongoing debate is happening now as to whether California should end the public domain status of most state and local government works. The state legislature is contemplating a bill (AB 2880) that would authorize copyright authority to all state agencies, local governments, and political subdivisions. In recent years entities of state government have attempted to rely on copyright as a means to suppress the dissemination of taxpayer-funded research and as a means to chill criticism but failed in the courts due to a lack of copyright authority. Ernesto Falcon, legislative counsel with the Electronic Frontier Foundation, will review the status of the legislation, the court decisions that lead to its creation, and the debate that now faces the California legislature.
Day/Time: Thursday, June 2 at 2pm Eastern/11am Pacific for our hour long free webinar.
Go to http://ala.adobeconnect.com/copytalk/ and sign in as a guest. You’re in.
This program is brought to you by OITP’s copyright education subcommittee.
This posting describes VIAF Finder. In short, given the values from MARC fields 1xx$a, VIAF Finder will try to find and record a VIAF identifier.  This identifier, in turn, can be used to facilitate linked data services against authority and bibliographic data.Quick start
Here is the way to quickly get started:
- download and uncompress the distribution to your Unix-ish (Linux or Macintosh) computer 
- put a file of MARC records named authority.mrc in the ./etc directory, and the file name is VERY important
- from the root of the distribution, run ./bin/build.sh
VIAF Finder will then commence to:
- create a “database” from the MARC records, and save the result in ./etc/authority.db
- use the VIAF API (specifically the AutoSuggest interface) to identify VAIF numbers for each record in your database, and if numbers are identified, then the database will be updated accordingly 
- repeat Step #2 but through the use of the SRU interface
- repeat Step #3 but limiting searches to authority records from the Vatican
- repeat Step #3 but limiting searches to the authority named ICCU
Once done the reader is expected to programmatically loop through ./etc/authority.db to update the 024 fields of their MARC authority data.Manifest
Here is a listing of the VIAF Finder distribution:
- 00-readme.txt – this file
- bin/build.sh – “One script to rule them all”
- bin/initialize.pl – reads MARC records and creates a simple “database”
- bin/make-dist.sh – used to create a distribution of this system
- bin/search-simple.pl – rudimentary use of the SRU interface to query VIAF
- bin/search-suggest.pl – rudimentary use of the AutoSuggest interface to query VIAF
- bin/subfield0to240.pl – sort of demonstrates how to update MARC records with 024 fields
- bin/truncate.pl – extracts the first n number of MARC records from a set of MARC records, and useful for creating smaller, sample-sized datasets
- etc – the place where the reader is expected to save their MARC files, and where the database will (eventually) reside
- lib/subroutines.pl – a tiny set of… subroutines used to read and write against the database
If the reader hasn’t figured it out already, in order to use VIAF Finder, the Unix-ish computer needs to have Perl and various Perl modules — most notably, MARC::Batch — installed.
If the reader puts a file named authority.mrc in the ./etc directory, and then runs ./bin/build.sh, then the system ought to run as expected. A set of 100,000 records over a wireless network connection will finish processing in a matter of many hours, if not the better part of a day. Speed will be increased over a wired network, obviously.
But in reality, most people will not want to run the system out of the box. Instead, each of the individual tools will need to be run individually. Here’s how:
- save a file of MARC (authority) records anywhere on your file system
- not recommended, but optionally edit the value of DB in bin/initialize.pl
- run ./bin/initialize.pl feeding it the name of your MARC file, as per Step #1
- if you edited the value of DB (Step #2), then edit the value of DB in bin/search-suggest.pl, and then run ./bin/search-suggest.pl
- if you want to possibly find more VIAF identifiers, then repeat Step #4 but with ./bin/search-simple.pl and with the “simple” command-line option
- optionally repeat Step #5, but this time use the “named” command-line option, and the possible named values are documented as a part of the VAIF API (i.e., “bav” denotes the Vatican
- optionally repeat Step #6, but with other “named” values
- optionally repeat Step #7 until you get tired
- once you get this far, the reader may want to edit bin/build.sh, specifically configuring the value of MARC, and running the whole thing again — “one script to rule them all”
A word of caution is now in order. VIAF Finder reads & writes to its local database. To do so it slurps up the whole thing into RAM, updates things as processing continues, and periodically dumps the whole thing just in case things go awry. Consequently, if you want to terminate the program prematurely, try to do so a few steps after the value of “count” has reached the maximum (500 by default). A few times I have prematurely quit the application at the wrong time and blew my whole database away. This is the cost of having a “simple” database implementation.To do
Alas, search-simple.pl contains a memory leak. Search-simple.pl makes use of the SRU interface to VIAF, and my SRU queries return XML results. Search-simple.pl then uses the venerable XML::XPath Perl module to read the results. Well, after a few hundred queries the totality of my computer’s RAM is taken up, and the script fails. One work-around would be to request the SRU interface to return a different data structure. Another solution is to figure out how to destroy the XML::XPath object. Incidentally, because of this memory leak, the integer fed to simple-search.pl was implemented allowing the reader to restart the process at a different point dataset. Hacky.Database
The use of the database is key to the implementation of this system, and the database is really a simple tab-delimited table with the following columns:
- id (MARC 001)
- tag (MARC field name)
- _1xx (MARC 1xx)
- a (MARC 1xx$a)
- b (MARC 1xx$b and usually empty)
- c (MARC 1xx$c and usually empty)
- d (MARC 1xx$d and usually empty)
- l (MARC 1xx$l and usually empty)
- n (MARC 1xx$n and usually empty)
- p (MARC 1xx$p and usually empty)
- t (MARC 1xx$t and usually empty)
- x (MARC 1xx$x and usually empty)
- suggestions (a possible sublist of names, Levenshtein scores, and VIAF identifiers)
- viafid (selected VIAF identifier)
- name (authorized name from the VIAF record)
Most of the fields will be empty, especially fields b through x. The intention is/was to use these fields to enhance or limit SRU queries. Field #13 (suggestions) is for future, possible use. Field #14 is key, literally. Field #15 is a possible replacement for MARC 1xx$a. Field #15 can also be used as a sort of sanity check against the search results. “Did VIAF Finder really identify the correct record?”
Consider pouring the database into your favorite text editor, spreadsheet, database, or statistical analysis application for further investigation. For example, write a report against the database allowing the reader to see the details of the local authority record as well as the authority data in VIAF. Alternatively, open the database in OpenRefine in order to count & tabulate variations of data it contains.  Your eyes will widened, I assure you.Commentary
First, this system was written during my “artist’s education adventure” which included a three-month stint in Rome. More specifically, this system was written for the good folks at Pontificia Università della Santa Croce. “Thank you, Stefano Bargioni, for the opportunity, and we did some very good collaborative work.”
Second, I first wrote search-simple.pl (SRU interface) and I was able to find VIAF identifiers for about 20% of my given authority records. I then enhanced search-simple.pl to include limitations to specific authority sets. I then wrote search-suggest.pl (AutoSuggest interface), and not only was the result many times faster, but the result was just as good, if not better, than the previous result. This felt like two steps forward and one step back. Consequently, the reader may not ever need nor want to run search-simple.pl.
Third, while the AutoSuggest interface was much faster, I was not able to determine how suggestions were made. This makes the AutoSuggest interface seem a bit like a “black box”. One of my next steps, during the copious spare time I still have here in Rome, is to investigate how to make my scripts smarter. Specifically, I hope to exploit the use of the Levenshtein distance algorithm. 
Finally, I would not have been able to do this work without the “shoulders of giants”. Specifically, Stefano and I took long & hard looks at the code of people who have done similar things. For example, the source code of Jeff Chiu’s OpenRefine Reconciliation service demonstrates how to use the Levenshtein distance algorithm.  And we found Jakob Voß’s viaflookup.pl useful for pointing out AutoSuggest as well as elegant ways of submitting URL’s to remote HTTP servers.  “Thanks, guys!”
Fun with MARC-based authority data!Links
 VIAF – http://viaf.org
 VIAF Finder distribution – http://infomotions.com/sandbox/pusc/etc/viaf-finder.tar.gz
 OpenRefine – http://openrefine.org
 Levenshtein distance – https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
 Chiu’s reconciliation service – https://github.com/codeforkjeff/refine_viaf
 Voß’s viaflookup.pl – https://gist.github.com/nichtich/832052/3274497bfc4ae6612d0c49671ae636960aaa40d2
STEAM programming (which includes science, technology, engineering, arts and mathematics) is fast becoming a core service in libraries across the country. From intermixing STEAM activities into family story hour to teen maker spaces and coding camps, public and school libraries provide engaging opportunities for kids of all ages to develop a passion for science, technology, engineering, the arts, and math. Curious to learn who else is experimenting with STEAM programs for kids? The article below comes from Sascha Paladino, who is the creator and executive producer of “Miles from Tomorrowland”, a Disney Junior animated series that weaves science, technology, engineering and mathematics concepts geared towards kids ages 2-7 into its storylines. Paladino will delve deeper into the topic at the 2016 American Library Association Annual Conference joined by others instrumental in getting Miles off the ground and kids into STEAM.
Six years ago, I came up with an idea for an animated series about a family on an adventure in outer space – from the kid’s perspective. I wanted to explore the universe through the eyes of a seven-year-old. I remembered how I saw outer space when I was young – as the greatest imaginable place for adventure – and I wanted to capture that feeling.
I pitched the idea to Disney, who liked it, and we began developing what would become MILES FROM TOMORROWLAND. Through the ups and downs that are part of any TV show’s journey to the screen, I tried to stay focused on my goals: tell entertaining stories, encourage kids to dream big and inspire viewers to explore STEAM (Science, Technology, Engineering, Arts and Math).
Luckily, with the support of Disney, I was able to surround the MILES creative team with a group of genius scientists. Dr. Randii Wessen of NASA’s Jet Propulsion Laboratory came onboard as an advisor, as did NASA astronaut Dr. Yvonne Cagle, and Space Tourism Society founder John Spencer. They shared their deep knowledge and experience with us, and gave our show some serious scientific street cred.
Along the way, I got a crash course in outer space. I was able to immerse myself in the science of our universe, and learned all about exoplanets, tardigrades, and electromagnetic pulses, for starters. Then, I could sit down with my writing and design teams and figure out ways to work these science facts into engaging stories to share with our audience.
I realized that I was making the show I wished I had as a kid: An exciting adventure that incorporates real science in a way that appeals to viewers whether or not they gravitate towards science. I always loved science, but my career path took me into the arts. In making this show, I learned that the arts can be a route into the sciences – which is why I’m really glad that STEM has expanded to STEAM, to include the “A” for “arts.”
My hope is that by exposing all sorts of kids to concepts such as black holes, coronal mass ejections, and spaghettification (best word ever), they’re inspired to explore further and deeper once the television is turned off.
When we were researching the series, we met with scientists, techies, and space professionals from amazing places such as NASA, SpaceX, Virgin Galactic, and Google. Over and over, we heard that they were inspired to go into their field because of science-fiction TV shows and movies that they saw as kids. Real-life innovations such as the first flip-phone were directly influenced by fantastical creations imagined on STAR TREK. Science fiction becomes science fact. It’s the circle of (sci-fi) life.
Now that MILES FROM TOMORROWLAND is on the air, I’ve been hearing from parents and kids that our vision of the future is giving the scientists of tomorrow some ideas. Nothing could make me happier. We’ve seen kids make their own creative versions of Miles’ tech and gear, such as cardboard spaceships and gadgets made from dried macaroni. As NASA’s Dr. Cagle told me recently, one of our goals should be to encourage kids to “engineer their dreams.” That sums it up perfectly.
I even heard from a kid who loves Miles’ Blastboard – his flying hoverboard – so much that he decided to sit down and design a real one. Whether it works or not is beside the point (although I’m quite sure that it does). What matters to me is that MILES FROM TOMORROWLAND set off a spark that, I hope, will continue to grow, multiply, and eventually inspire a future generation of scientists and innovators.
But mostly, I can’t wait to ride that Blastboard.
Join the “Coding in Tomorrowland: Inspiring Girls in STEM” session at the 2016 American Library Association Annual Conference in Orlando, which takes place on Sunday, June 26, 2016, from 1:00-2:30 p.m. (in the Orange County Convention Center, in room OCCC W303). Session speakers include “Miles from Tomorrowland” creator and executive producer, Sascha Paladino; series consultant and NASA astronaut, Dr. Yvonne Cagle; and Disney Junior executive, Diane Ikemiyashiro. This session will be moderated by Roger Rosen, who is the chief executive officer of Rosen Publishing and a senior advisor for national policy advocacy to ALA’s Office for Information Technology Policy.
There are four main areas where I have comments on Rumsey's text. On page 144, in the midst of a paragraph about the risks to our personal digital information she writes:
The documents on our hard disks will be indecipherable in a decade.The word "indecipherable" implies not data loss but format obsolescence. As I've written many times, Jeff Rothenberg was correct to identify format obsolescence as a major problem for documents published before the advent of the Web in the mid-90s. But the Web caused documents to evolve from being the private property of a particular application to being published. On the Web, published documents don't know what application will render them, and are thus largely immune to format obsolescence.
It is true that we're currently facing a future in which most current browsers will not render preserved Flash, not because they don't know how to but because it isn't safe to do so. But oldweb.today shows that the technological fix for this problem is already in place. Format obsolescence, were it to occur, would be hard for individuals to mitigate. Especially since it isn't likely to happen, it isn't helpful to lump it in with threats they can do something about by, for example, keeping local copies of their cloud data.
On page 148 Rumsey discusses the problem of the scale of the preservation effort needed and the resulting cost:
We need to keep as much as we can as cheaply as possible. ... we will have to invent ways to essentially freeze-dry data, to store data at some inexpensive low level of curation, and at some unknown time in the future be able to restore it. ... Until such a long-term strategy is worked out, preservation experts focus on keeping digital files readable by migrating data to new hardware and software systems periodically. Even though this looks like a short-term strategy, it has been working well ... for three decades and more.Yes, it has been working well and will continue to do so provided the low level of curation manages find enough money to keep the bits safe. Emulation will ensure that if the bits survive we will be able to render them, and it does not impose significant curation costs along the way.
The aggressive (and therefore necessarily lossy) compression Rumsey enviasges would reduce storage costs, and I've been warning for some time that Storage Will Be Much Less Free Than It Used To Be. But it is important not to lose sight of the fact that ingest, not storage, is the major cost in digital preservation. We can't keep it all; deciding what to keep and putting it some place safe is the most expensive part of the process.
On page 163 Rumsey switches to ignoring the cost and assuming that, magically, storage supply will expand to meet the demand:
Our appetite for more and more data is like a child's appetite for chocolate milk: ... So rather than less, we are certain to collect more. The more we create, paradoxically, the less we can afford to lose.Alas, we can't store everything we create now, and the situation isn't going to get better.
On page 166 Rumsey writes:
Other than the fact that preservation yields long-term rewards, and most technology funding goes to creating applications that yield short-term rewards, it is hard to see why there is so little investment, either public or private, in preserving data. The culprit is our myopic focus on short-term rewards, abetted by financial incentives that reward short-term thinking. Financial incentives are matters of public policy, and can be changed to encourage more investment in digital infrastructure.I completely agree that the culprit is short-term thinking, but the idea that "incentives ... can be changed" is highly optimistic. The work of, among others, Andrew Haldane at the Bank of England shows that short-termism is a fundamental problem in our global society. Inadequate investment in infrastructure, both physical and digital, is just a symptom, and is far less of a problem than society's inability to curb carbon emissions.
Finally, some nits to pick. On page 7 Rumsey writes of the Square Kilometer Array:
up to one exabyte (1018 bytes) of data per dayI've already had to debunk another "exabyte a day" claim. It may be true that the SKA generates an exabyte a day but it could not store that much data. An exabyte a day is most of the world's production of storage. Like the Large Hadron Collider, which throws away all but one byte in a million before it is stored the SKA actually stores only(!) a petabyte a day (according to Ian Emsley, who is responsible for planning its storage). A book about preserving information for the long term should be careful to maintain the distinction between the amounts of data generated, and stored. Only the stored data is relevant.
On page 46 Rumsey writes:
our recording medium of choice, the silicon chip, is vulnerable to decay, accidental deletion and overwritingOur recording medium of choice is not, and in the foreseeable future will not be, the silicon chip. It will be the hard disk, which is of course equally vulnerable, as any read-write digital medium would be. Write-once media would be somewhat less vulnerable, and they definitely have a role to play, but they don't change the argument.
I’ve changed the license on my content to CC-BY: Creative Commons Attribution 4.0.
UPDATE 25 May 2016: The feed metadata is now updated too. “We copy documents based on metadata.”