One question I got asked after giving my Code4Lib presentation on WebSockets was how I created my slides. I’ve written about how I create HTML slides before, but this time I added some new features like an audience interface that synchronizes automatically with the slides and allows for audience participation.
TL;DR I’ve open sourced starterdeck-node for creating synchronized and interactive HTML slide decks.
For a presentation on WebSockets I gave at Code4Lib 2014, I wanted to provide another example from within the presentation itself of what you can do with WebSockets. If you have the slides and the audience notes handout page open at the same time, you will see how they are synchronized. (Beware slowness as it is a large self-contained HTML download using data URIs.) When you change to certain slides in the presenter view, new content is revealed in the audience view. Because the slides are just an HTML page, it is possible to make the slides more interactive. WebSockets are used to allow the slides to send messages to each audience members’ browser and reveal notes. I am never able to say everything that I would want to in one short 20 minute talk, so this provided me a way to give the audience some supplementary material.
Another nice side benefit of getting the audience to notes before the presentation starts is that you can include your contact information and Twitter handle on the page.
I have wrapped up all this functionality for creating interactive slide decks into a project called starterdeck-node. It includes the WebSocket server and a simple starting point for creating your own slides. It strings together a bunch of different tools to make creating and deploying slide decks like this simpler so you’ll need to look at the requirements. This is still definitely just a tool for hackers, but having this scaffolding in place ought to make the next slide deck easier to create.
Here’s a video where I show starterdeck-node at work. Slides on the left; audience notes on the right.Other Features
While the new exciting feature added in this version of the project is synchronization between presenter slides and audience notes, there are also lots of other great features if you want to create HTML slide decks. Even if you aren’t going to use the synchronization feature, there are still lots of reasons why you might want to create your HTML slides with starterdeck-node.
Onstage view. Part of what gets built is a DZSlides onstage view where the presenter can see the current slide, next slide, speaker notes, and current time.
Single page view. This view is a self-contained, single-page layout version of the slides and speaker notes. This is a much nicer way to read a presentation than just flipping through the slides on various slide sharing sites. If you put a lot of work into your talk and are writing speaker notes, this is a great way to reuse them.
PDF backup. A script is included to create a PDF backup of your presentation. Sometimes you have to use the computer at the podium and it has an old version of IE on it. PDF backup to the rescue. While you won’t get all the features of the HTML presentation you’re still in business. The included Node.js app provides a server so that a headless browser can take screenshots of each slide. These screenshots are then compiled into the PDF.Examples
I’d love to hear from anyone who tries to use it. I’ll list any examples I hear about below.
Jason Ronallo: A Plugin For Mediaelement.js For Preview Thumbnails on Hover Over the Time Rail Using WebVTT
The time rail or progress bar on video players gives the viewer some indication of how much of the video they’ve watched, what portion of the video remains to be viewed, and how much of the video is buffered. The time rail can also be clicked on to jump to a particular time within the video. But figuring out where in the video you want to go can feel kind of random. You can usually hover over the time rail and move from side to side and see the time that you’d jump to if you clicked, but who knows what you might see when you get there.
Some video players have begun to use the time rail to show video thumbnails on hover in a tooltip. For most videos these thumbnails give a much better idea of what you’ll see when you click to jump to that time. I’ll show you how you can create your own thumbnail previews using HTML5 video.
TL;DR Use the time rail thumbnails plugin for Mediaelement.js.Archival Use Case
We usually follow agile practices in our archival processing. This style of processing became popularized by the article More Product, Less Process: Revamping Traditional Archival Processing by Mark A. Greene and Dennis Meissner. For instance, we don’t read every page of every folder in every box of every collection in order to describe it well enough for us to make the collection accessible to researchers. Over time we may decide to make the materials for a particular collection or parts of a collection more discoverable by doing the work to look closer and add more metadata to our description of the contents. But we try not to allow the perfect from being the enemy of the good enough. Our goal is to make the materials accessible to researchers and not hidden in some box no one knows about.
Some of our collections of videos are highly curated like for video oral histories. We’ve created transcripts for the whole video. We extract out the most interesting or on topic clips. For each of these video clips we create a WebVTT caption file and an interface to navigate within the video from the transcript.
At NCSU Libraries we have begun digitizing more archival videos. And for these videos we’re much more likely to treat them like other archival materials. We’re never going to watch every minute of every video about cucumbers or agricultural machinery in order to fully describe the contents. Digitization gives us some opportunities to automate the summarization that would be manually done with physical materials. Many of these videos don’t even have dialogue, so even when automated video transcription is more accurate and cheaper we’ll still be left with only the images. In any case, the visual component is a good place to start.Video Thumbnail Previews
When you hover over the time rail on some video viewers, you see a thumbnail image from the video at that time. YouTube does this for many of its videos. I first saw that this would be possible with HTML5 video when I saw the JW Player page on Adding Preview Thumbnails. From there I took the idea to use an image sprite and a WebVTT file to structure which media fragments from the sprite to use in the thumbnail preview. I’ve implemented this as a plugin for Mediaelement.js. You can see detailed instructions there on how to use the plugin, but I’ll give the summary here.1. Create an Image Sprite from the Video
This uses ffmpeg to take a snapshot every 5 seconds in the video and then uses montage (from ImageMagick) to stitch them together into a sprite. This means that only one file needs to be downloaded before you can show the preview thumbnail.ffmpeg -i "video-name.mp4" -f image2 -vf fps=fps=1/5 video-name-%05d.jpg montage video-name*jpg -tile 5x -geometry 150x video-name-sprite.jpg 2. Create a WebVTT metadata file
This is just a standard WebVTT file except the cue text is metadata instead of captions. The URL is to an image and uses a spatial Media Fragment for what part of the sprite to display in the tooltip.WEBVTT 00:00:00.000 --> 00:00:05.000 http://example.com/video-name-sprite.jpg#xywh=0,0,150,100 00:00:05.000 --> 00:00:10.000 http://example.com/video-name-sprite.jpg#xywh=150,0,150,100 00:00:10.000 --> 00:00:15.000 http://example.com/video-name-sprite.jpg#xywh=300,0,150,100 00:00:15.000 --> 00:00:20.000 http://example.com/video-name-sprite.jpg#xywh=450,0,150,100 00:00:20.000 --> 00:00:25.000 http://example.com/video-name-sprite.jpg#xywh=600,0,150,100 00:00:25.000 --> 00:00:30.000 http://example.com/video-name-sprite.jpg#xywh=0,100,150,100 3. Add the Video Thumbnail Preview Track
Put the following within the <video> element.<track kind="metadata" class="time-rail-thumbnails" src="http://example.com/video-name-sprite.vtt"></track> 4. Initialize the Plugin
See Bug Sprays and Pets with sound.Installation
One of the DOM API features I hadn’t used before is MutationObserver. One thing the thumbnail preview plugin needs to do is know what time is being hovered over on the time rail. I could have calculated this myself, but I wanted to rely on MediaElement.js to provide the information. Maybe there’s a callback in MediaElement.js for when this is updated, but I couldn’t find it. Instead I use a MutationObserver to watch for when MediaElement.js changes the DOM for the default display of a timestamp on hover. Looking at the time code there then allows the plugin to pick the correct cue text to use for the media fragment. MutationObserver is more performant than the now deprecated MutationEvents. I’ve experienced very little latency using a MutationObserver which allows it to trigger lots of events quickly.
The plugin currently only works in the browsers that support MutationObserver, which is most current browsers. In browsers that do not support MutationObserver the plugin will do nothing at all and just show the default timestamp on hover. I’d be interested in other ideas on how to solve this kind of problem, though it is nice to know that plugins that rely on another library have tools like MutationObserver around.Other Caveats
This plugin is brand new and works for me, but there are some caveats. All the images in the sprite must have the same dimensions. The durations for each thumbnail must be consistent. The timestamps currently aren’t really used to determine which thumbnail to display, but is instead faked relying on the consistent durations. The plugin just does some simple addition and plucks out the correct thumbnail from the array of cues. Hopefully in future versions I can address some of these issues.Discoveries
Having this feature be available for our digitized video, we’ve already found things in our collection that we wouldn’t have seen before. You can see how a “Profession with a Future” evidently involves shortening your life by smoking (at about 9:05). I found a spinning spherical display of Soy-O and synthetic meat (at about 2:12). Some videos switch between black & white and color which you wouldn’t know just from the poster image. And there are some videos, like talking heads, that appear from the thumbnails to have no surprises at all. But maybe you like watching boiling water for almost 13 minutes.
OK, this isn’t really a discovery in itself, but it is fun to watch a head banging JFK as you go back and forth over the time rail. He really likes milk. And Eisenhower had a different speaking style.
You can see this in action for all of our videos on the NCSU Libraries’ Rare & Unique Digital Collections site and make your own discoveries. Let me know if you find anything interesting.Preview Thumbnail Sprite Reuse
Since we already had the sprite images for the time rail hover preview, I created another interface to allow a user to jump through a video. Under the video player is a control button that shows a modal with the thumbnail sprite. The sprite alone provides a nice overview of the video that allows you to see very quickly what might be of interest. I used an image map so that the rather large sprite images would only have to be in memory once. (Yes, image maps are still valid in HTML5 and have their legitimate uses.) jQuery RWD Image Maps allows the map area coordinates to scale up and down across devices. Hovering over a single thumb will show the timestamp for that frame. Clicking a thumbnail will set the current time for the video to be the start time of that section of the video. One advantage of this feature is that it doesn’t require the kind of fine motor skill necessary to hover over the video player time rail and move back and forth to show each of the thumbnails.
This feature has just been added this week and deployed to production this week, so I’m looking for feedback on whether folks find this useful, how to improve it, and any bugs that are encountered.Summarization Services
I expect that automated summarization services will become increasingly important for researchers as archives do more large-scale digitization of physical collections and collect more born digital resources in bulk. We’re already seeing projects like fondz which autogenerates archival description by extracting the contents of born digital resources. At NCSU Libraries we’re working on other ways to summarize the metadata we create as we ingest born digital collections. As we learn more what summarization services and interfaces are useful for researchers, I hope to see more work done in this area. And this is just the beginning of what we can do with summarizing archival video.
Dr. Safiya U. Noble‘s selfie
Recently two awesome things changed my world. Beyoncé released her album Lemonade and the BC Library Association conference happened.
Cory Doctorow’s opening keynote was brilliant. As expected he gave a smart and funny talk full of examples to illustrate the bigger issues. I don’t think anyone will forget the baby monitor cam that was taken over by creepy men who were taunting the baby as an example of privacy flaws in everyday “smart” devices. I feel like he gave libraries more credit than we deserve. I felt pretty depressed and without hope thinking about how libraries continue to choose proprietary vendor technology that does not reflect our core values.
One of my favourite conversations at this conference was with Alison Macrina, from the Library Freedom Project. We talked about many things, including our mutual love for Beyoncé. She saw her concert in Houston and told me about the amazing choreography for Freedom, which was the last song Beyoncé performed.
When I asked friends what their favourite song was on Beyoncé’s Lemonade a few people said that they thought of the whole album as one song, or as an opera. So, on the way home from the conference, I was listening the whole album and hearing it in a new way. I jumped off the bus and walked up the street to my home just as Freedom came on, by the end of the song I had a realization. Beyonce embodies freedom by owning her creative product, but perhaps even more importantly she owns the means of distribution. Like Beyoncé, libraries need to own our distribution platforms.
Tidal, Beyonce’s distribution channel, is a streaming music platform that is a competitor to Spotify and Pandora. I’m not sure what the ownership breakdown is, but Tidal is owned by artists. A few of the artist-owners are Jay Z , Beyoncé, Prince, Rihanna, Kanye West, Nicki Minaj, Daft Punk, Jack White, Madonna, Arcade Fire, Alicia Keys, Usher, Chris Martin, Calvin Harris, deadmau5, Jason Aldean and J. Cole. Initially many people thought Tidal was a failure, but that has changed.
Lemonade was launched on HBO on April 22. On the 23rd the only place Lemonade was available was streamed through Tidal, and for purchase the day after. On the 25th it was available for purchase by track or album to Amazon Music and the iTunes Store. Physical copies of the album went on sale at brick and mortar stores on May 6. Initially the shift to digital distribution replicated the business model for distributing records which generated huge profits for record labels, but often cut out the artist.
PKP (Public Knowledge Project) is a great example of how academic libraries built open source publishing tools to challenge scholarly publishers. This has been a game changer in terms of how research is published, distributed and accessed.
For more than 10 years we’ve been complaining about Overdrive’s DRM-laced ebooks, and the crappy user experience. Instead of relying on vendors, we need to build our own distribution platform for ebooks. I realize that it’s the content our patrons are hungry for, and that we’re neither Jay Z, nor Beyoncé. If publishers aren’t willing to play with us, we have strong relationships with authors and could work directly with them as content creators. There needs to be a new business model where people can access creative works and that the content creators can make a living. Access Copyright’s model doesn’t work, but we could work with content creators to figure out a business model that does.
In her closing keynote at BCLA activist and writer Harsha Walia talked about systemic power structures and the need to change how we do things. Talking about pay equity she said “It’s not about breaking the glass ceiling, it’s about shattering the whole house.” Vendor rules and platforms are about profit margins for those companies. Libraries need to change the rules of the game.
Tryna rain, tryna rain on the thunder
Tell the storm I’m new
I’m a wall, come and march on the regular
Painting white flags blue
Freedom! Freedom! I can’t move
Freedom, cut me loose!
Freedom! Freedom! Where are you?
Cause I need freedom too!
William Denton: Do not insert non-US foreign coins, damaged coins, bent coins, dirty coins, commemorative coins, tokens, Eisenhower silver dollars or 1943 US pennies
Inserting anything but clean and undamaged coins into the Coinstar coin-counting machine is unacceptable and could impact functionality. Unacceptable items will not be counted and may not be returned. Unacceptable items include, but are not limited to:Behind every item, a story.
- 1943 US pennies,
- alcohol wipes,
- animal crackers,
- animal or human teeth,
- belt clips,
- bent coins,
- bottle caps,
- broken glass,
- candy wrappers,
- cat litter,
- commemorative coins,
- contact lenses,
- cotton balls,
- cotton swabs,
- cuff links,
- damaged coins,
- dirty coins,
- dog food,
- drill bits,
- ear plugs,
- Eisenhower silver dollars,
- finger nails,
- flash drives,
- foam objects,
- foreign coins,
- French fries,
- fruit snacks,
- gold fish,
- guitar picks,
- gum wrappers,
- gummy worms/bears,
- hair clips,
- jar lids,
- key chains,
- miniature dice,
- name tags,
- paper clips,
- pen caps,
- pine cone parts,
- pipe cleaners,
- playing cards,
- pop can tabs,
- popsicle sticks,
- quilt squares,
- rubber bands,
- rubber lid seals,
- screw driver bits,
- SD cards,
- tie tacks,
- tire caps,
- tooth picks,
- tree bark,
- wall hooks,
- watch bands,
In case you’re curious, Wikipedia explains about the steel 1943 US penny (“the only regular-issue United States coin that can be picked up with a magnet”) and the large Eisenhower dollar (“the new dollars failed to circulate to any degree, except in and around Nevada casinos, where they took the place of privately issued tokens”).
The obsession with “dirty money” warrants deeper analysis than I can supply.
It's been a little while since we checked in on the Islandora CLAW community sprints, but they've been ticking along every month since last November, knocking down tickets and gradually building up the next major version of Islandora. This last sprint was especially significant in two ways: we welcomed two new sprinters (Ed Fugikawa and Ben Rosner), and we closed the most tickets of any sprint since they began. You can check out those stellar results here.
The next sprint will run from May 16 - 27, kicking off with a meeting on Monday. If you would like to take part, please add your name to the sprint here. We've just gone through some heavy re-structuring of how Islandora CLAW's pieces are stored in GitHub, so there will be opportunities to participate on a documentation level (for those non-developers out there who still want to contribute directly).
Learn about the JSON-LD serialization of linked data and its various iterations.
“Needs assessment help establish the customer as the center of the service and bring the librarian and the library staff back to what is at the core of a library service: What do the library customers need?” (Dudden, 2007, p. 90)
As mentioned in my last post, Mackellar and Gerding, authors of ALA grant funding monographs, stress the importance of conducting a needs assessment as the first step in approaching a grant proposal. It may be painful at first, but once a thorough study has been made, the remaining grant proposal steps become easier. You become well-informed about the community you serve and identify current service gaps in your library. Not until you know your community’s needs will you be able to justify funding. Through my readings, I discovered that this includes your non-users as well as your current users. Remember, funders want to make sure people are helped by your project and therefore a guaranteed success.
In a nutshell, a needs assessment is, “A systematic process determining discrepancies between optimal and actual performance of a service by reviewing the service needs of a customer and stakeholders and then selecting interventions that allow the service to meet those needs in the fastest, most cost-effective manner” (Dudden, 2007, p. 61). According to Dudden, in her book Using Benchmarking, Needs Assessment, Quality Improvement, Outcome Measurement and Library Standards, there are 12 steps in conducting a needs assessment: (1) Define your purpose or question (2) Gather your team, (3) Identify stakeholders and internal and external factors, (4) Define the question (5) Determine resources available, (6) Develop a timeline (7) Define your customers (8) Gather data from identified sources, (9) Analyze the data, (10) Make a decision and a plan of action, (11) Report to administration and evaluate the needs assessment process, and (12) Repeat needs assessment in the future to see if the gap is smaller.
As librarians, we like to research something comprehensively before we dive into a project. Researching what others have done within their needs assessment project is an awesome strategy to get acquainted with the process and garner ideas. There are several approaches to gain information from a sample of your community via surveys, interviews, focus groups, observations, community forums/town meetings, suggestion boxes, and public records. If you bring in a technology-related project, your observation method may become a usability or user experience investigation, for example. I learned that it is important to use multi-forms of techniques together and then combine the results to formulate trustworthy data. I personally think that surveys are overly used, but I can live with it if used as one of many approaches in a study. Take for instance the case back in 2011 when Penn State wanted to build a knowledge commons (Lynn, 2011). Their project question or mission was to conduct a ten-month needs assessment in order to find out what new programming initiatives need creation and how the physical knowledge commons space should be configured in support of these endeavors. I was amazed to read that they used seven techniques to inform their decisions: conducted site visits to other library knowledge commons, reviewed the literature on this topic, conducted student and faculty focus groups, created an online survey focusing on the physical library space and resources, created a survey exclusively for incoming freshmen, evaluated knowledge common websites from other institutions, and evaluated work spaces (circulation desk, reference desk, office space, etc.). After each phase of the needs assessment was completed, they were able to prioritize space needs and draft a final report of their findings to administration and to the architectural firm. One thing mentioned in this case study article is that a needs assessment has secondary effects that are essential to the process – it markets the project immensely and also invokes support from all stakeholders. I am convinced that completing this process will get you one step closer to definite funding.
The Needs Assessment: Forum Unified Education Technology Suite
National Center for Education Statistics
IT Needs Assessment & Strategic Planning Surveys
Methods for Conducting and Educational Needs Assessment
Guidelines for Cooperative Extension System Professionals
by Paul F. Cawley, University of Idaho
Chapter 3: Assessing Community Needs and Resources
Community Tool Box, University of Kansas
Information Gathering Toolkit
Community Needs Assessment Survey Guide
Utah State University
Assessing Faculty’s Technology Needs
by Tena B. Crew
Using Needs Assessment as a Holistic Means for Improving Technology Infrastructure
by Joni E. Spurlin, edited by Diana G. Oblinger
Educause Learning Initiative
U.S. Department of Commerce
Google Map Maker
Dudden, R. F. (2007). Using benchmarking, needs assessment, quality improvement, outcome measurement, and library standards: A how-to-do-it manual. New York, NY: Neal-Schuman.
Lynn, V. (2011). A knowledge commons needs assessment. College & Research Libraries News, 72(8), 464-467.
MacKellar, P. H., & Gerding, S. K. (2010). Winning grants: A how-to-do-it manual for librarians with multimedia tutorials and grant development tools. New York, NY: Neal-Schuman.
Dena L. Luce
Dena L. Luce
From VIVO 2016 Conference organizers
We’ve extended our call for posters at VIVO16! If you missed the earlier deadline for posters, you have until May 23 to submit your poster abstract. The poster session lets you share your work in an informal, relaxed setting and chat with individual community members.
From Mike Conlon, VIVO project director
VIVO User Group Meeting. We had a great VIVO User Group meeting in Chicago at the Galter Health Science Library. You can find materials from the meeting on line here. The two day meeting sessions included:
From Dermot Frost, Chair, OR2016 Host Committee; David Minor, Matthias Razum, and Sarah Shreeves, Co-Chairs, OR2016 Program Committee
Yesterday, the National Institutes of Health (NIH) Director Dr. Francis Collins announced the appointment of Dr. Patricia Flatley Brennan as the next director of the National Library of Medicine (NLM), the world’s largest medical library and a component of NIH. Dr. Brennan comes to NLM from the University of Wisconsin-Madison, where she is the Lillian L. Moehlman Bascom Professor, School of Nursing and College of Engineering. She will be the first woman and first nurse to lead NLM. Dr. Brennan is expected to assume her post in August.
“Dr. Brennan brings her incredible experience of having cared for patients as a practicing nurse, improved the lives of home-bound patients by developing innovative information systems and services designed to increase their independence, and pursued cutting-edge research in data visualization and virtual reality,” said Dr. Collins.
NLM, based on the campus of NIH in Bethesda, Maryland, was founded in 1836 and has earned a reputation for innovation and public service. ALA has had the pleasure of working with a number of NLM staffers, and we look forward to collaborating with Dr. Brennan and her team.
The previous director, Dr. Donald Lindberg, led NLM from 1984 until his retirement in 2015. Among his many achievements was the founding of the National High-Performance Computing and Communications (HPCC) Office in 1992 and his service as its first director for three years. Establishing the HPCC Office was an important early milestone in the development, growth, and institutionalization of advanced information technology within and across federal agencies. I mention HPCC as it was my employer (though now evolved and re-named to the National Coordination Office for Networking Information Technology Research & Development) prior to coming to ALA.
The post New director named for the U.S. National Library of Medicine appeared first on District Dispatch.
Sunday June 26, 2016 from 3:00 pm to 4:00 pmSafiya Noble
Dr. Noble is an Assistant Professor in the Department of Information Studies in the Graduate School of Education and Information Studies at UCLA. She conducts research in socio-cultural informatics; including feminist, historical and political-economic perspectives on computing platforms and software in the public interest. Her research is at the intersection of culture and technology in the design and use of applications on the Internet.
All on Friday, June 24 from 1:00 pm – 4:00pm
Digital Privacy and Security: Keeping You and Your Library Safe and Secure in a Post-Snowden World
Presenters: Jessamyn West, Library Technologist at Open Library and Blake Carver, LYRASIS
Islandora for Managers: Open Source Digital Repository Training
Presenters: Erin Tripp, Business Development Manager at discoverygarden inc. and Stephen Perkins, Managing Member of Infoset Digital Publishing
Technology Tools and Transforming Librarianship
Presenters: Lola Bradley, Reference Librarian, Upstate University; Breanne Kirsch, Coordinator of Emerging Technologies, Upstate University; Jonathan Kirsch, Librarian, Spartanburg County Public Library; Rod Franco, Librarian, Richland Library; Thomas Lide, Learning Engagement Librarian, Richland Library
Top Technology Trends
Sunday June 26, 2016 from 1:00 pm to 2:30 pm
This regular program features our ongoing roundtable discussion about trends and advances in library technology by a panel of LITA technology experts. The panelists will describe changes and advances in technology that they see having an impact on the library world, and suggest what libraries might do to take advantage of these trends. Panelists will be announced soon. More information on Top Tech Trends go to: http://ala.org/lita/ttt
Imagineering – Science Fiction/Fantasy and Information Technology: Where We Are and Where We Could Have Been
Saturday June 25, 2016, 1:00 pm – 2:30 pm
Science Fiction and Fantasy Literature have a unique ability to speculate about things that have never been, but can also be predictive about things that never were. Through the lens provided by alternate history/counterfactual literature one can look at how the world might have changed if different technologies had been pursued. For examples what if instead of developing microprocessors computing depended on vacuum tubes or something fantastic like the harmonies in the resonance of crystals? Join LITA, the Imagineering Interest Group, and a panel of distinguished Science Fiction and Fantasy writers as they discuss what the craft can tell us about not only who we are today, but who, given a small set of differences, we could have been. The availability of authors can change, currently slated authors are:
- Charlie Jane Anders — All the Birds in the Sky
- Katherine Addison — The Goblin Emperor
- Catheryne Valente — Radiance
- Brian Staveley — The Providence of Fire
Friday June 24, 2016, 3:00 pm – 4:00 pm
LITA Open House is a great opportunity for current and prospective members to talk with Library and Information Technology Association (LITA) leaders and learn how to make connections and become more involved in LITA activities.
This year marks a special LITA Happy Hour as we kick off the celebration of LITA’s 50th anniversary. Make sure you join the LITA Membership Development Committee and LITA members from around the country for networking, good cheer, and great fun! Expect lively conversation and excellent drinks; cash bar. Help us cheer for 50 years of library technology.
I'd like to suggest answers to five questions related to the economics of long-term storage:
- How far into the future should we be looking?
- What do the economics of storing data for that long look like?
- How long should the media last?
- How reliable do the media need to be?
- What should the architecture of a future storage system look like?
Iain Emsley's talk at PASIG2016 on planning the storage requirements of the 1PB/day Square Kilometer Array mentioned that the data was expected to be used for 50 years. How hard a problem is planning with this long a horizon? Lets go back 50 years and see.
DiskIBM2314s (source)In 1966 as I was writing my first program disk technology was about 10 years old; the IBM 350 RAMAC was introduced in 1956. The state of the art was the IBM 2314. Each removable disk pack stored 29MB on 11 platters with a 310KB/s data transfer rate. Roughly equivalent to 60MB/rack. The SKA would have needed to add nearly 17M, or about 10 square kilometers, of racks each day.
R. M. Fano's 1967 paper The Computer Utility and the Community reports that for MIT's IBM 7094-based CTSS:
the cost of storing in the disk file the equivalent of one page of single-spaced typing is approximately 11 cents per month. It would have been hard to believe a projection that in 2016 it would be more than 7 orders of magnitude cheaper.
IBM2401s By Erik Pitti CC BY 2.0.The state of the art in tape storage was the IBM 2401, the first nine-track tape drive, storing 45MB per tape with a 320KB/s maximum transfer rate. Roughly equivalent to 45MB/rack of accessible data.
Your 1966 alter-ego's data management plan would be correct in predicting that 50 years later the dominant media would be "disk" and "tape", and that disk's lower latency would carry a higher cost per byte. But its hard to believe that any more detailed predictions about the technology would be correct. The extraordinary 30-year history of 30-40% annual cost per byte decrease, the Kryder rate, had yet to start.
Although disk is a 60-year old technology, a 50-year time horizon for a workshop on the Future of Storage may seem too long to be useful. But a 10-year time horizon is definitely too short to be useful. Storage is not just a technology, but also a multi-billion dollar manufacturing industry dominated by a few huge businesses, with long, hard-to-predict lead times.
Seagate 2008 roadmapTo illustrate the lead times, here is a Seagate roadmap slide from 2008 predicting that perpendicular magnetic recording (PMR) would be replaced in 2009 by heat-assisted magnetic recording (HAMR), which would in turn be replaced in 2013 by bit-patterned media (BPM).
In 2016, the trade press is reporting that:
Seagate plans to begin shipping HAMR HDDs next year.ASTC 2016 roadmap Here is a recent roadmap from ASTC showing HAMR starting in 2017 and BPM in 2021. So in 8 years HAMR has gone from next year to next year, and BPM has gone from 5 years out to 5 years out. The reason for this real-time schedule slip is that as technologies get closer and closer to the physical limits, the difficulty and above all cost of getting from lab demonstration to shipping in volume increases exponentially.
A recent TrendFocus report suggests that the industry is preparing to slip the new technologies even further:
The report suggests we could see 14TB PMR drives in 2017 and 18TB SMR drives as early as 2018, with 20TB SMR drives arriving by 2020.I believe this is mostly achieved by using helium-filled drives to add platters, and thus cost, not by increasing density above current levels.
Tape Historically, tape was the medium of choice for long-term storage. Its basic recording technology is around 8 years behind hard disk, so it has a much more credible technology road-map than disk. But its importance is fading rapidly. There are several reasons:
- Tape is a very small market in unit terms:Just under 20 million LTO cartridges were sent to customers last year. As a comparison let's note that WD and Seagate combined shipped more than 350 million disk drives in 2015; the tape cartridge market is less than 0.00567 per cent of the disk drive market in unit terms
- In effect there is now a single media supplier, raising fears of price gouging and supply vulnerability. The disk market has consolidated too, but there are still two very viable suppliers.
- The advent of data-mining and web-based access to archives make the long access latency of tape less tolerable.
- To maximize the value of the limited number of slots in the robots it is necessary to migrate data to new, higher-capacity cartridges as soon as they appear. This has two effects. First, it makes the long data life of tape media less important. Second, it consumes a substantial fraction of the available bandwidth, up to a quarter in some cases.
Exabytes shippedFirst, the conventional wisdom as expressed by the operators of cloud services and the disk industry, and supported by these graphs showing how few exabytes of flash are shipped in comparison to disk. Although flash is displacing disk from markets such as PCs, laptops and servers, Eric Brewer's fascinating keynote at this year's FAST conference started from the assertion that the only feasible medium for bulk data storage in the cloud was spinning disk.
NAND vs. HDD capex/TBThe argument is that flash, despite its many advantages, is and will remain too expensive for the capacity layer. The graph of the ratio of capital expenditure per TB of flash and hard disk shows that each exabyte of flash contains about 50 times as much capital as an exabyte of disk. Because:
factories to build 3D NAND are vastly more expensive than plants that produce planar NAND or HDDs -- a single plant can cost $10 billionno-one is going to invest the roughly $80B needed to displace hard disks because the investment would not earn a viable return.
WD unit shipmentsSecond, the view from the flash advocates. They argue that the fabs will be built, because they are no longer subject to conventional economics. The governments of China, Japan, and other countries are stimulating their economies by encouraging investment, and they regard dominating the market for essential chips as a strategic goal, something that justifies investment. They are thinking long-term, not looking at the next quarter's results. The flash companies can borrow at very low interest rates, so even if they do need to show a return, they only need to show a very low return.
Seagate unit shipmentsIf the fabs are built, the increase in supply will increase the Kryder rate of flash. This will increase the trend of storage moving from disk to flash. In turn, this will increase the rate at which disk vendor's unit shipments decrease. In turn, this will decrease their economies of scale, and cause disk's Kryder rate to go negative. The point at which flash becomes competitive with disk moves closer in time. Disk enters a death spiral.
The result would be that the Kryder rate for the capacity market, which has been very low, would get back closer to the historic rate sooner, and thus that storing bulk data for the long term would be significantly cheaper. But this isn't the only effect. When Data Domain's disk-based backup displaced tape, greatly reducing the access latency for backup data, the way backup data was used changed. Instead of backups being used mostly to cover media failures, they became used mostly to cover operator errors.
Similarly, if flash were to displace disk, the access latency for stored data would be significantly reduced, and the way the data is used would change. Because it is more accessible, people would find more ways to extract value from it. The changes induced by reduced latency would probably significantly increase the perceived value of the stored data, which would itself accelerate the turn-over from disk to flash.
I hope everyone is familiar with the concept of "stranded assets", for example the idea that if we're not to fry the planet oil companies cannot develop many of the reserves they carry on their books. Both views of the future of disk vs. flash involve a reduction in the unit volume of drives. The disk vendors cannot raise prices significantly, doing so would accelerate the reduction in unit volume. Thus their income will decrease, and thus their ability to finance the investments needed to get HAMR and then BPM into the market. The longer they delay these investments, the more difficult it becomes to afford them. Thus it is likely that HAMR and BPM will be "stranded technologies", advances we know how to, but never actually deploy.
Alternate MediaMedia trends to 2014Robert Fontana of IBM has an excellent overview of the roadmaps for tape, disk, optical and NAND flash (PDF) through the early 2020s. Clearly no other technology will significantly impact the storage market before then.
SanDisk shipped the first flash SSDs to GRiD Systems in 1991. Even if flash impacts the capacity market in 2018, it will have been 27 years after the first shipment. The storage technology that follows flash is probably some form of Storage Class Memory (SCM) such as XPoint. Small volumes of some forms of SCM have been shipping for a couple of years. Like flash, SCMs leverage much of the semiconductor manufacturing technology. Optimistically, one might expect SCM to impact the capacity market sometime in the late 2030s.
I'm not aware of any other storage technologies that could compete for the capacity market in the next three decades. SCMs have occupied the niche for a technology that exploits semiconductor manufacturing. A technology that didn't would find it hard to build the manufacturing infrastructure to ship the thousands of exabytes a year the capacity market will need by then.
Economics of Long-Term StorageCost vs. Kryder rateHere is a graph from a model of the economics of long-term storage I built back in 2012 using data from Backblaze and the San Diego Supercomputer Center. It plots the net present value of all the expenditures incurred in storing a fixed-size dataset for 100 years against the Kryder rate. As you can see, at the 30-40%/yr rates that prevailed until 2010, the cost is low and doesn't depend much on the precise Kryder rate. Below 20%, the cost rises rapidly and depends strongly on the precise Kryder rate.
2014 cost/byte projectionAs it turned out, we were already well below 20%. Here is a 2014 graph from Preeti Gupta, a Ph.D. student at UC Santa Cruz, plotting $/GB against time. The red lines are projections at the industry roadmap's 20% and my less optimistic 10%. It shows three things:
- The slowing started in 2010, before the floods hit Thailand.
- Disk storage costs in 2014, two and a half years after the floods, were more than 7 times higher than they would have been had Kryder's Law continued at its usual pace from 2010, as shown by the green line.
- If the industry projections pan out, as shown by the red lines, by 2020 disk costs will be between 130 and 300 times higher than they would have been had Kryder's Law continued.
Long-Lived Media?Every few months there is another press release announcing that some new, quasi-immortal medium such as 5D quartz or stone DVDs has solved the problem of long-term storage. But the problem stays resolutely unsolved. Why is this? Very long-lived media are inherently more expensive, and are a niche market, so they lack economies of scale. Seagate could easily make disks with archival life, but they did a study of the market for them, and discovered that no-one would pay the relatively small additional cost. The drives currently marketed for "archival" use have a shorter warranty and a shorter MTBF than the enterprise drives, so they're not expected to have long service lives.
The fundamental problem is that long-lived media only make sense at very low Kryder rates. Even if the rate is only 10%/yr, after 10 years you could store the same data in 1/3 the space. Since space in the data center racks or even at Iron Mountain isn't free, this is a powerful incentive to move old media out. If you believe that Kryder rates will get back to 30%/yr, after a decade you could store 30 times as much data in the same space.
The reason why disks are engineered to have a 5-year service life is that, at 30-40% Kryder rates, they were going to be replaced within 5 years simply for economic reasons. But, if Kryder rates are going to be much lower going forward, the incentives to replace drives early will be much less, so a somewhat longer service life would make economic sense for the customer. From the disk vendor's point of view, a longer service life means they would sell fewer drives. Not a reason to make them.
Additional reasons for skepticism include:
- The research we have been doing in the economics of long-term preservation demonstrates the enormous barrier to adoption that accounting techniques pose for media that have high purchase but low running costs, such as these long-lived media.
- The big problem in digital preservation is not keeping bits safe for the long term, it is paying for keeping bits safe for the long term. So an expensive solution to a sub-problem can actually make the overall problem worse, not better.
- These long-lived media are always off-line media. In most cases, the only way to justify keeping bits for the long haul is to provide access to them (see Blue Ribbon Task Force). The access latency scholars (and general Web users) will tolerate rules out off-line media for at least one copy. As Rob Pike said "if it isn't on-line no-one cares any more".
- So at best these media can be off-line backups. But the long access latency for off-line backups has led the backup industry to switch to on-line backup with de-duplication and compression. So even in the backup space long-lived media will be a niche product.
- Off-line media need a reader. Good luck finding a reader for a niche medium a few decades after it faded from the market - one of the points Jeff Rothenberg got right two decades ago.
- Media failures are only one of many, many threats to stored data, but they are the only one long-lived media address.
- Long media life does not imply that the media are more reliable, only that their reliability decreases with time more slowly.
Double the reliability is only worth 1/10th of 1 percent cost increase. ...
Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.
The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).
Moral of the story: design for failure and buy the cheapest components you can. :-)Eric Brewer made the same point in his 2016 FAST keynote. Because for availability and resilience against disasters they need geographic diversity, they have replicas from which to recover. So spending more to increase media reliability makes no sense, they're already reliable enough. This is because the systems that surround the drives have been engineered to deliver adequate reliability despite the current unreliability of the drives. Thus engineering away the value of more reliable drives.
Future Storage System Architecture?What do we want from a future bulk storage system?
- An object storage fabric.
- With low power usage and rapid response to queries.
- That maintains high availability and durability by detecting and responding to media failures without human intervention.
- And whose reliability is externally auditable.
The following year Ian Adams and Ethan Miller of UC Santa Cruz's Storage Systems Research Center and I looked at this possibility more closely in a Technical Report entitled Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes. We showed that it was indeed plausible that, even at then current flash prices, the total cost of ownership over the long term of a storage system built from very low-power system-on-chip technology and flash memory would be competitive with disk while providing high performance and enabling self-healing.
Two subsequent developments suggest we were on the right track. First, Seagate's announcement of its Kinetic architecture and Western Digital's subsequent announcement of drives that ran Linux, both exploited the processing power available from the computers in the drives that perform command processing, internal maintenance operations, and signal processing to delegate computation from servers to the storage media, and to get IP communication all the way to the media, as DAWN suggested. IP to the drive is a great way to future-proof the drive interface.
FlashBlade hardwareSecond, although flash remains more expensive than hard disk, since 2011 the gap has narrowed from a factor of about 12 to about 6. Pure Storage recently announced FlashBlade, an object storage fabric composed of large numbers of blades, each equipped with:
- Compute: 8-core Xeon system-on-a-chip, and Elastic Fabric Connector for external, off-blade, 40GbitE networking,
- Storage: NAND storage with 8TB or 52TB raw capacity of raw capacity and on-board NV-RAM with a super-capacitor-backed write buffer plus a pair of ARM CPU cores and an FPGA,
- On-blade networking: PCIe card to link compute and storage cards via a proprietary protocol.
DAWN exploits two separate sets of synergies:
- Like FlashBlade, DAWN moves the computation to where the data is, rather then moving the data to where the computation is, reducing both latency and power consumption. The further data moves on wires from the storage medium, the more power and time it takes. This is why Berkeley's Aspire project's architecture is based on optical interconnect technology, which when it becomes mainstream will be both faster and lower-power than wires. In the meantime, we have to use wires.
- Unlike FlashBlade, DAWN divides the object storage fabric into a much larger number of much smaller nodes, implemented using the very low-power ARM chips used in cellphones. Because the power a CPU needs tends to grow faster than linearly with performance, the additional parallelism provides comparable performance at lower power.
Storage systems are extremely reliable, but at scale nowhere near reliable enough to mean data loss can be ignored. Internal auditing, in which the system detects and reports it own losses, for example by hashing the stored data and comparing the result with a stored hash, is important but is not enough. The system's internal audit function will itself have bugs, which are likely to be related to the bugs in the underlying functionality causing data loss. Having the system report "I think everything is fine" is not as reassuring as one would like.
Auditing a system by extracting its entire contents for integrity checking does not scale, and is likely itself to cause errors. Asking a storage system for the hash of an object is not adequate, the system could have remembered the object's hash instead of computing it afresh. Although we don't yet have a perfect solution to the external audit problem, it is clear that part of the solution is the ability to supply a random nonce that is prepended to the object's data before hashing. The result is different every time, the system cannot simply remember it.
AcknowledgementsI'm grateful to Seagate for (twice) allowing me to pontificate about their industry, to Brian Berg for his encyclopedic knowledge of the history of flash, and Tom Coughlin for illuminating discussions and the graph of exabytes shipped. This isn't to say that they agree with any of the above.
FOR IMMEDIATE RELEASE
Duluth, Georgia–May 12, 2016
Equinox and Bintec conduct successful integration testing between meeScan and Evergreen
Equinox is pleased to announce successful integration testing between the meeScan self checkout system provided by Bintec Library Services and the Evergreen open source ILS. Additional information regarding how to configure Evergreen to work with meeScan will be made available to the Evergreen community.
Galen Charlton, Added Services Manager at Equinox, said, “One of the strengths of Evergreen is its ability to integrate with other library software. By performing interoperability testing with firms such as Bintec, Equinox helps to identify and resolve technical roadblocks before they become an issue for libraries.”
Peter Trenciansky, Director for Bintec Library Services commented “We are very excited to offer meeScan to Evergreen libraries around the world. Our service provides a modern and fresh way to check out items while eliminating traditional challenges associated with self-service kiosks. The collaboration between Equinox and Bintec is another milestone towards our goal of contributing to the development of a new generation of welcoming and engaging libraries.”
Bintec Library Services Inc. is a technology company dedicated to the development of solutions that provide added value to libraries and enrich the user experience. The knowledgeable team behind Bintec delivers software and hardware solutions encompassing electromagnetic (EM) security, radio-frequency identification (RFID) technologies, ILS systems integration, large cloud-based architecture and mobile app development. The company is based in Toronto, Canada and services customers across North America and other parts of the world. To find out more visit binteclibraryservices.com
meeScan is a cloud based self checkout system that lets patrons use their smartphones to check out books anywhere in their library. The system uses the built-in camera of the patron’s smartphone or tablet to scan the item barcode. With support for both EM and RFID, it is a full featured alternative to conventional self-check kiosks at a fraction of the cost. meeScan is extremely user friendly, it is simple to setup and requires virtually zero maintenance by the library. Find out more at meescan.com
About Equinox Software, Inc.
Equinox was founded by the original developers and designers of the Evergreen ILS. We are wholly devoted to the support and development of open source software in libraries, focusing on Evergreen, Koha, and the FulfILLment ILL system. We wrote over 80% of the Evergreen code base and continue to contribute more new features, bug fixes, and documentation than any other organization. Our team is fanatical about providing exceptional technical support. Over 98% of our support ticket responses are graded as “Excellent” by our customers. At Equinox, we are proud to be librarians. In fact, half of us have our ML(I)S. We understand you because we *are* you. We are Equinox, and we’d like to be awesome for you. For more information on Equinox, please visit http://www.esilibrary.com.
Evergreen is an award-winning ILS developed with the intent of providing an open source product able to meet the diverse needs of consortia and high transaction public libraries. However, it has proven to be equally successful in smaller installations including special and academic libraries. Today, almost 1400 libraries across the US and Canada are using Evergreen including NC Cardinal, SC Lends, and B.C. Sitka. For more information about Evergreen, including a list of all known Evergreen installations, see http://evergreen-ils.org.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.