Dr. Safiya U. Noble‘s selfie
Recently two awesome things changed my world. Beyoncé released her album Lemonade and the BC Library Association conference happened.
Cory Doctorow’s opening keynote was brilliant. As expected he gave a smart and funny talk full of examples to illustrate the bigger issues. I don’t think anyone will forget the baby monitor cam that was taken over by creepy men who were taunting the baby as an example of privacy flaws in everyday “smart” devices. I feel like he gave libraries more credit than we deserve. I felt pretty depressed and without hope thinking about how libraries continue to choose proprietary vendor technology that does not reflect our core values.
One of my favourite conversations at this conference was with Alison Macrina, from the Library Freedom Project. We talked about many things, including our mutual love for Beyoncé. She saw her concert in Houston and told me about the amazing choreography for Freedom, which was the last song Beyoncé performed.
When I asked friends what their favourite song was on Beyoncé’s Lemonade a few people said that they thought of the whole album as one song, or as an opera. So, on the way home from the conference, I was listening the whole album and hearing it in a new way. I jumped off the bus and walked up the street to my home just as Freedom came on, by the end of the song I had a realization. Beyonce embodies freedom by owning her creative product, but perhaps even more importantly she owns the means of distribution. Like Beyoncé, libraries need to own our distribution platforms.
Tidal, Beyonce’s distribution channel, is a streaming music platform that is a competitor to Spotify and Pandora. I’m not sure what the ownership breakdown is, but Tidal is owned by artists. A few of the artist-owners are Jay Z , Beyoncé, Prince, Rihanna, Kanye West, Nicki Minaj, Daft Punk, Jack White, Madonna, Arcade Fire, Alicia Keys, Usher, Chris Martin, Calvin Harris, deadmau5, Jason Aldean and J. Cole. Initially many people thought Tidal was a failure, but that has changed.
Lemonade was launched on HBO on April 22. On the 23rd the only place Lemonade was available was streamed through Tidal, and for purchase the day after. On the 25th it was available for purchase by track or album to Amazon Music and the iTunes Store. Physical copies of the album went on sale at brick and mortar stores on May 6. Initially the shift to digital distribution replicated the business model for distributing records which generated huge profits for record labels, but often cut out the artist.
PKP (Public Knowledge Project) is a great example of how academic libraries built open source publishing tools to challenge scholarly publishers. This has been a game changer in terms of how research is published, distributed and accessed.
For more than 10 years we’ve been complaining about Overdrive’s DRM-laced ebooks, and the crappy user experience. Instead of relying on vendors, we need to build our own distribution platform for ebooks. I realize that it’s the content our patrons are hungry for, and that we’re neither Jay Z, nor Beyoncé. If publishers aren’t willing to play with us, we have strong relationships with authors and could work directly with them as content creators. There needs to be a new business model where people can access creative works and that the content creators can make a living. Access Copyright’s model doesn’t work, but we could work with content creators to figure out a business model that does.
In her closing keynote at BCLA activist and writer Harsha Walia talked about systemic power structures and the need to change how we do things. Talking about pay equity she said “It’s not about breaking the glass ceiling, it’s about shattering the whole house.” Vendor rules and platforms are about profit margins for those companies. Libraries need to change the rules of the game.
Tryna rain, tryna rain on the thunder
Tell the storm I’m new
I’m a wall, come and march on the regular
Painting white flags blue
Freedom! Freedom! I can’t move
Freedom, cut me loose!
Freedom! Freedom! Where are you?
Cause I need freedom too!
William Denton: Do not insert non-US foreign coins, damaged coins, bent coins, dirty coins, commemorative coins, tokens, Eisenhower silver dollars or 1943 US pennies
Inserting anything but clean and undamaged coins into the Coinstar coin-counting machine is unacceptable and could impact functionality. Unacceptable items will not be counted and may not be returned. Unacceptable items include, but are not limited to:Behind every item, a story.
- 1943 US pennies,
- alcohol wipes,
- animal crackers,
- animal or human teeth,
- belt clips,
- bent coins,
- bottle caps,
- broken glass,
- candy wrappers,
- cat litter,
- commemorative coins,
- contact lenses,
- cotton balls,
- cotton swabs,
- cuff links,
- damaged coins,
- dirty coins,
- dog food,
- drill bits,
- ear plugs,
- Eisenhower silver dollars,
- finger nails,
- flash drives,
- foam objects,
- foreign coins,
- French fries,
- fruit snacks,
- gold fish,
- guitar picks,
- gum wrappers,
- gummy worms/bears,
- hair clips,
- jar lids,
- key chains,
- miniature dice,
- name tags,
- paper clips,
- pen caps,
- pine cone parts,
- pipe cleaners,
- playing cards,
- pop can tabs,
- popsicle sticks,
- quilt squares,
- rubber bands,
- rubber lid seals,
- screw driver bits,
- SD cards,
- tie tacks,
- tire caps,
- tooth picks,
- tree bark,
- wall hooks,
- watch bands,
In case you’re curious, Wikipedia explains about the steel 1943 US penny (“the only regular-issue United States coin that can be picked up with a magnet”) and the large Eisenhower dollar (“the new dollars failed to circulate to any degree, except in and around Nevada casinos, where they took the place of privately issued tokens”).
The obsession with “dirty money” warrants deeper analysis than I can supply.
It's been a little while since we checked in on the Islandora CLAW community sprints, but they've been ticking along every month since last November, knocking down tickets and gradually building up the next major version of Islandora. This last sprint was especially significant in two ways: we welcomed two new sprinters (Ed Fugikawa and Ben Rosner), and we closed the most tickets of any sprint since they began. You can check out those stellar results here.
The next sprint will run from May 16 - 27, kicking off with a meeting on Monday. If you would like to take part, please add your name to the sprint here. We've just gone through some heavy re-structuring of how Islandora CLAW's pieces are stored in GitHub, so there will be opportunities to participate on a documentation level (for those non-developers out there who still want to contribute directly).
Learn about the JSON-LD serialization of linked data and its various iterations.
“Needs assessment help establish the customer as the center of the service and bring the librarian and the library staff back to what is at the core of a library service: What do the library customers need?” (Dudden, 2007, p. 90)
As mentioned in my last post, Mackellar and Gerding, authors of ALA grant funding monographs, stress the importance of conducting a needs assessment as the first step in approaching a grant proposal. It may be painful at first, but once a thorough study has been made, the remaining grant proposal steps become easier. You become well-informed about the community you serve and identify current service gaps in your library. Not until you know your community’s needs will you be able to justify funding. Through my readings, I discovered that this includes your non-users as well as your current users. Remember, funders want to make sure people are helped by your project and therefore a guaranteed success.
In a nutshell, a needs assessment is, “A systematic process determining discrepancies between optimal and actual performance of a service by reviewing the service needs of a customer and stakeholders and then selecting interventions that allow the service to meet those needs in the fastest, most cost-effective manner” (Dudden, 2007, p. 61). According to Dudden, in her book Using Benchmarking, Needs Assessment, Quality Improvement, Outcome Measurement and Library Standards, there are 12 steps in conducting a needs assessment: (1) Define your purpose or question (2) Gather your team, (3) Identify stakeholders and internal and external factors, (4) Define the question (5) Determine resources available, (6) Develop a timeline (7) Define your customers (8) Gather data from identified sources, (9) Analyze the data, (10) Make a decision and a plan of action, (11) Report to administration and evaluate the needs assessment process, and (12) Repeat needs assessment in the future to see if the gap is smaller.
As librarians, we like to research something comprehensively before we dive into a project. Researching what others have done within their needs assessment project is an awesome strategy to get acquainted with the process and garner ideas. There are several approaches to gain information from a sample of your community via surveys, interviews, focus groups, observations, community forums/town meetings, suggestion boxes, and public records. If you bring in a technology-related project, your observation method may become a usability or user experience investigation, for example. I learned that it is important to use multi-forms of techniques together and then combine the results to formulate trustworthy data. I personally think that surveys are overly used, but I can live with it if used as one of many approaches in a study. Take for instance the case back in 2011 when Penn State wanted to build a knowledge commons (Lynn, 2011). Their project question or mission was to conduct a ten-month needs assessment in order to find out what new programming initiatives need creation and how the physical knowledge commons space should be configured in support of these endeavors. I was amazed to read that they used seven techniques to inform their decisions: conducted site visits to other library knowledge commons, reviewed the literature on this topic, conducted student and faculty focus groups, created an online survey focusing on the physical library space and resources, created a survey exclusively for incoming freshmen, evaluated knowledge common websites from other institutions, and evaluated work spaces (circulation desk, reference desk, office space, etc.). After each phase of the needs assessment was completed, they were able to prioritize space needs and draft a final report of their findings to administration and to the architectural firm. One thing mentioned in this case study article is that a needs assessment has secondary effects that are essential to the process – it markets the project immensely and also invokes support from all stakeholders. I am convinced that completing this process will get you one step closer to definite funding.
The Needs Assessment: Forum Unified Education Technology Suite
National Center for Education Statistics
IT Needs Assessment & Strategic Planning Surveys
Methods for Conducting and Educational Needs Assessment
Guidelines for Cooperative Extension System Professionals
by Paul F. Cawley, University of Idaho
Chapter 3: Assessing Community Needs and Resources
Community Tool Box, University of Kansas
Information Gathering Toolkit
Community Needs Assessment Survey Guide
Utah State University
Assessing Faculty’s Technology Needs
by Tena B. Crew
Using Needs Assessment as a Holistic Means for Improving Technology Infrastructure
by Joni E. Spurlin, edited by Diana G. Oblinger
Educause Learning Initiative
U.S. Department of Commerce
Google Map Maker
Dudden, R. F. (2007). Using benchmarking, needs assessment, quality improvement, outcome measurement, and library standards: A how-to-do-it manual. New York, NY: Neal-Schuman.
Lynn, V. (2011). A knowledge commons needs assessment. College & Research Libraries News, 72(8), 464-467.
MacKellar, P. H., & Gerding, S. K. (2010). Winning grants: A how-to-do-it manual for librarians with multimedia tutorials and grant development tools. New York, NY: Neal-Schuman.
Dena L. Luce
Dena L. Luce
From VIVO 2016 Conference organizers
We’ve extended our call for posters at VIVO16! If you missed the earlier deadline for posters, you have until May 23 to submit your poster abstract. The poster session lets you share your work in an informal, relaxed setting and chat with individual community members.
From Mike Conlon, VIVO project director
VIVO User Group Meeting. We had a great VIVO User Group meeting in Chicago at the Galter Health Science Library. You can find materials from the meeting on line here. The two day meeting sessions included:
From Dermot Frost, Chair, OR2016 Host Committee; David Minor, Matthias Razum, and Sarah Shreeves, Co-Chairs, OR2016 Program Committee
Yesterday, the National Institutes of Health (NIH) Director Dr. Francis Collins announced the appointment of Dr. Patricia Flatley Brennan as the next director of the National Library of Medicine (NLM), the world’s largest medical library and a component of NIH. Dr. Brennan comes to NLM from the University of Wisconsin-Madison, where she is the Lillian L. Moehlman Bascom Professor, School of Nursing and College of Engineering. She will be the first woman and first nurse to lead NLM. Dr. Brennan is expected to assume her post in August.
“Dr. Brennan brings her incredible experience of having cared for patients as a practicing nurse, improved the lives of home-bound patients by developing innovative information systems and services designed to increase their independence, and pursued cutting-edge research in data visualization and virtual reality,” said Dr. Collins.
NLM, based on the campus of NIH in Bethesda, Maryland, was founded in 1836 and has earned a reputation for innovation and public service. ALA has had the pleasure of working with a number of NLM staffers, and we look forward to collaborating with Dr. Brennan and her team.
The previous director, Dr. Donald Lindberg, led NLM from 1984 until his retirement in 2015. Among his many achievements was the founding of the National High-Performance Computing and Communications (HPCC) Office in 1992 and his service as its first director for three years. Establishing the HPCC Office was an important early milestone in the development, growth, and institutionalization of advanced information technology within and across federal agencies. I mention HPCC as it was my employer (though now evolved and re-named to the National Coordination Office for Networking Information Technology Research & Development) prior to coming to ALA.
The post New director named for the U.S. National Library of Medicine appeared first on District Dispatch.
Sunday June 26, 2016 from 3:00 pm to 4:00 pmSafiya Noble
Dr. Noble is an Assistant Professor in the Department of Information Studies in the Graduate School of Education and Information Studies at UCLA. She conducts research in socio-cultural informatics; including feminist, historical and political-economic perspectives on computing platforms and software in the public interest. Her research is at the intersection of culture and technology in the design and use of applications on the Internet.
All on Friday, June 24 from 1:00 pm – 4:00pm
Digital Privacy and Security: Keeping You and Your Library Safe and Secure in a Post-Snowden World
Presenters: Jessamyn West, Library Technologist at Open Library and Blake Carver, LYRASIS
Islandora for Managers: Open Source Digital Repository Training
Presenters: Erin Tripp, Business Development Manager at discoverygarden inc. and Stephen Perkins, Managing Member of Infoset Digital Publishing
Technology Tools and Transforming Librarianship
Presenters: Lola Bradley, Reference Librarian, Upstate University; Breanne Kirsch, Coordinator of Emerging Technologies, Upstate University; Jonathan Kirsch, Librarian, Spartanburg County Public Library; Rod Franco, Librarian, Richland Library; Thomas Lide, Learning Engagement Librarian, Richland Library
Top Technology Trends
Sunday June 26, 2016 from 1:00 pm to 2:30 pm
This regular program features our ongoing roundtable discussion about trends and advances in library technology by a panel of LITA technology experts. The panelists will describe changes and advances in technology that they see having an impact on the library world, and suggest what libraries might do to take advantage of these trends. Panelists will be announced soon. More information on Top Tech Trends go to: http://ala.org/lita/ttt
Imagineering – Science Fiction/Fantasy and Information Technology: Where We Are and Where We Could Have Been
Saturday June 25, 2016, 1:00 pm – 2:30 pm
Science Fiction and Fantasy Literature have a unique ability to speculate about things that have never been, but can also be predictive about things that never were. Through the lens provided by alternate history/counterfactual literature one can look at how the world might have changed if different technologies had been pursued. For examples what if instead of developing microprocessors computing depended on vacuum tubes or something fantastic like the harmonies in the resonance of crystals? Join LITA, the Imagineering Interest Group, and a panel of distinguished Science Fiction and Fantasy writers as they discuss what the craft can tell us about not only who we are today, but who, given a small set of differences, we could have been. The availability of authors can change, currently slated authors are:
- Charlie Jane Anders — All the Birds in the Sky
- Katherine Addison — The Goblin Emperor
- Catheryne Valente — Radiance
- Brian Staveley — The Providence of Fire
Friday June 24, 2016, 3:00 pm – 4:00 pm
LITA Open House is a great opportunity for current and prospective members to talk with Library and Information Technology Association (LITA) leaders and learn how to make connections and become more involved in LITA activities.
This year marks a special LITA Happy Hour as we kick off the celebration of LITA’s 50th anniversary. Make sure you join the LITA Membership Development Committee and LITA members from around the country for networking, good cheer, and great fun! Expect lively conversation and excellent drinks; cash bar. Help us cheer for 50 years of library technology.
I'd like to suggest answers to five questions related to the economics of long-term storage:
- How far into the future should we be looking?
- What do the economics of storing data for that long look like?
- How long should the media last?
- How reliable do the media need to be?
- What should the architecture of a future storage system look like?
Iain Emsley's talk at PASIG2016 on planning the storage requirements of the 1PB/day Square Kilometer Array mentioned that the data was expected to be used for 50 years. How hard a problem is planning with this long a horizon? Lets go back 50 years and see.
DiskIBM2314s (source)In 1966 as I was writing my first program disk technology was about 10 years old; the IBM 350 RAMAC was introduced in 1956. The state of the art was the IBM 2314. Each removable disk pack stored 29MB on 11 platters with a 310KB/s data transfer rate. Roughly equivalent to 60MB/rack. The SKA would have needed to add nearly 17M, or about 10 square kilometers, of racks each day.
R. M. Fano's 1967 paper The Computer Utility and the Community reports that for MIT's IBM 7094-based CTSS:
the cost of storing in the disk file the equivalent of one page of single-spaced typing is approximately 11 cents per month. It would have been hard to believe a projection that in 2016 it would be more than 7 orders of magnitude cheaper.
IBM2401s By Erik Pitti CC BY 2.0.The state of the art in tape storage was the IBM 2401, the first nine-track tape drive, storing 45MB per tape with a 320KB/s maximum transfer rate. Roughly equivalent to 45MB/rack of accessible data.
Your 1966 alter-ego's data management plan would be correct in predicting that 50 years later the dominant media would be "disk" and "tape", and that disk's lower latency would carry a higher cost per byte. But its hard to believe that any more detailed predictions about the technology would be correct. The extraordinary 30-year history of 30-40% annual cost per byte decrease, the Kryder rate, had yet to start.
Although disk is a 60-year old technology, a 50-year time horizon for a workshop on the Future of Storage may seem too long to be useful. But a 10-year time horizon is definitely too short to be useful. Storage is not just a technology, but also a multi-billion dollar manufacturing industry dominated by a few huge businesses, with long, hard-to-predict lead times.
Seagate 2008 roadmapTo illustrate the lead times, here is a Seagate roadmap slide from 2008 predicting that perpendicular magnetic recording (PMR) would be replaced in 2009 by heat-assisted magnetic recording (HAMR), which would in turn be replaced in 2013 by bit-patterned media (BPM).
In 2016, the trade press is reporting that:
Seagate plans to begin shipping HAMR HDDs next year.ASTC 2016 roadmap Here is a recent roadmap from ASTC showing HAMR starting in 2017 and BPM in 2021. So in 8 years HAMR has gone from next year to next year, and BPM has gone from 5 years out to 5 years out. The reason for this real-time schedule slip is that as technologies get closer and closer to the physical limits, the difficulty and above all cost of getting from lab demonstration to shipping in volume increases exponentially.
A recent TrendFocus report suggests that the industry is preparing to slip the new technologies even further:
The report suggests we could see 14TB PMR drives in 2017 and 18TB SMR drives as early as 2018, with 20TB SMR drives arriving by 2020.I believe this is mostly achieved by using helium-filled drives to add platters, and thus cost, not by increasing density above current levels.
Tape Historically, tape was the medium of choice for long-term storage. Its basic recording technology is around 8 years behind hard disk, so it has a much more credible technology road-map than disk. But its importance is fading rapidly. There are several reasons:
- Tape is a very small market in unit terms:Just under 20 million LTO cartridges were sent to customers last year. As a comparison let's note that WD and Seagate combined shipped more than 350 million disk drives in 2015; the tape cartridge market is less than 0.00567 per cent of the disk drive market in unit terms
- In effect there is now a single media supplier, raising fears of price gouging and supply vulnerability. The disk market has consolidated too, but there are still two very viable suppliers.
- The advent of data-mining and web-based access to archives make the long access latency of tape less tolerable.
- To maximize the value of the limited number of slots in the robots it is necessary to migrate data to new, higher-capacity cartridges as soon as they appear. This has two effects. First, it makes the long data life of tape media less important. Second, it consumes a substantial fraction of the available bandwidth, up to a quarter in some cases.
Exabytes shippedFirst, the conventional wisdom as expressed by the operators of cloud services and the disk industry, and supported by these graphs showing how few exabytes of flash are shipped in comparison to disk. Although flash is displacing disk from markets such as PCs, laptops and servers, Eric Brewer's fascinating keynote at this year's FAST conference started from the assertion that the only feasible medium for bulk data storage in the cloud was spinning disk.
NAND vs. HDD capex/TBThe argument is that flash, despite its many advantages, is and will remain too expensive for the capacity layer. The graph of the ratio of capital expenditure per TB of flash and hard disk shows that each exabyte of flash contains about 50 times as much capital as an exabyte of disk. Because:
factories to build 3D NAND are vastly more expensive than plants that produce planar NAND or HDDs -- a single plant can cost $10 billionno-one is going to invest the roughly $80B needed to displace hard disks because the investment would not earn a viable return.
WD unit shipmentsSecond, the view from the flash advocates. They argue that the fabs will be built, because they are no longer subject to conventional economics. The governments of China, Japan, and other countries are stimulating their economies by encouraging investment, and they regard dominating the market for essential chips as a strategic goal, something that justifies investment. They are thinking long-term, not looking at the next quarter's results. The flash companies can borrow at very low interest rates, so even if they do need to show a return, they only need to show a very low return.
Seagate unit shipmentsIf the fabs are built, the increase in supply will increase the Kryder rate of flash. This will increase the trend of storage moving from disk to flash. In turn, this will increase the rate at which disk vendor's unit shipments decrease. In turn, this will decrease their economies of scale, and cause disk's Kryder rate to go negative. The point at which flash becomes competitive with disk moves closer in time. Disk enters a death spiral.
The result would be that the Kryder rate for the capacity market, which has been very low, would get back closer to the historic rate sooner, and thus that storing bulk data for the long term would be significantly cheaper. But this isn't the only effect. When Data Domain's disk-based backup displaced tape, greatly reducing the access latency for backup data, the way backup data was used changed. Instead of backups being used mostly to cover media failures, they became used mostly to cover operator errors.
Similarly, if flash were to displace disk, the access latency for stored data would be significantly reduced, and the way the data is used would change. Because it is more accessible, people would find more ways to extract value from it. The changes induced by reduced latency would probably significantly increase the perceived value of the stored data, which would itself accelerate the turn-over from disk to flash.
I hope everyone is familiar with the concept of "stranded assets", for example the idea that if we're not to fry the planet oil companies cannot develop many of the reserves they carry on their books. Both views of the future of disk vs. flash involve a reduction in the unit volume of drives. The disk vendors cannot raise prices significantly, doing so would accelerate the reduction in unit volume. Thus their income will decrease, and thus their ability to finance the investments needed to get HAMR and then BPM into the market. The longer they delay these investments, the more difficult it becomes to afford them. Thus it is likely that HAMR and BPM will be "stranded technologies", advances we know how to, but never actually deploy.
Alternate MediaMedia trends to 2014Robert Fontana of IBM has an excellent overview of the roadmaps for tape, disk, optical and NAND flash (PDF) through the early 2020s. Clearly no other technology will significantly impact the storage market before then.
SanDisk shipped the first flash SSDs to GRiD Systems in 1991. Even if flash impacts the capacity market in 2018, it will have been 27 years after the first shipment. The storage technology that follows flash is probably some form of Storage Class Memory (SCM) such as XPoint. Small volumes of some forms of SCM have been shipping for a couple of years. Like flash, SCMs leverage much of the semiconductor manufacturing technology. Optimistically, one might expect SCM to impact the capacity market sometime in the late 2030s.
I'm not aware of any other storage technologies that could compete for the capacity market in the next three decades. SCMs have occupied the niche for a technology that exploits semiconductor manufacturing. A technology that didn't would find it hard to build the manufacturing infrastructure to ship the thousands of exabytes a year the capacity market will need by then.
Economics of Long-Term StorageCost vs. Kryder rateHere is a graph from a model of the economics of long-term storage I built back in 2012 using data from Backblaze and the San Diego Supercomputer Center. It plots the net present value of all the expenditures incurred in storing a fixed-size dataset for 100 years against the Kryder rate. As you can see, at the 30-40%/yr rates that prevailed until 2010, the cost is low and doesn't depend much on the precise Kryder rate. Below 20%, the cost rises rapidly and depends strongly on the precise Kryder rate.
2014 cost/byte projectionAs it turned out, we were already well below 20%. Here is a 2014 graph from Preeti Gupta, a Ph.D. student at UC Santa Cruz, plotting $/GB against time. The red lines are projections at the industry roadmap's 20% and my less optimistic 10%. It shows three things:
- The slowing started in 2010, before the floods hit Thailand.
- Disk storage costs in 2014, two and a half years after the floods, were more than 7 times higher than they would have been had Kryder's Law continued at its usual pace from 2010, as shown by the green line.
- If the industry projections pan out, as shown by the red lines, by 2020 disk costs will be between 130 and 300 times higher than they would have been had Kryder's Law continued.
Long-Lived Media?Every few months there is another press release announcing that some new, quasi-immortal medium such as 5D quartz or stone DVDs has solved the problem of long-term storage. But the problem stays resolutely unsolved. Why is this? Very long-lived media are inherently more expensive, and are a niche market, so they lack economies of scale. Seagate could easily make disks with archival life, but they did a study of the market for them, and discovered that no-one would pay the relatively small additional cost. The drives currently marketed for "archival" use have a shorter warranty and a shorter MTBF than the enterprise drives, so they're not expected to have long service lives.
The fundamental problem is that long-lived media only make sense at very low Kryder rates. Even if the rate is only 10%/yr, after 10 years you could store the same data in 1/3 the space. Since space in the data center racks or even at Iron Mountain isn't free, this is a powerful incentive to move old media out. If you believe that Kryder rates will get back to 30%/yr, after a decade you could store 30 times as much data in the same space.
The reason why disks are engineered to have a 5-year service life is that, at 30-40% Kryder rates, they were going to be replaced within 5 years simply for economic reasons. But, if Kryder rates are going to be much lower going forward, the incentives to replace drives early will be much less, so a somewhat longer service life would make economic sense for the customer. From the disk vendor's point of view, a longer service life means they would sell fewer drives. Not a reason to make them.
Additional reasons for skepticism include:
- The research we have been doing in the economics of long-term preservation demonstrates the enormous barrier to adoption that accounting techniques pose for media that have high purchase but low running costs, such as these long-lived media.
- The big problem in digital preservation is not keeping bits safe for the long term, it is paying for keeping bits safe for the long term. So an expensive solution to a sub-problem can actually make the overall problem worse, not better.
- These long-lived media are always off-line media. In most cases, the only way to justify keeping bits for the long haul is to provide access to them (see Blue Ribbon Task Force). The access latency scholars (and general Web users) will tolerate rules out off-line media for at least one copy. As Rob Pike said "if it isn't on-line no-one cares any more".
- So at best these media can be off-line backups. But the long access latency for off-line backups has led the backup industry to switch to on-line backup with de-duplication and compression. So even in the backup space long-lived media will be a niche product.
- Off-line media need a reader. Good luck finding a reader for a niche medium a few decades after it faded from the market - one of the points Jeff Rothenberg got right two decades ago.
- Media failures are only one of many, many threats to stored data, but they are the only one long-lived media address.
- Long media life does not imply that the media are more reliable, only that their reliability decreases with time more slowly.
Double the reliability is only worth 1/10th of 1 percent cost increase. ...
Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.
The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).
Moral of the story: design for failure and buy the cheapest components you can. :-)Eric Brewer made the same point in his 2016 FAST keynote. Because for availability and resilience against disasters they need geographic diversity, they have replicas from which to recover. So spending more to increase media reliability makes no sense, they're already reliable enough. This is because the systems that surround the drives have been engineered to deliver adequate reliability despite the current unreliability of the drives. Thus engineering away the value of more reliable drives.
Future Storage System Architecture?What do we want from a future bulk storage system?
- An object storage fabric.
- With low power usage and rapid response to queries.
- That maintains high availability and durability by detecting and responding to media failures without human intervention.
- And whose reliability is externally auditable.
The following year Ian Adams and Ethan Miller of UC Santa Cruz's Storage Systems Research Center and I looked at this possibility more closely in a Technical Report entitled Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes. We showed that it was indeed plausible that, even at then current flash prices, the total cost of ownership over the long term of a storage system built from very low-power system-on-chip technology and flash memory would be competitive with disk while providing high performance and enabling self-healing.
Two subsequent developments suggest we were on the right track. First, Seagate's announcement of its Kinetic architecture and Western Digital's subsequent announcement of drives that ran Linux, both exploited the processing power available from the computers in the drives that perform command processing, internal maintenance operations, and signal processing to delegate computation from servers to the storage media, and to get IP communication all the way to the media, as DAWN suggested. IP to the drive is a great way to future-proof the drive interface.
FlashBlade hardwareSecond, although flash remains more expensive than hard disk, since 2011 the gap has narrowed from a factor of about 12 to about 6. Pure Storage recently announced FlashBlade, an object storage fabric composed of large numbers of blades, each equipped with:
- Compute: 8-core Xeon system-on-a-chip, and Elastic Fabric Connector for external, off-blade, 40GbitE networking,
- Storage: NAND storage with 8TB or 52TB raw capacity of raw capacity and on-board NV-RAM with a super-capacitor-backed write buffer plus a pair of ARM CPU cores and an FPGA,
- On-blade networking: PCIe card to link compute and storage cards via a proprietary protocol.
DAWN exploits two separate sets of synergies:
- Like FlashBlade, DAWN moves the computation to where the data is, rather then moving the data to where the computation is, reducing both latency and power consumption. The further data moves on wires from the storage medium, the more power and time it takes. This is why Berkeley's Aspire project's architecture is based on optical interconnect technology, which when it becomes mainstream will be both faster and lower-power than wires. In the meantime, we have to use wires.
- Unlike FlashBlade, DAWN divides the object storage fabric into a much larger number of much smaller nodes, implemented using the very low-power ARM chips used in cellphones. Because the power a CPU needs tends to grow faster than linearly with performance, the additional parallelism provides comparable performance at lower power.
Storage systems are extremely reliable, but at scale nowhere near reliable enough to mean data loss can be ignored. Internal auditing, in which the system detects and reports it own losses, for example by hashing the stored data and comparing the result with a stored hash, is important but is not enough. The system's internal audit function will itself have bugs, which are likely to be related to the bugs in the underlying functionality causing data loss. Having the system report "I think everything is fine" is not as reassuring as one would like.
Auditing a system by extracting its entire contents for integrity checking does not scale, and is likely itself to cause errors. Asking a storage system for the hash of an object is not adequate, the system could have remembered the object's hash instead of computing it afresh. Although we don't yet have a perfect solution to the external audit problem, it is clear that part of the solution is the ability to supply a random nonce that is prepended to the object's data before hashing. The result is different every time, the system cannot simply remember it.
AcknowledgementsI'm grateful to Seagate for (twice) allowing me to pontificate about their industry, to Brian Berg for his encyclopedic knowledge of the history of flash, and Tom Coughlin for illuminating discussions and the graph of exabytes shipped. This isn't to say that they agree with any of the above.
FOR IMMEDIATE RELEASE
Duluth, Georgia–May 12, 2016
Equinox and Bintec conduct successful integration testing between meeScan and Evergreen
Equinox is pleased to announce successful integration testing between the meeScan self checkout system provided by Bintec Library Services and the Evergreen open source ILS. Additional information regarding how to configure Evergreen to work with meeScan will be made available to the Evergreen community.
Galen Charlton, Added Services Manager at Equinox, said, “One of the strengths of Evergreen is its ability to integrate with other library software. By performing interoperability testing with firms such as Bintec, Equinox helps to identify and resolve technical roadblocks before they become an issue for libraries.”
Peter Trenciansky, Director for Bintec Library Services commented “We are very excited to offer meeScan to Evergreen libraries around the world. Our service provides a modern and fresh way to check out items while eliminating traditional challenges associated with self-service kiosks. The collaboration between Equinox and Bintec is another milestone towards our goal of contributing to the development of a new generation of welcoming and engaging libraries.”
Bintec Library Services Inc. is a technology company dedicated to the development of solutions that provide added value to libraries and enrich the user experience. The knowledgeable team behind Bintec delivers software and hardware solutions encompassing electromagnetic (EM) security, radio-frequency identification (RFID) technologies, ILS systems integration, large cloud-based architecture and mobile app development. The company is based in Toronto, Canada and services customers across North America and other parts of the world. To find out more visit binteclibraryservices.com
meeScan is a cloud based self checkout system that lets patrons use their smartphones to check out books anywhere in their library. The system uses the built-in camera of the patron’s smartphone or tablet to scan the item barcode. With support for both EM and RFID, it is a full featured alternative to conventional self-check kiosks at a fraction of the cost. meeScan is extremely user friendly, it is simple to setup and requires virtually zero maintenance by the library. Find out more at meescan.com
About Equinox Software, Inc.
Equinox was founded by the original developers and designers of the Evergreen ILS. We are wholly devoted to the support and development of open source software in libraries, focusing on Evergreen, Koha, and the FulfILLment ILL system. We wrote over 80% of the Evergreen code base and continue to contribute more new features, bug fixes, and documentation than any other organization. Our team is fanatical about providing exceptional technical support. Over 98% of our support ticket responses are graded as “Excellent” by our customers. At Equinox, we are proud to be librarians. In fact, half of us have our ML(I)S. We understand you because we *are* you. We are Equinox, and we’d like to be awesome for you. For more information on Equinox, please visit http://www.esilibrary.com.
Evergreen is an award-winning ILS developed with the intent of providing an open source product able to meet the diverse needs of consortia and high transaction public libraries. However, it has proven to be equally successful in smaller installations including special and academic libraries. Today, almost 1400 libraries across the US and Canada are using Evergreen including NC Cardinal, SC Lends, and B.C. Sitka. For more information about Evergreen, including a list of all known Evergreen installations, see http://evergreen-ils.org.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
“Simplify, simplify.” — Henry David Thoreau, Walden.
Most of us comb through a lifelong collection of personal papers and photos either when we have plenty of free time (typically in retirement) or when we have to deal with the belongings of a deceased loved one. All too often the job seems so daunting and overwhelming that our natural response is to get discouraged and say, “I don’t know where to begin” or “It’s too much; I’ll do it some other time” or worse, “I’ll just get rid of it all.”
At the Library of Congress, archivists process every type of collection imaginable. They often acquire — along with scholarly and historical works — personal papers and mementos, things that had special meaning to the owner, not only letters and photos but also locks of hair, newspaper clippings and beverage-stained documents. One recent collection contained a piece of bark. Some collections arrive neatly organized and others arrive heaped into makeshift containers. How do professional archivists create order from clutter? Where do they start? And what we can we learn from their work and apply to our own personal archiving projects?
For this story, I spoke with Laura Kells and Meg McAleer, two senior archivists from the Library of Congress’s Manuscript Division. Both exude the good-natured patience and relaxed humor that comes from years of dealing with a constant inflow of often-disorganized paper and digital files. [Watch their presentation, titled “The Truth about Original Order, or What to Do When Your Collection Arrives in Trash Cans.”]
I found it striking that, throughout our interview, they rarely dictated how something must be done. Instead they offered well-seasoned advice about archiving but they left the decisions up to the individual. In the end, their main message was this: if you want to get through the project and not make yourself crazy and despondent over it, start simply, separate items broadly at first and, in the end, accept your final sorting decisions as “good enough.”Start Simply
First, approach your collection as a single unit of stuff. Don’t dwell on individual photos or letters yet. Think about the entire collection as a mass of related things. Kells said, “You’ll scare yourself if you think, ‘I have two hundred things.’ The project will seem bigger.” It is one collection.Clumps
Consider devoting a rainy weekend to pulling out your collection. At this point you will be surveying its broad landscape. Begin by sorting items from your collection into what McAleer and Kells expertly call “clumps.” This is your first pass, so just group things into general categories such as letters and photos. You decide on your categories. Be consistent but accept that there might be overlap between categories. If you want to categorize clumps by year, fine. Or phases of a person’s life. Or holidays. Or type of materials (letters, photos).
“What you try to do is identify the clumps that already exist,” McAleer said. “And hopefully clumping naturally occurs. For instance, you could have gotten all of your grandmother’s papers after her death. That’s a clump. Trips? That’s a clump. Christmas stuff, that’s a clump. Photographs, that’s a clump.”
WARNING: Don’t get sidetracked. Resist the temptation to savor any one thing right now. “If you begin engaging with individual items at this point, then you’re sunk,” McAleer said. “You can paralyze yourself by over scrutinizing.” Whatever it is, no matter how wonderful it is, put it in its rightful clump and come back to it later.Be Realistic About Work Space and Time
There are two important things you should address early on: space and time. Your collection will take up space in your house as you sift through it, so plan your work space realistically. Set aside a temporary work space if you can – a room or a corner of a room — or plan to unpack and re-pack your collection for each sorting session. “In most people’s homes they don’t have a great deal of space to have things sitting out for a long time,” McAleer said. “At some point you will really need that dining room table for dinner.”
Don’t eat or drink in the work area. Kells said, “Just step away. When you’ve got big piles and you reach your drink and you knock it over, you’ll be real sorry if you spill your coffee all over your documents or your photographs.” McAleer said, “It happens in an instant. None of us anticipate it. It can be tragic.”
As for time, McAleer said, “Do not start out with a commitment that every single item within this collection is going to be organized perfectly.” Kells said, “That could make you feel a sense of defeat. Just start out by saying, ‘I want to improve the organization.’ ”Nothing is Perfect
After sorting the collection into clumps, you could put everything into envelopes or other containers and be happy about your progress. “You can feel good because you’ve done something,” Kells said. “As long as there is some order. It’s probably chaotic within those clumps but just by identifying and labeling and boxing those clumps, you have some intellectual control over it that you didn’t have before.”
You could leave the project at that or you could continue on, from a rough sort to a refined sort. “If you have the energy, you just work in layers and keep improving it,” Kells said. “Then you can gauge how much time you have and how much space you have to do this. Anything new is gravy.”
For example, you could sort letters by date or by topic or sort photos by location or by who is in each photo. “It is a matter of constant refinement, where you’re going to be getting more and more information about the content over time,” McAleer said. “It’s like building a house. You start out building the structure of a house and then you add furniture into each room.”
It’s a good time to throw things away too. Decide if you really want to save paid bills, cancelled checks or grocery lists. McAleer said, “In the long run, just save the things that you’re going to value over time. It is up to you how far down you drill in terms of arranging the material. At some point you have to say to yourself, ‘This is so much better than it was. I know what I have. This may be as good as it gets. I have put some organization on it and that is going to make it more accessible.’ ”Scanning
Scanning is a terrific way to preserve and share digital versions of papers and photographs. The Library of Congress explains the basics of scanning in a blog post and an instructional video. You can also add descriptions into your digital photos, in much the same way as you would write on the back of a paper photo.
Scan newspaper clippings too. Newspaper ages poorly, when folded it can rip at the creases and it can crumble when being handled. Print a scanned copy if you want a hard copy. Computer paper ages better than newspaper does.
Another reason to scan photos is to rescue them. Photos may fade due to their chemical composition or because they may have been in direct sunlight for a long time. (Institutions rotate their collections regularly to avoid the damage from light and environmental exposure.) “Resist the idea of framing things,” McAleer said. “They really should not be exposed to light for too long. You can make a copy and frame that but keep the original out of the light.”
If you have hundreds of photos, think about if you really want to scan them all. That may add pressure on you. Again, be realistic with your time. Consider being selective and only scanning the special photos or documents that you value the highest. Most institutions don’t have the resources to scan everything so they digitize their collections selectively; maybe you should too.Disks and Digital Storage Media
If the collection includes computer disks, scan the disks for viruses before you open the contents. Don’t put everything else on your computer at risk. Before opening a file, make a duplicate of it and open the duplicate to avoid any accidental modifications. That way you’ll still have the original if you mess something up.
If the disks contain files in an old format that you can’t access, but you believe those files might contain something of interest or value, archive those files with your other digital stuff. You can either find a professional service to open them or someday you might find a resource that will enable you to open them.Digital Preservation
Save your digital files properly. Organize the scanned files on your computer and back them up on a separate drive. If you acquire disorganized computer files, organize the clutter as best you can within a file system. To help you find specific files again, you can rename those files, without affecting their contents.Archiving a Life Story
Organizing personal collections can be a way to tell a story about your life or the life of a loved one. “I don’t think people should be afraid to curate these collections,” McAleer said. “Zooming in and narrowing in on one particular story or one particular item can actually have a little bit more impact.”
Kells said, “Old letters give you a sense of the people, even if there’s not much to the letters and cards. It shows you what they valued. What they did, what they ate, what holidays they celebrated.” McAleer said, “Letters provide a voice and by grouping them together you release a kind of narrative.”
What was in her wallet or purse? What did she keep near to her? “There are probably certain things in a drawer somewhere that tell a story,” Kells said. “You could create a time capsule about a loved one.
“Not everyone values this stuff but if you archive it, it will be there for somebody in a later generation. There may be one person who really cares about their family history and will be glad to have it.”
This guest post was written by Jasmine Burns, Image Technologies and Visual Literacy Librarian, Indiana University and DPLA + DLF ‘Cross-Pollinator.’ (Twitter: @jazz_with_jazz)
Thanks to the generous support of the DPLA + DLF Cross-Pollinator Grant, I spent two fully-packed days wandering through some of the most beautiful (both architecturally and intellectually) institutions in Washington DC. DPLAfest was perfectly self-described: a festival of workshops, conversations, and collaborations between hundreds of librarians, authors, coders, publishers, educators, and more. This community that converged on Capitol Hill left me feeling inspired and exhausted, as I returned home with a laundry list of new ideas and long-term goals.
My initial interest in attending DPLAfest was to gain a closer glimpse into the large and growing community of the Digital Public Library of America. I graduated from an MLIS program last May and immediately started my first professional position in an academic library as the Image Technologies and Visual Literacy Librarian. As an emerging professional, I am still navigating the transient landscape of useful and applicable tools, pedagogies, and resources that are relevant to the needs of my campus community. The programming at DPLAfest seemed to combine many of the topics and areas that I have been utilizing as a visual resources professional. The opportunity to dig much deeper into these resources with the mission of creating collaborations and connections with the DLF community was an ideal framework for my experience in Washington.
Copyright + digital libraries. Packed room! #DPLAfest
— Jasmine Burns (@Jazz_with_Jazz) April 14, 2016
The first day of the fest kicked off at the Library of Congress with breakfast and coffee (!!), the debut of RightsStatements.org (VERY exciting in library-land), the release of the 100 Primary Source Sets (which I promptly emailed to my K-12 teacher friends), and the first ever selfie to be added to DPLA! For the remainder of the day I attended a workshop on geovisualization, sat in on a fantastic conversation about Authorship in the Digital Age, learned all about GIFs and how to make them (by far my favorite!), attended a totally packed, standing-room only session on copyright, and finally got to hear about the fantastic public domain drop at NYPL Labs.
Somewhere in between the action, I even had the chance to pop over to the Madison building to catch up with some of my old co-workers at the Prints and Photographs Division and eat lunch in the Great Hall! After running to my hotel to catch my breath, I meandered down to the National Archives, where I had drinks and hors d’oeuvre with the Declaration of Independence and got completely lost in the exhibits (both literally and figuratively). I was so busy geeking out about how the exhibits actually looked like archives (solander boxes and everything) that I forgot to do much socializing at all!
The next morning, I headed back to the National Archives to start round two (and coincidentally ran into my cousin on the street, I guess DC is more of a small town than I thought!). Day two started with a much appreciated breakfast buffet, and a session showcasing some fabulous digital projects. Next, I learned everything I ever wanted to know about IIIF, listened in on presentations about API Development, and rounded out the whole shebang with a train ride back to my family in Virginia, all while participating in the #DPLAfest tweetstorm.
— Jasmine Burns (@Jazz_with_Jazz) April 14, 2016
This was the first conference I have attended where I wasn’t presenting, organizing, or attending committee meetings. I felt like I could sit back, absorb the content, and tweet away to my heart’s desire. I had never had the time to live-tweet a conference, and this was my first time archiving my thoughts in 140 character chunks. I felt that the most important benefits of the conference were moments when I was able to recognize the human element behind the digital resources that I use all the time by putting a face behind a platform (specifically NYPL Labs, IIIF, DPLA Developers, etc). It is not often the case that I leave a conference wishing that it had been longer or that I could have spoken to more people, but DPLAfest exceeded many of my expectations from the start, and I am grateful to DLF for this trip.