You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 5 days 22 hours ago

In the Library, With the Lead Pipe: A radical publishing collective: the Journal of Radical Librarianship

Wed, 2015-03-25 10:30

From Flickr user, Julian Stallbrass, licensed under CC BY 2.0. Image has been cropped.

In Brief: the Journal of Radical Librarianship is a new open-access journal publishing scholarly work in the field of radical librarianship. The focus on critical approaches to librarianship and anti-marketisation of information is reflected not only in our subject matter but in our publishing model, our licensing model, and our organisational practices. We hope to foster open and engaging discussions about radical approaches to librarianship and information studies.

There’s a growing amount of discussion in the UK about ‘radical librarianship’: a concept that covers a range of ideas about political engagement in librarianship and information studies. 2015 sees the launch of the Journal of Radical Librarianship: a new publication attempting to capture some of that discussion and disseminate it to a wider audience in an open and critically rigorous way. Fundamentally we want to promote discussion about the political nature of librarianship and to discuss radical alternatives to current practices and prevailing thinking in the profession.

A radical approach

Broadly speaking, ‘radical librarianship’ could be said to be a focus on the ethical roots of librarianship. Some of these ethical considerations are articulated by authors like Froehlich1 and include respect for the individual and individual freedom, social responsibility, and organisational obligations. Central to radical librarianship’s ethics is presenting an alternative to the prevalent trend towards marketisation of libraries. Over the past few decades, libraries have shifted towards a market-oriented approach: one that emphasises marketised solutions to service provision (that is, solutions used in the private-sector free market environment).

For example, Clark and Preater (2014) point to the use of market-oriented language in professional discourse: “As a profession we have broadly accepted the idea of members or users as “customers” or “consumers”, and accepted the need to adopt market strategies to meet their needs… This is so accepted that a rejection of this approach, for example rejecting the label of “customer”, has become seen to be old-fashioned and outdated.”2 LIS ‘thought leaders’ and writers tell libraries to market themselves, to adopt the practices of successful retail chains. Social media pundits tell librarians to think about their ‘personal brand’ and to manage their online presence as if it were a social marketing campaign. An Independent Library Report for England published in December 2014 recommended to the UK Government that libraries “can only be saved if they become more like coffee shops with wi-fi, sofas and hot drinks”3. The recommendation that libraries adopt private sector service practices like this is present in a lot of professional discourse in librarianship and has been aptly chronicled in recent years by Public Libraries News and Informed.

This trend must be seen within the context of the prevailing social framework of neoliberalism. Neoliberalism can be defined as the belief “that markets are inherently efficient and that the state and public sector have no essential role to play in economic development apart from facilitating the expansion, intensification and primacy of market relations.”4 It is an emphasis on laissez-faire capitalism and the free market and the theory was a formative tenet in the economic policies of Conservative Prime Minister Margaret Thatcher in the UK and Republican President Ronald Reagan in the USA during the 1980s5. As organisations – and governments – across society have adopted neoliberal practices, libraries have followed unquestioningly. The move towards a market-oriented approach has been framed in LIS professional discourse as not just a progressive approach but as necessary to the survival of libraries. Following the trend in professional literature, librarians at management level have shaped their library’s practices to fit the neoliberal theory. The prevailing (and ‘progressive’) view implicitly “advocates a belief there is a market relationship between the service and the user, with barriers placed between the two, and reduces the relationship between libraries and users to a transactional one with the library supplying information – viewed as a commodity in a market setting.”6 Libraries have been encouraged to think of their objectivity as political neutrality and librarians have been discouraged from thinking about the political implications of their work.

Radical librarianship is an alternative position. It acknowledges that libraries are and always have been inherently political7 rather than – as has been the accepted view – politically and ideologically neutral8. It argues that the ethical roots of librarianship are openness, free access to information, and a strong community spirit – principles we apply differently to the neoliberal appropriations of those terms – and that practice in librarianship should be true to these roots. It attempts to present a real alternative to the current orthodoxy in library discourse: a discourse that sits within a wider dominant ideology of capitalism.

From Flickr user, library_mistress, licensed under CC BY-NC-ND 2.0.

In the UK, this has manifested in the Radical Librarians Collective, a loose affiliation of like-minded library and information workers who come together to challenge the marketisation of libraries and the prevailing neoliberal position. The Radical Librarians Collective facilitates communication between like-minded people and hosts non-hierarchical events for discussion and social interaction.

At a Radical Librarians Collective meeting in London, UK, in May 2014, a group raised the idea of a publication space for writing on the subject of radical librarianship. Publishing work on radical librarianship in the form of an academic journal would be a way to bring theory and practice together. “At this stage, the most significant victory for the radical alternative is to open dialogue about the alternatives. Without dialogue, without alternatives being voiced and discussed, there is no hope for a radical alternative.”9 The Journal of Radical Librarianship will facilitate that dialogue.

A radical journal

The Journal of Radical Librarianship is a new open-access journal publishing a combination of peer-reviewed scholarly writing and non-peer-reviewed commentary and reviews. We’re looking for work on the subject of radical librarianship and related areas. Broadly speaking, anything that investigates the political aspects of librarianship or takes a critical theory-based approach to LIS10. The journal is available at and is open for submissions.

Information ethics are core to the journal’s mission and so the team have made a conscious effort to adopt practices in keeping with our shared moral code. We have set up the journal in such a way as to support the exchange and reworking of knowledge: we want to make work as freely available to the public as possible and we want to encourage ‘remixing’ of the ideas in our published work11.

In practical terms, this commitment to an ethical publishing code has informed our hosting and our licensing. The journal is hosted on an Open Journal Systems platform: an open-source piece of software from the Public Knowledge Project designed to facilitate open-access publishing for journals and other publications. Works in the Journal of Radical Librarianship are published under Creative Commons licenses: either Creative Commons Attribution 4.0 (CC BY) or Creative Commons Public Domain (CC0) licenses. The CC0 license has allowed this article to reuse some of the content of the journal’s introductory editorial by Stuart Lawson12. Although under CC0 others can freely use the content without attribution to the source, articles can also be referenced normally with attribution in citations.

The focus on alternative approaches also informs our editorial team structure. The current team includes a range of people from different sectors with different experience. As we continue, we will have a fluid editorial team with a non-hierarchical structure easily receptive to change. From the beginning, we’ve focused on collective decision-making with no single person able to make big decisions without consulting the rest of the team. Through use of online communication and collective decision-making software Loomio, we’ve made decisions that reflect compromise and direct diplomacy and that don’t give authority to any individual team member.

Though centred in the UK, the Journal of Radical Librarianship has an international scope and we encourage submissions from anywhere in the world. We hope to publish work in multiple languages as a recognition that, in order to share knowledge as widely as possible, we should not preference one language over others. Submissions are accepted in any language.

Fundamentally the Journal of Radical Librarianship aims to provide a publication space for alternatives in librarianship discourse. In the LIS publishing landscape, there is little space for writing that challenges – or falls outside the assumptions of – society’s dominant neoliberal ideology. While much library publishing ignores the political context of library and information studies and presents libraries as politically neutral institutions, the Journal of Radical Librarianship situates LIS within the political framework of society and examines library issues from that perspective.

The Journal of Radical Librarianship is not unique in LIS publishing: several other publications produce work with similar scope. We acknowledge their value and hope to work together as complementary rather than competitive publications. Progressive Librarian is a journal produced by the US-based Progressive Librarians Guild; Library Juice Press is a publisher of a number of books on “theoretical and practical issues in librarianship from a critical perspective”; Collaborative Librarianship is a journal publishing work on cooperative librarianship. In the Library with the Lead Pipe can also be included in this list for its commitment to open-access and progressive publication practices and we’d like to thank them for giving us the space to write about the Journal of Radical Librarianship. This is just a selection of English-language publications in similar areas and demonstrates that there is an ecosystem of library and information authors interested in exploring radical and critical approaches. We want to position the Journal of Radical Librarianship as an attempt to enrich the existing community by applying innovative scholarly approaches to the field.

From Flickr user, ijclark, licensed under CC BY 2.0.

A radical community

The Journal of Radical Librarianship is open for submissions. If you’re unsure about an idea for submission, please email

For peer-reviewed articles, we are looking for pieces that take radical approaches to subjects such as: information literacy, politics and social justice, scholarly communication, equality and diversity, library history, management and professionalism, and political economy of information and knowledge. Our editorial team is flexible and we will look at work outside those boundaries.

We’re also looking for non-peer-reviewed pieces. We accept editorials and commentary on relevant political issues or library issues. We’re also accepting reviews of media relevant to the scope of the journal. If you’re interested in becoming a reviewer, please contact

We want a discussion. Only by working together and discussing subjects out in the open in a free and unrestrained way will we be able to bring about change. We want people to read the work we publish, think about it, argue about it, disagree with it, use it in their working lives. We want people to think critically and act radically. We want people to see the alternatives. We want to talk about libraries. We want to talk about politics. We want to talk about radical librarianship.


Thanks to the editorial team of the Journal of Radical Librarianship for contributions and feedback on this article and for all the hard work involved in setting up and running the journal. Thanks to Stuart Lawson for his sterling work on getting us up and running. We’d collectively like to thank the editorial team at In the Library with the Lead Pipe for giving us the space to contribute this piece and for their excellent editorial input.

  1. Froelich, T., 1998. ‘Ethical Considerations Regarding Library Nonprofessionals: Competing Perspectives and Values’, Library trends, 46 (3), pp. 444-466
  2. Clark, I., and Preater, A., 2014. ‘Creaters not consumers: visualising the radical alternative for libraries’, Infoism, 2014-11-13
  3. Clark, N., 2014. ‘The great British library betrayal: Closures bring national network to brink of ‘absolute disaster’, reveals official inquiry’, The Independent, 2014-12-17
  4. Lowes, D. E., 2006. The anti-capitalist dictionary: movements, histories and motivations. London: Zed Books, p. 170.
  5. Jones, C., et al., 2005. For business ethics. London: Routledge, p. 100.
  6. Clark and Preater, op. cit.
  7. Giroux, H. A., 2004. ‘Neoliberalism and the Demise of Democracy: Resurrecting Hope in Dark Times’, Dissident Voice, 2004-08-07,
  8. Lewis, A. M., 2008. Introduction. In: Alison M. Lewis (ed.), Questioning library neutrality: essays from progressive librarian. Duluth, MI: Library Juice Press, pp.1-4
  9. Clark and Preater, op. cit.
  10. Smith, L., 2014. ‘Radical Librarians Collective (Part Three): Critical Theory’, Lauren Smith, 2014-05-16
  11. Lessig, L., 2008. Remix: making art and commerce thrive in the hybrid economy. London: Bloomsbury Academic,
  12. Lawson, S., 2015. ‘Editorial’, Journal of Radical Librarianship, vol. 1,

Open Knowledge Foundation: Open Data Day 2015 Recap #4: Hacks, Meet Ups and Data Exploration in Africa!

Wed, 2015-03-25 08:39

Open Data Day 2015, which took place on February 21, was celebrated in hundreds of communities around the world. In this blog series, we have been highlighting the discussions and outcomes of a small selection of the hacks, data dives and meetups that were organised that day. In this fourth post in the series, we will be looking at a selection of events that took place in Africa, from Tunisia to South Africa and Nigeria to Kenya! If you want to learn more about what transpired in other parts of the world, check out our recaps posts on Asia, the Americas and Europe.

Open Data Day was rocking this year on the African continent, here is just a sample of some of the incredible events that were organised through open community members!


As far as we can tell, the award for largest open data day event in the world goes to the open data community in Yaoundé, Cameroon who managed to pile 2,000 people into the amphitheater at the University of Yaoundé to learn about open data and its potential to improve the lives of citizens in Cameroon. Furthermore, as if 2,000 people in an university amphitheater on a Saturday afternoon wasn’t impressive enough, the event had 5,000 registered participants and incredible online engagement.

The NetSquared Yaoundé community brought together students, open data experts and professors to listen and learn about the importance of open data and specifically the benefits that open data can bring education, economic development, citizen engagement and government transparency and accountability. In a keynote talk, the president of the College of Law and Political Science at the University of Yaoundé emphasised the importance open data for researchers in Cameroon and West Africa as a whole, highlighting that access to open data allows researchers to better understand the challenges they are facing and to developed evidence based and locally specific solutions.

Our hats are off to the open data community in Cameroon, what an exceptional result!


In Dar es Salaam, Tanzania, open data day was attended by technologists, programmers, hackers, students, activists and NGOs, all of whom came together to grow the local open knowledge community, introduce newcomers to civic hacking and demonstrate the incredible storytelling power of data and data driven projects. Overall, the event was a great success. Several open data projects were presented and a number of open data challenges were identified discussed (for example, the lack of open licence, FOIA, and reluctance of government agencies to share data). Together, participants, shared their strategies to overcome these challenges and brainstormed an effective best path forward.

Participants acknowledged the very real challenge that the open movement is facing in Tanzania as some civil society groups still fail to see the value of coming together to collectively raise a louder voice in demanding open government data. Ultimately, participants determined that the open movement in Tanzania would benefit from increased community building efforts and targeting new partners in order to push the open agenda forward with a larger, more joined-up, base. The participants also determined that there was a need to encourage government officials to attend events like this in the future.


In Nigeria, the open data community organised Benin City’s first open data hackathon with resounding success.

The goal of the event was simple, to raise awareness for open data by using the agricultural sector as an example and demonstrating how data was being used by entrepreneurs in the sector. By the end of the day, organisers hoped that they would generate a pool of ideas on how to stimulate innovation within the agricultural sector through data driven applications and were pleased to report that a number of ideas emerged from the day’s discussion. Keynote presentations were used to introduce key concepts and provide examples of open data. Subsequently, participants were asked to get their hands dirty and actually work with data and think about solutions to challenges within the sector.

Open Data Day participants in Benin City worked with agriculture data from the state’s open data portal – The emerging projects are still in development phase and are not yet online but organisers were incredibly excited to see participants working with the data the government has been publishing!


Kampala, Uganda was bustling with open data day activities this year, so much so that hackathons were carried out across two weekends!

On February 28th, Reality Check organised an open data day event for journalists, researchers, entrepreneurs, students and technologists. This followed up on data storytelling from the week before. Participants learned about the strengths and weaknesses of tools like videos, pictures, charts as well as about the various tools that are available. Participants learned how to clean and visualise data it in order to use it for effective storytelling. New visualisations were created as well. .. Check it out here.

The participants discussed the progress that had been made over the past year, specifically focusing on what worked and what didn’t, in order to plan better for the year to come. The event was a great success and created such excitement among participants that many of them want to meet and discuss the subject on a monthly basis! In addition, a new google group as well as facebook page were opened to allow people can keep in touch and continue to engage with one another online as well as offline.

Finally, in February in Kampala, open data related activities were not limited to one off hackathons> check out the awesome Code for Africa Bootcamp organised the following week! February was a busy month for open data in Uganda!


In Tunisia, Clibre organised a meetup with a diverse group of participants to celebrate open data day 2015! There were a number of participants with little to no technical knowledge and the goal of the meetup was to expose all participants to the concept of open data and open government as well as to discuss the legal implications of open government in light of the new Tunisian constitution.

Following the presentations and discussion, a number of initiatives using Open Data in Tunisia were presented. M. Nizar Kerkeni, president of the CLibre Association, presented the new open data portal developed for the city of Sayada. At the end of the day, after having had the chance to play around with the new data portal, M. Ramzi Hajjaji from CLibre announced that an official web portal for the city of Monastir would be launched soon, inspired by the portal developed for the city of Sayada.

You can check out a full report of the day (in French) on their website, along with a number of videos (in Arabic)!


Open Data Day in Mombasa, Kenya was celebrated by organising four separate focus groups in order to explore the potential of open data in the following key areas: security, economy, education & conservation.

The conservation group looked at data on everything from marine and wildlife conservation to the conservation of historic buildings and sites in Mombasa. They looked for various datasets, analysed the data and created various data visualisations documenting pertinent trends. The group exploring the potential of open data on the economy built a prototype for an open tendering system for the government of Kenya, scoping the necessary features and potential impact. The participants exploring open data in education brainstormed various ways in which open data could help parents and students make more informed choices about where they go to school. Finally, in the security group, participants discussed and hacked on ways that they could use open data to combat corruption and fraud.

South Africa

In South Africa, Code for South Africa organised an Data Easter Egg Hunt! If you want to find out more, check out the awesome video they made on the day!

Roy Tennant: Bringing CRUD Operations to Linked Data

Wed, 2015-03-25 03:21

Those who have labored in the database orchard know about CRUD. It isn’t the stuff you scraped off your shoe, but a set of operations that must be supported for typical database maintenance:

  • C = Create a record.
  • R = Read a record.
  • U = Update a record.
  • D = Delete a record.

Then Linked Data came along, which is not typically stored in a standard database, but in a strange beast called a triplestore. To support the need to keep linked data up-to-date, and often through a web-based interface, the W3C is working on a standard for a Linked Data Platform that supports these types of operations via standard HTTP requests such as “GET”, “POST”, “PUT”, “DELETE”, and a new one, “PATCH”.

Your best bet on getting up to speed is to take a look at the W3C’s Linked Data Platform 1.0 Primer. The latest version is hot off the press.

DuraSpace News: ATTEND the Fedora 4 Workshop at Texas Conference on Digital Libraries

Wed, 2015-03-25 00:00

Austin, TX  Fedora team members Andrew Woods and David Wilcox will be presenting a Fedora 4 Training Workshop at the 2015 Texas Conference on Digital Libraries (TCDL) to be held April 27-28 in Austin, Texas. The Fedora 4 Training Workshop will be held on April 28 from 1:30 PM to 5:30 PM and is is free for conference attendees.

DuraSpace News: "Developments with DSpace and ORCID" Webinar Recording Available

Wed, 2015-03-25 00:00

DSpace 5 brings native support for ORCID integrations, and existing code libraries can be easily adapted to take advantage of information from ORCID records and to support authenticated connections between author records in DSpace and ORCID iDs.  On March 24, 2015 , João Moreira (Head of Scientific Information, FCT-FCCN,) Paulo Graça (RCAAP Team Member, FCT,) Bram Luyten (Co-Founder, @mire) and Andrea Bollini (CRIS Solution Product Manager, CINECA) presented “New Possibilities: Developments with DSpace and ORCID.”  Each presenter provided insight into how ORCID persistent digital identifier

DPLA: DPLA and Museums: #MuseumWeek 2015

Tue, 2015-03-24 21:00

This week, DPLA is participating in #MuseumWeek, an online conversation about and celebration of museums across the globe. It is a great way to share our voice with the wider museum and cultural heritage community, but also an opportunity to highlight DPLA’s involvement with museum collections.

In DPLA as a whole, museums make up the third largest type of contributing institution, behind university and public libraries. Besides the wealth of information (over 1 million items!) we have from our content Partner, the Smithsonian and all its associated museums, like the Cooper Hewitt and the Natural History Museum, there is an incredible amount of valuable content from other contributing museums. These include art museums, such as the Yale University Art Gallery, the Walters Art Museum and the National Gallery of Art. There is even content from institutions abroad, like the Museum Victoria, London’s Natural History Museum and the Royal Ontario Museum.

An image from a Natural History Museum, London, bulletin.

The museum content available through DPLA also offers unique insights into American history. From the Tubman African American Museum, for example, you can access art and civil rights related content. There is also Native American art and design material available through DPLA, via the Pueblo Grande Museum. Photographs of Japanese-Americans in World War II internment camps are part of a collection from the Topaz Museum. These are just a snapshot of the museum content that is searchable and shareable via DPLA.

It’s more than just searchable and shareable, however. In addition to being featured on DPLA’s social media channels, museum content is often reused as part of our educational outreach. Many of DPLA’s exhibitions highlight items from museums–giving their digital content new use for students at varying grade levels.

Beyond the opportunity for access to new audiences, DPLA is committed to its participation in the museum community. In addition to celebrating #MuseumWeek, our Director for Content Emily Gore recently presented at theVisual Resources Association conference, and many of this year’s new Community Reps members program work in museums. For example, rep Kirsten Terry works as a librarian at the Arab American National Museum in Michigan. Rep Al Bersch works at the Oakland Museum of California, and Janice Lea Lurie is with the Minneapolis Institute of Arts (after a long career in museums!). We have a community rep from the Indianapolis Museum of Art, Samantha Norling, and the Women’s Museum of California, Bonnie R. Domingos, as well as  the Corning Museum of Glass, Rebecca Hopman. We even have one representative from a historic house museum, Julie Goforth, who works at Oatlands in Virginia.

Just as museums place value on their role as public service institutions, DPLA shares that same mission of education and public access. So, help DPLA celebrate those values and the museum institutions that support them by following our social media channels this #MusemWeek and share your favorite #DPLAmuseum with us!

Dan Cohen: What’s the Matter with Ebooks?

Tue, 2015-03-24 20:50

[As you may have noticed, I haven't posted to this blog for over a year. I've been extraordinarily busy with my new job. But I'm going to make a small effort to reinvigorate this space, adding my thoughts on evolving issues that I'd like to explore without those thoughts being improperly attributed to the Digital Public Library of America. This remains my personal blog, and you should consider these my personal views. I will also be continuing to post on DPLA's blog, as I have done on this topic of ebooks.]

Over the past two years I’ve been tracking ebook adoption, and the statistics are, frankly, perplexing. After Amazon released the Kindle in 2007, there was a rapid growth in ebook sales and readership, and the iPad’s launch three years later only accelerated the trend.

Then something odd happened. By most media accounts, ebook adoption has plateaued at about a third of the overall book market, and this stall has lasted for over a year now. Some are therefore taking it as a Permanent Law of Reading: There will be electronic books, but there will always be more physical books. Long live print!

I read both e- and print books, and I appreciate the arguments about the native advantages of print. I am a digital subscriber to the New York Times, but every Sunday I also get the printed version. The paper feels expansive, luxuriant. And I do read more of it than the daily paper on my iPad, as many articles catch my eye and the flipping of pages requires me to confront pieces that I might not choose to read based on a square inch of blue-tinged screen. (Also, it’s Sunday. I have more time to read.) Even though I read more ebooks than printed ones at this point, it’s hard not to listen to the heart and join the Permanent Law chorus.

But my mind can’t help but disagree with my heart. Yours should too if you run through a simple mental exercise: jump forward 10 or 20 or 50 years, and you should have a hard time saying that the e-reading technology won’t be much better—perhaps even indistinguishable from print, and that adoption will be widespread. Even today, studies have shown that libraries that have training sessions for patrons with iPads and Kindles see the use of ebooks skyrocket—highlighting that the problem is in part that today’s devices and ebook services are hard to use. Availability of titles, pricing (compared to paperback), DRM, and a balkanization of ebook platforms and devices all dampen adoption as well.

But even the editor of the New York Times understands the changes ahead, despite his love for print:

How long will print be around? At a Loyola University gathering in New Orleans last week, the executive editor [of the Times], Dean Baquet, noted that he “has as much of a romance with print as anyone.” But he also admitted, according to a Times-Picayune report, that “no one thinks there will be a lot of print around in 40 years.”

Forty years is a long time, of course—although it is a short time in the history of the book. The big question is when the changeover will occur—next year, in five years, in Baquet’s 2055?

The tea leaves, even now, are hard to read, but I’ve come to believe that part of this cloudiness is because there’s much more dark reading going on than the stats are showing. Like dark matter, dark reading is the consumption of (e)books that somehow isn’t captured by current forms of measurement.

For instance, usually when you hear about the plateauing of ebook sales, you are actually hearing about the sales of ebooks from major publishers in relation to the sales of print books from those same publishers. That’s a crucial qualification. But sales of ebooks from these publishers is just a fraction of overall e-reading. By other accounts, which try to shine light on ebook adoption by looking at markets like Amazon (which accounts for a scary two-thirds of ebook sales), show that a huge and growing percentage of ebooks are being sold by indie publishers or authors themselves rather than the bigs, and a third of them don’t even have ISBNs, the universal ID used to track most books.

The commercial statistics also fail to account for free e-reading, such as from public libraries, which continues to grow apace. The Digital Public Library of America and other sites and apps have millions of open ebooks, which are never chalked up as a sale.

Similarly, while surveys of the young continue to show their devotion to paper, yet other studies have shown that about half of those under 30 read an ebook in 2013, up from a quarter of Millennials in 2011—and that study is already dated. Indeed, most of the studies that highlight our love for print over digital are several years old (or more) at this point, a period in which large-format, high-resolution smartphone adoption (much better for reading) and new all-you-can-read ebook services, such as Oyster, Scribd, and Kindle Unlimited, have emerged. Nineteen percent of Millennials have already subscribed to one of these services, a number considered low by the American Press Institute, but which strikes me as remarkably high, and yet another contributing factor to the dark reading mystery.

I’m a historian, not a futurist, but I suspect that we’re not going to have to wait anywhere near forty years for ebooks to become predominant, and that the “plateau” is in part a mirage. That may cause some hand-wringing among book traditionalists, an emotion that is understandable: books are treasured artifacts of human expression. But in our praise for print we forget the great virtues of digital formats, especially the ease of distribution and greater access for all—if done right.

Open Library Data Additions: OL.111001.meta.mrc

Tue, 2015-03-24 20:45

OL.111001.meta.mrc 7033 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Open Library Data Additions: OL.110801.meta.mrc

Tue, 2015-03-24 20:45

OL.110801.meta.mrc 5128 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Open Library Data Additions: OL.110701.meta.mrc

Tue, 2015-03-24 20:45

OL.110701.meta.mrc 5126 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Open Library Data Additions: OL.110601.meta.mrc

Tue, 2015-03-24 20:45

OL.110601.meta.mrc 4924 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Open Library Data Additions: OL.110501.meta.mrc

Tue, 2015-03-24 20:45

OL.110501.meta.mrc 4566 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Open Library Data Additions: OL.110401.meta.mrc

Tue, 2015-03-24 20:45

OL.110401.meta.mrc 5290 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Open Library Data Additions: OL.111201.meta.mrc

Tue, 2015-03-24 20:45

OL.111201.meta.mrc 7631 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Open Library Data Additions: OL.111101.meta.mrc

Tue, 2015-03-24 20:45

OL.111101.meta.mrc 5980 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Open Library Data Additions: OL.110901.meta.mrc

Tue, 2015-03-24 20:45

OL.110901.meta.mrc 6702 records.

This item belongs to: data/ol_data.

This item has files of the following types: Archive BitTorrent, Metadata, Unknown

Nicole Engard: Bookmarks for March 24, 2015

Tue, 2015-03-24 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Scrollback Create rooms based on your interest or follow existing ones. Share ideas, discuss realtime and redefine your online community experience with Scrollback.

Digest powered by RSS Digest

The post Bookmarks for March 24, 2015 appeared first on What I Learned Today....

Related posts:

  1. Koha Users and Developers to Meet at KohaCon 2009
  2. Can you say Kebberfegg 3 times fast
  3. Capturing, Sharing and Acting on Ideas

FOSS4Lib Recent Releases: Archivematica - 1.3.2

Tue, 2015-03-24 17:05

Last updated March 24, 2015. Created by Peter Murray on March 24, 2015.
Log in to edit this page.

Package: ArchivematicaRelease Date: Wednesday, March 18, 2015

David Rosenthal: The Opposite Of LOCKSS

Tue, 2015-03-24 15:00
Jill Lepore's New Yorker "Cobweb" article has focused attention on the importance of the Internet Archive, and the analogy with the Library of Alexandria. In particular on the risks implicit in the fact that both represent single points of failure because they are so much larger than any other collection.

Typically, Jason Scott was first to respond with a outline proposal to back up the Internet Archive, by greatly expanding the collaborative efforts of ArchiveTeam. I think Jason is trying to do something really important, and extremely difficult.

The Internet Archive's collection is currently around 15PB. It has doubled in size in about 30 months. Suppose it takes another 30 months to develop and deploy a solution at scale. We're talking crowd-sourcing a distributed backup of at least 30PB growing at least 3PB/year.

To get some idea of what this means, suppose we wanted to use Amazon's Glacier. This is, after all, exactly the kind of application Glacier is targeted at. As I predicted shortly after Glacier launched, Amazon has stuck with the 1c/GB/mo price. So in 2017 we'd be paying Amazon $3.6M a year just for the storage costs. Alternately, suppose we used Backblaze's Storage Pod 4.5 at their current price of about 5c/GB, for each copy we'd have paid $1.5M in hardware cost and be adding $150K worth per year. This ignores running costs and RAID overhead.

It will be very hard to crowd-source resources on this scale, which is why I say this is the opposite of Lots Of Copies Keep Stuff Safe. The system is going to be short of storage; the goal of a backup for the Internet Archive must be the maximum of reliability for the minimum of storage.

Nevertheless, I believe it would be well worth trying some version of his proposal and I'm happy to help any way I can. Below the fold, my comments on the design of such a system.
ReliabilityWhy is reliability so important for this system? After all, I've been arguing that reliability isn't as important as people think elsewhere. Lets suppose that somehow we have a single copy of the 30PB on disks, and that we perform an integrity check via checksums 10 times a year. Optimistically, we assume these disks never fail for any reason, but they achieve their specified Unrecoverable Bit Error Rate (UBER) of 10-15. There are 2.4*1017 bits in the copy, so on average every time we do an integrity check we will get 240 bad bits. Pessimistically, we assume these bits are randomly distributed (this makes the analysis much easier).

If, as Jason suggests, the backup is divided into 70K 500GB blocks, the probability that any of them will have more than 1 bad bit is small, so we will lose 10*240*500GB of data every year, or 1.2PB, or about a third of the incoming data. Of course, we can repair these failures from the Internet Archive itself, at the cost of increasing the bandwidth impact from about an additional quarter of the Archive's current bandwidth to about a third (see below). But the probability that the Archive and the backup would lose the same data becomes significant.

This argues for much smaller blocks, to reduce the impact of the UBER at the cost of increasing the overhead of the system. Smaller blocks would also make it possible for more people to contribute storage, both from the cost of their contribution and from the impact on their bandwidth. Downloading 500GB on my DSL link would take its entire capacity for two weeks.

In real life, even in data centers disks fail in all sorts of ways that make UBER fairly unimportant. The crowd-sourced disks are likely to be much less reliable still. So the system needs to replicate the data.
ReplicationThe discussions I've seen so far assume that the data is simply replicated, as it is in the LOCKSS system, but only three times. Even replicating by a factor of three means the demand for storage in the backup network by 2018 is nearly 100PB. Clearly, some scheme that gave adequate reliability but used less storage would help significantly. There are two techniques that might help, erasure coding and entanglement. Warning: the following discussion is radically simplified, see here.
Erasure CodingErasure coding is like a distributed version of RAID; files are divided into storage blocks. These data blocks are organized into groups of N. For each group (M>N) blocks are stored, containing the data from the N blocks mixed together so that from any N blocks in the group the original data can be recovered. This allows for non-integer replication factors; the replication factor is (M/N). There are two ways to do this:
  • The N data blocks can be stored unchanged, and (M-N) parity blocks computed from the N blocks can be added. This is the way RAID typically works; it has the advantage that, if nothing has gone wrong, reading a data block requires accessing only a single block. Writing a data block requires writing (1+M-N) blocks, as the parity blocks need to be updated to reflect the new data.
  • The N data blocks are not stored. Instead M blocks are stored each computed from all of the N data blocks in the group. Reading a block requires accessing N blocks,  writing a data block requires accessing M blocks.
The second form is much more expensive, so why would you do it? In a distributed system these accesses can happen in parallel, so the impact is less.  Also, reads of the backup, other than for integrity checking, will be rare, and integrity checks do not need to recover data "in the clear"; the performance costs are not significant in this application.

The real importance is that no individual storage node can, if compromised, reveal any data. In the context of a crowd-sourced backup of the Internet Archive, this is important. If a node in the backup network contains data from the Archive "in the clear" the owner of the node might be in trouble if the relevant authorities considered that content undesirable. If the owner has deniability, in the sense that they can say "there is no way I can know what the data I am storing is, and no way anyone can recover usable data from my disk alone" it is much harder for the authorities to claim that the owner is doing something bad.

The second form of erasure coding has desirable properties for a backup of the Archive, and it can significantly reduce the demand for storage. Examples of systems using the second form of erasure coding are Tahoe-LAFS and Cleversafe.
EntanglementEntanglement was introduced in two 2001 papers, Tangler: A Censorship-Resistant Publishing System Based On Document Entanglements by Marc Waldman and David Mazières, and Dagster: Censorship-Resistant Publishing Without Replication by Adam Stubblefield and Dan Wallach. It has recently been revived in a strengthened form by Verónica Estrada Galiñanes and Pascal Felber in their paper Helical Entanglement Codes: An Efficient Approach for Designing Robust Distributed Storage Systems.

Like the second form of erasure coding, entanglement does not store the data blocks themselves. Each stored block contains data derived from multiple data blocks. The key difference is that erasure coding mixes the data from a fixed group of blocks, whereas entanglement does not organize the blocks into groups but mixes each incoming block with a pseudo-randomly chosen set of stored blocks. This has the following effects:
  • Since the information from which a data block can be recovered is spread across the whole set of stored blocks, deleting or over-writing a data block will affect other data blocks. If the spread is wide enough, selective censorship is effectively blocked and the system is append-only. For the Internet Archive backup application, this is a good thing.
  • Entanglement supports only integer replication factors:
    • Tangler's publishing algorithm takes two stored blocks and a data block and outputs two new stored blocks, thus its replication factor is two. A data block can be recovered from any three of the four (two input and two output) stored blocks.
    • In the default three-strand configuration of the Helical Entanglement Code (HEC) system publishing takes three stored blocks, one from each strand, and a data block and outputs three new stored blocks,  one for each strand. Its replication factor is thus three. Absent data loss, recovering a data block requires accessing two successive stored blocks from any of the three strands. If data loss means that none of the three strands can supply the necessary blocks, a search process can recover the lost stored blocks from information in other stored blocks.
Entanglement systems vary in how they spread the information about a data block among the stored blocks. In Towards A Theory of Data Entanglement James Aspnes et al introduced two criteria for this:
  • A system provides document dependency if a document cannot be recovered if any document it is entangled with is lost.
  • A system provides all-or-nothing integrity if no document can be recovered if any document is lost.
They show that Dagster and Tangler do not meet these criteria. Helical Entanglement is claimed to provide all-or-nothing integrity, the stronger of the criteria. In her brief Work-In-Progress talk at FAST15, Verónica Estrada Galiñanes showed that HEC systems could be configured with large numbers of strands and devices and a replication factor of four to have very high tolerance for failures.

Entanglement has several desirable properties for a backup of the Archive, but it has too high a replication factor to be practical.
RequirementsIf I were doing the design I would start from the end I haven't seen any discussion of so far. Its neat to have a backup copy of the archive, but if it won't actually work when it is needed what's the point? So I'd start the design by looking not at how the data gets out there, but at the use cases when the data needs to get back. Two obvious cases are:
  • The archive loses say 10TB, perhaps because it suffers a rash of correlated disk failures. How does the archive get it back from the backup?
  • The Big One hits the Bay Area and the entire archive is lost. How can the service be re-constituted from the backup?
Note that time is a big issue here. If it is theoretically possible to recover the needed data, but only in a timescale that's so long everyone will have forgotten about the archive by the time it's back, nothing has really been achieved. Recovery needs to be modelled with realistic upstream bandwidths, which will be much less than downstream for most nodes, replication factors, and proportion of accessible nodes.

Once I had a good recovery design, then I'd figure out:
  • How to get data from the archive into a system like that.
  • How to continually audit the system verify that it was in good enough shape to work when needed, something systems used only in an emergency frequently fail to do.
So lets say the requirements are:
  • Capacity of 35PB by late 2017.
  • Replication factor less than 1.5, to limit storage demand by late 2017 to less than about 50PB, or say 100K volunteers each providing 500GB.
  • Provides deniability (see above).
  • Ingest bandwidth of 15PB/year by late 2017 (the current content of the archive needs to be backed up in say 2 years while it is growing say 5PB/year). Note that this is about 4Gb/s leaving the archive, or roughly an additional 25% outbound bandwidth.
  • 95% probability of correctly recovering 10TB in 5 days (it is assumed that much of the content will be off-line most of the time, so instant recovery cannot be a requirement).
  • 95% probability of correctly recovering 95% of the entire archive in 90 days.
  • Meet these requirements in the face of 5% malign nodes conspiring with each other, and realistic error and availability probabilities for the non-malign nodes.
  • The system self-configures as available storage resources change so as to:
    • Ensure all content is stored.
    • Minimize the variation of replication factor across the content.
As I said, this is an extremely difficult problem.

District Dispatch: Upcoming National Library Legislative Day deadlines

Tue, 2015-03-24 12:30

National Library Legislative Day is May 4-5, 2015 and it will be here soon. Here are a few dates you should know:

  • March 31st is the last day to receive the discount rate available at the Liaison Hotel as part of our room block.
  • April 1st is the last day we are accepting nominations for the WHCLIST award.
  • April 24th is the last day to register online for National Library Legislative Day

If you haven’t already, please take a moment to share the following information with your networks, on social media, and through any listservs you moderate. Encourage everyone to join us, and to pass the information along to friends and associates who may also be interested:

“It is so important to have library advocates in Washington, DC to participate in impactful face-to-face meetings. And in the wake of the sweeping changes to both the House and the Senate in the 2014 Congressional elections, it is more important than ever that library supporters rally together to speak up on behalf of libraries and the communities they serve.

This year, National Library Legislative Day will be held May 4-5, 2015. If you haven’t already, please take a moment to consider joining us this year. Registration information and the discount code for the hotel room block are both available on the ALA Washington Office website.

Know a non-librarian who gets fired up about library issues? First-time participants are eligible for a unique scholarship opportunity. The White House Conference on Library and Information Services Taskforce (WHCLIST) and the ALA Washington Office are calling for nominations for the 2015 WHCLIST Award. Recipients of this award receive a stipend ($300 and two free nights at a D.C. hotel) to a non-librarian participant in National Library Legislative Day.

The promotional video for National Library Legislative Day this year can be found here.

Any questions regarding National Library Legislative Day can be directed to the Washington Office Grassroots Communications Specialist, Lisa Lindle.

The post Upcoming National Library Legislative Day deadlines appeared first on District Dispatch.