You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 38 min 24 sec ago

Richard Wallis: Baby Steps Towards A Library Graph

Fri, 2014-09-26 15:36

It is one thing to have a vision, regular readers of this blog will know I have them all the time, its yet another to see it starting to form through the mist into a reality. Several times in the recent past I have spoken of the some of the building blocks for bibliographic data to play a prominent part in the Web of Data.  The Web of Data that is starting to take shape and drive benefits for everyone.  Benefits that for many are hiding in plain site on the results pages of search engines. In those informational panels with links to people’s parents, universities, and movies, or maps showing the location of mountains, and retail outlets; incongruously named Knowledge Graphs.

Building blocks such as Schema.org; Linked Data in WorldCat.org; moves to enhance Schema.org capabilities for bibliographic resource description; recognition that Linked Data has a beneficial place in library data and initiatives to turn that into a reality; the release of Work entity data mined from, and linked to, the huge WorldCat.org data set.

OK, you may say, we’ve heard all that before, so what is new now?

As always it is a couple of seemingly unconnected events that throw things into focus.

Event 1:  An article by David Weinberger in the DigitalShift section of Library Journal entitled Let The Future Go.  An excellent article telling libraries that they should not be so parochially focused in their own domain whilst looking to how they are going serve their users’ needs in the future.  Get our data out there, everywhere, so it can find its way to those users, wherever they are.  Making it accessible to all.  David references three main ways to provide this access:

  1. APIs – to allow systems to directly access our library system data and functionality
  2. Linked Datacan help us open up the future of libraries. By making clouds of linked data available, people can pull together data from across domains
  3. The Library Graph –  an ambitious project libraries could choose to undertake as a group that would jump-start the web presence of what libraries know: a library graph. A graph, such as Facebook’s Social Graph and Google’s Knowledge Graph, associates entities (“nodes”) with other entities

(I am fortunate to be a part of an organisation, OCLC, making significant progress on making all three of these a reality – the first one is already baked into the core of OCLC products and services)

It is the 3rd of those, however, that triggered recognition for me.  Personally, I believe that we should not be focusing on a specific ‘Library Graph’ but more on the ‘Library Corner of a Giant Global Graph’  – if graphs can have corners that is.  Libraries have rich specialised resources and have specific needs and processes that may need special attention to enable opening up of our data.  However, when opened up in context of a graph, it should be part of the same graph that we all navigate in search of information whoever and wherever we are.

Event 2: A posting by ZBW Labs Other editions of this work: An experiment with OCLC’s LOD work identifiers detailing experiments in using the OCLC WorldCat Works Data.

ZBW contributes to WorldCat, and has 1.2 million oclc numbers attached to it’s bibliographic records. So it seemed interesting, how many of these editions link to works and furthermore to other editions of the very same work.

The post is interesting from a couple of points of view.  Firstly the simple steps they took to get at the data, really well demonstrated by the command-line calls used to access the data – get OCLCNum data from WorldCat.or in JSON format – extract the schema:exampleOfWork link to the Work – get the Work data from WorldCat, also in JSON – parse out the links to other editions of the work and compare with their own data.  Command-line calls that were no doubt embedded in simple scripts.

Secondly, was the implicit way that the corpus of WorldCat Work entity descriptions, and their canonical identifying URIs, is used as an authoritative hub for Works and their editions.  A concept that is not new in the library world, we have been doing this sort of things with names and person identities via other authoritative hubs, such as VIAF, for ages.  What is new here is that it is a hub for Works and their relationships, and the bidirectional nature of those relationships – work to edition, edition to work – in the beginnings of a library graph linked to other hubs for subjects, people, etc.

The ZBW Labs experiment is interesting in its own way – simple approach enlightening results.  What is more interesting for me, is it demonstrates a baby step towards the way the Library corner of that Global Web of Data will not only naturally form (as we expose and share data in this way – linked entity descriptions), but naturally fit in to future library workflows with all sorts of consequential benefits.

The experiment is exactly the type of initiative that we hoped to stimulate by releasing the Works data.  Using it for things we never envisaged, delivering unexpected value to our community.  I can’t wait to hear about other initiatives like this that we can all learn from.

So who is going to be doing this kind of thing – describing entities and sharing them to establish these hubs (nodes) that will form the graph.  Some are already there, in the traditional authority file hubs: The Library of Congress LC Linked Data Service for authorities and vocabularies (id.loc.gov), VIAF, ISNI, FAST, Getty vocabularies, etc.

As previously mentioned Work is only the first of several entity descriptions that are being developed in OCLC for exposure and sharing.  When others, such as Person, Place, etc., emerge we will have a foundation of part of a library graph – a graph that can and will be used, and added to, across the library domain and then on into the rest of the Global Web of Data.  An important authoritative corner, of a corner, of the Giant Global Graph.

As I said at the start these are baby steps towards a vision that is forming out of the mist.  I hope you and others can see it too.

(Toddler image: Harumi Ueda)

DPLA: Remembering the Little Rock Nine

Fri, 2014-09-26 14:21

This week, 57 years ago, was a tumultuous one for nine African American students at Central High School in Little Rock, Arkansas. Now better known as the Little Rock Nine, these high school students were part of a several year battle to integrate Little Rock School District after the landmark 1954 Brown v. Board of Education Supreme Court ruling.

From that ruling on, it was a tough uphill battle to get the Little Rock School District to integrate. On a national level, all eight congressmen from Arkansas were part of the “Southern Manifesto,” encouraging Southern states to resist integration. On a local level, white citizens’ councils, like the Capital Citizens Council and the Mothers’ League of Central High School, were formed in Little Rock to protest desegregation. They also lobbied politicians, in particular Arkansas Governor Orval Faubus, who went on to block the 1957 desegregation of Central High School.

These tensions escalated throughout September 1957—which saw the Little Rock Nine barred from entering the school by Arkansas National Guard troops sent by Faubus. Eventually, Federal District Judge Ronald Davies was successful in ordering Faubus to stop interfering with desegregation. Integration began during this week, 57 years ago.

On September 23, 1957, the nine African American students entered Central High School by a side door, while a mob of more than 1,000 people crowded the building. Local police were overwhelmed, and the protesters began attacking African American reporters outside the school building.

President Eisenhower, via Executive Order 10730, sent the U.S. Army to Arkansas to escort the Little Rock Nine into school, on September 25, 1957. The students attended classes with soldiers by their side. By the end of the month, a now federalized National Guard had mostly taken over protection of the students. While eventually the protests died down, the abuse and tension did not.  The school was eventually shut down from 1958 through fall 1959 as the struggle for segregation continued.

Through the DPLA, you can get a better sense of what that struggle and tension was like. In videos from our service hub, Digital Library of Georgia, you can view news clips recorded during this historic time in Little Rock. These videos are a powerful testament to the struggle of the Little Rock Nine, and the Civil Rights movement as a whole.

Related items in DPLA

Reporters interview students protesting the integration of Central High School. Police hold back rioters during the protest White students burn an effigy of a black student, while African American students are escorted by police into the high school President Dwight D. Eisenhower makes a statement about the Little Rock Nine and integration at Central High School Arkansas Governor Orval Faubus calls Arkansas “an occupied territory,” and a “defenseless state” against the federal troops sent by President Eisenhower Georgia Governor Marvin Griffin condemns federal troops in Little Rock, promises to maintain segregation in Georgia schools

David Rosenthal: Plenary Talk at 3rd EUDAT Conference

Fri, 2014-09-26 10:04
I gave a plenary talk at the 3rd EUDAT Conference's session on sustainability entitled Economic Sustainability of Digital Preservation. Below the fold is an edited text with links to the sources.



I'm David Rosenthal from the LOCKSS (Lots Of Copies Keep Stuff Safe) Program at the Stanford Libraries. We've been sustainably preserving digital information for a reasonably long time, and I'm here to talk about some of the lessons we've learned along the way that are relevant for research data.

In May 1995 Stanford Libraries' HighWire Press pioneered the shift of academic journals to the Web by putting the Journal of Biological Chemistry on-line. Almost immediately librarians, who pay for this extraordinarily expensive content, saw that the Web was a far better medium than paper for their mission of getting information to current readers. But they have a second mission, getting information to future readers. There were both business and technical reasons why, for this second mission, the Web was a far worse medium than paper:
  • The advent of the Web forced libraries to change from purchasing a copy of the content to renting access to the publisher's copy. If the library stopped paying the rent, it would lose access to the content.
  • Because in the Web the publisher stored the only copy of the content, and because it was on short-lived, easily rewritable media, the content was at great risk of loss and damage.
As a systems engineer, I found the paper library system interesting as an example of fault-tolerance. It consisted of a loosely-coupled network of independent peers. Each peer stored copies of its own selection of the available content on durable, somewhat tamper-evident media. The more popular the content, the more peers stored a copy. There was a market in copies; as content had fewer copies, each copy became more valuable, encouraging the peers with a copy to take more care of it. It was easy to find a copy, but it was hard to be sure you had found all copies, so undetectably altering or deleting content was difficult. There was a mechanism, inter-library loan and copy, for recovering from loss or damage to a copy.

The LOCKSS Program started in October 1998 with the goal of replicating the paper library system for the Web. We built software that allowed libraries to deploy a PC, a LOCKSS box, that was the analog for the Web of the paper library's stacks. By crawling the Web, the box collected a copy of the content to which the library subscribed and stored it. Readers could access their library's copy if for any reason they couldn't get to the publisher's copy. Boxes at multiple libraries holding the same content cooperated in a peer-to-peer network to detect and repair any loss or damage.

The program was developed and went into early production with initial funding from the NSF, and then major funding from the Mellon Foundation, the NSF and Sun Microsystems. But grant funding isn't a sustainable business model for digital preservation. In 2005, the Mellon Foundation gave us a grant with two conditions; we had to match it dollar-for-dollar and by the end of the grant in 2007 we had to be completely off grant funding. We made both conditions, and we have (with one minor exception which I will get to later) been off grant funding and in the black ever since. The LOCKSS Program has two businesses:
  • We develop, and support libraries that use, our open-source software for digital preservation. The software is free, libraries pay for support. We refer to this as the "Red Hat" business model
  • Under contract to a separate not-for-profit organization called CLOCKSS run jointly by publishers and libraries, we use our software to run a large dark archive of e-journals and e-books. This archive has recently been certified as a "Trustworthy Repository" after a third-party audit which awarded it the first-ever perfect score in the Technologies, Technical Infrastructure, Security category.
The first lesson that being self-sustaining for 7 years has taught us is "Its The Economics, Stupid". Research in two areas of preservation, e-journals and the public Web, indicates that in each of these two areas combining all current efforts preserves less than half the content that should be preserved. Why less than half? The reason is that the budget for digital preservation isn't adequate to preserve even half using current technology. This leaves us with three choices:
  • Do nothing. In that case we can stop worrying about bit rot, format obsolescence, operator error and all the other threats digital preservation systems are designed to combat. These threats are dwarfed by the threat of can't afford to preserve. It is going to mean that more than 50% of the stuff that should be available to future readers isn't.
  • Double the budget for digital preservation. This is so not going to happen. Even if it did, it wouldn't solve the problem because, as I will show, the cost per unit content is going to rise.
  • Halve the cost per unit content of current systems. This can't be done with current architectures. Yesterday morning I gave a talk at the Library of Congress describing a radical re-think of long-term storage architecture that might do the trick. You can find the text of the talk on my blog.
Unfortunately, the structure of research funding means that economics is an even worse problem for research data than for our kind of content. There's been quite a bit of research into the costs of digital preservation, but it isn't based on a lot of good data. Remedying that is important. I'm on the advisory board of an EU-funded project called 4C that is trying to remedy that. If you have any kind of cost data you can share please go to http://www.4cproject.eu/ and submit it to the Curation Cost Exchange.

As an engineer, I'm used to using rules of thumb. The one I use to summarize most of the cost research is that ingest takes half the lifetime cost, preservation takes one third, and access takes one sixth.


Research grants might be able to fund the ingest part, this is a one-time up-front cost. But preservation and access are ongoing costs for the life of the data, so grants have no way to cover them. We've been able to ignore this problem for a long time, for two reasons. From at least 1980 to 2010 costs followed Kryder's Law, the disk analog of Moore's Law, dropping 30-40%/yr. This meant that, if you could afford to store the data for a few years, the cost of storing it for the rest of time could be ignored, because of course Kryder's Law would continue forever. The second is that as the data got older, access to it was expected to become less frequent. Thus the cost of access in the long term could be ignored.

Kryder's Law held for three decades, an astonishing feat for exponential growth. Something that goes on that long gets built into people's model of the world, but as Randall Munroe points out, in the real world exponential curves cannot continue for ever. They are always the first part of an S-curve.

This graph, from Preeti Gupta of UC Santa Cruz plots the cost per GB of disk drives against time. In 2010 Kryder's Law abruptly stopped. In 2011 the floods in Thailand destroyed 40% of the world's capacity to build disks, and prices doubled. Earlier this year they finally got back to 2010 levels. Industry projections are for no more than 10-20% per year going forward (the red lines on the graph). This means that disk is now about 7 times as expensive as was expected in 2010 (the green line), and that in 2020 it will be between 100 and 300 times as expensive as 2010 projections.

Thanks to aggressive marketing, it is commonly believed that "the cloud" solves this problem. Unfortunately, cloud storage is actually made of the same kind of disks as local storage, and is subject to the same slowing of the rate at which it was getting cheaper. In fact, when all costs are taken in to account, cloud storage is not cheaper for long-term preservation than doing it yourself once you get to a reasonable scale. Cloud storage really is cheaper if your demand is spiky, but digital preservation is the canonical base-load application.

You may think that cloud storage is a competitive market; in fact it is dominated by Amazon. When Google recently started to get serious about competing, they pointed out that Amazon's margins on S3 may have been minimal at introduction, by then they were extortionate:
cloud prices across the industry were falling by about 6 per cent each year, whereas hardware costs were falling by 20 per cent. And Google didn't think that was fair. ... "The price curve of virtual hardware should follow the price curve of real hardware."Notice that the major price drop triggered by Google was a one-time event; it was a signal to Amazon that they couldn't have the market to themselves, and to smaller players that they would no longer be able to compete.

In fact commercial cloud storage is a trap. It is free to put data in to a cloud service such as Amazon's S3, but it costs to get it out. For example, getting your data out of Amazon's Glacier without paying an arm and a leg takes 2 years. If you commit to the cloud as long-term storage, you have two choices. Either keep a copy of everything outside the cloud (in other words, don't commit to the cloud), or stay with your original choice of provider no matter how much they raise the rent.

The storage part of preservation isn't the only on-going cost that will be much higher than people expect, access will be too. In 2010 the Blue Ribbon Task Force on Sustainable Digital Preservation and Access pointed out that the only real justification for preservation is to provide access. With research data this is a difficulty, the value of the data may not be evident for a long time. Shang dynasty astronomers inscribed eclipse observations on animal bones. About 3200 years later, researchers used these records to estimate that the accumulated clock error was about 7 hours. From this they derived a value for the viscosity of the Earth's mantle as it rebounds from the weight of the glaciers.

In most cases so far the cost of an access to an individual item has been small enough that archives have not charged the reader. Research into past access patterns to archived data showed that access was rare, sparse, and mostly for integrity checking.

But the advent of "Big Data" techniques mean that, going forward, scholars increasingly want not to access a few individual items in a collection, but to ask questions of the collection as a whole. For example, the Library of Congress announced that it was collecting the entire Twitter feed, and almost immediately had 400-odd requests for access to the collection. The scholars weren't interested in a few individual tweets, but in mining information from the entire history of tweets. Unfortunately, the most the Library of Congress can afford to do with the feed is to write two copies to tape. There's no way they can afford the compute infrastructure to data-mine from it. We can get some idea of how expensive this is by comparing Amazon's S3, designed for data-mining type access patterns, with Amazon's Glacier, designed for traditional archival access. S3 is currently at least 2.5 times as expensive; until recently it was 5.5 times.

The real problem here is that scholars are used to having free access to library collections, but what scholars now want to do with archived data is so expensive that they must be charged for access. This in itself has costs, since access must be controlled and accounting undertaken. Further, data-mining infrastructure at the archive must have enough performance for the peak demand but will likely be lightly used most of the time, increasing the cost for individual scholars. A charging mechanism is needed to pay for the infrastructure. Fortunately, because the scholar's access is spiky, the cloud provides both suitable infrastructure and a charging mechanism.

For smaller collections, Amazon provides Free Public Datasets, Amazon stores a copy of the data with no charge, charging scholars accessing the data for the computation rather than charging the owner of the data for storage.

Even for large and non-public collections it may be possible to use Amazon. Suppose that in addition to keeping the two archive copies of the Twitter feed on tape, the Library kept one copy in S3's Reduced Redundancy Storage simply to enable researchers to access it. For this year, it would have averaged about $4100/mo, or about $50K. Scholars wanting to access the collection would have to pay for their own computing resources at Amazon, and the per-request charges; because the data transfers would be internal to Amazon there would not be bandwidth charges. The storage charges could be borne by the library or charged back to the researchers. If they were charged back, the 400 initial requests would each need to pay about $125 for a year's access to the collection, not an unreasonable charge. If this idea turned out to be a failure it could be terminated with no further cost, the collection would still be safe on tape. In the short term, using cloud storage for an access copy of large, popular collections may be a cost-effective approach. Because the Library's preservation copy isn't in the cloud. they aren't locked-in.

One thing it should be easy to agree on about digital preservation is that you have to do it with open-source software; closed-source preservation has the same fatal "just trust me" aspect that closed-source encryption (and cloud storage) suffer from. Sustaining open source preservation software is interesting, because unlike giants like Linux, Apache and so on it is a niche market with little commercial interest.

We have managed to sustain open-source preservation software well for 7 years, but have encountered one problem. This brings me to the exception I mentioned earlier. To sustain the free software, paid support model you have to deliver visible value to your customers regularly and frequently. We try to release updated software every 2 months, and new content for preservation weekly. But this makes it difficult to commit staff resources to major improvements to the infrastructure. These are needed to address problems that don't impact customers yet, but will in a few years unless you work on them now.

The Mellon Foundation supports a number of open-source initiatives, and after discussing this problem with them they gave us a small grant specifically to work on enhancements to the LOCKSS system such as support for collecting websites that use AJAX, and for authenticating users via Shibboleth. Occasional grants of this kind may be needed to support open-source preservation infrastructure generally, even if pay-for-support can keep it running.

Unfortunately, economics aren't the only hard problem facing the long-term storage of data. There are serious technical problems too. Lets start by examining the technical problem in its most abstract form. Since 2007 I've been using the example of "A Petabyte for a Century". Think about a black box into which you put a Petabyte, and out of which a century later you take a Petabyte. Inside the box there can be as much redundancy as you want, on whatever media you choose, managed by whatever anti-entropy protocols you want. You want to have a 50% chance that every bit in the Petabyte is the same when it comes out as when it went in.

Now consider every bit in that Petabyte as being like a radioactive atom, subject to a random process that flips it with a very low probability per unit time. You have just specified a half-life for the bits. That half-life is about 60 million times the age of the universe. Think for a moment how you would go about benchmarking a system to show that no process with a half-life less than 60 million times the age of the universe was operating in it. It simply isn't feasible. Since at scale you are never going to know that your system is reliable enough, Murphy's law will guarantee that it isn't.

Here's some back-of-the-envelope hand-waving. Amazon's S3 is a state-of-the-art storage system. Its design goal is an annual probability of loss of a data object of 10-11. If the average object is 10K bytes, the bit half-life is about a million years, way too short to meet the requirement but still really hard to measure.

Note that the 10-11 is a design goal, not the measured performance of the system. There's a lot of research into the actual performance of storage systems at scale, and it all shows them under-performing expectations based on the specifications of the media. Why is this? Real storage systems are large, complex systems subject to correlated failures that are very hard to model.

Worse, the threats against which they have to defend their contents are diverse and almost impossible to model. Nine years ago we documented the threat model we use for the LOCKSS system. We observed that most discussion of digital preservation focused on these threats:
  • Media failure
  • Hardware failure
  • Software failure
  • Network failure
  • Obsolescence
  • Natural Disaster
but that the experience of operators of large data storage facilities was that the significant causes of data loss were quite different:
  • Operator error
  • External Attack
  • Insider Attack
  • Economic Failure
  • Organizational Failure
Building systems to defend against all these threats combined is expensive, and can't ever be perfectly effective. So we have to resign ourselves to the fact that stuff will get lost. This has always been true, it should not be a surprise. And it is subject to the law of diminishing returns. Coming back to the economics, how much should we spend reducing the probability of loss?

Consider two storage systems with the same budget over a decade, one with a loss rate of zero, the other half as expensive per byte but which loses 1% of its bytes each year. Clearly, you would say the cheaper system has an unacceptable loss rate.

However, each year the cheaper system stores twice as much and loses 1% of its accumulated content. At the end of the decade the cheaper system has preserved 1.89 times as much content at the same cost. After 30 years it has preserved more than 5 times as much at the same cost.

Adding each successive nine of reliability gets exponentially more expensive. How many nines do we really need? Is losing a small proportion of a large dataset really a problem? The canonical example of this is the Internet Archive's web collection. Ingest by crawling the Web is a lossy process. Their storage system loses a tiny fraction of its content every year. Access via the Wayback Machine is not completely reliable. Yet for US users archive.org is currently the 150th most visited site, whereas loc.gov is the 1519th. For UK users archive.org is currently the 131st most visited site, whereas bl.uk is the 2744th.

Why is this? Because the collection was always a series of samples of the Web, the losses merely add a small amount of random noise to the samples. But the samples are so huge that this noise is insignificant. This isn't something about the Internet Archive, it is something about very large collections. In the real world they always have noise; questions asked of them are always statistical in nature. The benefit of doubling the size of the sample vastly outweighs the cost of a small amount of added noise. In this case more really is better.

To sum up, the good news is that sustainable preservation of digital content such as research data is possible, and the LOCKSS Program is an example.

The bad news is that people's expectations are way out of line with reality. It isn't possible to preserve nearly as much as people assume is already being preserved, nearly as reliably as they assume it is already being done. This mismatch is going to increase. People don't expect more resources yet they do expect a lot more data. They expect that the technology will get a lot cheaper but the experts no longer believe it will.

Research data, libraries and archives are a niche market. Their problems are technologically challenging but there isn't a big payoff for solving them, so neither industry nor academia are researching solutions. We end up cobbling together preservation systems out of technology intended to do something quite different, like backups.

Meredith Farkas: Whistleblowers and what still isn’t transparent

Fri, 2014-09-26 03:41

Social media is something I have in common with popular library speaker Joe Murphy. We’ve both given talks about the power of social media at loads of conferences. I love the radical transparency that social media enables. It allows for really authentic connection and also really authentic accountability. So many bad products and so much bad behavior have come to light because of social media. Everyone with a cell phone camera can now be an investigative reporter. So much less can be swept under the rug. It’s kind of an amazing thing.

But what’s disturbing is what has not become more transparent. Sexual harassment for one. When a United States senator doesn’t feel like she can name the man who told her not to lose weight after having her baby because “I like my girls chubby,” then we know this problem is bigger than just libraryland.

It’s been no secret among many women (and some men) who attend and speak at conferences like Internet Librarian and Computers in Libraries that Joe Murphy has a reputation for using these conferences as his own personal meat markets. Whether it’s true or not, I don’t know. I’ve known these allegations since before 2010, which was when I had the privilege of attending a group dinner with him.

He didn’t sexually harass anyone at the table that evening, but his behavior was entitled, cocky, and rude. He barely let anyone else get a word in edgewise because apparently what he had to say (in a group with some pretty freaking illustrious people) was more important than what anyone else had to say. The host of the dinner apologized to me afterwards and said he had no idea what this guy was like. And that was the problem. This information clearly wasn’t getting to the people who needed it most; particularly the people who invited him to speak at conferences. For me, it only cemented the fact that it’s a man’s world (even in our female-dominated profession) and men can continue to get away with and profit from offering more flash than substance and behaving badly.

Why don’t we talk about sexual harassment in the open? I can only speak from my own experience not revealing a public library administrator who sexually harassed me at a conference. First, I felt embarrassed, like maybe I’d encouraged him in some way or did something to deserve it. Second, he was someone I’d previously liked and respected and a lot of other people liked and respected him, and I didn’t want to tarnish his reputation over something that didn’t amount to that much. Maybe also the fact that he was so respected also made me scared to say something, because, in the end, it could end up hurting me.

People who are brave enough to speak out about sexual harassment and name names are courageous. As Barbara Fister wrote, they are whistleblowers. They protect other women from suffering a similar fate, which is noble. When Lisa Rabey and nina de jesus (AKA #teamharpy) wrote about behavior from Joe Murphy that many of us had been hearing about for years, they were acting as whistleblowers, though whistleblowers who had only heard about the behavior second or third-hand, which I think is an important distinction. I believe they shared this information in order to protect other women. And now they’re being sued by Joe Murphy for 1.25 million dollars in damages for defaming his character. You can read the statement of claim here. I assume he is suing them in Canada because it’s easier to sue for libel and defamation outside of the U.S.

On his blog, Wayne Biven’s Tatum wonders “whether the fact of the lawsuit might hurt Murphy within the librarian community more than any accusations of sexual harassment.” Is it the Streisand effect, whereby Joe Murphy is bringing more attention to his alleged behavior by suing these women? It’s possible that this will bite him in the ass more than the original tweets and blog post (which I hadn’t seen prior) ever could. 

I fear the impact of this case will be that women feel even less safe speaking out against sexual harassment if they believe that they could be sued for a million or more dollars. In the end, how many of us really have “proof” that we were sexually harassed other than our word? If you know something that substantiates their allegations of sexual predatory behavior, consider being a witness in #teamharpy’s case. If you don’t but still want to help, contribute to their defense fund.

That said, that this information comes second or third-hand does concern me. I don’t know for a fact that Joe Murphy is a sexual predator. Do you? Here’s what I do know. Did he creep me out when I interacted with him? Yes. Did he creep out other women at conferences? Yes. Did he behave like an entitled jerk at least some of the time? Yes. Do many people resent the fact that a man with a few years of library experience who hasn’t worked at a library in years is getting asked to speak at international conferences when all he offers is style and not substance? Yes.

While all of the rumors about him that have been swirling around for at least the past 4-5 years may be 100% true, I don’t know if they are. I don’t know if anyone has come out and said they were harassed by him beyond the general “nice shirt” comment that creeped out many women. As anyone who has read my blog for a while knows, I am terrified of groupthink. So I feel really torn when it comes to this case. Part of me wonders whether my dislike of Joe Murphy makes me more prone to believe these things. Another part of me feels that these allegations are very consistent with my experience of him and with the rumors over these many years. But I’m not going to decide whether the allegations are true without hearing it from someone who experienced it first-hand.

I wish I could end this post on a positive note, but this is pretty much sad for everyone. Sad for the two librarians who felt they were doing a courageous thing (and may well have been) by speaking out and are now being threatened by a tremendously large lawsuit. Sad for the victims of harassment who may be less likely to speak out because of this lawsuit. And sad for Joe Murphy if he is truly innocent of what he’s been accused (and imagine for a moment the consequences of tarring and feathering an innocent man). I wish we lived in a world where we felt as comfortable reporting abuse and sexual harassment as we do other wrongdoing. I wish as sharp a light was shined on this as has recently been shined on police brutality, corporate misbehavior, and income inequality. And maybe the only positive is that this is shining a light on the fact that this happens and many women, even powerful women, do not feel empowered to report it.

Photo credit: She whispered into the wrong ears by swirling thoughts

Galen Charlton: Banned books and the library of Morpheus

Fri, 2014-09-26 03:23

A notion that haunts me is found in Neil Gaiman’s The Sandman: the library of the Dreaming, wherein can be found books that no earth-bound librarian can collect.  Books that caught existence only in the dreams – or passing thoughts – of their authors. The Great American Novel. Every Great American Novel, by all of the frustrated middle managers, farmers, and factory workers who had their heart attack too soon. Every Great Nepalese Novel.  The conclusion of the Wheel of Time, as written by Robert Jordan himself.

That library has a section containing every book whose physical embodiment was stolen.  All of the poems of Sappho. Every Mayan and Olmec text – including the ones that, in the real world, did not survive the fires of the invaders.

Books can be like cockroaches. Text thought long-lost can turn up unexpectedly, sometimes just by virtue of having been left lying around until someone things to take a closer look. It is not an impossible hope that one day, another Mayan codex may make its reappearance, thumbing its nose at the colonizers and censors who despised it and the culture and people it came from.

Books are also fragile. Sometimes the censors do succeed in utterly destroying every last trace of a book. Always, entropy threatens all.  Active measures against these threats are required; therefore, it is appropriate that librarians fight the suppression, banning, and challenges of books.

Banned Books Week is part of that fight, and is important that folks be aware of their freedom to read what they choose – and to be aware that it is a continual struggle to protect that freedom.  Indeed, perhaps “Freedom to Read Week” better expresses the proper emphasis on preserving intellectual freedom.

But it’s not enough.

I am also haunted by the books that are not to be found in the Library of the Dreaming – because not even the shadow of their genesis crossed the mind of those who could have written them.

Because their authors were shot for having the wrong skin color.

Because their authors were cheated of an education.

Because their authors were sued into submission for daring to challenge the status quo.  Even within the profession of librarianship.

Because their authors made the decision to not pursue a profession in the certain knowledge that the people who dominated it would challenge their every step.

Because their authors were convinced that nobody would care to listen to them.

Librarianship as a profession must consider and protect both sides of intellectual freedom. Not just consumption – the freedom to read and explore – but also the freedom to write and speak.

The best way to ban a book is to ensure that it never gets written. Justice demands that we struggle against those who would not just ban books, but destroy the will of those who would write them.

CrossRef: CrossRef Indicators

Thu, 2014-09-25 20:35

Updated September 23, 2014

Total no. participating publishers & societies 5363
Total no. voting members 2609
% of non-profit publishers 57%
Total no. participating libraries 1902
No. journals covered 36,035
No. DOIs registered to date 69,191,919
No. DOIs deposited in previous month 582,561
No. DOIs retrieved (matched references) in previous month 35,125,120
DOI resolutions (end-user clicks) in previous month 79,193,741

CrossRef: New CrossRef Members

Thu, 2014-09-25 20:34

Last updated September 23, 2014

Voting Members
Brazilian Journal of Internal Medicine
Brazilian Journal of Irrigation and Drainage - IRRIGA
Djokosoetono Research Center
EDIPUCRS
Education Association of South Africa
Feminist Studies
Laboreal, FPCE, Universidade do Porto
Libronet Bilgi Hizmetleri ve Yazilim San. Tic. Ltd., Sti.
Open Access Text Pvt, Ltd.
Pontifical University of John Paul II in Krakow
Revista Brasileira de Quiropraxia - Brazilian Journal of Chiropractic
Scientific Online Publishing, Co. Ltd.
Symposium Books, Ltd.
Turkiye Yesilay Cemiyeti
Uniwersytet Ekonomiczny w Krakowie - Krakow University of Economics
Volgograd State University

Sponsored Members
IJNC Editorial Committee
Japanese Association of Cardioangioscopy
Lithuanian Universtity of Educational Sciences
The Operations Research Society of Japan

Represented Members
Acta Medica Anatolia
Ankara University Faculty of Agriculture
CNT Nanostroitelstvo
Dnipropetrovsk National University of Railway Transport
English Language and Literature Association of Korea
Institute for Humanities and Social Sciences
Institute of Korean Independence Movement Studies
Journal of Chinese Language and Literature
Journal of Korean Linguistics
Knowledge Management Society of Korea
Korea Association for International Commerce and Information
Korea Research Institute for Human Settlements
Korean Academic Society for Public Relations
Korean Marketing Association
Korean Society for Art History
Korean Society for the Study of Physical Education
Korean Society of Consumer Policy and Education
Law Research Institute, University of Seoul
Research Institute Centerprogamsystem, JSC
Research Institute of Science Education, Pusan National University
Research Institute of Social Science
Silicea - Poligraf, LLC
SPb RAACI
The Altaic Society of Korea
The Hallym Academy of Sciences
The Korean Association of Ethics
The Korean Association of Translation Studies
The Korean Society for Culture and Arts Education Studies
The Korean Society for Feminist Studies in English Literature
The Korean Society for Investigative Cosmetology
The Regional Association of Architectural Institute of Korea
The Society for Korean Language and Literary Research
Ural Federal University
V.I. Shimakov Federal Research Center of Transplantology and Artificial Organs
World Journal of Traditional Chinese Medicine
Yonsei Institute for North Korean Studies

Last updated September 10, 2014

Voting Members
Fucape Business School
Journal Issues Limited
Revista Bio Ciencias
The Russian Law Academy of the Ministry of Justice of the RF

Sponsored Members

Japan Society for Simulation Technology

Represented Members

Asian Journal of Education
Center for Studies of Christian Thoughts and Culture
Contemporary Film Research Institute
Democratic Legal Studies Association
Foreign Studies Institute
Institute for English Cultural Studies
Institute for Japanese Studies
Institute for Philosophy
Institute for the Translation of Korean Classics
Institute of Humanities
International Journal of Entrepreneurial Knowledge
Korean Academy of Kinesiology
Korean Association for the Study of English Language and Linguistics (KASELL)
Korean Logistics Society
The Association of Korean Education
The Korean Philosophy of Education Society
The Korean Society for School Science

District Dispatch: Copyright Office under the congressional spotlight

Thu, 2014-09-25 17:33

Last Thursday, the U.S. House Judiciary Subcommittee on Courts, Intellectual Property, and the Internet held a hearing to gather information about the work of the U.S. Copyright Office and to learn about the challenges the Office faces in trying to fulfill its many responsibilities. Testifying before the Committee was Maria Pallante, Register of Copyrights and Director of the Copyright Office (view Pallante’s testimony (pdf)). Pallante gave a thorough overview of the Office’s administrative, public policy and regulatory functions, and highlighted a number of ways in which the Office’s structure and position within the federal bureaucracy create inefficiencies in its day-to-day operations. Pallante described these inefficiencies as symptoms of a larger problem: The 1976 Copyright Act vested the Office with the resources and authority it needed to thrive in an analog world, but it failed to anticipate the new needs the Office would develop in adjusting to a digital world.


Broadcast live streaming video on Ustream

Although the Office’s registration system—the system by which it registers copyright claims—was brought online in 2008, Pallante describes it as nothing more than a 20th century system presented in a 21st century format. The Office’s recordation system—the process by which it records copyright documents—is still completed manually and has not been updated for decades. Pallante considers fully digitizing the registration and recordation functions of the Copyright Office a top priority:

From an operational standpoint, the Office’s electronic registration system was fully implemented in 2008 by adapting off-the-shelf software. It was designed to transpose the paper-based system of the 20th century into an electronic interface, and it accomplished that goal. However, as technology continues to move ahead we must continue to evaluate and implement improvements. Both the registration and recordation systems need to be increasingly flexible to meet the rapidly changing needs of a digital marketplace.

Despite Pallante’s commitment to updating these systems, she cited her lack of administrative autonomy within the Library of Congress and her Office’s tightening budget as significant impediments to achieving this goal. Several members of the Committee suggested that the Office would have greater latitude to update its operations for the digital age if it were moved out from under the authority of the Library of Congress (LOC). While Pallante did not explicitly support this idea, she was receptive to suggestions from members of the Subcommittee that her office carries out very specialized functions that differ from those that are carried out by the rest of the LOC. Overall, Pallante seemed open to—if not supportive of—having a longer policy discussion on the proper position of the Copyright Office within the federal government.

In addition to providing insight into the inner-workings of the copyright office, the hearing continued the policy discussion on the statutory and regulatory frameworks that govern the process of documenting a copyright. As the Judiciary Committee continues to review the copyright law, it will be interesting to see if it further examines statutory and regulatory changes to the authority and structure of the Copyright Office.

The post Copyright Office under the congressional spotlight appeared first on District Dispatch.

LITA: An Interview with LITA Emerging Leader Annie Gaines

Thu, 2014-09-25 13:00

1. Tell us about your library job.  What do you love most about it?

I am the Scholarly Communications Librarian at the University of Idaho. This is a brand new position within the library and also my first ‘real’ librarian job, so it’s been a constant learning experience. I work along with the Digital Initiatives Librarian on the various digital projects happening at the library, including building an institutional repository, creating digital collections, redesigning the library website, creating and managing open access journals, and working on VIVO (a semantic-web application we are using as a front-end to our IR). I also do some education and advocacy around copyright, author’s rights, open access, etc.

The thing I love most about this job (aside from being able to design websites in crayon – image attached) is taking an idea and bringing it into fruition. Whether it’s a digital collection of postcards with custom navigation or a new journal or database, being able to make an idea a functional, beautiful reality is really rewarding. Also I’m just really excited about increasing access to information, and designing new ways to make that information accessible to a broader audience.

2. Where do you see yourself going from here?

Having just started this career, I’m not completely sure what’s next for me. I’m very happy in my current position, and I love all of the people I work with at the University of Idaho. I think my next step will probably be to start pursing another degree to help expand my knowledge in this field, or to fulfil my dream to become a professional comic artist/graphic novelist on the side.

3. Why did you apply to be an Emerging Leader?  What are your big takeaways from the ALA-level activities so far?

I was encouraged by my mentor, a previous Emerging Leader, to apply. I am actually the fourth Emerging Leader in a row to be selected from the University of Idaho Library, so there is a lot of administrative support and encouragement for this kind of activity. The big thing I’ve learned through working with ALA is that although the organization and the sub-organizations have a massive population, it is a handful of active participants who make nearly everything happen. My goal is to become one of those change-agents at the ALA level, eventually.

4. What have you learned about LITA governance and activities so far?

I’ve learned that LITA is inclusive and active with its membership. This is a very fun organization, and I’m impressed with the discussion and activities that come out of LITA and its membership.

5. What’s your favorite LITA moment?  What would you like to do next in the organization?

My favorite LITA moment was working with Rachel Vacek and Kyle Denlinger on the Town Meeting activities at Midwinter. My favorite kind of brainstorming involves large sheets of paper and crayons (see above) and being able to do that with other LITA members was really fun.

Eric Hellman: Emergency! Governor Christie Could Turn NJ Library Websites Into Law-Breakers

Wed, 2014-09-24 23:24
Nate Hoffelder over at The Digital Reader highlighted the passage of a new "Reader Privacy Act" passed by the New Jersey State Legislature. If signed by Governor Chris Christie it would take effect immediately. It was sponsored by my state senator, Nia Gill.

In light of my writing about privacy on library websites, this poorly drafted bill, though well intentioned, would turn my library's website into a law-breaker, subject to a $500 civil fine for every user. (It would also require us to make some minor changes at Unglue.it.)
  1. It defines "personal information" as "(1) any information that identifies, relates to, describes, or is associated with a particular user's use of a book service; (2) a unique identifier or Internet Protocol address, when that identifier or address is used to identify, relate to, describe, or be associated with a particular user, as related to the user’s use of a book service, or book, in whole or in partial form; (3) any information that relates to, or is capable of being associated with, a particular book service user’s access to a book service."
  2. “Provider” means any commercial entity offering a book service to the public.
  3. A provider shall only disclose the personal information of a book service user [...] to a person or private entity pursuant to a court order in a pending action brought by [...] by the person or private entity.
  4. Any book service user aggrieved by a violation of this act may recover, in a civil action, $500 per violation and the costs of the action together with reasonable attorneys’ fees.
My library, Montclair Public Library, uses a web catalog run by Polaris, a division of Innovative Interfaces, a private entity, for BCCLS, a consortium serving northern New Jersey. Whenever I browse a catalog entry in this catalog, a cookie is set by AddThis (and probably other companies) identifying me and the web page I'm looking at. In other words, personal information as defined by the act is sent to a private entity, without a court order.

And so every user of the catalog could sue Innovative for $500 each, plus legal fees.

The only out is "if the user has given his or her informed consent to the specific disclosure for the specific purpose." Having a terms of use and a privacy policy is usually not sufficient to achieve "informed consent".

Existing library privacy laws in NJ have reasonable exceptions for "proper operations of the library". This law does not have a similar exemption.
I urge Governor Christie to veto the bill and send it back to the legislature for improvements that take account of the realities of library websites and make it easier for internet bookstores and libraries to operate legally in the Garden State.

You can contact Gov. Christie's office using this form.

Update: Just talked to one of Nia Gill's staff; they're looking into it. Also updated to include the 2nd set of amendments.

Update 2: A close reading of the California law on which the NJ statute was based reveals that poor wording in section 4 is the source of the problem. In the California law, it's clear that it pertains only to the situation where a private entity is seeking discovery in a legal action, not when the private entity is somehow involved in providing the service.

Where the NJ law reads
A provider shall only disclose the personal information of a book service user to a government entity, other than a law enforcement entity, or to a person or private entity pursuant to a court order in a pending action brought by the government entity or by the person or private entity.  it's meant to read
In a pending action brought by the government entity other than a law enforcement entity, or by a person or by a private entity, a provider shall only disclose the personal information of a book service user to such entity or person pursuant to a court order. 

Tara Robertson: Reply from Vancouver Public Library Board re: new internet use policy

Wed, 2014-09-24 22:54

A few people were critical of my directness in my letter to the VPL board, so I was surprised to get a response. I have permission to post the reply I received here.  I’d love to know what other people think.

Dear Tara,

Thank you for your email dated August 26, to the VPL Board regarding the new VPL Policy.

VPL upholds high standards with regard to access to information and intellectual freedom. We have demonstrated this repeatedly in response to challenges to items in our collection and room rentals. The issue of public displays in a public space is a challenging one that raises unique issues that access to collections for personal private use does not.

Staff considered a multitude of options before and during the development of this policy solution, including all of the considerations you mentioned in your email – space design, equipment options, specific versus more general language. Ultimately, each of these solutions creates their own problems and it was determined that the approved approach, while not perfect, was the most appropriate given the library’s circumstances.

The Board agrees that implementation of the policy and appropriate training for staff will be critical to ensure that people’s rights to access content are not unreasonably restricted. Our professional librarians at VPL – who share common library professional values – have considerable experience in managing and balancing diverse values and public goods in policy and service. In fact, we have high confidence in our professional librarians’ ability to apply this policy in a nuanced and appropriate manner that does not unreasonably restrict access to content. We also all agree that the appropriate person to have this conversation are public service staff; however, there are occasional circumstances when Security staff are appropriate.

Staff will monitor the outcomes of this policy change and will report to the Board after a full  year of implementation. At that point, they may or may not recommend adjustments to the policy.

If you have any further questions, we invite you to connect with VPL management. We understand you have many personal contacts on the VPL management team who are always open to discussing matters related to the library with colleagues.

Sincerely,

Mary Lynn Baum
Chair, Vancouver Public Library Board

Pages