You are here

Feed aggregator

Roy Tennant: Ambitious “Hydra-in-a-Box” Effort Funded by IMLS

planet code4lib - Wed, 2015-04-15 15:32

Those who have been paying attention to the cutting edge of digital libraries no doubt know about the Hydra project headed up by Stanford. Hydra is a digital repository system that is built using Ruby and is designed to accept the full range of digital object types that a large research library must manage. Built on top of Fedora and Solr, with Blacklight as the default front-end, one doesn’t normally associate ease of installation with a stack like that. Heck, you could spend a week just getting all of the dependencies installed, configured, and up and running.

So color me surprised when it was announced that the Digital Public Library of America, Stanford University, and the Duraspace organization announced that IMLS had awarded them a $2 million National Leadership Grant to develop “Hydra-in-a-Box”. Just as it sounds, the goal is to “build, bundle, and promote a feature-complete, robust digital repository that is easy to install, configure, and maintain—in short, a next-generation digital repository that will work for institutions large and small, and is capable of running as a hosted service.”

That is no small goal, and a laudable one at that. But…gosh. What a distance there is to travel to get there. The project has it pegged at 30 months, so nearly three years. That sounds about right, and so far Tom Cramer has built one of the most broad-based coalitions I’ve seen in academic libraries around Hydra, so you won’t find me betting against him. Especially since he just landed $2 million to help him build out his pet project. So as much as it pains this Cal Bear to say it, Go Stanford!

LITA: Bend Your Mind…and the Laws of the Universe: Adult Summer Reading 2015

planet code4lib - Wed, 2015-04-15 14:50

Summer is right around the corner and a long held tradition in the public library community is summer reading programs. Synonymous with youth and young adult services, summer reading is worth the revisit by adults.

Texas State Library and Archives Commission (2009). Flickr


Science fiction is a gateway

I believe there is a positive correlation between reading science fiction novels and genuine interest in emerging technology. When I was younger, I loved science fiction and fantasy. My interests range from A Princess of Mars to The Hitchhikers Guide to the Galaxy. The Twilight Zone was a mark of my childhood. What I read and watched informed my psyche and furthered my interests in futuristic technology that modern humans could only dream of. The bottom line is that these books sparked an interest. Almost all tech heads I know love science fiction and fantasy. Not everyone is into books, but most science fiction films are based on alternate worlds created by authors like Isaac Asimov and Philip K. Dick. Authors of science fiction and fantasy push the envelope on physics, technology, psychology and history. These novels take place in the “future”, a fictional past or serve as social commentary. They can are cautionary tales or impetus for the reader to become proactive in current affairs. I’m sure no one wants to live in a world similar to Pat Frank’s Alas, Babylon.
A few suggestions for your reading list

In 2011 NPR published a fan-selected list of the top 100 science-fiction and fantasy books for summer reading. While selecting the best science fiction/fantasy book of all time may be a point of contention amongst staunch fans, the point in doing so is impractical.

I went ahead and selected my favorites from NPR’s list as suggestions for summer reading. There are a few that are on my personal reading wish list and many are on my re-read wish list. Which eager reader doesn’t have a wish list?


The classics:

If you went to high school in the United States, you were probably forced to read these. You probably had to analyze the themes, tone, characters, etc. As a result the mere mention of them is trite, but they more than deserve their place on this list.

1984 by George Orwell

Fahrenheit 451 by Ray Bradbury

Brave New World by Aldous Huxley

Slaughterhouse-Five by Kurt Vonnegut

Frankenstein by Mary Shelley


The epics:

Some of the best science-fiction/fantasy books are based in an infinite universe so that they require reader commitment and the ability to lift a ten pound book. Though your eyes may be weary, you won’t be at a loss for the possibilities that are illuminated through the text.

The Lord of the Rings by J.R.R. Tolkien

Dune by Frank Herbert

Foundation by Isaac Asimov

A Game of Thrones by George R.R. Martin

The Giver by Lois Lowry (not on NPR’s list)

A Princess of Mars by Edgar Rice Burroughs (not on NPR’s list)


Notable mention:

Do Androids Dream of Electric Sheep? by Philip K. Dick

The Andromeda Strain by Michael Chrichton

The Gunslinger (The Dark Tower Series) by Stephen King

Outlander by Diana Gabaldon

1632 by Eric Flint

The Body Snatchers by Jack Finney


Now that I’ve performed my reader’s advisory, what’s on your summer reading list? If you have any recommendations, reply to this post to share with others.

Code4Lib: Code4Lib Journal #28: Special Issue on Diversity in Library Technology

planet code4lib - Wed, 2015-04-15 14:38
Topic: journal

Issue 28 of the Code4Lib Journal, Special Issue on Diversity in Library Technology, organized by eight special guest editors and two regular editors, is now available.

Thom Hickey: VIAF RDF Changes

planet code4lib - Wed, 2015-04-15 14:28

Here's a contribution from Jeff Young, who manages the RDF aspects of VIAF:

Since Wikidata’s introduction to the Linked Data Web in 2014 and subsequent integration of Freebase, it has become a premier example of how to publish and manage Linked Data. Like VIAF, Wikidata uses as its core RDF vocabulary and both datasets publish using Linked Data best practices. This consistency should allow applications to treat both datasets as complementary. The main difference will be in the coverage of entities/information, based on their respective sources.

The VIAF RDF changes outlined on the Developer Network blog are intended to further enrich and align the common purpose. Some of the VIAF changes provide additional information to help disambiguate entities, such as schema:location and schema:description. Where possible, schema:names are now language tagged, which should make it easier for applications to select a language-appropriate label for display.

The biggest change, though, is in the “shape of the data” that gets returned via Linked Data requests. Previously, this was a record-oriented view rather than a concise description of the entity. Like Wikidata, the new response will focus on the entity itself and depend on the related entities to describe themselves.

Alignment with Wikidata is a major step in the evolution of VIAF, which started with RDF/XML representations of name authority clusters in 2009 and transitioned to “primary entities” in 2011.  The introduction of VIAF as in 2014 extends the audience and integration with Wikidata further strengthens industry standard practices. These steps should help ensure that VIAF remains an authoritative source of entity identifiers and information in the linked web of data.


Note: We expect these RDF changes to be visible on April 16, 2015.  The bulk distribution will follow shortly after that.


DPLA: Far-reaching “Hydra-in-a-Box” Joint Initiative Funded by IMLS

planet code4lib - Wed, 2015-04-15 13:30

Boston, MA –  The Digital Public Library of America (DPLA), Stanford University, and the DuraSpace organization are pleased to announce that their joint initiative has been awarded a $2M National Leadership Grant from the Institute of Museum and Library Services (IMLS). Nicknamed Hydra-in-a-Box, the project aims foster a new, national, library network through a community-based repository system, enabling discovery, interoperability and reuse of digital resources by people from this country and around the world.

This transformative network is based on advanced repositories that not only empower local institutions with new asset management capabilities, but also interconnect their data and collections through a shared platform.

“At the core of the Digital Public Library of America is our national network of hubs, and they need the systems envisioned by this project,” said Dan Cohen, DPLA’s executive director. “By combining contemporary technologies for aggregating, storing, enhancing, and serving cultural heritage content, we expect this new stack will be a huge boon to DPLA and to the broader digital library community. In addition, I’m thrilled that the project brings together the expertise of DuraSpace, Stanford, and DPLA.”

Each of the partners will fulfill specific roles in the joint initiative. Stanford will use its existing leadership in the Hydra Project to develop core components, in concert with the broader Hydra community. DPLA will focus on the connective tissue between hubs, mapping, and crosswalks to DPLA’s metadata application profile, and infrastructure to support metadata enhancement and remediation. DuraSpace will use its expertise in building and serving repositories, and doing so at scale, to construct the back-end systems for Hydra hosting.

“DuraSpace is excited to provide the infrastructure for this project,” said Debra Hanken Kurtz, DuraSpace CEO. “It aligns perfectly with our mission to steward the scholarly and cultural heritage records and make them accessible for current and future generations. We look forward to working with DPLA and Stanford to support their work and that of the community to ensure a robust and sustainable future for Hydra-in-a-Box.’”

Over the project’s 30-month time frame, the partners will engage with libraries, archives, and museums nationwide, especially current and prospective DPLA hubs and the Hydra community, to systematically capture the needs for a next-generation, open source, digital repository. They will collaboratively extend the existing Hydra project codebase to build, bundle, and promote a feature-complete, robust digital repository that is easy to install, configure, and maintain—in short, a next-generation digital repository that will work for institutions large and small, and is capable of running as a hosted service. Finally, starting with DPLA’s own metadata aggregation services, the partners will work to ensure that these repositories have the necessary affordances to support networked aggregation, discovery, management and access to these resources, producing a shared, sustainable, nationwide platform.

“The Hydra Project has already demonstrated enormous traction and value as a best-in-class digital repository for institutions like Stanford,” said Tom Cramer, Chief Technology Strategist at the Stanford University Libraries. “And yet there is so much more to do. This grant will provide the means to rapidly accelerate Hydra’s rate of development and adoption–expanding its community, features and value all at once.”

To find out more about the Hydra-in-a-Box initiative contact Dan Cohen (, Tom Cramer ( or Debra Hanken Kurtz ( An information page is available here:

About DPLA

The Digital Public Library of America ( strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. Since launching in April 2013, it has aggregated over 8.5 million items from over 1,700 institutions. The DPLA is a registered 501(c)(3) non-profit.

About DuraSpace

DuraSpace (, an independent 501(c)(3) not-for-profit organization providing leadership and innovation for open technologies that promote durable, persistent access to digital data. We collaborate with academic, scientific, cultural, and technology communities by supporting projects (DSpace, Fedora, VIVO) and creating services (DuraCloud, DSpaceDirect, ArchivesDirect) to help ensure that current and future generations have access to our collective digital heritage. Our values are expressed in our organizational byline, “Committed to our digital future.”

About Stanford University Libraries

The Stanford University Libraries ( is internationally recognized as a leader among research libraries, and in leveraging digital technology to support scholarship in the age of information. It is a founder of both the Hydra Project and the Fedora 4 repository effort, and a leading institution in the International Image Interoperability Framework (IIIF) (

About the Hydra Project

The Hydra Project ( is both an open source community and a suite of software that provides a flexible and robustframework for managing, preserving, and providing access to digital assets. The project motto, “One body, many heads,” speaks to the flexibility provided by Hydra’s modern, modular architecture, and the power of combining a robust repository backend (the “body”) with flexible, tailored, user interfaces (“heads”). Co-designed and developed in concert with Fedora 4, the extensible, durable, and widely used repository software, the Hydra/Fedora stack is centerpiece of a thriving and rapidly expanding open source community poised to easy-to-implement solution.

FOSS4Lib Recent Releases: Mirador - 2.0

planet code4lib - Wed, 2015-04-15 12:51

Last updated April 15, 2015. Created by Peter Murray on April 15, 2015.
Log in to edit this page.

Package: MiradorRelease Date: Tuesday, April 14, 2015

FOSS4Lib Recent Releases: pycounter - 0.5a2

planet code4lib - Wed, 2015-04-15 12:48

Last updated April 15, 2015. Created by Peter Murray on April 15, 2015.
Log in to edit this page.

Package: pycounterRelease Date: Monday, April 6, 2015

Code4Lib Journal: Special Issue on Diversity in Library Technology Guest Editorial Committee

planet code4lib - Wed, 2015-04-15 10:54
The guest editorial committee for Code4Lib Journal’s Special Issue on Diversity in Library Technology (issue 28) was developed in order to include new voices and perspectives on the journal’s practices and how they support inclusivity. The committee is comprised of eight guest editors and two regular editorial committee members. More information on the development of […]

Code4Lib Journal: Finding and Supporting New Voices: Code4Lib Journal’s Issue 28 on Diversity in Library Technology

planet code4lib - Wed, 2015-04-15 10:54
Welcome to Code4Lib Journal’s special issue on diversity in library technology. As C4LJ’s first-ever special issue, 28 brings together a plethora of voices from the library tech world in order to approach the challenge of inclusivity within our field from all directions. Over a year of development has gone into this project, which has involved […]

Code4Lib Journal: Feminism and the Future of Library Discovery

planet code4lib - Wed, 2015-04-15 10:54
This paper discusses the various ways in which the practices of libraries and librarians influence the diversity (or lack thereof) of scholarship and information access. We examine some of the cultural biases inherent in both library classification systems and newer forms of information access like Google search algorithms, and propose ways of recognizing bias and applying feminist principles in the design of information services for scholars, particularly as libraries re-invent themselves to grapple with digital collections.

Code4Lib Journal: How to Hack it as a Working Parent

planet code4lib - Wed, 2015-04-15 10:54
The problems faced by working parents in technical fields in libraries are not unique or particularly unusual. However, the cross-section of work-life balance and gender disparity problems found in academia and technology can be particularly troublesome, especially for mothers and single parents. Attracting and retaining diverse talent in work environments that are highly structured or with high expectations of unstated off-the-clock work may be impossible long term. (Indeed, it is not only parents that experience these work-life balance problems but anyone with caregiver responsibilities such as elder or disabled care.) Those who have the energy and time to devote to technical projects for work and fun in their off-work hours tend to get ahead. Those tied up with other responsibilities or who enjoy non-technical hobbies do not get the same respect or opportunities for advancement. Such problems mirror the experiences of women on the tenure track in academia, particularly women working in libraries, and they provide a useful corollary for this discussion. We present some practical solutions for those in technical positions in libraries. Such solutions involve strategic use of technical tools, and lightweight project management applications. Technical workarounds are not the only answer; real and lasting change will involve a change in individual priorities and departmental culture such as sophisticated and ruthless time management, reviewing workloads, cross-training personnel, hiring contract replacements, and creative divisions of labor. Ultimately, a flexible environment that reflects the needs of parents will help create a better workplace culture for everyone, kids or no kids.

Code4Lib Journal: But Then You Have to Make It Happen

planet code4lib - Wed, 2015-04-15 10:54
Librarianship as a profession has a strong commitment to diversity and tends to attract professionals ethically inclined to champion inclusion. The authors, both from historically underrepresented populations in library information technology, have a half-century of combined experience in the field and have held positions ranging from technician, systems librarian, instructional technologist, head of circulation, and digital scholarship and services librarian to associate dean in an academic library. The authors share their experiences and discuss how diversity and inclusion must be embraced at the individual level in order to develop a culture of diversity within an organization and to attract and retain diverse technology teams. Internal commitments to supporting a diverse environment are ultimately critical to recognizing, assessing, and fulfilling the needs of patrons. The authors identify and detail individual and grassroots efforts that have led to library technology programming for underserved populations, including programs involving outreach to diverse student and prospective student communities over the course of their careers. They reflect on strategies to create and retain a diverse technology group within the library and to advance and support diversity within the day-to-day work environment. They posit that a mix of experiences is necessary to advocate for access to underrepresented patron populations and to negotiate and implement a truly diverse environment with regard to ethnicity, gender, age, and socioeconomic background.

Code4Lib Journal: Code as Code: Speculations on Diversity, Inequity, and Digital Women

planet code4lib - Wed, 2015-04-15 10:54
All technologies are social. Taking this socio-technological position becomes less a political stance as a necessity when considering the lived experience of digital inequity, divides, and –isms as they are encountered in every-day library work spheres. Personal experience as women and women of color in our respective technological and leadership communities provides both fore- and background to explore the private-public lines delineating definitions of “diversity”, “inequity”, and digital literacies in library practice. We suggest that by not probing these definitions at the most personal level of lived experience, we in the LIS and technology professions will remain well-intentioned, but ineffective, in genuine inclusion.

Code4Lib Journal: User Experience is a Social Justice Issue

planet code4lib - Wed, 2015-04-15 10:54
When we're building services for people, we often have a lot more practice seeing from the computer's point of view than seeing from another person's point of view. The author asks the library technology community to consider several case studies in this problem, including their root causes, and the negative impact of this problem on achieving our mission as library technologists. The author then recommends specific actions that we, as individual contributors and organizations, can take to increase our empathy and improve the user experience we provide to patrons.

Code4Lib Journal: Recognizing Cultural Diversity in Library Interface Development

planet code4lib - Wed, 2015-04-15 10:54
The rapid increase in complex library digital infrastructures has enabled a more full-featured set of resources to become accessible by autonomous users, whether onsite or remote. However, this trend also necessitates careful consideration of the usability of new interfaces for populations with increasing cultural, geographic, and socioeconomic diversity. Researcher Aron Marcus has become an authority on how cultural principles affect interface perceptions and inform their development. This article will explore Marcus’ work to contextualize diversity issues within usability before exploring the redevelopment strategy for the New York University Libraries’ web presence, which serves a broad and global set of users.

Code4Lib Journal: Transforming Knowledge Creation: An Action Framework for Library Technology Diversity

planet code4lib - Wed, 2015-04-15 10:54
This paper will articulate an action framework for library technology diversity consisting of five dimensions and based on the vision for knowledge creation, the academic library’s fundamental vision. The framework focuses on increasing diversity for library technology efforts based on the desire for transformation and inclusiveness within and across the dimensions. The dimensions are people, content and pedagogy, embeddedness and the global perspective, leadership, and the 5th dimension – bringing it all together.

Code4Lib Journal: “What If I Break It?”: Project Management for Intergenerational Library Teams Creating Non-MARC Metadata

planet code4lib - Wed, 2015-04-15 10:54
Libraries are constantly challenged to meet new user needs and to provide access to new types of materials. We are in the process of launching many new technology-rich initiatives and projects which require investments of staff time, a resource which is at a premium for most new library hires. We simultaneously have people on staff in our libraries with more traditional skill sets who may be able to contribute time and theoretical expertise to these projects, but require training. Incorporating these “seasoned” employees into new initiatives can be a daunting task. In this article, I will share some of the strategies I have used as a metadata project manager for bridging diverse generations of library staff who have various levels of comfort and expertise with technology, and strategies that I have used to reduce the barriers to participation for staff with diverse perspectives and skill sets. These strategies can also be helpful in assisting a new librarian with technology-rich skill sets to more successfully orient themselves when embedded in a “traditional” library setting.

David Rosenthal: The Maginot Paywall

planet code4lib - Tue, 2015-04-14 21:55
Two recent papers examine the growth of peer-to-peer sharing of journal articles. Guilliame Cabanac's Bibliogifts in LibGen? A study of a text-sharing platform driven by biblioleaks and crowdsourcing (LG) is a statistical study of the Library Genesis service, and Carolyn Caffrey Gardner and Gabriel J. Gardner's Bypassing Interlibrary Loan via Twitter: An Exploration of #icanhazpdf Requests (TW) is a similar study of one of the sources for Library Genesis. Both implement forms of Aaron Swartz's Guerilla Open Access Manifesto, a civil disobedience movement opposed to the malign effects of current copyright law on academic research. Below the fold, some thoughts on the state of this movement.

In the years leading up to WWII, the French built the Maginot Line as an impregnable barrier against a German invasion:
While the fortification system did prevent a direct attack, it was strategically ineffective, as the Germans invaded through Belgium, going around the Maginot Line.Copyright maximalists such as the major academic publishers, are in a similar position. The more effective and thus intrusive the mechanisms they implement to prevent unauthorized access, the more they incentivize "guerilla open access".

Some copyright owners are coming to terms with this phenomenon. Today, Hugh Pickens reports that the first 4 of the 10 episodes of Game of Thrones new season have leaked:
The episodes have already been downloaded almost 800,000 times, and that figure was expected to blow past a million downloads by the season 5 premiere. Game of Thrones has consistently set records for piracy, which has almost been a point of pride for HBO. "Our experience is [piracy] leads to more penetration, more paying subs, more health for HBO, less reliance on having to do paid advertising. If you go around the world, I think you're right, Game of Thrones is the most pirated show in the world. Well, you know, that's better than an Emmy." LG shows the massive scale on which "guerilla open access" is happening in the field of academic journals. As of the study, Library Genesis hosted nearly 23M articles identified by DOI, 15TB of data. The distribution was heavily skewed to the major publishers, representing 77% of Elsevier's DOIs, 73% of Wiley's and 53% of Springer's, although only 36% of all DOIs. To give some idea of the scale, this is about 60% of Ontario's Scholar's Portal, which has 38M.

Although some open access DOIs are included, the motivation to upload them is much less. A recent estimate by Khabasa and Lee Giles is that 24% of all articles are openly accessible on the Web, their methodology excluded most content from Library Genesis. Not all DOIs from major publishers are paywalled, they publish some open access journals and allow Gold open access (author pays) in some cases. Despite these elements of double counting, it appears likely that at least a majority of all articles, and significantly more than a majority of major publisher articles, can be accessed without passing though a paywall.

Although the bulk of the Library Genesis content arrived via a small number of large uploads, the median upload rate is 2720 new articles/day. Among the sources for them are:
  • The Scholar subreddit, which LG estimates sees about 45 requests/day for articles to be shared via Library Genesis.
  • Sci-Hub, a service using proxies running on networks with subscriptions to paywalled publishers that allows users to enter a DOI. It it is not available from Library Genesis, the service tries proxies at random until one is found that can access the paper, which is both served to the user and added to Library Genesis.
Presumably, the #icanhazpdf hashtag is another of the Library Genesis upload paths. TW analyzed 824 requests from 475 users over 3 months, or about 10/day. 674 of them were for articles, from 493 different journal titles. The mechanism doesn't provide information about how many were satisfied, or how many of the results ended up on Library Genesis.

LG doesn't have an estimate of the Sci-Hub traffic, but unless it is very large there must be other mechanisms filling the large gap between the Scholar subreddit and #icanhazpdf rates and the Library Genesis median upload rate.

Admittedly, it takes time for newly published articles to appear outside their paywalls. Some publishers operate "moving walls", so their articles become open access after an embargo period. It takes time for the various mechanisms driving Library Genesis to locate and upload articles. LG shows that their most recent year (2013) has only about half as many articles as the previous year, so the average delay is similar to the moving wall.

Paying to pass through paywalls thus delivers some value, not just access to a minority of the content but also more timely access to some of the majority. Nevertheless, the multi-billion dollar profits of the major publishers, let alone the other multiple billions that represent their costs in supplying their services, are hard to justify. We have already seen that their peer review process fails in its assigned role of ensuring the quality of the papers they publish. Now we see that the majority of the content for which they charge these enormous sums is available without payment.

My previous posts on scholarly communication.

Meredith Farkas: Sinners, saints, and social media take-downs

planet code4lib - Tue, 2015-04-14 21:39

I hate one-dimensional characters in movies and TV. I love complex characters who have good qualities and bad. I like that “The Good Wife” actually isn’t really such a paragon of moral virtue at all. That she has made questionable decisions and struggles with things, just like we all do. I like how many of the “villains” on that show do monstrous things, but still have likable qualities and people they love and who love them in turn. I’m glad we’re seeing more and more shows like that, where characters are as flawed and three-dimensional as we all are.

Yet there seems to be something in us that likes to simplify things when it comes to judging real people. Someone is either good or bad. On the side of right or on the side of evil. And there’s a tendency to either vilify people or put them on a pedestal. But the world is not so black-and-white.

I think few things have made that tendency to simplify as clear to me as the whole Joe Murphy vs. #Teamharpy lawsuit and social media debacle. It seemed like the dominant narrative either had to be that Lisa Rabey and nina de jesus were heroes and saints and Joe Murphy was a monster, or that Joe Murphy was a saint and poor innocent victim and Lisa Rabey and nina de jesus were monsters. I personally don’t believe either is true. Joe Murphy is not a saint, but he has had his reputation damaged (maybe fatally in our profession) for something there may be no evidence of him having done. Calling someone a sexual predator without first-hand knowledge or evidence that they are one (and I’m not saying that victims need to have evidence) seems like a shitty thing to do. But, given the number of negative things I’d heard about Joe from other librarians prior to all this, I’m assuming (and hoping) that Lisa thought she was doing something good in warning people about him.

I’m writing this knowing that I will probably be trolled by someone for it, but c’est la vie. I’m disturbed by the fact that, after all of the petitions, and Facebook drama, and blog posts, and tweets about this no one seems to be talking about this (other than right-wing feminist-hating nut-jobs) since the lawsuit was settled and Lisa and nina published retractions. We shouldn’t let right-wing feminist-hating nut-jobs control the narrative. And we also should be willing to admit when we were wrong and/or stand up for our beliefs if we feel we are right.

When I first wrote a post about all this, social media had been relatively quiet about it. I think there had been a couple of blog posts and the Team Harpy WordPress site was up, but nothing with a lot of vitriol had come out. Most of the rhetoric seemed focused generally on how common sexual harassment is — even in our female-dominated profession — and how important it is that there are whistleblowers who speak out about that behavior. There were posts about the importance of believing victims and supporting whistleblowers. I’d say that people were generally supportive of Lisa and nina, but were not necessarily assuming that Joe was what they said he was.

Soon after, the discussion took a turn for the bizarre, at least to me. The conversation around Joe on Facebook and Twitter became intensely vitriolic, with plenty of people arguing his guilt as if they had inside information. Respected library administrators who have never met Joe were calling him a “douchebag” on Twitter. There was a petition asking him to drop his lawsuit, apologize to nina and Lisa, and compensate them. It was signed by over 1,000 people, including many people I like and respect. I did not sign it. I found it really odd that no one was considering the fact that he might be the victim in this. Instead, Lisa and nina were treated like victims, which, if they did harm his career without any evidence of a crime, they were very much in the wrong. I find it difficult to believe that over 1,000 other people knew for a fact that he actually was a sexual predator.

It seemed more like people thought he was wrong to have sued them. If someone publicly accused me of a terrible crime with no evidence and damaged my career, wouldn’t I be the injured party and shouldn’t I be able to seek damages in a court of law? The idea that he was squashing their free speech rights was ridiculous. If it’s not true that Joe is a sexual predator, it is slander. It’s one thing to say Joe Murphy is a jerk. That is opinion. But stating that someone is factually something that they don’t know is true is not protected speech. Destroying someone’s reputation is a tremendous and personal violation of another human being. But maybe he deserved it because he was a player and a flirt? How is that any different than “slut-shaming?” I found it disturbing that none of the people I like and respect seemed to be acknowledging this. But maybe everyone but me knew for a fact that it was true?

I don’t like Joe Murphy. I still feel about him exactly the way I did when I wrote my first post. But, as I mentioned then, I think the fact that he was disliked by so many people made it easy for folks to believe him to have done it (and he might consider why so many people were saying awful things about him behind his back, because it’s not just “haters gonna hate”). We’ve all seen the delight people feel when someone powerful (or someone who is perceived of as being privileged) is taken down. I’ve been reading a lot about Jon Ronson’s new book So You’ve Been Publicly Shamed and am looking forward to reading it and learning more about this strange and all-too-common social phenomenon.

In addition to the fact that plenty of people wanted to see him taken down a peg, this was happening at a time when things like gamergate and the recent conversations, articles, and presentations about sexual harassment in librarianship were shining a pretty bright light on this issue. I think people wanted to show their support for women who have been the victims of sexual harassment and this lawsuit gave our community an opportunity to come together to do that.

But let’s remember something here: nina and Lisa were not sexually harassed by Joe Murphy. That was never what anyone was claiming. But many people behaved like Joe was suing the victims of harassment. No. He was suing people who were reporting something they said they’d heard. This wasn’t about believing the victims of sexual harassment. They may have believed they were doing the right thing, but they weren’t harassed by Joe prior to posting what they did.

Now the tide has shifted and the trolls are attacking nina, Lisa, and their supporters (including me, though I wasn’t actually a supporter). I can’t even blame Joe much for engaging in a bit of schadenfreude now (I’ve seen him favoriting some of the trolling tweets his lawyers have been shooting out to me and others) I can’t fathom the suffering he must have endured through all this. I can’t imagine how demoralizing it must have been to have more than 1,000 people in our profession signing a petition against him. But sadly, because he’s put on the mantle of the innocent victim and good-guy, I doubt very much that he is going to examine the behavior that got him here (and I don’t mean the lawsuit).

And that’s the rub. How do we call people like Joe on their shit in a way that might actually create change? Calling them a sexual predator on Twitter without evidence is clearly not it. I believe in the power of social media for good, but I haven’t seen a lot of good come out of it when it comes to calling out powerful men for bad behavior, because many then just position themselves as victims. Has public shaming really ever worked to meaningfully change people’s behavior (again, gotta read Ronson’s book)? But the “whisper network” doesn’t work either. People were saying lots of things about Joe, but the information wasn’t getting to people in power or maybe even Joe himself. Maybe he didn’t know how a lot of people felt about him. I have no idea.

Still the greatest tragedy here, in my opinion is that so many women suffer sexual harassment and most of the time the perpetrators get away with it. And this whole sordid affair did little to help the cause of encouraging women to come forward. I’ve been sexually harassed and stalked and never reported any of it. But it was when a faculty member at a former job who used to stand too close to me and would put his arm around my waist sometimes later escalated to grabbing a colleagues breasts that I realized my silence was hurting other women. Because men who do things like this don’t just do it once. If they get away with something that you consider too minor to report, they may escalate to doing something much worse to someone else. We have to find more ways to help women feel safe reporting harassment. I’m happy that more conferences now have codes of conduct and discernible methods of reporting inappropriate behavior, and that will help, but it’s not enough.

I don’t have anything positive to end with here, so I’ll close with an excerpt from an interview with Jon Ronson where he talks about a situation where guy at a conference was social media shamed after a woman tweeted about an off-color joke he made and then she was horribly trolled after he said he lost his job because of it. See any parallels?

The strange thing is the impulse to shame often comes from a good place. Like the desire to confront sexism, say. A good example is the tech conference incident: Hank whispers a naff joke about ‘big dongles’ to his friend, Adria hears it and takes offence, posts something on Twitter and the whole thing snowballs.

Ronson: Yeah, everybody involved in that shaming is doing it for social justice reasons. So Adria feels that in calling out Hank she’s a calling out a greater truth: that privileged white men don’t know the effect they have on other people. The trolls think they’re doing the right thing because they feel Adria robbed Hank of his employment – so they wanted to get back at her. Everybody involved in that story feels the urge to be a good person – and it’s carnage all round. Everyone is broken by the experience; especially Adria, she has it worse than anybody. I mean, I’m on Hank’s side. Nobody wants to live in a world where you can’t make a dongle joke! But by the end of the story, Hank’s okay, he’s got a new job, but Adria’s unemployed and subjected to death threats. So Adria’s view of the world feels vindicated.

Image credit

Ed Summers: Tweets and Deletes

planet code4lib - Tue, 2015-04-14 16:20

Archives are full of silences. Archivists try to surface these silences by making appraisal decisions about what to collect and what not to collect. Even after they are accessioned, records can be silenced by culling, weeding and purging. We do our best to document these activities, to leave a trail of these decisions, but they are inevitably deeply contingent. The context for the records and our decisions about them unravels endlessly.

At some point we must accept that the archival record is not perfect, and that it’s a bit of a miracle that it exists at all. But in all these cases it is the archivist who has agency: the deliberate or subliminal decisions that determine what comprises the archival record are enacted by an archivist. In addition the record creator has agency, in their decision to give their records to an archive.

Perhaps I’m over-simplifying a bit, but I think there is a curious new dynamic at play in social media archives, specifically archives of Twitter data. I wrote in a previous post about how Twitter’s Terms of Service prevent distribution of Twitter data retrieved from their API, but do allow for the distribution of Tweet IDs and relatively small amounts of derivative data (spreadsheets, etc).

Tweet IDs can then be hydrated, or turned back into raw original data, by going back to the Twitter API. If a tweet has been deleted you cannot get it back from the API. The net effect this has is of cleaning, or purging, the archival record as it is made available on the Web. But the decision of what to purge is made by the record creator (the creator of the tweet) or by Twitter themselves in cases where tweets or users are deleted.

For example lets look at the collection of Twitter data that Nick Ruest has assembled in the wake of the attack on the offices of Charlie Hebdo earlier this year. Nick collected 13 million tweets mentioning four hashtags related to the attacks, for the period of January 9th to January 28th, 2015. He has made the tweet IDs available as a dataset for researchers to use (a separate file for each hashtag). I was interested in replicating the dataset for potential researchers at the University of Maryland, but also in seeing how many of the tweets had been deleted.

So on February 20th (42 days after Nick started his collecting) I began hydrating the IDs. It took 4 days for twarc to finish. When it did I counted up the number of tweets that I was able to retrieve. The results are somewhat interesting:

hashtag archived tweets hydrated deletes percent deleted #JeSuisJuif 96,518 89,584 6,934 7.18% #JeSuisAhmed 264,097 237,674 26,423 10.01% #JeSuisCharlie 6,503,425 5,955,278 548,147 8.43% #CharlieHebdo 7,104,253 6,554,231 550,022 7.74% Total 13,968,293 12,836,767 1,131,526 8.10%

It looks like 1.1 million tweets out of the 13.9 million tweet dataset have been deleted. That’s about 8.1%. I suspect now even more have been deleted. While the datasets themselves are significantly smaller the number of deletes for #JeSuiAhmed and #JeSuisJuif seem quite a bit higher than #JeSuisCharlie and #CharlieHebdo. Could this be that users were concerned about how their tweets would be interpreted by parties analyzing the data?

Of course, it’s very hard for me to say since I don’t have the deleted tweets. I don’t even know who sent them. A researcher interested in these questions would presumably need to travel to York University to work with the dataset. In a way this seems to be how archives usually work. But if you add the Web as a global, public access layer into the mix it complicates things a bit.


Subscribe to code4lib aggregator