You are here

Feed aggregator

OCLC Dev Network: Software Development Practices: Getting Specific with Acceptance Criteria

planet code4lib - Thu, 2014-09-11 13:30

If you’ve been following our product development practices series, you know how to think about identifying problems and articulating those problems as user stories. But even the best user story can’t encompass all of the details of the user experience that need to be considered in the development process.  This week’s post explains the important role of acceptance criteria.

Karen Coyle: Philosophical Musings: The Work

planet code4lib - Thu, 2014-09-11 11:30
We can't deny the idea of work - opera, oeuvre - as a cultural product, a meaningful bit of human-created stuff. The concept exists, the word exists. I question, however that we will ever have, or that we should ever have, precision in how works are bounded; that we'll ever be able to say clearly that the film version of Pride and Prejudice is or is not the same work as the book. I'm not even sure that we can say that the text of Pride and Prejudice is a single work. Is it the same work when read today that it was when first published? Is it the same work each time that one re-reads it? The reading experience varies based on so many different factors - the cultural context of the reader; the person's understanding of the author's language; the age and life experience of the reader.

The notion of work encompasses all of the complications of human communication and its consequent meaning. The work is a mystery, a range of possibilities and of possible disappointments. It has emotional and, at its best, transformational value. It exists in time and in space. Time is the more canny element here because it means that works intersect our lives and live on in our memories, yet as such they are but mere ghosts of themselves.

Take a book, say, Moby Dick; hundreds of pages, hundreds of thousands of words. We read each word, but we do not remember the words -- we remember the book as inner thoughts that we had while reading. Those could be sights and smells, feelings of fear, love, excitement, disgust. The words, external, and the thoughts, internal, are transformations of each other; from the author's ideas to words, and from the words to the reader's thoughts. How much is lost or gained during this process is unknown. All that we do know is that, for some people at least, the experience is vivid one. The story takes on some meaning in the mind of the reader, if one can even invoke the vague concept of mind without torpedoing the argument altogether.

Brain scientists work to find the place in the maze of neuronic connections that can register the idea of "red" or "cold" while outside of the laboratory we subject that same organ to the White Whale, or the Prince of Denmark, or the ever elusive Molly Bloom. We task that organ to taste Proust's madeleine; to feel the rage of Ahab's loss; to become a neighbor in one of Borges' villages. If what scientists know about thought is likened to a simple plastic ping-pong ball, plain, round, regular, white, then a work is akin to a rainforest of diversity and discovery, never fully mastered, almost unrecognizable from one moment to the next.

As we move from textual works to musical ones, or on to the visual arts, the transformation from the work to the experience of the work becomes even more mysterious. Who hasn't passed quickly by an unappealing painting hanging on the wall of a museum before which stands another person rapt with attention. If the painting doesn't speak to us, then we have no possible way of understanding what it is saying to someone else.

Libraries are struggling to define the work as an abstract but well-bounded, nameable thing within the mass of the resources of the library. But a definition of work would have to be as rich and complex as the work itself. It would have to include the unknown and unknowable effect that the work will have on those who encounter it; who transform it into their own thoughts and experiences. This is obviously impractical. It would also be unbelievably arrogant (as well as impossible) for libraries to claim to have some concrete measure of "workness" for now and for all time. One has to be reductionist to the point of absurdity to claim to define the boundaries between one work and another, unless they are so far apart in their meaning that there could be no shared messages or ideas or cultural markers between them. You would have to have a way to quantify all of the thoughts and impressions and meanings therein and show that they are not the same, when "same" is a target that moves with every second that passes, every synapse that is fired.

Does this mean that we should not try to surface workness for our users? Hardly. It means that it is too complex and too rich to be given a one-dimensional existence within the current library system. This is, indeed, one of the great challenges that libraries present to their users: a universe of knowledge organized by a single principle as if that is the beginning and end of the story. If the library universe and the library user's universe find few or no points of connection, then communication between them fails. At best, like the user of a badly designed computer interface, if any communication will take place it is the user who must adapt. This in itself should be taken the evidence of superior intelligence on the part of the user as compared to the inflexibility of the mechanistic library system.

Those of us in knowledge organization are obsessed with neatness, although few as much as the man who nearly single-handled defined our profession in the late 19th century; the man who kept diaries in which he entered the menu of every meal he ate; whose wedding vows included a mutual promise never to waste a minute; the man enthralled with the idea that every library be ordered by the simple mathematical concept of the decimal.

To give Dewey due credit, he did realize that his Decimal Classification had to bend reality to practicality. As the editions grew, choices had to be made on where to locate particular concepts in relation to others, and in early editions, as the Decimal Classification was used in more libraries and as subject experts weighed in, topics were relocated after sometimes heated debate. He was not seeking a platonic ideal or even a bibliographic ideal; his goal was closer to the late 19th century concept of efficiency. It was a place for everything, and everything in its place, for the least time and money.

Dewey's constraints of an analog catalog, physical books on physical shelves, and a classification and index printed in book form forced the limited solution of just one place in the universe of knowledge for each book. Such a solution can hardly be expected to do justice to the complexity of the Works on those shelves. Today we have available to us technology that can analyze complex patterns, can find connections in datasets that are of a size way beyond human scale for analysis, and can provide visualizations of the findings.

Now that we have the technological means, we should give up the idea that there is an immutable thing that is the work for every creative expression. The solution then is to see work as a piece of information about a resource, a quality, and to allow a resource to be described with as many qualities of work as might be useful. Any resource can have the quality of the work as basic content, a story, a theme. It can be a work of fiction, a triumphal work, a romantic work. It can be always or sometimes part of a larger work, it can complement a work, or refute it. It can represent the philosophical thoughts of someone, or a scientific discovery. In FRBR, the work has authorship and intellectual content. That is precisely what I have described here. But what I have described is not based on a single set of rules, but is an open-ended description that can grow and change as time changes the emotional and informational context as the work is experienced.

I write this because we risk the petrification of the library if we embrace what I have heard called the "FRBR fundamentalist" view. In that view, there is only one definition of work (and of each other FRBR entity). Such a choice might have been necessary 50 or even 30 years ago. It definitely would have been necessary in Dewey's time. Today we can allow ourselves greater flexibility because the technology exists that can give us different views of the same data. Using the same data elements we can present as many interpretations of Work as we find useful. As we have seen recently with analyses of audio-visual materials, we cannot define work for non-book materials identically to that of books or other texts. [1] [2] Some types of materials, such as works of art, defy any separation between the abstraction and the item. Just where the line will fall between Work and everything else, as well as between Works themselves, is not something that we can pre-determine. Actually, we can, I suppose, and some would like to "make that so", but I defy such thinkers to explain just how such an uncreative approach will further new knowledge.

[1] Kara Van Malssen. BIBFRAME A-V modeling study
[2] Kelley McGrath. FRBR and Moving Images

Peter Murray: Thursday Threads: Sakai Reverberations, Ada Initiative Fundraising, Cost of Bandwidth

planet code4lib - Thu, 2014-09-11 10:43
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

Welcome to the latest edition of Thursday Threads. This week’s post has a continuation of the commentary about the Kuali Board’s decisions from last month. Next, news of a fundraising campaign by the Ada Initiative in support of women in technology fields. Lastly, an article that looks at the relative bulk bandwidth costs around the world.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

Discussion about Sakai’s Shift Continues

The Kuali mission continues into its second decade. Technology is evolving to favor cloud-scale software platforms in an era of greater network bandwidth via fast Internet2 connections and shifting economics for higher education. The addition of a Professional Open Source organization that is funded with patient capital from university interests is again an innovation that blends elements to help create options for the success of colleges and universities.

- The more things change, the more they look the same… with additions, by Brad Wheeler, Kuali Blog, 27-Aug-2014

Yet many of the true believers in higher education’s Open Source Community, which seeks to reduce software costs and provide better e-Learning and administrative IT applications for colleges and universities, may feel that they have little reason to celebrate the tenth anniversaries of Sakai, an Open Source Learning Management System and Kuali, a suite of mission critical, Open Source, administrative applications, both of which launched in 2004.  Indeed, for some Open Source evangelists and purists, this was probably a summer marked by major “disturbances in the force” of Open Source

- Kuali Goes For Profits by Kenneth C. Green, 9-Sep-2014, Digital Tweed blog at Inside Higher Ed

The reverberations from the decision by the Kuali Foundation Board to fork the Kuali code to a different open source license and to use Kuali capital reserves to form a for-profit corporation continue to reverberate. (This was covered in last week’s DLTJ Thursday Threads and earlier in a separate DLTJ post.) In addition to the two articles above, I would encourage readers to look at Charles Severance’s “How to Achieve Vendor Lock-in with a Legit Open Source License – Affero GPL”. Kuali is forking its code from using the Educational Community License to the Affero GPL license, which it has the right to do. It also comes with some significant changes, as Kenneth Green points out. There is still more to this story, so expect it to be covered in additional Thursday Threads posts.

Ada Initiative, Supporting Women in Open Technology and Culture, Focuses Library Attention with a Fundraising Campaign

The Ada Initiative has my back. In the past several years they have been a transformative force in the open source software community and in the lives of women I know and care about. To show our support, Andromeda Yelton, Chris Bourg, Mark Matienzo and I have pledged to match up to $5120 of donations to the Ada Initiative made through this link before Tuesday September 16. That seems like a lot of money, right? Well, here’s my story about how the Ada Initiative helped me when I needed it most.

- The Ada Initiative Has My Back, by Bess Sadler, Solvitur Ambulando blog, 9-Sep-2014

The Ada Initiative does a lot to support women in open technology and culture communities; in the library technology community alone, many women have been affected by physical and emotional violence. (See the bottom of the campaign update blog post from Ada Initiative for links to the stories.) I believe it is only decent to enable anyone to participate in our communities without fear for their physical and psychic space, and that our communities are only as strong as they can be when the barriers to participation are low. The Ada Initiative is making a difference, and I’m proud to have supported them with a financial contribution as well as being an ally and a amplifier for the voice of women in technology.

The Relative Cost of Bandwidth Around the World

The chart above shows the relative cost of bandwidth assuming a benchmark transit cost of $10/Megabits per second (Mbps) per month (which we know is higher than actual pricing, it’s just a benchmark) in North America and Europe. From CloudFlare

Over the last few months, there’s been increased attention on networks and how they interconnect. CloudFlare runs a large network that interconnects with many others around the world. From our vantage point, we have incredible visibility into global network operations. Given our unique situation, we thought it might be useful to explain how networks operate, and the relative costs of Internet connectivity in different parts of the world.

- The Relative Cost of Bandwidth Around the World, by Matthew Prince, CloudFlare Blog, 26-Aug-2014

Bandwidth is cheapest in Europe and highest in Australia? Who knew? CloudFlare published this piece showing their costs on most of the world’s continents with some interesting thoughts about the role competition has on the cost of bandwidth.

Link to this post!

Code4Lib: code4libBC: November 27 and 28, 2014

planet code4lib - Thu, 2014-09-11 00:36

What: It’s a 2 day unconference in Vancouver, BC! A participant-driven meeting featuring lightning talks in the mornings, breakout sessions in the afternoons, with coffee, tea and snacks provided. Lightning talks are brief presentations which are typically 5-10 minutes in length (15 minutes is the maximum) on topics related to library technologies: current projects, tips and tricks, or hacks in the works. Breakout sessions is an opportunity to bring participants together in an ad hoc fashion for a short, yet sustained period of problem solving, software development and fun. In advance of the event, we will gather project ideas in a form available through our wiki and registration pages. Each afternoon the code4libBC participants will review and discuss the proposals, break into groups, and work on some of the projects.

When: November 27 and 28, 2014

Where: SFU Harbour Centre, Vancouver, BC map

Cost: $20

Accommodations: Info coming soon.

Register: here

Who: A diverse and open community of library developers and non-developers engaging in effective, collaborative problem-solving through technology.Anyone from the library community who is interested in library technologies are welcome to join and participate, regardless of their department or background: systems and IT, public services, circulation, cataloguing and technical services, archives, digitization and preservation. All are welcome to help set the agenda, define the outcomes and develop the deliverables!

Why: Why not? code4libBC is a group of dynamic library technology practitioners throughout the province who want to build new relationships as much as develop new software solutions to problems.

Tag d'hash: #c4lbc

More information

DuraSpace News: Contribute, and Learn More About the New Features in DSpace 5

planet code4lib - Thu, 2014-09-11 00:00
Peter Dietz, Longsight, on behalf of the DSpace 5.0 Release team

M. Ryan Hess: Is Apple Pay Really Private?

planet code4lib - Wed, 2014-09-10 20:03

Apply Pay, the new payment system unveiled by Apple yesterday was an intriguing alternative to using Debit and Credit Cards. But how private, and how secure, is this new payment system going to really be?

Tim Cook, Apple CEO, made it very clear that Apple intends to never collect data on you or what you purchase via Apple Pay. The service, in fact, adds a few new layers of security to transactions. But you have to wonder.

A typical model for data collection business models is to promise robust privacy assurances in their service agreements and marketing even though the long-term strategy is to leverage that data for profit. Anyone who was with Facebook early on knows how quickly these terms can change.

So, when we’re assured that our purchases will remain wholly private and marketing firms will never have access to them, how can we really be confident that this will always remain the case? We can’t. So, as users, we should approach such services with skepticism.

As with anything related to personal data, we should assume that enterprising hackers or government agents can and will figure out a way to access and exploit our information. Just last week, celebrities using Apple’s iCloud had their accounts compromised and embarrassing photos were made public. And while Apple has done a pretty good job at securing Apple Pay, it’s still possible someone could figure out a way in…and then you’re not just dealing with incriminating photos, you’ve got your financial history exposed.

So ask yourself:

  1. Can you think of things you buy that could prove embarrassing or might give people with malign intent a way to blackmail or do financial damage to me?
  2. If my most embarrassing purchases were to become permanently public, can I live with that?
  3. How would such public exposure impact my reputation, professionally and personally?
  4. Does the convenience of purchasing something with my phone outweigh the risks to my financial security?

Depending on how you answer this, you may want to stick with your credit card.

Or just go the analog route and use the most anonymous medium of exchange: cash.

FOSS4Lib Upcoming Events: VIVO Project Hackathon at Cornell University

planet code4lib - Wed, 2014-09-10 18:57
Date: Monday, October 13, 2014 - 08:00 to Wednesday, October 15, 2014 - 17:00Supports: Vivo

Last updated September 10, 2014. Created by Peter Murray on September 10, 2014.
Log in to edit this page.

The VIVO Project is hosting a hackathon event on the Cornell University campus in Ithaca, New York from October 13-15. This event builds on the March, 2014 hackathon held in conjunction with the VIVO I-Fest at Duke University, and is open to anyone interested in actively participating in improving some aspect of the VIVO software, ontology, documentation, testing, or related applications and tools.

Bohyun Kim: Why I Don’t Talk Much about Gender or Race & Why I Support the Ada Initiative

planet code4lib - Wed, 2014-09-10 18:07

I rarely talk about gender or race issues.  Not because I am not interested but because I am afraid that I may say things that are viewed negatively by a socially acceptable norm.  As a person who grew up in one country with one culture (the Confusian culture that is notoriously preferential to men to boot) and then moved to, live, and now work in another country with a completely different culture (just as discriminatory to women and minorities I am afraid) and who often has opinions that are different from those held by the majorities in both societies, I am acutely aware of various disadvantages, backlashes, and penalties that can result as a consequence of a minor slip and the pervasive social norm of inequality applied to women and racial/ethnic/gender minorities reinforced in everyday life.

I hate telling stories about how things went all wrong because it can reinforce negative sentiments such as frustration, anger, and the general sentiment of feeling pathetic about oneself. But I will make an exception and tell you this one story in the hope that you will join me in supporting the Ada Initiative.

A few years ago, in one of the library mailing listservs, the idea of creating a sub-group of women among the members was floated up. I do not recall all the context now but in relation to that idea, which I supported, I posed a question to the listserv specifically directed at only women.  To my dismay, this did not stop any men on the mailing list to liberally exercise their freedom to object to the idea in the name of the good of the listserv.  The idea was attacked as something akin to a separatist movement and was vehemently objected by a man who is regarded as very influential in that venue. My response to this was simply “how dare you,” not personally to me but to the entire group of women in the listerv. The question was submitted to women. No opinion was solicited from men.

But this is not why I brought up this story. The reason why I brought up this story is that I wanted to tell you what I did after this incident.  I didn’t respond back and communicate my indignation, frustration, and anger.  I simply disengaged myself from the conversation and abandoned the whole thread.  I didn’t want to have a conversation with this famous person who was so blatantly unaware of his faux pas. (Although his describing that idea as a separatist movement was not at all fair, I now see the point that it is actually a valid worry as women are not a minority but 50 percent of the population. And we all know well that the majority in the library is indeed women, not men. Potentially, the current listserv may have to compete with this new one -if the new one succeeds- and may lose its precious prestige and some other social privileges that go with the membership for some people.)

I justified my behavior by telling myself that I don’t have enough energy to deal with this right now. Fortunately, women who are much wiser, more articulate, and more courageous than me stood up and wrote great replies to this person.  Because I decided to not attach myself to the thread any longer, I also sent a personal email to these women who were my heroes.  At that time, I thought that was a good thing to do because I was so relieved by and hugely appreciated the fact that someone was taking the stance and was articulating the reasons in such a cool manner that I could not maintain. But looking back I can’t but think that it was so cowardly of me not openly supporting them. I have to add that this realization only dawned on me when the same thing happened to me only in the reverse role this time around. Another librarian sent me a private Twitter message personally thanking me about what I said openly. This taught me the lesson that what I meant as kudos to someone could have felt to that person like a punch in the gut instead. I thought about this incident a lot always as one of my (many) failings, although I only once dared to vent about it to one of my male colleagues because I knew he wouldn’t mind listening to me. (Our internalization of the social norm is indeed very deep even when we are critical of the very norm.)

It wasn’t until at last year’s Code4Lib pre-conference, “Technology, Librarianship, and Gender: Moving the conversation forward,” organized by Lisa Rabey and attended by many awesome people including Valerie Aurora from the Ada Initiative — She also gave the keynote at the Code4Lib Conference — that I was told for the first time that those who belong to minority groups do NOT have the obligation to always speak up, defend their positions, etc., etc. That was a refreshing thought that respects the additional burden that many minorities carry, the feeling of having to be a vocal champion of a cause at a personal level whether you are exhausted and sick or all or not. I also loved hearing that one thing that those with existing privileges can and should do is to listen to those without such privileges and their experience, not shouting out their own thoughts and dominating the conversation. It recognizes the important fact that the voice of sympathetic advocates should never overpower that of women and racial/ethnic/gender minorities. To be sustainable, a social change must be implemented by those who need and want the change by themselves.

So it is not an exaggeration to say that being a woman in technology can complicate things. (And I only told you just one story, and I am not even touching the issue of belonging to a racial/ethnic minority group here.) How many more awesome and productive things would women be able to achieve if they do not have to deal with this kind of crap that turns up all the time when they are simply trying to get things done?

I support the Ada Initiative because it acknowledges and articulates common issues often unacknowledged, opens and legitimizes a conversation about those issues, and helps organizations institute and establish more just and more equitable norms with useful and tangible tools and resources, thereby leveling the playing field for everyone. This results in benefiting all, not just women and minorities in race and gender.

Consider donating to the Ada Initiative below or at Share your reasons in Twitter with the hashtag, #libs4ada, and check out many thoughtful and amazing posts people wrote about their reasons for supporting the Ada Initiative. (If you think that this is all irrelevant because you have never been physically harmed or threatened in librarianship, check out this terrific post by the Library Loon.) I invite you to become an ally to those who are with less privileges than you. Thanks for reading this post!

If you are not familiar with the Ada initiative, here is some information from its website.

The Ada Initiative helps women get and stay involved in open source, open data, open education, and other areas of free and open technology and culture. These communities are changing the future of global society. If we want that society to be socially just and to serve the interests of all people, women must be involved in its creation and organization.
The Ada Initiative is a feminist organization. We strive to serve the interests and needs of women in open technology and culture who are at the intersection of multiple forms of oppression, including disabled women, women of color, LBTQ women, and women from around the world.

We are making a difference in open technology and culture by:

  • Supporting and connecting women in these communities
  • Changing the culture to better fit women, instead of changing women to fit the culture
  • Helping women overcome internalized sexism that is the result of living within the existing culture
  • Asking men and influential community members to take responsibility for culture change
  • Giving people the tools they need to change their communities (e.g., policies and ally skills)
  • Creating sustainable systems to support feminist activists in these communities
  • Being the change we want to see by making our own events and communities safer and more inclusive


HangingTogether: Innovative solutions for dealing with born-digital content in obsolete forms – Part 3

planet code4lib - Wed, 2014-09-10 17:00

This is the third and final post about an SAA lightning talk session. Part 1 began with descriptions of the array of media an archives might confront and an effort to test how much can be done in house. In Part 2, we heard from archivists who had dealt with particularly challenging formats. Part 3 includes the service provider perspective, reaching out to the retrocomputing community, and concludes with some words about agreements between repositories and service providers.

Matthew McKinley, Digital Project Specialist at University of California, Irvine Libraries, gave an archives-to-archives service provider perspective. UCLA Special Collections has 15 5 ¼” floppy disks from the Southern California Women for Understanding collection. The collection inventory says the disks contain mailing lists and research data. He imagines the contents might be interesting research material, but he hasn’t been able to begin the project, because the legal departments and other administrators from both institutions had to sign off on the service agreement.

After finally getting the agreement through the UCI process, UCLA is on its third pass of the document.

Determining risks to obsolete media & cost of inaction, how do you factor in criticable & very slow-moving factors, like legal? #saa14 #s601

— Ricc Ferrante (@raferrante) August 16, 2014

His advice is to start with a vendor agreement and workflow for converting other media, such as audiovisual media. Learn from colleagues what roadblocks they ran into. Remember, this is an “innovative service” that administration may not understand or agree with, but you can use that to your advantage: make it known that data could be dying and time is of the essence.

The knowledge level of those involved affects the level of collaboration, oversight, and communication needed. In this situation, both project managers were familiar with archival terms and concepts. If your service provider has more technical experience, but no preservation or library & archive focus, you’ll need to make it very clear exactly what you need in both agreement & ongoing communication: metadata requirements, file format specifications, privacy and security concerns… Explain it and be sure they understand it. It could save you a lot of time and headaches later.

Matthew offered another insight that will make a transfer project go more smoothly: learn as much as you can about when and how the media was created. This is important for provenance metadata, but even more important for accessing the media and converting the content. What you get will depend on the creator’s knowledge or interest level — some may give you detailed version numbers and others may say “I used whatever came on my Mac in 1985.” But any information helps, whether about the donor’s hardware and software or about their computing behavior.

Margo Padilla, Strategic Programs Manager at the Metropolitan New York Library Council (METRO), talked about a born-digital migration service pilot project being coordinated by METRO and the Center for Jewish History, with support from the Delmas Foundation. She harkened back to the 2012 OCLC report, “Swatting the Long Tail of Digital Media: A Call for Collaboration,” which proposed a community-based approach for transferring content off of legacy media to more stable media.

The report suggested that a local institution become a hub, developing expertise and acquiring and maintaining necessary equipment to provide transfer services for institutions without the staff or resources to undertake this sort of work themselves. METRO is coordinating a working group to test this approach. The group is focused on the logistical issues and practical outcomes involved with this process, in order to test and refine real-world workflows, contractual agreements, and deliverables, and to document it for use by others. Partners in the working group include: The American Jewish Historical Society, The Guggenheim Museum Archives, The Leo Baeck Institute, The New-York Historical Society, and Queens Library.

Each organization will explore their own internal processes and requirements for inventorying legacy media, appraising and selecting them for transfer, and determining how they will be reintegrated into collections. So far, each participant has conducted a survey of their materials on digital storage media and a small sample of test materials (floppies, Zip disks, and optical media) have been delivered to METRO.

As a non-profit consortium currently providing a range of other digital services, METRO is well situated to offer digital migration services on a free or cost-recovery basis to local repositories and is focusing on scoping and developing their service model. They will track cost and labor estimates for developing a collaborative forensics, migration, and media archaeology lab and providing transfer-related services such as metadata, content analysis, normalization, and redundant storage. They will also identify and refine other potential requirements such as insurance coverage, secure storage areas, delivery methods, and protocols for dealing with confidential data.

The group has developed a draft service agreement, taking into consideration questions of potential deliverables, security, turnaround times, and other standard agreement language. METRO is in the process of building a dedicated workstation and is about to begin transferring content, initially providing disk images with file-system analytics and a metadata export. As part of an iterative process to develop a tiered service model, each organization will analyze this initial information and let METRO know what sort of additional forensic processing or analysis they would like to receive. Discussions about expectations and scope of work, contract agreements, workflows, and deliverables are ongoing.

Stephen Torrence, Vice President and Co-Founder of the Museum of Computer Culture, gave his perspective as a data migration service provider. MCC, part of a network of collectors, focuses on hardware restoration and stepped forward as a service provider to assist archives with obsolete media.

He did a successful proof-of-concept project with The Ohio State University Libraries, working with three low-risk diskettes. This helped both parties to work through the process and establish a reasonable base price per disk (for disk image, files, and metadata) and a fee for file format conversion.

He’s also learned from less successful projects. He tried to help out the Texas Department of State Health Services when they needed to recover a medical image from a magneto-optical disk with a proprietary operating system. After much time and many attempts, he was finally able to read the disk, but couldn’t decode the bitstream. From this he learned to establish a fallback base cost to recover some of his investment. When he worked with PSU on the AMSTRAD disks, he found that getting non-US hardware and software was cost-prohibitive. Both the service provider and the client need to acknowledge practical limitations.

Stephen then gave advice about “how to be an awesome client:” Know the origin environment before beginning – it will keep costs down and allow more accurate time estimates. Accept exceptions – be prepared to iterate based on feedback and redefine success, if necessary. Share the impact – involving your provider in the import of the context or content can give them a better sense of purpose.

"saving old born-dig finding aids that saves hours of time and labor of archivists" gives vintage collector "warm fuzzies" #saa14 #s601

— Jennifer Schaffner (@genschaffner) August 16, 2014

Then we heard from an archivist who reached out to another community.

Dorothy Waugh, Digital Archives Project Archivist at Emory University, talked about the problem of knowing where to turn for help in working with obsolete physical media.

Dorothy Waugh at MARBL sez Step 1: admit you have a problem (wrt obsolete media of course) #s601 #saa14

— Sasha Griffin (@griffingate) August 16, 2014

The needs of the Digital Archives unit were often outside the scope of Emory’s desktop and IT support and the complex technical language frequently found in online resources drove her to pursue options that offered the chance to ask questions in real time and get hands-on practice. She embarked on research into local retrocomputing groups and has since been participating in meetings of the Atlanta Historical Computing Society, or AHCS.

Dorothy discussed the benefits of the hands-on approach at these meetings, one immediate benefit being a broader understanding of the development of personal computers. AHCS is a great resource for locating hard-to-find hardware—she’s been lucky enough to receive a number of floppy disk and Zip disk drives from its members—and for troubleshooting difficulties with obsolete or specialized hardware or software: for instance, Dorothy has worked with members to solve some minor issues affecting Emory’s KryoFlux, a tool that aids in the imaging of aging floppy disks. The group also offered problem-solving assistance on accessing content on some proprietary floppy disks formatted for a mid-1980s VideoWriter word processor.

Throughout, Dorothy has taken full advantage of the opportunity to ask lots of questions, which have consistently been met with patience and enthusiasm. She stressed, though, some key differences between the retrocomputing community and the digital archives profession, not least the emphasis that archives place upon content as opposed to the retrocomputing community’s emphasis upon computers as cultural artifacts. As a result of these differences, Dorothy has learned the importance of being quite explicit about her objectives as a digital archivist, while remaining engaged with the retrocomputing community’s commitment to the preservation of authentic computing environments.

A remaining concern for Dorothy is how her participation with AHCS can evolve into more of a reciprocal partnership and she continues to seek ways by which she can achieve this. Her hope is that, in addition to serving as an opportunity for education and advocacy, these attempts will enable AHCS to identify additional opportunities for two-way collaboration between the two communities.

After the ten talks, I announced a new OCLC Research report, “Agreement Elements for Outsourcing Transfer of Born Digital Content,” by Ricky Erway, Ben Goldman, and Matthew McKinley. This document suggests elements that should be considered when constructing an agreement (or memorandum of understanding) for outsourcing the transfer of born-digital content from a physical medium while encouraging adherence to both archival principles and technical requirements. By being explicit about the expectations, processes, and deliverables we can hope to develop an effective community-based approach for transferring content off of legacy media and into a form that can be manage, preserved, and used by researchers.

Master lesson of #saa14: You're part of a community of expertise, think of what you can do for others and who you can lean on for help #601

— Helen Schubrt Fields (@magicallyreal) August 16, 2014

We hope these vignettes inspire others facing challenges with archival content in obscure formats to find innovative options for dealing with obsolete media, so you can get on with the really important work of processing, preserving, and providing access to those unique collections.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (33)

Andromeda Yelton: Why I support the Ada Initiative. (You, too?)

planet code4lib - Wed, 2014-09-10 13:30

I need to talk to you about emotion. From a brilliant Code4Lib lightning talk by Mark Matienzo.

I got involved with codes of conduct by accident. People were saying on Twitter that ALA should have one, and I didn’t want it to be one of those good ideas full of tweety energy that go nowhere, and called their bluff and set up a google doc, and everyone else realized this made me a leader before I did.

One of the things that happens when people see you as a leader on this topic is you start to be a keeper of stories. People come to you, out of nowhere, and tell you about that time when someone was creepy. Or someone was drunk. Or someone didn’t keep their hands, or lips, to themselves. Or someone made them feel violated, blindsided, and maybe they smiled and kept the peace because that’s what women are taught to do, and maybe they didn’t understand until later what they should have said, how they should have said it. And maybe no one watching even understood how serious the problem was. Because we’re all taught, all of us, that women are not the protagonists of the story — we’re the love interest, the sassy sidekick, the prop there to raise someone else’s stakes. It’s easy not to see how hurt someone has been if you think the story is about someone else.

Stories change when you identify with a new protagonist. To me, a subtext of discussions about codes of conduct, and so many other issues that arise in social justice discourse communities, is this: we make new statements about who is the protagonist. Whose stakes matter. Whose perspective observers should take. With whom to empathize. We claim our own positions as protagonists in our stories, and demand that others do as well.

I find my work on the Ada Initiative advisory board compelling in large part because there are so many stories, so many emotions — people who have been treated in ways no one should be treated, people who have to waste energy on issues many of their colleagues don’t before they even get to the work they all share, people walking around with these raw and gaping and secret wounds. I want us all to be protagonists. I want us all to be able to spend our energy on love and intellect and creativity, not on responding to harassment, or to the threat of harassment, or even the implicit fear of it.

So that’s why I support the work of the Ada Initiative: because codes of conduct legitimize marginalized people’s own understandings of their (our) own experiences, and give them (us) concrete ways to push back when their (our) lines are crossed. Because AdaCamps give us environments where we don’t have to be the only woman in the room, and therefore space to be all the other things we are, too. Because Ally Skills workshops give men a framework for seeing the ways women’s lives can be entirely, invisibly, surprisingly different, right in front of them, and strategies for enacting their values.

And that’s why, along with Bess Sadler, Chris Bourg, and Mark Matienzo, am providing a $5120 matching grant for donations from libraryland toward the Ada Initiative’s yearly fund drive. From today through the 15th, we’ll match every dollar pledged through the Ada Initiative’s library-specific campaign page, up to $5120.

If you’re happy that your favorite library association (e.g. ALA, DLF, Code4Lib, SAA, Access, In the Library with the Lead Pipe, Evergreen, Open Repositories, and OLA) has adopted a Code of Conduct, the Ada Initiative deserves part of the credit. If you liked Valerie Aurora’s keynote at Code4Lib 2014, or you’re looking forward to her Ally Skills Workshop at DLF Forum, or you think it’s cool that several library technologists have been to AdaCamp, please consider donating to the Ada Initiative, so that it can keep doing all those things (and hey, maybe more!).

Or if you just like feminist stickers. We can set you up with those, too.

Thank you.

Erin White: Why this librarian supports the Ada Initiative

planet code4lib - Wed, 2014-09-10 13:00

This week the Ada Initiative is announcing a fundraising drive just for the library community. I’m pitching in, and I hope you will, too.

The Ada Initiative’s mission is to increase the status and participation of women in open technology and culture. The organization holds AdaCamps, ally workshops for men, and impostor syndrome trainings; and spreads awareness of the need for conference codes of conduct.

Library tech is a great place to be right now

Library tech is an increasingly gender-inclusive space. I’m especially happy to be part of the Code4Lib community. In late 2012 Code4Lib adopted a conference code of conduct, and at this year’s conference, the Ada Initiative’s Valerie Aurora joined us for the whole conference and keynoted on the final day, which was a treat. Meeting her was a big deal for me and I learned a lot from her talk.

And, there are awe-inspiring women role models in library tech-land: Bess Sadler, Andromeda Yelton, Coral Sheldon-Hess, Margaret Heller, and many others.

…but that doesn’t mean we can’t do better

Despite our forward momentum, there are still some fundamental gender gaps in libraryland.

I went to grad school because I liked building websites and wanted to get a theoretical background for my work in IT. Information science seemed like a natural place for me. I didn’t think I would become a librarian, but my path started to veer toward library technology as I finished my program. As that happened, I realized that there was a false distinction between the library science and information science programs at my school. So I wrote my master’s paper about it.

The distinction between the programs bothered me most because of how gender-divided they were, despite the trivial difference in core curricula (two courses). The year I did my research, 2009, the gender proportions were inverse: about 70% of library science students were women and about 70% of information science students were men. What my paper didn’t include, but should have, was a deeper analysis of the stark gender gaps between the programs and how that informed students’ perceptions of and interactions with each other and their career choices. As a woman in the information science program going into a career in librarianship, I was a deep outlier in the program. I identified myself as a technologist, while many women in the library science program did not, even though their tech aptitude was way higher than most mortals’. Something was holding other women back from choosing essentially the same degree path but with a more technical label.

Now with five years under my belt as a librarian, a few things have become clear to me:

The distinction between the grad programs was largely cultural, not curricular.

Libraries are technology. The past, present, and future of libraries is technology.

The future of library leadership is technology.

To be such a female-dominated field, libraryland has a disproportionately low number of women in leadership roles and in technology roles. So few women align themselves as technologists even when they are doing work in technology. And so few women align themselves as leaders even when they are poised to take leadership roles.

We need to encourage more women to embrace technology leadership roles in libraries.

What we can do

This is cultural. This is something we need to talk about. This is something we need to work on. Even if that process is uncomfortable.

Thank you: Wren Lanier for clarifying edits and Laura Gariepy for additional ideas.

In the Library, With the Lead Pipe: Open for Business – Why In the Library with the Lead Pipe is Moving to CC-BY Licensing

planet code4lib - Wed, 2014-09-10 10:30

Blown Away, CC-BY felixtsao (Flickr).

In brief: Lead Pipe is changing our licensing from CC-BY-NC to CC-BY. Here, we explain why.

In the Library with the Lead Pipe has, since we began publishing in 2008, been run by volunteers with a desire to spread ideas for positive change as widely as possible.

For this reason, we have required that all articles are published under a Creative Commons Attribution Non-Commercial (CC BY-NC 3.0 US) license.

Publishing under a CC BY-NC license has always been viewed by Lead Pipe as a way of balancing our commitment to authors (by ensuring they retain their own copyrights and are protected from unrecompensed commercial exploitation of their work) with our commitment to our readers (by ensuring our articles can be openly and freely accessed on our own site and distributed elsewhere for non-commercial purposes).

In the first half of 2014, however, as we took time to reflect on what had changed in over five years of publishing, we began to debate the merits of moving to a more permissive license.

Why change?

The central tension for any publisher is that of distribution versus control. The more effectively reading and publishing can be controlled, the less widely an article will be distributed and read. Alternatively, if wide distribution is given preference, we must relinquish control over how and to whom it is distributed. There are all sorts of solutions to this problem, depending on the goals of the author and the publisher.

As the Lead Pipe Editorial Board worked through our new documentation, we re-assessed our own mission. Our About page states that:

Lead Pipe intends to help improve communities, libraries, and professional organizations. Our goal is to explore new ideas and start conversations, to document our concerns and argue for solutions.

The question we began to discuss is whether our licensing matched this mission. Implied in this mission is that the conversations we start include as many people as possible. That is, we should privilege wide distribution over control. The definition of ‘non commercial’ in CC BY-NC licenses, however, is vague, and a recent court case indicates it may be much more narrowly applied than we anticipated. In this case, it was found that use of a CC-BY-NC licensed photo on the website of the German national broadcaster breached copyright. The Court found that ‘non-commercial’ use means ‘personal’ use, so use by a non-profit organisation on its advertising-free website still infringed. This case, and other legal opinions in Germany and other countries, potentially make Lead Pipe articles unusable even by the people and organizations we hope to support, such as educators and public broadcasters.

The discussion really kicked off in February when Micah (then a member of the Board) proposed an article co-authored by Chealsye Bowley, and suggested that (based on his research) we may need to re-assess our licensing. In March, Hugh was contacted by McGraw-Hill, which sought to negotiate terms to republish one of his (CC-BY-NC licensed) blog posts for use in school assessment software. Both Micah’s research and Hugh’s experience resulted in a proposal to the Lead Pipe Editorial Board regarding a change to CC-BY licensing. With Open Access mandates creeping closer to requiring CC-BY, (especially in the UK, – for example the Wellcome Trust and RCUK), and all our authors writing for love rather than money, we reached consensus relatively easily.

Back to the future

The aim of relicensing previously published articles is primarily to ensure that, to the greatest extent possible, our readers are able to understand what rights and obligations they have when re-using or re-publishing Lead Pipe articles. Whilst we suspected it was unattainable when we began this process, our ultimate goal has been to ensure all articles published by Lead Pipe are licensed the same way to ensure clarity regarding license terms.

Since Lead Pipe does not hold the copyright in the articles we publish, we needed to ask each of our authors to change the licensing on their articles. Not including (then) current Editorial Board members, this required us to contact 65 authors by email with an explanation of what we were asking, and why (the text of our initial email is included in the appendix below). We first did so on 13 July, and after a second nudge we gained agreement from 52 of our authors, with two authors declining to change their licensing. We have so far been unable to contact the remaining eleven authors, but continue our efforts to do so. Where an author (including any co-author) has not clearly stated that they agree to re-license, the existing license is retained.

it is important to note that this is a request we have made of our authors, but it is they who hold the copyright in, and decide how to license, their articles.  As we wrote in our initial email to authors:

We recognize that if Lead Pipe required CC-BY licensing at the time you wrote your article, you may have chosen another publication with which to publish, or chosen not to write it at all.

Essentially, we wrote to our authors stating that we had changed our minds about the most appropriate form of licensing for their articles. Many were happy to change, some requested more information before making a decision, and some made an informed decision not to change their licensing. One of the lessons here is the value of retaining your own copyright as an author, and therefore retaining control over who can re-publish it and under what circumstances.

More open, more access

Lead Pipe has always aimed to be open, progressive, and a force for positive change. This requires us to go further than simply publishing provocative articles. We must be open to changing our own behaviour and procedures when evidence suggests they inhibit our goals. We have come to the view that changing the licensing of Lead Pipe articles will better align our practice with our goals.

From today, all new articles, and most existing articles, will be published under a Creative Commons Attribution International 4.0 (CC-BY 4.0) license.

Where previously-published authors have requested that the existing license remain, or we have been unable to ascertain their wishes, we have noted the non-commercial licensing terms at the end of their articles.

Lead Pipe would like to thank all of our authors for their positive and gracious responses to our relicensing process. We look forward to working with other amazing authors to explore new ideas and start conversations, and to helping spread those ideas even more widely.

Thanks to Katrina McAlpine for editing and advice on this editorial.

References and further reading

Daught, G, 25 March 2014. ‘I dropped the “NC” from my Creative Commons license’, Alpha Omega | Open Access.

Moody, G, 27 March 2014. ‘German Court Says Creative Commons ‘Non-Commercial’ Licenses Must Be Purely For Personal Use’, Techdirt.

Nowviskie, B, 11 May 2011. ‘Why Oh Why, CC-BY?’

Research Councils UK,  8 April 2013. RCUK Policy on Open Access and Supporting Guidance,

Rundle, H., 23 March 2014. ‘Creative Commons, Open Access, and hypocrisy’,

Rundle, H, 2 January 2013. ‘Mission Creep – a 3D printer will not save your library’,

Vandegrift, M & Bowley, C, 23 April 2014. ‘Librarian, Heal Thyself: A Scholarly Communications Analysis of LIS Journals’, In the Library with the Lead Pipe.

Wellcome Trust, une 2014,. Open access: CC-BY licence required for all articles which incur an open access publication fee – FAQ, J

Appendix: text of email to authors

Dear [author]

Since 2008, In the Library with the Lead Pipe has been publishing posts and articles that inspire, challenge and provoke librarians around the world. We are honoured to have had you as one of our authors. When Lead Pipe launched, we were determined not only to be inspiring and challenging to the profession, but also to be open. As a Lead Pipe author, we asked you to provide us with ‘first publisher’ attribution, and assign a Creative Commons Attribution-Non Commercial (CC-BY-NC) license to your article.

After more than five years of publishing, we have recently taken time to consider our position as a publisher. Lead Pipe started as a peer-reviewed group blog, but we have now repositioned ourselves as an open access, open peer reviewed journal. As part of this process we have reconsidered our licensing, and will be moving to a more permissive Creative Commons Attribution (CC-BY) license. In making this decision, we have recognized that CC-BY-NC licenses are surrounded by confusion and controversy. The definition of ‘non commercial’ is vague, and a recent court case indicates it may be applied much more narrowly than we anticipated, potentially making Lead Pipe articles unusable even by the people and organizations we hope to support, such as educators and public broadcasters.

As an Editorial Board, we aim for all the work published in Lead Pipe to find a wide audience and for it to help change library practice for the better. By removing the ‘non commercial’ license provisions we feel that this aim will more easily be achieved. We would therefore like not only to license all future works CC-BY, but also relicense all previously published articles from CC-BY-NC to CC-BY.

As one of our existing authors, we ask that you agree to relicense your article to a Creative Commons Attribution 4.0 International license. Our goal in relicensing all previously published articles is to provide clarity for our readers by ensuring our licensing is consistent throughout our website.

If you would like to read more about CC-BY versus CC-BY-NC licensing, we recommend the following:

If you are willing to agree to relicense your article, please reply to this email with the words:  “I agree to re-license all work published under my name in In the Library with the Lead Pipe, to a Creative Commons Attribution 4.0 International License”.

Whilst we would prefer for all authors to re-license their articles, we understand that you may prefer not to do so. We recognize that if Lead Pipe required CC-BY licensing at the time you wrote your article, you may have chosen another publication with which to publish, or chosen not to write it at all. If you would prefer your article to remain CC-BY-NC, we will ensure that its licensing is clearly indicated as such.

If you are sure that you would prefer to keep the existing CC-BY-NC license, please reply to this email with the words “Please keep the licensing of my article as CC-BY-NC”.

If you have any questions about re-licensing your work, or would like more information, please let us know by emailing [email].


Brett Bonfield
Ellie Collier
Erin Dorney
Emily Ford
Gretchen Kolderup
Hugh Rundle
Coral Sheldon-Hess
Micah Vandegrift

DuraSpace News: Islandora 7.x-1.4 Release Timeline

planet code4lib - Wed, 2014-09-10 00:00
From Samantha Fritz, Interim Project and Community Manager, Islandora Foundation   Charlottetown, Prince Edward Island, CA  Islandora is pleased to announce that the upcoming 7.x-1.4 release is well underway!   Islandora 7.x-1.4 will see the return of Islandora Solr Views, as well as the addition of one new module, Islandora Video.js, in addition to a number of improvements to existing modules and tools.  

DuraSpace News: Islandora Camp Set for Denver, Colorado, Oct. 13-15

planet code4lib - Wed, 2014-09-10 00:00
From Samantha Fritz, Interim Project and Community Manager, Islandora Foundation   Charlottetown, CA  Islandora Camp will be in Colorado this Fall. Hosted by the University of Denver in association with the Colorado Alliance of Research Libraries, this first Central-US camp will take place October 13-15, 2014.  

DuraSpace News: Islandora 7.x-1.4 Release Timeline

planet code4lib - Wed, 2014-09-10 00:00
From Samantha Fritz, Interim Project and Community Manager, Islandora Foundation   Charlottetown, Prince Edward Island, CA  Islandora is pleased to announce that the upcoming 7.x-1.4 release is well underway!   Islandora 7.x-1.4 will see the return of Islandora Solr Views, as well as the addition of one new module, Islandora Video.js, in addition to a number of improvements to existing modules and tools.  

DuraSpace News: REGISTER: VIVO Project to Host Hackathon at Cornell, Oct. 13-15

planet code4lib - Wed, 2014-09-10 00:00

From Layne Johnson, VIVO Project Director

Please Note: If you plan to attend the Hackathon please make your hotel reservations ASAP—the $129 rate at the Ithaca Hotel expires this Friday, September 12.

Roy Tennant: The Power of Powers of 2

planet code4lib - Tue, 2014-09-09 23:43

Despite the fact that I consider myself a lifelong feminist, I am still surprised and dismayed at how easily I can overlook discriminatory behavior toward women. Or not even discriminatory behavior but things that are much more subtle, like situations that discourage women from speaking up or participating.

So when a colleague forwarded a notice about the Ada Initiative’s “Allies Workshop” (now called the “Ally Skills” workshop), I jumped at the chance to go. I had heard of the Ada Initiative and I was interested to hear what they had to tell me. The workshop I attended in San Francisco included mostly men from startup technology companies. I learned more about the subtle ways in which discrimination occurs and how to be a better ally to those experiencing such discrimination. I was also surprised and pleased to discover that I learned a lot from situations that others had experienced and described in our interactive sessions.

I left the workshop feeling more knowledgeable and empowered to help make a difference. But more importantly, I learned more about how my own behavior can be modified to provide space for voices that might otherwise go unheard. I also left being impressed with Valerie Aurora, who led the workshop. Little did I know at the time that we would cross paths again soon.

The 2014 Code4Lib Conference was able to sign on Valerie Aurora as a keynote speaker, and she requested to be interviewed rather than to give a speech. I was happy to offer to do the interview, which you can see here. It was one of the best sessions we’ve ever had at Cod4Lib, and not because of the interviewer.

Now the Ada Initiative is asking for our help and I’m happy to help publicize the library-specific campaign to raise money to support and expand their work. Suggested donation amounts use powers of 2, which is totally a geek thing. Andromeda Yelton, Bess Sadler, Chris Bourg, and Mark Matienzo joined forces to help raise $4,092 (2 to the 12th power) in matching funds. This means that any donation you make is automatically doubled. The campaign runs from now to September 15th, so donate now!

Feel the power of using powers of 2 to build a more equitable and just society.

Bess Sadler: The Ada Initiative Has My Back

planet code4lib - Tue, 2014-09-09 22:49

The Ada Initiative has my back. In the past several years they have been a transformative force in the open source software community and in the lives of women I know and care about. To show our support, Andromeda Yelton, Chris Bourg, Mark Matienzo and I have pledged to match up to $5120 of donations to the Ada Initiative made through this link before Tuesday September 16. That seems like a lot of money, right? Well, here’s my story about how the Ada Initiative helped me when I needed it most.

Thanks to the Ada Initiative, having a conference code of conduct has become an established best practice, and it is changing conference culture for the better. I’m so proud of the many library conferences and organizations who have adopted Code of Conduct policies this year. However, just because a conference has a policy in place doesn’t mean there won’t be any problems. I’d like to share something that happened to me this year, the way the Ada Initiative helped me and the conference in question deal with it, and how things have since improved.

I gave a talk at code4lib a few years ago called “Vampires vs Werewolves: Ending the War Between Developers and Sysadmins with Puppet“. The talk was well received enough that it became a little bit of a meme, and last year PuppetConf asked me if I’d give an updated version of it for PuppetConf 2013.

It was exciting to participate in such a huge conference. PuppetConf is well funded and professionally managed. They have a code of conduct in place, friendly and helpful staffers, and high quality content. I was having a great time up until right before my talk.

Unfortunately, the talk before mine was titled “Nobody Has to Die Today: Keeping the Peace With the Other Meat Sacks.” I watched this talk on the video monitor from backstage, while getting hooked up to a lapel microphone, already a little nervous about facing such a large audience.

The speaker was a large heavily muscled man who was shouting more than speaking into the microphone. He was shouting about violence, and about how many people get murdered in the workplace. He particularly mentioned the fact that murder is the number one cause of death for women in the workplace. I felt my blood running cold. My body felt flooded with fear, and I wanted to run. He went on to discuss the many ways he personally had hurt people, through his work as a bouncer, in martial arts, or just because someone made him angry. At this point I was literally shaking. I have been on the receiving end of violence. I have known people who have been murdered. I know people, especially women, who have been hospitalized with the kinds of injuries he was graphically describing having inflicted. In spite of the small print disclaimer on his slides that this presentation was not encouraging violence, it was doing precisely that. If you don’t communicate technical requirements in the way he specifies, apparently, you will get what’s coming to you and you will deserve it.

The conference staff backstage were horrified. Some high profile people in the audience were walking out. It was clear that I was upset (I was trying not to hyperventilate at this point) and someone kept asking me if I wanted to file a code of conduct complaint. The thought of this made my panic even worse. I could easily picture filing a complaint and then paying for it with my life, when this guy found out and beat me to death in the parking lot after the conference. I said I did not want to file a complaint. I tried to take deep breaths and to not break down crying. I was determined to give my talk in spite of my shaky emotional state.

And I did. I delivered my presentation, which went surprisingly well, except for the fact that in the video I am swaying back and forth. I don’t usually do that when I speak, and I read it as the outward manifestation of how upset I was.

Afterwards, I thought for a long time about writing to the conference with my concerns. I started to do so several times, but I always chickened out. It was too easy for me to picture this guy learning my name and coming after me. Before you dismiss me as paranoid please consider the stories of Anita Sarkeesian and Kathy Sierra. Women in technology face worse than sexist jokes. We face assault. We face death threats. If you defend the status quo, understand what you are defending.

It wasn’t until this year’s PuppetConf call for proposals that I complained. The conference had liked my talk last year, and invited me to submit another talk this year. I wrote to decline, and told them why. I also sent a copy of my letter to Valerie Aurora, asking for the advice of the Ada Initiative.

I am very pleased to say that PuppetConf took my concerns seriously. Working with the Ada Initiative, they strengthened their code of conduct, put more screening measures in place for presentations, and improved training for conference staff on how to deal with problematic situations. PuppetLabs is an example of a company that is doing things right. They have specific outreach programs to get more women involved in the Puppet community and they are pursuing similar strategies to encourage participation from underrepresented racial groups. I feel good about the fact that I’m sending members of my staff to PuppetConf 2014, and at this point I would gladly speak at the conference again.

As upsetting as this incident was, this is a story with a happy ending. Because the Ada Initiative exists, both PuppetConf and I had someone to go to for guidance in how to improve the situation. Honestly, I still feel a little afraid about writing this post. But I also believe that nothing gets better until people take the risk of speaking out publicly. I am choosing to take that risk, in order to better communicate about why this work matters.

The Ada Initiative continues to do great things. You can read their 2014 progress report here. I am particularly excited about the Ally Skills workshop that will be offered at the Digital Library Federation Forum on October 29. Today, librarians are showing our love for the Ada Initiative. Watch for blog and social media posts from friends in library land who will be sharing more stories about why the Ada Initiative matters, and follow the action on twitter under the hashtag #libs4ada. Join us in supporting the Ada Initiative’s mission and donate today!

State Library of Denmark: Small scale sparse faceting

planet code4lib - Tue, 2014-09-09 20:35

While sparse faceting has profound effect on response time in our web-archive, we are a bit doubtful about the amount of multi billion document Solr indexes out there. Luckily we also have our core index at Statsbiblioteket, which should be a bit more representative of your everyday Solr installation: Single-shard, 50GB, 14M documents. The bulk of the traffic are user-issued queries, which involves spellcheck, edismax qf & pf on 30+ fields and faceting on 8 fields. In this context, the faceting is of course the focus.

Of the 8 facet fields, 6 are low-cardinality and 2 are high-cardinality. Sparse was only active for the 2 high-cardinality ones, namely subject (4M unique values, 51M instances (note to self: 51M!? How did it get so high?)) and author (9M unique values, 40M instances).

To get representative measurements, the logged response times were extracted for the hours 07-22; there’s maintenance going on at night and it makes measurement unreliable. Only user-entered searches with faceting were considered. To compare before- and after sparse-enabling, the data for this Tuesday and last Tuesday were used.

50GB / 14M docs, logged timing from production, without (20140902) and with (20140909) sparse faceting

The performance improvement is palpable with response time being halved, compared to the non-sparse faceting. Fine-reading the logs, the time spend on faceting the high-cardinality fields is now in the single-digit milliseconds for nearly all queries. We’ll have to do some test to see what stops the total response time from getting down to that level. I am guessing spellcheck.

As always, sparse faceting is readily available for the adventurous at SOLR-5894.


Subscribe to code4lib aggregator