You are here

Feed aggregator

LITA: Cataloging a world of languages

planet code4lib - Wed, 2014-10-01 12:00

My university has a mandate to increase our international reach through research collaborations, courses offered, and support for international students.

From the technical services side, this means our catalogers must provide metadata for resources in unfamiliar languages, including some that don’t use the Roman alphabet. A few of the challenges we face include:

  • Identifying the language of an item (is that Spanish or Catalan?)
  • Cataloging an item in a language you don’t speak or read (what is this book even about?)
  • Transliterating from non-Roman alphabets (e.g. Cyrillic, Chinese, Thai)
  • Diacritic codes in copy cataloging that don’t match your system’s encoding scheme

I’d like to share a few free tools that our catalogers have found helpful. I’ve used some of these in other areas of librarianship as well, including acquisitions and reference.

Language identifiers

Sometimes I open a book or article and have no idea where to start, because the language isn’t anything I’ve seen before.

I turn to the Open Xerox Language Identifier, which covers over 80 different languages. Type or paste in text of the mysterious language, and give it a try. The more text you provide, the more accurate it is.

Language translators

Web translation tools aren’t perfect, but they’re a great way to get the gist of a piece of writing (don’t use them for sending sensitive emails to bilingual coworkers, however).

Google Translate includes over 75 languages, and also a language identification tool. Enter the title, a few chapter names, or back cover blurb, and you’ll get the general idea of the content.

Transliteration tables

If you catalog in Roman script, and you wind up with a resource in Cyrillic or Chinese, how do you translate that so the record is searchable in your ILS? Transliteration tables match up characters between scripts.

The ALA-LC Romanization Tables for non-Roman scripts are approved by the American Library Association and the Library of Congress. They cover over 70 different scripts.

Bibliographic dictionaries

We’re fortunate that librarians love to share: there are quite a few sites produced by libraries that look at common bibliographic terms you’d find on title pages: numbers, dates, editions, statements of responsibility, price, etc.

To share two Canadian examples, Memorial University maintains a Glossary of Bibliographic Information by Language and Queen’s University has a page of Foreign Language Equivalents for Bibliographic Terms.

If you’ve ever seen the phrase “bibliographic knowledge of [language]” in a job posting, this is what it’s referring to—when you’ve cataloged enough material in a language to know these terms, but can’t carry on a conversation about daily life. I have bibliographic knowledge of Spanish, Italian, and Germany, but don’t ask me to go to a restaurant in Hamburg and order a hamburger.

Subject-specific glossaries

Similar to bibliographic dictionaries, these are for terms common to specific subjects.

My university has significant music and map collections, so I often consult the language tools at Music Cataloging at Yale (…and I once  thought music was the universal language) and the European Environment Agency’s Terminology and Discovery Service.

Diacritic charts

In order to ensure that accented characters and special symbols display properly in the catalog, it’s important to have the correct diacritic code.

Our system uses Unicode, and we often rely on the Unicode Character Code Chart or Unicode Character Table.  Which interface you use is personal preference.

It may also be worth coming up with a cheat sheet of the codes you use most frequently – for example, common French accents if you’re cataloging Canadian government documents, which are bilingual.

Many Integrated Library Systems also have diacritic charts built in, where you can select the symbol you need and click it to place it in the record.

Diacritic guessers

Diacritic charts can be long and involved (the Unicode example above is a bit of a nightmare), so if you’re working with a new language, browsing through them searching for a specific code can be time-consuming. You can see the symbol in front of you, but have no idea what it’s called.

This is where Shapecatcher comes in.  This utility allows you to draw a character using your mouse or tablet. It identifies possible matches for the symbol and gives you the symbol’s name and Unicode number.

Have you encountered issues handling different languages when cataloguing? Is there a free language tool you’d like to share? Tell us about it in the comments!

__

Credits: Image of Pieter Bruegel the Elder’s painting The Tower of Babel courtesy of the Google Art Project. Many thanks also to my colleagues Judy Harris and Vivian Zhang for sharing their language challenges and tools.

Ed Summers: A Ferguson Twitter Archive

planet code4lib - Wed, 2014-10-01 02:05

Much has been written about the significance of Twitter as the recent events in Ferguson echoed round the Web, the country, and the world. I happened to be at the Society of American Archivists meeting 5 days after Michael Brown was killed. During our panel discussion someone asked about the role that archivists should play in documenting the event.

There was wide agreement that Ferguson was a painful reminder of the type of event that archivists working to “interrogate the role of power, ethics, and regulation in information systems” should be documenting. But what to do? Unfortunately we didn’t have time to really discuss exactly how this agreement translated into action.

Fortunately the very next day the Archive-It service run by the Internet Archive announced that they were collecting seed URLs for a Web archive related to Ferguson. It was only then, after also having finally read Zeynep Tufekci‘s terrific Medium post, that I slapped myself on the forehead … of course, we should try to archive the tweets. Ideally there would be a “we” but the reality was it was just “me”. Still, it seemed worth seeing how much I could get done.

twarc

I had some previous experience archiving tweets related to Aaron Swartz using Twitter’s search API. (Full disclosure: I also worked on the Twitter archiving project at the Library of Congress, but did not use any of that code or data then, or now.) I wrote a small Python command line program named twarc (a portmanteau for Twitter Archive), to help manage the archiving.

You give twarc a search query term, and it will plod through the search results, in reverse chronological order (the order that they are returned in), while handling quota limits, and writing out line-oriented-json, where each line is a complete tweet. It worked quite well to collect 630,000 tweets mentioning “aaronsw”, but I was starting late out of the gate, 6 days after the events in Ferguson began. One downside to twarc is it is completely dependent on Twitter’s search API, which only returns results for the past week or so. You can search back further in Twitter’s Web app, but that seems to be a privileged client. I can’t seem to convince the API to keep going back in time past a week or so.

So time was of the essence. I started up twarc searching for all tweets that mention ferguson, but quickly realized that the volume of tweets, and the order of the search results meant that I wouldn’t be able to retrieve the earliest tweets. So I tried to guesstimate a Twitter ID far enough back in time to use with twarc’s --max_id parameter to limit the initial query to tweets before that point in time. Doing this I was able to get back to 2014-08-10 22:44:43 — most of August 9th and 10th had slipped out of the window. I used a similar technique of guessing a ID further in the future in combination with the --since_id parameter to start collecting from where that snapshot left off. This resulted in a bit of a fragmented record, which you can see visualized (sort of below):

In the end I collected 13,480,000 tweets (63G of JSON) between August 10th and August 27th. There were some gaps because of mismanagement of twarc, and the data just moving too fast for me to recover from them: most of August 13th is missing, as well as part of August 22nd. I’ll know better next time how to manage this higher volume collection.

Apart from the data, a nice side effect of this work is that I fixed a socket timeout error in twarc that I hadn’t noticed before. I also refactored it a bit so I could use it programmatically like a library instead of only as a command line tool. This allowed me to write a program to archive the tweets, incrementing the max_id and since_id values automatically. The longer continuous crawls near the end are the result of using twarc more as a library from another program.

Bag of Tweets

To try to arrange/package the data a bit I decided to order all the tweets by tweet id, and split them up into gzipped files of 1 million tweets each. Sorting 13 million tweets was pretty easy using leveldb. I first loaded all 16 million tweets into the db, using the tweet id as the key, and the JSON string as the value.

import json import leveldb import fileinput   db = leveldb.LevelDB('./tweets.db')   for line in fileinput.input(): tweet = json.loads(line) db.Put(tweet['id_str'], line)

This took almost 2 hours on a medium ec2 instance. Then I walked the leveldb index, writing out the JSON as I went, which took 35 minutes:

import leveldb   db = leveldb.LevelDB('./tweets.db') for k, v in db.RangeIter(None, include_value=True): print v,

After splitting them up into 1 million line files with cut and gzipping them I put them in a Bag and uploaded it to s3 (8.5G).

I am planning on trying to extract URLs from the tweets to try to come up with a list of seed URLs for the Archive-It crawl. If you have ideas of how to use it definitely get in touch. I haven’t decided yet if/where to host the data publicly. If you have ideas please get in touch about that too!

Library Tech Talk (U of Michigan): Old Wine in New Bottles: Our Efforts Migrating Legacy Materials to HathiTrust

planet code4lib - Wed, 2014-10-01 00:00
(by Kat Hagedorn, Christina Powell, Lance Stuchell and John Weise) The one constant in digital preservation over the past couple of decades has been change. Digitization standards have changed as equipment has improved and become more affordable, formats have come and gone, and tools have been developed to help with automated format creation and validation. The progress made on this front has been great, but how do we reconcile older content with current digitization and preservation standards?

Library of Congress: The Signal: QCTools: Open Source Toolset to Bring Quality Control for Video within Reach

planet code4lib - Tue, 2014-09-30 12:01

In this interview, part of the Insights Interview series, FADGI talks with Dave Rice and Devon Landes about the QCTools project.

In a previous blog post, I interviewed Hannah Frost and Jenny Brice about the AV Artifact Atlas, one of the components of Quality Control Tools for Video Preservation, an NEH-funded project which seeks to design and make available community oriented products to reduce the time and effort it takes to perform high-quality video preservation. The less “eyes on” time it takes to do QC work, the more time can be redirected towards quality control and assessment of video on the digitized content most deserving of attention.

QCTools’ Devon Landes

In this blog post, I interview archivists and software developers Dave Rice and Devon Landes about the latest release version of the QCTools, an open source software toolset to facilitate accurate and efficient assessment of media integrity throughout the archival digitization process.

Kate:  How did the QCTools project come about?

Devon:  There was a recognized need for accessible & affordable tools out there to help archivists, curators, preservationists, etc. in this space. As you mention above, manual quality control work is extremely labor and resource intensive but a necessary part of the preservation process. While there are tools out there, they tend to be geared toward (and priced for) the broadcast television industry, making them out of reach for most non-profit organizations. Additionally, quality control work requires a certain skill set and expertise. Our aim was twofold: to build a tool that was free/open source, but also one that could be used by specialists and non-specialists alike.

QCTools’ Dave Rice

Dave:  Over the last few years a lot of building blocks for this project were coming in place. Bay Area Video Coalition had been researching and gathering samples of digitization issues through the A/V Artifact Atlas project and meanwhile FFmpeg had made substantial developments in their audiovisual filtering library. Additionally, open source technology for archival and preservation applications has been finding more development, application, and funding. Lastly, the urgency related to the obsolescence issues surrounding analog video and lower costs for digital video management meant that more organizations were starting their own preservation projects for analog video and creating a greater need for an open source response to quality control issues. In 2013, the National Endowment for the Humanities awarded BAVC with a Preservation and Access Research and Development grant to develop QCTools.

Kate: Tell us what’s new in this release. Are you pretty much sticking to the plan or have you made adjustments based on user feedback that you didn’t foresee? How has the pilot testing influenced the products?

QCTools provides many playback filters. Here the left window shows a frame with the two fields presented separately (revealing the lack of chroma data in field 2). The right window here shows the V plane of the video per field to show what data the deck is providing.

Devon:  The users’ perspective is really important to us and being responsive to their feedback is something we’ve tried to prioritize. We’ve had several user-focused training sessions and workshops which have helped guide and inform our development process. Certain processing filters were added or removed in response to user feedback; obviously UI and navigability issues were informed by our testers. We’ve also established a GitHub issue tracker to capture user feedback which has been pretty active since the latest release and has been really illuminating in terms of what people are finding useful or problematic, etc.

The newest release has quite a few optimizations to improve speed and responsiveness, some additional playback & viewing options, better documentation and support for the creation of an xml-format report.

Dave:  The most substantial example of going ‘off plan’ was the incorporation of video playback. Initially the grant application focused on QCTools as a purely analytical tool which would assess and present quantifications of video metrics via graphs and data visualization. Initial work delved deeply into identifying methodology to use to pick out the right metrics to find what could be unnatural to digitized analog video (such as pixels too dissimilar from their temporal neighbors, or the near-exact repetition of pixel rows, or discrepancies in the rate of change over time between the two video fields). When presenting the earliest prototypes of QCTools to users a recurring question was “How can I see the video?” We redesigned the project so that QCTools would present the video alongside the metrics along with various scopes, meters and visual tools so that now it has a visual and an analytic side.

Kate:   I love that the Project Scope for QCTools quotes both the Library of Congress’s Sustainability of Digital Formats and the Federal Agencies Digitization Guidelines Initiative as influential resources which encourage best practices and standards in audiovisual digitization of analog material for users. I might be more than a little biased but I agree completely. Tell me about some of the other resources and communities that you and the rest of the project team are looking at.

Here the QCTools vectorscope shows a burst of illegal color values. With the QCTools display of plotted graphs this corresponds to a spike in the maximum saturation (SATMAX).

Devon: Bay Area Video Coalition connected us with a group of testers from various backgrounds and professional environments so we’ve been able to tap into a pretty varied community in that sense. Also, their A/V Artifact Atlas has also been an important resource for us and was really the starting point from which QCTools was born.

Dave:  This project would not at all be feasible without the existing work of FFmpeg. QCTools utilizes FFmpeg for all decoding, playback, metadata expression and visual analytics. The QCTools data format is an expression of FFmpeg’s ffprobe schema, which appeared to be one of the only audiovisual file format standards that could efficiently store masses of frame-based metadata.

Kate:   What are the plans for training and documentation on how to use the product(s)?

Devon:  We want the documentation to speak to a wide range of backgrounds and expertise, but it is a challenge to do that and as such it is an ongoing process. We had a really helpful session during one of our tester retreats where users directly and collaboratively made comments and suggestions to the documentation; because of the breadth of their experience it really helped to illuminate gaps and areas for improvement on our end. We hope to continue that kind of engagement with users and also offer them a place to interact more directly with each other via a discussion page or wiki. We’ve also talked about the possibility of recording some training videos and hope to better incorporate the A/V Artifact Atlas as a source of reference in the next release.

Kate:   What’s next for QCTools?

Dave:   We’re presenting the next release of QCTools at the Association of Moving Image Archivists Annual Meeting on October 9th for which we anticipate supporting better summarization of digitization issues per file in a comparative manner. After AMIA, we’ll focus on audio and the incorporation of audio metrics via FFmpeg’s EBUr128 filter. QCTools has been integrated into workflows at BAVC, Dance Heritage Coalition, MOMA, Anthology Film Archives and Die Osterreichische Mediathek so the QCTools issue tracker has been filling up with suggestions which we’ll be tackling in the upcoming months.

Open Knowledge Foundation: Why the Open Definition Matters for Open Data: Quality, Compatibility and Simplicity

planet code4lib - Tue, 2014-09-30 10:55

The Open Definition performs an essential function as a “standard”, ensuring that when you say “open data” and I say “open data” we both mean the same thing. This standardization, in turn, ensures the quality, compatibility and simplicity essential to realizing one of the main practical benefits of “openness”: the greatly increased ability to combine different datasets together to drive innovation, insight and change.

This post explores in more detail why it’s important to have a clear standard in the form of the Open Definition for what open means for data.

Three Reasons

There are three main reasons why the Open Definition matters for open data:

Quality: open data should mean the freedom for anyone to access, modify and share that data. However, without a well-defined standard detailing what that means we could quickly see “open” being diluted as lots of people claim their data is “open” without actually providing the essential freedoms (for example, claiming data is open but actually requiring payment for commercial use). In this sense the Open Definition is about “quality control”.

Compatibility: without an agreed definition it becomes impossible to know if your “open” is the same as my “open”. This means we cannot know whether it’s OK to connect your open data and my open data together since the terms of use may, in fact, be incompatible (at the very least I’ll have to start consulting lawyers just to find out!). The Open Definition helps guarantee compatibility and thus the free ability to mix and combine different open datasets which is one of the key benefits that open data offers.

Simplicity: a big promise of open data is simplicity and ease of use. This is not just in the sense of not having to pay for the data itself, its about not having to hire a lawyer to read the license or contract, not having to think about what you can and can’t do and what it means for, say, your business or for your research. A clear, agreed definition ensures that you do not have to worry about complex limitations on how you can use and share open data.

Let’s flesh these out in a bit more detail:

Quality Control (avoiding “open-washing” and “dilution” of open)

A key promise of open data is that it can freely accessed and used. Without a clear definition of what exactly that means (e.g. used by whom, for what purpose) there is a risk of dilution especially as open data is attractive for data users. For example, you could quickly find people putting out what they call “open data” but only non-commercial organizations can access the data freely.

Thus, without good quality control we risk devaluing open data as a term and concept, as well as excluding key participants and fracturing the community (as we end up with competing and incompatible sets of “open” data).

Compatibility

A single piece of data on its own is rarely useful. Instead data becomes useful when connected or intermixed with other data. If I want to know about the risk of my home getting flooded I need to have geographic data about where my house is located relative to the river and I need to know how often the river floods (and how much).

That’s why “open data”, as defined by the Open Definition, isn’t just about the freedom to access a piece of data, but also about the freedom connect or intermix that dataset with others.

Unfortunately, we cannot take compatibility for granted. Without a standard like the Open Definition it becomes impossible to know if your “open” is the same as my “open”. This means, in turn, that we cannot know whether it’s OK to connect (or mix) your open data and my open data together (without consulting lawyers!) – and it may turn out that we can’t because your open data license is incompatible with my open data license.

Think of power sockets around the world. Imagine if every electrical device had a different plug and needed a different power socket. When I came over to your house I’d need to bring an adapter! Thanks to standardization at least in a given country power-sockets are almost always the same – so I bring my laptop over to your house without a problem. However, when you travel abroad you may have to take adapter with you. What drives this is standardization (or its lack): within your own country everyone has standardized on the same socket type but different countries may not share a standard and hence you need to get an adapter (or run out of power!).

For open data, the risk of incompatibility is growing as more open data is released and more and more open data publishers such as governments write their own “open data licenses” (with the potential for these different licenses to be mutually incompatible).

The Open Definition helps prevent incompatibility by:

Evergreen ILS: Evergreen to Participate in Outreach Program for Women

planet code4lib - Tue, 2014-09-30 02:59

The Evergreen project will participate in the Outreach Program for Women, a program organized through the GNOME Foundation to improve gender diversity in Free and Open Source Software projects.

The Executive Oversight Board voted last month to fund one internship through the program. The intern will work on a project for the community from December 9, 2014 to March 9, 2015. The Evergreen community has identified five possible projects for the internship: three are software development projects, one is a documentation project, and one is a user experience project.

Candidates for the program have started asking questions in IRC and on the mailing list as they prepare to submit their applications, which are due on October 22, 2014. They will also be looking for feedback on their ideas. Please take the opportunity to share your thoughts with them on these ideas since it will help strengthen their application.

If you are an OPW candidate trying to decide on a project, take some time to stop into the #evergreen IRC channel to learn about our project and to get to know the people responsible for the care and feeding of Evergreen. We are an active and welcoming community that includes not only developers, but the sys admins and librarians who use Evergreen on a daily basis.

To get started, read through the Learning About Evergreen section of our OPW page. Try Evergreen out on one of our community demo servers, read through the documentation, and sign up for our mailing lists to learn more about the community. If you are planning to apply for a coding project, take some time to download and install Evergreen. Each project has an application requirement that you should do before submitting the application. Please take time to review that application requirement and find some way you can contribute to the project.

We look forward to working with you on the project!

District Dispatch: Free webinar: Making the election connection

planet code4lib - Mon, 2014-09-29 21:41

From federal funding to support for school librarians to net neutrality, 2015 will be a critical year for federal policies that impact libraries. We need to be working now to build the political relationships necessary to make sure these decisions benefit our community. Fortunately, the November elections provide a great opportunity to do so.

In a new free webinar hosted by the American Library Association (ALA) and Advocacy Guru Stephanie Vance, leaders will discuss how all types of library supporters can legally engage during an election season, as well as what types of activities will have the most impact. Webinar participants will learn 10 quick and easy tactics, from social media to candidate forums that will help you take action right away. If you want to help protecting our library resources in 2015 and beyond, then this is the session for you. Register now as space is limited.

Webinar: Making the Election Connection
Date: Monday, October 6, 2014
Time: 2:00–2:30 p.m. EDT

The archived webinar will be emailed to District Dispatch subscribers.

The post Free webinar: Making the election connection appeared first on District Dispatch.

LITA: 2014 LITA Forum Student Registration Rate Deadline Extended

planet code4lib - Mon, 2014-09-29 17:57

The special student registration rate to the 2014 LITA National Forum has been extended through Monday October 6th, 2014.  The Forum will be held November 5-8, 2014 at the Hotel Albuquerque in Albuquerque, NM. Learn more about the Forum here.

This special rate is intended for a limited number of graduate students enrolled in ALA accredited programs. In exchange for a discounted registration, students will assist the LITA organizers and the Forum presenters with on-site operations. This year’s theme is “Transformation: From Node to Network.” We are anticipating an attendance of 300 decision makers and implementers of new information technologies in libraries.

The selected students will be expected to attend the full LITA National Forum, Thursday noon through Saturday noon. This does not include the pre-conferences on Thursday and Friday. You will be assigned a variety of duties, but you will be able to attend the Forum programs, which include 3 keynote sessions, 30 concurrent sessions, and a dozen poster presentations.

The special student rate is $180 – half the regular registration rate for LITA members. This rate includes a Friday night reception at the hotel, continental breakfasts, and Saturday lunch. To get this rate you must apply and be accepted per below.

To apply for the student registration rate, please provide the following information:

  • Complete contact information including email address,
  • The name of the school you are attending, and
  • 150 word (or less) statement on why you want to attend the 2014 LITA Forum

Please send this information no later than October 6, 2014 to lita@ala.org, with “2014 LITA Forum Student Registration Request” in the subject line.

Those selected for the student rate will be notified no later than October 10, 2014.

Library of Congress: The Signal: Beyond Us and Them: Designing Storage Architectures for Digital Collections 2014

planet code4lib - Mon, 2014-09-29 17:39

The following post was authored by Erin Engle, Michelle Gallinger, Butch Lazorchak, Jane Mandelbaum and Trevor Owens from the Library of Congress.

The Library of Congress held the 10th annual Designing Storage Architectures for Digital Collections meeting September 22-23, 2014. This meeting is an annual opportunity for invited technical industry experts, IT  professionals, digital collections and strategic planning staff and digital preservation practitioners to discuss the challenges of digital storage and to help inform decision-making in the future. Participants come from a variety of government agencies, cultural heritage institutions and academic and research organizations.

The DSA Meeting. Photo credit: Peter Krogh/DAM Useful Publishing.

Throughout the two days of the meeting the speakers took the participants back in time and then forward again. The meeting kicked-off with a review of the origins of the DSA meeting. It started ten years ago with a gathering of Library of Congress and external experts who discussed requirements for digital storage architectures for the Library’s Packard Campus of the National Audio-Visual Conservation Center. Now, ten years later, the speakers included representatives from Facebook and Amazon Web Services, both of which manage significant amounts of content and neither of which existed in 2004 when the DSA meeting started.

The theme of time passing continued with presentations by strategic technical experts from the storage industry who began with an overview of the capacity and cost trends in storage media over the past years. Two of the storage media being tracked weren’t on anyone’s radar in 2004, but loom large for the future – flash memory and Blu-ray disks. Moving from the past quickly to the future, the experts then offered predictions, with the caveat that predictions beyond a few years are predictably unpredictable in the storage world.

Another facet of time – “back to the future” – came up in a series of discussions on the emergence of object storage in up-and-coming hardware and software products.  With object storage, hardware and software can deal with data objects (like files), rather than physical blocks of data.  This is a concept familiar to those in the digital curation world, and it turns out that it was also familiar to long-time experts in the computer architecture world, because the original design for this was done ten years ago. Here are some of the key meeting presentations on object storage:

Several speakers talked about the impact of the passage of time on existing digital storage collections in their institutions and the need to perform migrations of content from one set of hardware or software to another as time passes.  The lessons of this were made particularly vivid by one speaker’s analogy, which compared the process to the travails of someone trying to manage the physical contents of a car over one’s lifetime.

Even more vivid was the “Cost of Inaction” calculator, which provides black-and-white evidence of the costs of not preserving analog media over time, starting with the undeniable fact that you have to start with an actual date in the future for the “doomsday” when all your analog media will be unreadable.

The DSA Meeting. Photo Credit: Trevor Owens

Several persistent time-related themes engaged the participants in lively interactive discussions during the meeting.  One topic was the practical methods for checking the data integrity of content  in digital collections.  This concept, called fixity, has been a common topic of interest in the digital preservation community. Similarly, a thread of discussion on predicting and dealing with failure and data loss over time touched on a number of interesting concepts, including “anti-entropy,” a type of computer “gossip” protocol designed to query, detect and correct damaged distributed digital files. Participants agreed it would be useful to find a practical approach to identifying and quantifying types of failures.  Are the failures relatively regular but small enough that the content can be reconstructed? Or are the data failures highly irregular but catastrophic in nature?

Another common theme that arose is how to test and predict the lifetime of storage media.  For example, how would one test the lifetime of media projected to last 1000 years without having a time-travel machine available?  Participants agreed to continue the discussions of these themes over the next year with the goal of developing practical requirements for communication with storage and service providers.

The meeting closed with presentations from vendors working on the cutting edge of new archival media technologies.  One speaker dealt with questions about the lifetime of media by serenading the group with accompaniment from a 32-year-old audio CD copy of Pink Floyd’s “Dark Side of the Moon.” The song “Us and Them” underscored how the DSA meeting strives to bridge the boundaries placed between IT conceptions of storage systems and architectures and the practices, perspectives and values of storage and preservation in the cultural heritage sector. The song playing back from three decade old media on a contemporary device was a fitting symbol of the objectives of the meeting.

Background reading (PDF) was circulated prior to the meeting and the meeting agenda and copies of the presentations are available at http://www.digitalpreservation.gov/meetings/storage14.html.

Open Knowledge Foundation: Join the Global Open Data Index 2014 Sprint

planet code4lib - Mon, 2014-09-29 17:02

In 2012 the Open Knowledge launched the Global Open Data Index to help track the state of open data around the world. We’re now in the process of collecting submissions for the 2014 Open Data Index and we want your help!

How can you contribute?

The main thing you can do is become a Contributor and add information about the state of open data in your country to the Open Data Index Survey. More details and quickstart guide to contributing here »

We also have other ways you can help:

Become a Mentor: Mentors support the Index in a variety of ways from engaging new contributors, mentoring them and generally promoting the Index in their community. Activities can include running short virtual “office hours” to support and advise other contributors, promoting the Index with civil society organizations – blogging, tweeting etc. To apply to be a Mentor, please fill in this form.

Become a Reviewer: Reviewers are specially selected experts who review submissions and check them to ensure information is accurate and up-to-date and that the Index is generally of high-quality. To apply to be a Reviewer, fill in this form.

Mailing Lists and Twitter

The Open Data Index mailing list is the main communication channel for folks who have questions or want to get in touch: https://lists.okfn.org/mailman/listinfo/open-data-census

For twitter, keep an eye on updates via #openindex2014

Key dates for your calendar

We will kick off on September 30th, in Mexico City with a virtual and in-situ event at Abre LATAM and ConDatos (including LATAM regional skillshare meeting!). Keep an eye on Twitter to find out more details at #openindex14 and tune into these regional sprints:

  • Europe / MENA / Africa (October 8-10) – with a regional Google Hangout on 9/10.
  • Asia / Pacific (October 13-15) – with a regional Google Hangout on 13/10.
  • All day virtual event to wrap-up (October 17)

More on this to follow shortly, keep an eye on this space.

Why the Open Data Index?

The last few years has seen an explosion of activity around open data and especially open government data. Following initiatives like data.gov and data.gov.uk, numerous local, regional and national bodies have started open government data initiatives and created open data portals (from a handful three years ago there are now nearly 400 open data portals worldwide).

But simply putting a few spreadsheets online under an open license is obviously not enough. Doing open government data well depends on releasing key datasets in the right way.

Moreover, with the proliferation of sites it has become increasingly hard to track what is happening: which countries, or municipalities, are actually releasing open data and which aren’t? Which countries are releasing data that matters? Which countries are releasing data in the right way and in a timely way?

The Global Open Data Index was created to answer these sorts of questions, providing an up-to-date and reliable guide to the state of global open data for policy-makers, researchers, journalists, activists and citizens.

The first initiative of its kind, the Global Open Data Index is regularly updated and provides the most comprehensive snapshot available of the global state of open data. The Index is underpinned by a detailed annual survey of the state of open data run by Open Knowledge in collaboration with open data experts and communities around the world.

Global Open Data Index: survey

District Dispatch: ALA launches educational 3D printing policy campaign

planet code4lib - Mon, 2014-09-29 15:21

The American Library Association (ALA) today announced the launch of “Progress in the Making,” (pdf) a new educational campaign that will explore the public policy opportunities and challenges of 3D printer adoption by libraries. Today, the association released “Progress in the Making: An Introduction to 3D Printing and Public Policy,” a tip sheet that provides an overview of 3D printing, describes a number of ways libraries are currently using 3D printers, outlines the legal implications of providing the technology, and details ways that libraries can implement simple yet protective 3D printing policies in their own libraries.

“As the percentage of the nation’s libraries helping their patrons create new objects and structures with 3D printers continues to increase, the legal implications for offering the high-tech service in the copyright, patent, design and trade realms continues to grow as well,” said Alan S. Inouye, director of the ALA Office for Information Technology Policy. “We have reached a point in the evolution of 3D printing services where libraries need to consider developing user policies that support the library mission to make information available to the public. If the library community promotes practices that are smart and encourage creativity, it has a real chance to guide the direction of the public policy that takes shape around 3D printing in the coming years.”

Over the next coming months, ALA will release a white paper and a series of tip sheets that will help the library community better understand and adapt to the growth of 3D printers, specifically as the new technology relates to intellectual property law and individual liberties.

This tip sheet is the product of collaboration between the Public Library Association (PLA), the ALA Office for Information Technology Policy (OITP) and United for Libraries, and coordinated by OITP Information Policy Analyst Charlie Wapner. View the tip sheet (pdf).

The post ALA launches educational 3D printing policy campaign appeared first on District Dispatch.

Jonathan Rochkind: Rubyists, don’t forget about the dir glob!

planet code4lib - Mon, 2014-09-29 15:11

If you are writing configuration to take a pattern to match against files in a file system…

You probably want Dir.globs, not regexes.  Dir.glob is in the stdlib. Dir.glob’s unix-shell-style patterns are less expressive than regexes, but probably expressive enough for anything you need in this use case, and much simpler to deal with for common patterns in this use case.

Dir.glob(“root/path/**/*.rb”)

vs.

%r{\Aroot/path/.*\.rb\Z}

Or

Dir.glob(“root/path/*.rb”)

vs.

…I don’t even feel like thinking about how to express as a regexp that you don’t want child directories, but only directly there.

Dir.glob will find matches from within a directory on local file system — but if you have a certain filepath in a string you want to test for a match against a dirglob, you can easily do that too with Pathname.fnmatch, which does not even require the string to exist in the local file system but can still check it for a match against a dirglob.

Some more info and examples from Shane da Silva, who points out some annoying inconsistent gotchas to be aware of.


Filed under: General

Jenny Rose Halperin: Why I feel like an Open Source Failure

planet code4lib - Mon, 2014-09-29 15:05

I presented a version of this talk at the Supporting Cultural Heritage Open Source Software (SCHOSS) Symposium in Atlanta, GA in September 2014. This talk was generously sponsored by LYRASIS and the Andrew Mellon Foundation.

I often feel like an Open Source failure.

I haven’t submitted 500 patches in my free time, I don’t spend my after-work hours rating html5 apps, and I was certainly not a 14 year old Linux user. Unlike the incredible group of teenaged boys with whom I write my Mozilla Communities newsletter and hang out with on IRC, I spent most of my time online at that age chatting with friends on AOL Instant Messenger and doing my homework.

I am a very poor programmer. My Wikipedia contributions are pretty sad. I sometimes use Powerpoint. I never donated my time to Open Source in the traditional sense until I started at Mozilla as a GNOME OPW intern and while the idea of data gets me excited, the thought of spending hours cleaning it is another story.

I was feeling this way the other day and chatting with a friend about how reading celebrity news often feels like a better choice after work than trying to find a new open source project to contribute to or making edits to Wikipedia. A few minutes later, a message popped up in my inbox from an old friend asking me to help him with his application to library school.

I dug up my statement of purpose and I was extremely heartened to read my words from three years ago:

I am particularly interested in the interaction between libraries and open source technology… I am interested in innovative use of physical and virtual space and democratic archival curation, providing free access to primary sources.

It felt good to know that I have always been interested in these topics but I didn’t know what that would look like until I discovered my place in the open source community. I feel like for many of us in the cultural heritage sector the lack of clarity about where we fit in is a major blocker, and I do think it can be associated with contribution to open source more generally. Douglas Atkin, Community Manager at Airbnb, claims that the two main questions people have when joining a community are “Are they like me? And will they like me?”. Of course, joining a community is a lot more complicated than that, but the lack of visibility of open source projects in the cultural heritage sector can make even locating a project a whole lot more complicated.

As we’ve discussed in this working group, the ethics of cultural heritage and Open Source overlap considerably and

the open source community considers those in the cultural heritage sector to be natural allies.

In his article, “Who are you empowering?” Hugh Rundle writes: (I quote this article all the time because I believe it’s one of the best articles written about library tech recently…)

A simple measure that improves privacy and security and saves money is to use open source software instead of proprietary software on public PCs.

Community-driven, non-profit, and not good at making money are just some of the attributes that most cultural heritage organizations and open source project have in common, and yet, when choosing software for their patrons, most libraries and cultural heritage organizations choose proprietary systems and cultural heritage professionals are not the strongest open source contributors or advocates.

The main reasons for this are, in my opinion:

1. Many people in cultural heritage don’t know what Open Source is.

In a recent survey I ran of the Code4Lib and UNC SILS listservs, nearly every person surveyed could accurately respond to the prompt “Define Open Source in one sentence” though the responses varied from community-based answers to answers solely about the source code.

My sample was biased toward programmers and young people (and perhaps people who knew how to use Google because many of the answers were directly lifted from the first line of the Wikipedia article about Open Source, which is definitely survey bias,) but I think that it is indicative of one of the larger questions of open source.

Is open source about the community, or is it about the source code?

There have been numerous articles and books written on this subject, many of which I can refer you to (and I am sure that you can refer me to as well!) but this question is fundamental to our work.

Many people, librarians and otherwise, will ask: (I would argue most, but I am operating on anecdotal evidence)

Why should we care about whether or not the code is open if we can’t edit it anyway? We just send my problems to the IT department and they fix it.

Many people in cultural heritage don’t have many feelings about open source because they simply don’t know what it is and cannot articulate the value of one over the other. Proprietary systems don’t advertise as proprietary, but open source constantly advertises as open source, and as I’ll get to later, proprietary systems have cornered the market.

This movement from darkness to clarity brings most to mind a story that Kathy Lussier told about the Evergreen project, where librarians who didn’t consider themselves “techy” jumped into IRC to tentatively ask a technical question and due to the friendliness of the Evergreen community, soon they were writing the documentation for the software themselves and were a vital part of their community, participating in conferences and growing their skills as contributors.

In this story, the Open Source community engaged the user and taught her the valuable skill of technical documentation. She also took control of the software she uses daily and was able to maintain and suggest features that she wanted to see. This situation was really a win-win all around.

What institution doesn’t want to see their staff so well trained on a system that they can write the documentation for it?

2. The majority of the market share in cultural heritage is closed-source, closed-access software and they are way better at advertising than Open Source companies.

Last year, my very wonderful boss in the cataloging and metadata department of the University of North Carolina at Chapel Hill came back from ALA Midwinter with goodies for me: pens and keychains and postits and tote bags and those cute little staplers. “I only took things from vendors we use,” she told me.

Linux and Firefox OS hold 21% of the world’s operating system marketshare. (Interestingly, this is more globally than IOS, but still half that of Windows. On mobile, IOS and Android are approximately equal.)

Similarly, free, open source systems for cultural heritage are unfortunately not a high percentage of the American market. Wikipedia has a great list of proprietary and open source ILSs and OPACs, the languages they’re written in, and their cost. Marshall Breeding writes that FOSS software is picking up some market share, but it is still “the alternative” for most cultural heritage organizations.

There are so many reasons for this small market share, but I would argue (as my previous anecdote did for me,) that a lot of it has to do with the fact that these proprietary vendors have much more money and are therefore a lot better at marketing to people in cultural heritage who are very focused on their work. We just want to be able to install the thing and then have it do the thing well enough. (An article in Library Journal in 2011 describes open source software as: “A lot of work, but a lot of control.”)

As Jack Reed from Stanford and others have pointed out, most of the cost of FOSS in cultural heritage is developer time, and many cultural heritage institutions believe that they don’t have those resources. (John Brice’s example at the Meadville Public Library proves that communities can come together with limited developers and resources in order to maintain vital and robust open source infrastructures as well as significantly cut costs.)

I learned at this year’s Wikiconference USA that academic publishers had the highest profit margin of any company in the country last year, ahead of Google and Apple.

The academic publishing model is, for more reasons than one, completely antithetical to the ethics of cultural heritage work, and yet they maintain a large portion of the cultural heritage market share in terms of both knowledge acquisition and software. Megan Forbes reminds us that the platform Collection Space was founded as the alternative to the market dominance of “several large, commercial vendors” and that cost put them “out of reach for most small and mid-sized institutions.”

Open source has the chance to reverse this vicious cycle, but institutions have to put their resources in people in order to grow.

While certain companies like OCLC are working toward a more equitable future, with caveats of course, I would argue that the majority of proprietary cultural heritage systems are providing inferior product to a resource poor community.

 3. People are tired and overworked, particularly in libraries, and to compound that, they don’t think they have the skills to contribute.

These are two separate issues, but they’re not entirely disparate so I am going to tackle them together.

There’s this conception outside of the library world that librarians are secret coders just waiting to emerge from their shells and start categorizing datatypes instead of MARC records (this is perhaps a misconception due to a lot of things, including the sheer diversity of types of jobs that people in cultural heritage fill, but hear me out.)

When surveyed, the skill that entering information science students most want to learn is “programming.” However, the majority of MLIS programs are still teaching Microsoft Word and beginning html as technology skills.

Learning to program computers takes time and instruction and while programs like Women who Code and Girl Develop It can begin educating librarians, we’re still faced with a workforce that’s over 80% female-identified that learned only proprietary systems in their work and a small number of technology skills in their MLIS degrees.

Library jobs, and further, cultural heritage jobs are dwindling. Many trained librarians, art historians, and archivists are working from grant to grant on low salaries with little security and massive amounts of student loans from both undergraduate and graduate school educations. If they’re lucky to get a job, watching television or doing the loads of professional development work they’re expected to do in their free time seems a much better choice after work than continuing to stare at a computer screen for a work-related task or learn something completely new. For reference: an entry-level computer programmer can expect to make over $70,000 per year on average. An entry-level librarian? Under $40,000. I know plenty of people in cultural heritage who have taken two jobs or jobs they hate just to make ends meet, and I am sure you do too.

One can easily say, “Contributing to open source teaches new skills!” but if you don’t know how to make non-code contributions or the project is not set up to accept those kinds of contributions, you don’t see an immediate pay-off in being involved with this project, and you are probably not willing to stay up all night learning to code when you have to be at work the next day or raise a family. Programs like Software Carpentry have proven that librarians, teachers, scientists, and other non-computer scientists are willing to put in that time and grow their skills, so to make any kind of claim without research would be a reach and possibly erroneous, but I would argue that most cultural heritage organizations are not set up in a way to nurture their employees for this kind of professional development. (Not because they don’t want to, necessarily, but because they feel they can’t or they don’t see the immediate value in it.)

I could go on and on about how a lot of these problems are indicative of cultural heritage work being an historically classed and feminized professional grouping, but I will spare you right now, although you’re not safe if you go to the bar with me later.

In addition, many open source projects operate with a “patches welcome!” or “go ahead, jump in!” or “We don’t need a code of conduct because we’re all nice guys here!” mindset, which is not helpful to beginning coders, women, or really, anyone outside of a few open source fanatics.

I’ve identified a lot of problems, but the title of this talk is “Creating the Conditions for Open Source Community” and I would be remiss if I didn’t talk about what works.

Diversification, both in terms of types of tasks and types of people and skillsets as well as a clear invitation to get involved are two absolute conditions for a healthy open source community.

Ask yourself the questions: Are you a tight knit group with a lot of IRC in-jokes that new people may not understand? Are you all white men? Are you welcoming? Paraphrasing my colleague Sean Bolton, the steps to an inviting community is to build understanding, build connections, build clarity, build trust, build pilots, which creates a build win-win.

As communities grow, it’s important to be able to recognize and support contributors in ways that feel meaningful. That could be a trip to a conference they want to attend, a Linkedin recommendation, a professional badge, or a reference, or best yet: you could ask them what they want. Our network for contributors and staff is adding a “preferred recognition” system. Don’t know what I want? Check out my social profile. (The answer is usually chocolate, but I’m easy.)

Finding diverse contribution opportunities has been difficult for open source since, well, the beginning of open source. Even for us at Mozilla, with our highly diverse international community and hundreds of ways to get involved, we often struggle to bring a diversity of voices into the conversation, and to find meaningful pathways and recognition systems for our 10,000 contributors.

In my mind, education is perhaps the most important part of bringing in first-time contributors. Organizations like Open Hatch and Software Carpentry provide low-cost, high-value workshops for new contributors to locate and become a part of Open Source in a meaningful and sustained manner. Our Webmaker program introduces technical skills in a dynamic and exciting way for every age.

Mentorship is the last very important aspect of creating the conditions for participation. Having a friend or a buddy or a champion from the beginning is perhaps the greatest motivator according to research from a variety of different papers. Personal connection runs deep, and is a major indicator for community health. I’d like to bring mentorship into our conversation today and I hope that we can explore that in greater depth in the next few hours.

With mentorship and 1:1 connection, you may not see an immediate uptick in your project’s contributions, but a friend tells a friend tells a friend and then eventually you have a small army of motivated cultural heritage workers looking to take back their knowledge.

You too can achieve on-the-ground action. You are the change you wish to see.

Are you working in a cultural heritage institution and are about to switch systems? Help your institution switch to the open source solution and point out the benefits of their community. Learning to program? Check out the Open Hatch list of easy bugs to fix! Are you doing patron education? Teach them Libre Office and the values around it. Are you looking for programming for your library? Hold a Wikipedia edit-a-thon. Working in a library? Try working open for a week and see what happens. Already part of an open source community? Mentor a new contributor or open up your functional area for contribution.

It’s more than just “if you build it, they will come.”

If you make open source your mission, people will want to step up to the plate.

In order to close, I’m going to tell a story that I can’t take credit for, but I will tell it anyway.

We have a lot of ways to contribute at Mozilla. From code to running events to learning and teaching the Web, it can be occasionally overwhelming to find your fit.

A few months ago, my colleague decided to create a module and project around updating the Mozilla Wiki, a long-ignored, frequently used, and under-resourced part of our organization. As an information scientist and former archivist, I was psyched. The space that I called Mozilla’s collective memory was being revived!

We started meeting in April and it became clear that there were other wiki-fanatics in the organization who had been waiting for this opportunity to come up. People throughout the organization were psyched to be a part of it. In August, we held a fantastically successful workweek in London, reskinned the wiki, created a regular release cycle, wrote a manual and a best practice guide, and are still going strong with half contributors and half paid-staff as a regular working group within the organization. Our work has been generally lauded throughout the project, and we’re working hard to make our wiki the resource it can be for contributors and staff.

To me, that was the magic of open source. I met some of my best friends, and at the end of the week, we were a cohesive unit moving forward to share knowledge through our organization and beyond. And isn’t that a basic value of cultural heritage work?

I am still an open source failure. I am not a code fanatic, and I like the ease-of-use of my used IPhone. I don’t listen to techno and write Javscript all night, and I would generally rather read a book than go to a hackathon.

And despite all this, I still feel like I’ve found my community.

I am involved with open source because I am ethically committed to it, because I want to educate my community of practice and my local community about what working open can bring to them.

When people ask me how I got involved with open source, my answer is: I had a great mentor, an incredible community and contributor base, and there are many ways to get involved in open source.

While this may feel like a new frontier for cultural heritage, I know we can do more and do better.

Open up your work as much as you can. Draw on the many, many intelligent people doing work in the field. Educate yourself and others about the value that open source can bring to your institution. Mentor someone new, even if you’re shy. Connect with the community and treat your fellow contributors with respect.Who knows?

You may get an open source failure like me to contribute to your project.

District Dispatch: CopyTalk webinar on open licensing

planet code4lib - Mon, 2014-09-29 14:56

Join us for our next installment of CopyTalk, October 2nd at 2pm Eastern Time. It’s FREE.

In the webinar titled Open Licensing and the Public Domain: Tools and policies to support libraries, scholars, and the public, Timothy will discuss the Creative Commons (CC) licenses and public domain instruments, with a particular focus on how these tools are being used within the GLAM (galleries, libraries, archives and museums) sector. He’ll also talk about the evolving Open Access movement–including legal and technological challenges to researchers and publishers–and how librarians and copyright experts are helping address these issues. Finally, he’ll discuss the increasing role of institutional policies and funding mandates that are being adopted to support the creation and sharing of content and data in the public commons.

Timothy Vollmer is Public Policy Manager for Creative Commons. He coordinates public policy positions in collaboration with CC staff, international affiliate network, and a broad community of copyright experts. Timothy helps educate policymakers at all levels and across various disciplines such as education, data, science, culture, and government about copyright licensing, the public domain, and the adoption of open policies. Prior to CC, Timothy worked on information policy issues for the American Library Association in Washington, D.C. He is a graduate of the University of Michigan, School of Information, and helped establish the Open.Michigan initiative.

There is no need to pre-register! Just show up on October 2, at 2pm Eastern http://ala.adobeconnect.com/copyright/

CopyTalk webinars are archived.

 

The post CopyTalk webinar on open licensing appeared first on District Dispatch.

Ed Summers: Hong Kong Tags

planet code4lib - Mon, 2014-09-29 14:53

The top 25 tags in 166,246 tweets between 2014-09-29 09:54:18 – 2014-09-21 10:31:00 (EDT) mentioning #occupycentral.

hongkong 37,836 hkstudentstrike 13,667 hk 12,819 hk926 9,928 hkclassboycott 7,439 china 5,297 occupyhongkong 5,273 occupyadmiralty 5,075 umbrellarevolution 4,271 hkdemocracy 3,863 occupyhk 3,626 hk929 3,195 hk928 3,063 hongkongdemocracy 2,500 hongkongprotests 2,144 solidarityhk 1,983 hkstudentboycott 1,702 democracy 1,466 ferguson 1,449 umbrellamovement 1,168 globalforhk 1,157 ?? 1,080 imperialism 1,003 gonawazgo 800 handsupdontshoot 777

LITA: The Password Dilemma

planet code4lib - Mon, 2014-09-29 11:00
Elizabeth Montgomery on the game show Password, 1971

One-on-one technology help is one of the greatest services offered by the modern public library. Our ability to provide free assistance without an underlying agenda to sell a product puts us in a unique and valuable position in our communities. While one-on-one sessions are one of my favorite job duties, I must admit that they can also be the most frustrating, primarily because of passwords. It is rare that I assist a patron and we don’t encounter a forgotten password, if not several. Trying to guess the password or resetting it usually eats up most of our time. I wish that I were writing this post as an authority on how to conquer the war on passwords, but I fear that we’re losing the battle. One day we’ll look back and laugh at the time we wasted trying to guess our passwords; resetting them again and again, but it’s been 10 years since Bill Gates predicted the death of the password, so I’m not holding my breath.

The latest answer to this dilemma is password managers like Dashlane and Last Pass. These are viable solutions for some, but the majority of the patrons I work with have little experience with technology and a password manager is simply too overwhelming.

I’ve been thinking a lot about passwords lately; I’ve read countless articles about how to manage passwords, and I don’t think there’s an easy answer. That said, I think that the best thing librarians can do is change our attitude about passwords in general. Instead of considering them to be annoyances we should view them as tools. Passwords should empower us, not annoy us. Passwords are our first line of defense against hackers. If we want to protect the content we create, it’s our responsibility to create and manage strong passwords. This is exactly the perspective we should share with our patrons. Instead of griping about patrons who don’t know their email passwords, we should take this opportunity to educate our patrons. We should view this encounter as a chance to stop patrons from using one password across all of their accounts or God forbid, using 123456 as their password.

If a patron walks away from a one-on-one help session with nothing more than a stronger account password and a slightly better understanding of online security, then that is a victory for the librarian.

What’s your take on the password dilemma? Do you have any suggestions for working with patrons in one-on-one situations? Please share your thoughts in the comments.

District Dispatch: Webinar: Fighting Ebola with information

planet code4lib - Mon, 2014-09-29 06:44

Photo by Phil Moyer

Recent outbreaks across the globe and in the U.S. have increased public awareness of the potential public health impacts of infectious diseases. As a result, many librarians are assisting their patrons in finding credible information sources on topics such as Ebola, Chikungunya and pandemic influenza.

The American Library Association (ALA) is encouraging librarians to participate in “Fighting Ebola and Infectious Diseases with Information: Resources and Search Skills Can Arm Librarians,” a free webinar that will teach participants how to find and share reliable health information. Librarians from the U.S. National Library of Medicine will host the interactive webinar, which takes place on Tuesday, October 14, 2014, from 2–3:00p.m. Eastern.

Speakers include:

Siobhan Champ-Blackwell
Siobhan Champ-Blackwell is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center. She selects material to be added to the NLM disaster medicine grey literature data base and is responsible for the Center’s social media efforts. She has over 10 years of experience in providing training on NLM products and resources.

Elizabeth Norton
Elizabeth Norton is a librarian with the U.S. National Library of Medicine Disaster Information Management Research Center where she has been working to improve online access to disaster health information for the disaster medicine and public health workforce. She has presented on this topic at national and international association meetings and has provided training on disaster health information resources to first responders, educators, and librarians working with the disaster response and public health preparedness communities.

Date: Tuesday, October 14, 2014
Time: 2:00 PM – 3:00 PM Eastern
Register for the free event

If you cannot attend this live session, a recorded archive will be available to view at your convenience. To view past webinars also hosted collaboratively with iPAC, please visit Lib2Gov.org.

The post Webinar: Fighting Ebola with information appeared first on District Dispatch.

DuraSpace News: DuraSpace Hot Topics Webinar Series 9: Early Advantage: Introducing New Fedora 4.0 Repositories

planet code4lib - Mon, 2014-09-29 00:00

Announcing Series 9 in the DuraSpace Hot Topics Webinar Series: Early Advantage: Introducing New Fedora 4.0 Repositories

Curated by David Wilcox, Fedora Product Manager, DuraSpace

Eric Hellman: Online Bookstores to Face Stringent Privacy Law in New Jersey

planet code4lib - Sun, 2014-09-28 18:11
Before you read this post, be aware that this web page is sharing your usage with Google, Facebook, StatCounter.com, unglue.it and Harlequin.com. Google because this is Blogger. Facebook because there's a "Like" button, StatCounter because I use it to measure usage, and Harlequin because I embedded the cover for Rebecca Avery's Maid Crave directly from Harlequin's website. Harlequin's web server has been sent the address of this page along with you IP address as part of the HTTP transaction that fetches the image, which, to be clear, is not a picture of me.

I'm pretty sure that having read the first paragraph, you're now able to give informed consent if I try to sell you a book (see unglue.it embed -->) and constitute myself as a book service for the purposes of a New Jersey "Reader Privacy Act", currently awaiting Governor Christie's signature. That act would make it unlawful to share information about your book use (borrowing, downloading, buying, reading, etc.) with a third party, in the absence of a court order to do so. That's good for your reading privacy, but a real problem for almost anyone running a commercial "book service".

Let's use Maid Crave as an example. When you click on the link, your browser first sends a request to Harlequin.com. Using the instructions in the returned HTML, it then sends requests to a bunch of web servers to build the web page, complete with images, reviews and buy links. Here's the list of hosts contacted as my browser builds that page:

  • www.harlequin.com
  • stats.harlequin.com
  • seal.verisign.com (A security company)
  • www.goodreads.com  (The review comes from GoodReads. They're owned by Amazon.)
  • seal.websecurity.norton.com (Another security company)
  • www.google-analytics.com
  • www.googletagservices.com
  • stats.g.doubleclick.net (Doubleclick is an advertising network owned by Google)
  • partner.googleadservices.com
  • tpc.googlesyndication.com
  • cdn.gigya.com (Gigya’s Consumer Identity Management platform helps businesses identify consumers across any device, achieve a single customer view by collecting and consolidating profile and activity data, and tap into first-party data to reach customers with more personalized marketing messaging.)
  • cdn1.gigya.com
  • cdn2.gigya.com
  • cdn3.gigya.com
  • comments.us1.gigya.com
  • gscounters.us1.gigya.com
  • www.facebook.com (I'm told this is a social network)
  • connect.facebook.net
  • static.ak.facebook.com
  • s-static.ak.facebook.com
  • fbstatic-a.akamaihd.net (Akamai is here helping to distribute facebook content)
  • platform.twitter.com (yet another social network)
  • syndication.twitter.com
  • cdn.api.twitter.com
  • edge.quantserve.com (QuantCast is an "audience research and behavioural advertising company")

All of these servers are given my IP address and the URL of the Harlequin page that I'm viewing. All of these companies except Verisign, Norton and Akamai also set tracking cookies that enable them to connect my browsing of the Harlequin site with my activity all over the web. The Guardian has a nice overview of these companies that track your use of the web. Most of them exist to better target ads at you. So don't be surprised if, once you've visited Harlequin, Amazon tries to sell you romance novels.
Certainly Harlequin qualifies as a commercial book service under the New Jersey law. And certainly Harlequin is giving personal information (IP addresses are personal information under the law) to a bunch of private entities without a court order. And most certainly it is doing so without informed consent. So its website is doing things that will be unlawful under the New Jersey law.
But it's not alone. Almost any online bookseller uses services like those used by Harlequin. Even Amazon, which is pretty much self contained, has to send your personal information to Ingram to fulfill many of the book orders sent to it. Under the New Jersey law, it appears that Amazon will need to get your informed consent to have Ingram send you a book. And really, do I care? Does this improve my reading privacy?
The companies that can ignore this law are Apple, Target, Walmart and the like. Book services are exempt if they derive less than 2% of their US consumer revenue from books. So yay Apple.
Other internet book services will likely respond to the law with pop-up legal notices like those you occasionally see on sites trying to comply with European privacy laws. "This site uses cookies to improve your browsing experience. OK?" They constitute privacy theater, a stupid legal show that doesn't improve user privacy one iota.
Lord knows we need some basic rules about privacy of our reading behavior. But I think the New Jersey law does a lousy job of dealing with the realities of today's internet. I wonder if we'll ever start a real discussion about what and when things should be private on the web.

Karen G. Schneider: Against Shiny

planet code4lib - Sat, 2014-09-27 18:09

So I need to talk about something on my mind but blurt it out hastily and therefore with less finesse than I’d prefer. There has been a Recent Unpleasantness in LibraryLand where a librarian sued two other librarians for libel. Normally we are a free-speechy sort of group not inclined to sue one another over Things People Said, but as noted in this post by bossladywrites (another academic library director–we are legion), we are not in normal times.  And as Meredith observes in another smart post, it is hard to see the upside of any part of this. Note: I’m not going to discuss the actual details of the lawsuit; I’m more interested in the state of play that got us there. To quote my own tweet:

Not going to wade deeply into #teamharpy except to note that “thought leaders from the library community” are generally not pro-SLAPP.

— K.G. Schneider (@kgs) September 24, 2014

But first — the context for my run-on sentences and choppy transitions, this being a personal blog and therefore sans an editor to say “stop, stick to topic.” The last two weeks have featured a fender-bender with our Honda where the other driver decided to file a medical claim, presumably for chipping a nail, as you can’t do much damage at 5 mph, even when you are passing on the right and running a stop sign; intense work effort around a mid-year budget adjustment; an “afternoon off” to do homework during which the Most Important Database I needed at that moment was erratic at best; a terrible case of last-minuting by another campus department that should really know better; and the death at home last Saturday of our 18-year-old cat Emma, which included not only the trauma of her departure, but also the mild shame of bargain-shopping for a pet crematorium early last Sunday morning after the first place I called wanted more than I felt would be reasonable for my own cremation.

Now Emma’s ashes are on the shelf with the ashes of Darcy, Dot, and Prada; I am feeling no longer so far behind on homework, though I have a weekend ahead of me that needs to feature less Crazy and more productivity; and I have about 45 minutes before I drive Sandy to a Diabetes Walk, zoom to the Alemany farmer’s market, then settle in for some productive toiling.

It will sound hypocritical for a librarian who has been highly visible for over two decades to say this, but I agree that there is a hyper-rock-stardom afoot in our profession, and I do wonder if bossladywrites isn’t correct that social media is the gasoline over its fire. It does not help when programs designed to help professionals build group project skills have “leader” in the title and become so heavily coveted that librarians publicly gnash teeth and wail if they are not selected, as if their professional lives have been ruined.

It will also sound like the most sour of grapes to say this (not being a Mover & Shaker), and perhaps it is, but there is also a huge element of Shiny in the M&S “award,” which after all is bestowed by an industry magazine and based on a rather casual referral process. There are some well-deserved names mingling with people who are there for reasons such as schmoozing a nomination from another Famous Name (and I know of more than one case of post-nomination regret). Yet being selected for a Library Journal Mover & Shaker automatically labels that person with a gilded status, as I have seen time and again on committees and elsewhere. It’s a magazine, people, not a professional committee.

We own this problem. I have participated in professional activities where it was clear that these titles — and not the performance behind them — fast-tracked librarians for nominations far too premature for their skills. (And no, I am not suggesting the person that brought the suit is an EL–I don’t know that, though I know he was an M&S.) I am familiar with one former EL (not from MPOW!) who will take decades if ever to live up to anything with “leader” in the title, and have watched him get proposed as a candidate for association-wide office–by virtue of being on the magic EL-graduate roster.

Do I think Emerging Leaders is a good program? If I didn’t, I wouldn’t have carved money out of our tiny budget to commit to supporting one at MPOW. Do I think being an EL graduate means you are qualified for just about anything the world might offer, and your poop don’t stink? No, absolutely not. I did not single out one person due  to magical sparkly librarian powers; it had a lot more to do with this being a good fit for that librarian at the time, just as I have helped others at MPOW get into leadership programs, research institutes, information-literacy boot camps, and skill-honing committees. It’s just part of my job.

The over-the-top moment for me with EL was the trading cards. Really? Coronets and fanfare for librarians learning project management and group work? Couldn’t we at least wait until their work was done? Of the tens of thousands of librarians in the U.S. alone, less than one hundred become ELs every year. The vast majority of the remainder are “emerging” just fine in their own right; there are great people doing great work that you will never, ever hear of. Why not just give us all trading cards — yes, every damn librarian? And before you conclude KGS Hates EL, keep in mind I have some serious EL street cred, having not only sponsored an EL but also for successfully proposing GLBTRT’s first EL and making a modest founding donation to its effort to boot.

Then there was ALA’s “invitational summit” last spring where fewer than 100 “thought leaders from the library field” gathered to “begin a national conversation.” Good for them, but as one of the uninvited, I could not resist poking mild fun at this on Twitter, partly for its exclusivity and partly because this “national conversation” was invisible to the rest of the world. I was instantly lathered in Righteous Indignation by some of the chosen people who attended — and not even to my (social network) face, but in the worst passive-aggressive librarian style, through “vaguebook” comments on social networks. (And a la Forrest Gump, the person who brought the lawsuit against the two librarians was at this summit, too, though I give the organizers credit for blending interesting outliers along with the usual suspects.) If you take yourself that seriously, you need a readjustment — perhaps something we can discuss if that conversation is ever launched.

I have a particularly bitter taste in my mouth about the absentee rockstar librarian syndrome because I had one job, eons ago, where I succeeded an absentee leader who had been on the conference circuit for several years, and all the queen’s horses couldn’t put that department together again. There were a slew of other things that were going wrong, but above all, the poor place stank of neglect.  The mark of a real rock star is the ability to ensure that no one back at the ranch ever has any reason to begrudge you your occasional Shiny Moment.  Like the way so many of us learn hard lessons, it gave me pause about my own practices, and caused me to silently beg forgiveness from past organizations for any and all transgressions.

Shiny Syndrome can twist people’s priorities and make the quotidian seem unimportant (along with making them boors at dinner parties, as Meredith recounts). Someone I intensely dislike is attributed with saying that 80 percent of life is showing up, a statement I grudgingly agree is spot-on. When people ask if I would run for some office or serve on some very busy board, or even do a one-off talk across country, I point out that I have a full-time job and am a full-time student (I barely have time to brew beer more than three times a year these days!). But it’s also true that I get a huge amount of satisfaction simply from showing up for work every day, as well as activities that likely sound dull but to me are very exciting, such as shared-print pilots and statewide resource sharing, as well as the interviews I am conducting for a research paper that is part of my doctoral process, a project that has big words like Antecedents in the title but is to me fascinating and rewarding.

I also get a lot of pleasure from professional actions that don’t seem terribly fun, such as pursuing the question of whether there should be a Planning and Budget Assembly, a question that may seem meaningless to some; in fact, at an ALA midwinter social, one Shiny Person belittled me for my actions on PBA to the point where I left the event in tears. Come to think of it, that makes two white men who have belittled me for pursuing the question of PBA, which brings up something Meredith and bossladywrites hint at: the disproportionate number of rockstar librarians who are young, white, and male. They left off age, but I feel that acutely; far too often, “young” is used as a synonym for forward-thinking, tech-savvy, energetic, smart, creative, and showcase-worthy.

I do work in a presentation now and then — and who can complain about being “limited” to the occasional talk in Australia and New Zealand (I like to think “I’m big, really big, in Palmerston North”), though my favorite talk in the last five years was to California’s community college library directors, because they are such a nice group and it was a timely jolt of Vitamin Colleague — but when I do, I end up talking about my work in one way or the other. And one of the most touching moments of my career happened this August when at an event where MPOW acknowledged my Futas Award — something that honors two decades of following Elizabeth Futas’ model of outspoken activism, sometimes at personal risk, sometimes wrongheadedly, sometimes to no effect, but certainly without pause — I realized that some of our faculty thought I was receiving this award for my efforts on behalf of my dear library, as if there were an award for fixing broken bathroom exhaust fans and replacing tables and chairs, activities that along with the doctoral program take up the space where shiny stuff would go. That flash of insight was one of the deepest, purest moments of joy in my professional life. I got to be two people that day: the renegade of my youth, and the macher of my maturity.

Finally, I am now venturing into serious geezer territory, but back in the day, librarians were rock stars for big stuff, like inventing online catalogs, going to jail rather than revealing their patrons’ identities, and desegregating state associations. These days you get your face, if not on the cover of Rolling Stone, as a centerfold in a library magazine, position yourself as a futurist or guru, go ping ping ping all over the social networks, and you’re now at every conference dais. (In private messaging about this topic, I found myself quoting the lyrics from “You’re So Vain.”)

Name recognition has always had its issues (however convenient it is for those of us, like me, who have it). I often comment, and it is not false modesty, that I know some people vote for me for the wrong reasons. I have my areas of competence, but I know that name recognition and living in a state with a large population (as I am wont to do) play a role in my ability to get elected. (Once I get there, I like to think I do well enough, but that is beside the point. A favorite moment of mine, from back when I chaired a state intellectual freedom committee, was a colleague who remarked, clearly surprised, that”you know how to run a meeting!”) And of course, there are rock stars who rock deservedly, and sometimes being outward-facing is just part of the package (and some of us can’t help it — I was that little kid that crazy people walked up to in train stations to gift with hand-knit sweaters, and yes, that really happened). But we seem to have gone into a new space, where a growing percentage of Shiny People are famous for being shiny.  It’s not good for us, and it’s not good for them, and it’s terrible for our profession.

 

Bookmark to:

Pages

Subscribe to code4lib aggregator