You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 1 day 26 min ago

John Miedema: Orlando: the lives and works of British women writers. Digital resources working together in unexpected and insightful ways.

Sat, 2014-10-11 19:55

Orlando is a digital resource, indexing the lives and works of British women writers.

The full name of the project is, Orlando: Women’s Writing in the British Isles from the Beginnings to the Present. It is the work of scholars Susan Brown, Patricia Clements, and Isobel Grundy. The name of the work was inspired by Virginia Woolf’s 1928 novel, Orlando: A Biography. The project, like the novel, is an important resource in the history of women’s writing. It grew out of the limitations of a print-based publication, The Feminist Companion to Literature in English. The Companion presented a great deal of research on women writers but lacked an adequate index. The researchers decided to compile a digital index.

I have the good fortune to work with Susan Brown and the Orlando resource. I have extracted bibliographic and literary data from Orlando, and intend to integrate it with unstructured literary content using Natural Language Processing. The aim is a first demonstration of how digital resources like Orlando can provide new ways of reading and understanding literature. In particular I hope to show how digital resources can work together in unexpected and insightful ways.

More information:

The Orlando Project

Bigold, Melanie (2013) “Orlando: Women’s Writing in the British Isles from the Beginnings to the Present, edited by Susan Brown, Patricia
Clements, and Isobel Grundy,” ABO: Interactive Journal for Women in the Arts, 1640-1830: Vol. 3: Iss. 1, Article 8.
Available at:

Orlando: A Biography. Wikipedia


Open Library Data Additions: An error occurred

Sat, 2014-10-11 10:12
The RSS feed is currently experiencing technical difficulties. The error is: Search engine returned invalid information or was unresponsive

Patrick Hochstenbach: My first VideoScribe project

Sat, 2014-10-11 07:11
Trying out a little animation with VideoScribe to give an introduction into the services of Ghent University Library. The illustrations were created on paper using a fineliner. I scanned them and vector traced them in Adobe Illustrator (VideoScribe need to

FOSS4Lib Upcoming Events: Code3cme

Sat, 2014-10-11 04:53
Date: Saturday, October 11, 2014 - 00:45 to Sunday, October 11, 2015 - 00:45Supports: DMP Online

Last updated October 11, 2014. Created by bunnychris on October 11, 2014.
Log in to edit this page.

Get enrolled for the refresher courses for a great medical career. To know more about the site click here .

LITA: Shifting & Merging

Sat, 2014-10-11 00:39
McKenzie Pass, Ore. Courtesy of Ryan Shattuck. Task Easy Blog 2013.

It has been exactly seven weeks since I moved to Bloomington, Indiana, yet I finally feel like I have arrived. Let me rewind, quick, and tell you a little about my background. During my last two years of undergrad at the University of Nebraska-Lincoln (UNL), I spent my time working on as many Digital Humanities (DH) projects and jobs as I possibly could in the Center for Digital Research in the Humanities.

[DH is a difficult concept to define because everyone does it through various means, for various reasons. To me, it means using computational tools to analyze or build humanities projects. This way, we can find patterns we wouldn't see through the naked eye, or display physical objects digitally for greater access.]

By day, I studied English and Computer Science, and by night, my fingers scurried over my keyboard encoding poems, letters, and aphorisms. I worked at the Walt Whitman Archive, on an image analysis project with two brilliant professors, on text analysis and digital archives projects with leading professors in the fields, and on my own little project analyzing a historical newspaper. My classmates and I, both undergraduate and graduate, constantly talked about DH, what it is, who does it, how it is done, the technologies we use do it and how that differs from others.

Discovering an existing group of people already doing the same work you do is like merging onto a packed interstate where everyone is travelling at 80 miles per hour in the same direction. The thrill, the overwhelming “I know I am in the right place” feeling.

I chose Indiana University (IU) for my Library and Information Science degrees because I knew it was a hub for DH projects. I have an unparalleled opportunity working with Dr. John Walsh and Dr. Noriko Hara, both prominent DH and Information Science scholars.

However, I am impatient. After travelling on the DH interstate, I expected every classmate I met at IU to wear a button proclaiming, “I heart DH, let’s collaborate.” I half expected my courses to start from where I left off in my previous education. The beginning of the semester forced me to take a step back, to realize that I was shifting to a new discipline, and that I needed the basics first. My classes are satisfying my library love, but I was still missing that extra-curricular technology aspect, outside of my work for Dr. Walsh.

Then, one random, serendipitous meeting in the library and I was “zero to eighty” instantly. I met those DH students and learned about projects, initiatives, and IU networking. They reaffirmed that the community for which I was searching existed.

Since then, I have found others in the community and continue those same DH who, what, how, why conversations. While individual research is important, we can reach a higher potential through collaboration, especially in the digital disciplines. I am continuing to learn the importance of reaching out and learning from others, which I don’t believe will cease once I graduate. (Will it?)

I assure you that my future posts will be more closely related to library technology and digital humanities tools, but frankly, I’m new here. While I could talk about the library and information theory I’m learning, I will spare you those library school memories, and keep you updated on new technologies as I learn them.

In the meantime, I’ll ask you to reflect and share your experience transitioning to library school or into a library career. How were you first introduced to library technology or digital humanities? Any nuggets of advice for us beginners?

LITA: 2014 LITA Forum: 3 Amazing Keynotes

Fri, 2014-10-10 17:10

Join your LITA colleagues in Albuquerque, Nov 5-8, 2041 for the 2014 LITA Forum.

This year’s Forum has three amazing keynotes you won’t want to miss:

AnnMarie Thomas, Engineering Professor, University of St. Thomas

AnnMarie is an engineering professor who spends her time trying to encourage the next generation of makers and engineers. Among a host of other activities she is the director of the Playful Learning Lab and leads a team of students looking at both the playful side of engineering (squishy circuits for students, the science of circus, toy design) and ways to use engineering design to help others. AnnMarie and her students developed Squishy Circuits.

Check out AnnMarie’s fun Ted Talk on Play-Doh based squishy circuits.

Lorcan Dempsey, Vice President, OCLC Research and Chief Strategist

Lorcan Dempsey oversees the research division and participates in planning at OCLC. He is a librarian who has worked for library and educational organizations in Ireland, England and the US.

Lorcan has policy, research and service development experience, mostly in the area of networked information and digital libraries. He writes and speaks extensively, and can be followed on the web at Lorcan Dempsey’s weblog and on twitter.

Kortney Ryan Ziegler, Founder Trans*h4ck

Kortney Ryan Ziegler is an Oakland based award winning artist, writer, and the first person to hold the Ph.D. of African American Studies from Northwestern University.

He is the director of the multiple award winning documentary, STILL BLACK: a portrait of black transmen, runs the GLAAD Media Award nominated blog, blac (k) ademic, and was recently named one of the Top 40 Under 40 LGBT activists by The Advocate Magazine and one of the most influential African Americans by TheRoot100.

Dr. Ziegler is also the founder of Trans*H4CK–the only tech event of its kind that spotlights trans* created technology, trans* entrepreneurs and trans* led startups.

See all the keynoters full bios at the LITA Forum Keynote Sessions web page

More than 30 concurrent colleague inspired sessions and a dozen poster sessions will provide a wealth of practical information on a wide range of topics. Networking opportunities, a major advantage of a smaller conference, are an important part of the Forum. Take advantage of the Thursday evening reception and sponsor showcase, the Friday networking dinners or Kitchen Table Conversations, plus meals and breaks throughout the Forum to get to know LITA leaders, Forum speakers, sponsors, and peers.

This year two preconference workshops will also be offered.

Linked Data for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users
Led by: Dean B. Krafft and Jon Corson-Rikert, Cornell University Library

Learn Python by Playing with Library Data
Led by: Francis Kayiwa, Kayiwa Consulting

2014 LITA Forums sponsors include EBSCO, Springshare, @mire, Innovative and OCLC.

Visit the LITA website for more information.

Library and Information Technology Association (LITA) members are information technology professionals dedicated to educating, serving, and reaching out to the entire library and information community.   LITA is a division of the American Library Association.

LITA and the LITA Forum fully support the Statement of Appropriate Conduct at ALA Conferences

OCLC Dev Network: WorldCat Discovery API and Linked Data

Fri, 2014-10-10 14:00

This is the second post in our series introducing the WorldCat Discovery API. In our introductory remarks on the API, we told you about how the API can be used to power all aspects of resource discovery in your library. We also introduced some of the reasons why we chose entity-based bibliographic description for the API’s data serializations over more traditional API outputs. In this post we want to explore this topic even further and take a closer look at the Linked Data available in the WorldCat Discovery API.

Library of Congress: The Signal: Archiving from the Bottom Up: A Conversation with Howard Besser

Fri, 2014-10-10 13:54

Howard Besser, Professor of Cinema Studies and Director of New York University’s Moving Image Archiving & Preservation Program and Senior Scientist for Digital Library Initiatives for NYU’s Library.

The following is a guest post from Julia Fernandez, this year’s NDIIPP Junior Fellow. Julia has a background in American studies and working with folklife institutions and worked on a range of projects leading up to CurateCamp Digital Culture in July. This is part of a series of interviews Julia conducted to better understand the kinds of born-digital primary sources folklorists, and others interested in studying digital culture, are making use of for their scholarship.

Continuing our NDSA Insights interview series, I’m delighted to interview Howard Besser, Director of New York University’s Moving Image Archiving & Preservation Program (MIAP) and Professor of Cinema Studies at NYU. He is also one of the founders of Activist Archivists, a group created in the fall of 2011 to coordinate the collection of digital media relating to the Occupy Wall Street political movement.

Julia: Could you tell us a bit about Activist Archivists?  What are the group’s objectives? What kinds of digital media are you exploring?

Howard: Activist Archivists began with the question of how archivists could help assure that digital media documenting the “Occupy” movement could be made to persist. This led us into a variety of interesting sub-areas: getting individuals making recordings to follow practices that are more archivable; documenting the corruption of metadata on YouTube and Vimeo; evangelizing for the adoption of Creative Commons licenses that would allow libraries and archives to collect and make available content created by an individual; making documenters aware that the material they create could be used against their friends; and a host of other sub-areas.

We focused mainly on moving images and sound, and to a lesser degree on still images.  As the Occupy movement began to dissipate, Activist Archivists morphed into a focus on community archiving that might be analog, digital or a hybrid. We worked with Third World Newsreel and Interference Archive and in 2014 produced the first Home Video Day in association with members of the NYC Asian American community and Downtown Community Television. And several Archivist Archivist members are on the planning committee for the 2015 Personal Digital Archiving Conference.

Julia: Could you tell us a bit about the digital materials you are working from? What made them an interesting source for you?

Peoples Library Occupy Wall Street 2011 Shankbone, shared by user David Shankbone on Flickr.

Howard: Working with Occupy, we were mainly dealing with sound and images recorded on cellphones. This was particularly interesting because of the lack of prior knowledge in the library/archiving community about how to employ the wealth of metadata that cellphones captured while recording images and sound. For example, it’s very easy to set a cellphone to capture geolocation information as part of the metadata coupled to every image or sound that is recorded. And this, of course, can raise privacy issues because a corpus of photos one takes creates an exact path of places that one has been. The other thing that made this project particularly interesting to me was how social media sites such as YouTube strip away so much metadata (including much that could be useful to archives and scholars).

Julia: What are some of the challenges of working with a “leaderless” and anti-establishment movement like Occupy?

Howard: It’s always difficult for people who have spent most of their lives in hierarchical environments to adapt to a bottom-up (instead of a top-down) structure. It means that each individual needs to take on more initiative and responsibility, and usually ends up with individuals becoming more intensively involved, and feeling like they have more of a stake in the issues and the work. I think that the toughest challenge that we experienced was that each time we met with an Occupy Committee or Group, we needed to start re-explaining things from scratch. Because each new meeting always included people who had not attended the previous meeting, we always had to start from the beginning. Other major problems we faced would always be true in all but the most severe hierarchical organizations: how do you get everyone in the organization to adopt standards or follow guidelines. This is an age-old problem that is seldom solved merely by orders from above.

Julia: Activist Archivists has printed a “Why Archive?” informational card that spells out the importance of groups taking responsibility for the record of their activity.  If libraries and archives wanted to encourage a more participatory mode of object and metadata gathering, what would you suggest? What would you want to see in how libraries and archives provide access to them?

Howard: One of the earliest issues we encountered with Occupy was the prevalent notion that history is documented in book-length essays about famous people. Many people in Occupy could not see that someone in the future might be interested in the actions of an ordinary person like them. Now, a third of a century after Howard Zinn’s “A People’s History Of The United States,” most progressive historians believe that history is made by ordinary individuals coming together to conduct acts in groups. And they believe that we can read history through archival collections of letters, post-cards and snapshots. Librarians, archivists and historians need to make the case to ordinary people that their emails, blogs and Flickr and Facebook postings are indeed important representations of early 21st century life that people in the future will want to access. And as librarians and archivists, we need to be aggressive about collecting these types of material and make concrete plans for access.

Julia: In a recent NDSA talk (PDF) you identified some of the challenges of archiving correspondence in the digital age. For one, “digital info requires a whole infrastructure to view it” and “each piece of that infrastructure is changing at an incredibly rapid rate”; and also “people no longer store their digital works in places over which they have absolute control,” opting instead for email services, cloud storage or social network services. What are some effective approaches you’ve seen to dealing with these challenges?

Howard: Only institutions that themselves are sustainable across centuries can commit to the types of continuous refreshing and either migration or emulation that are necessary to preserve born-digital works over time. Libraries, archives and museums are about the only long-term organizations that have preservation as one of their core missions, so effective long-term digital preservation is likely to only happen in these types of institutions. The critical issue is for these cultural institutions to get the born-digital personal collections of individuals into preservable shape (through file formats and metadata) early in the life-cycle of these works.

As we found in both the InterPARES II Project and the NDIIPP Preserving Digital Public Television Project (PDF), waiting until a digital collection is turned over to an archive (usually near the end of it’s life-cycle) is often too late to do adequate preservation (and even more difficult if the creator is dead). We either need to get creators to follow good practices (file formats, metadata, file-naming conventions, no compression, executing Creative Commons licenses, …) at the point of creation, or we need to get the creators to turn over their content to us shortly after creation. So we need to be aggressive about both offering training and guidelines and about collection development.

Updated 10/10/14 for typos.

Open Knowledge Foundation: Open Humanities Hack: 28 November 2014, London

Fri, 2014-10-10 13:42

This is a cross-post from the DM2E-blog, see the original here

On Friday 28 November 2014 the second Open Humanities Hack event will take place at King’s College, London. This is the second in a series of events organised jointly by the King’s College London Department of Digital Humanities , the Digitised Manuscripts to Europeana (DM2E) project, the Open Knowledge Foundation and the Open Humanities Working Group

The event is focused on digital humanists and intended to target research-driven experimentation with existing humanities data sets. One of the most exciting recent developments in digital humanities include the investigation and analysis of complex data sets that require the close collaboration between Humanities and computing researchers. The aim of the hack day is not to produce complete applications but to experiment with methods and technologies to investigate these data sets so that at the end we can have an understanding of the types of novel techniques that are emerging.

Possible themes include but are not limited to

  • Research in textual annotation has been a particular strength of digital humanities. Where are the next frontiers? How can we bring together insights from other fields and digital humanities?

  • How do we provide linking and sharing humanities data that makes sense of its complex structure, with many internal relationships both structural and semantic. In particular, distributed Humanities research data often includes digital material combining objects in multiple media, and in addition there is diversity of standards for describing the data.

  • Visualisation. How do we develop reasonable visualisations that are practical and help build on overall intuition for the underlying humanities data set

  • How can we advance the novel humanities technique of network analysis to describe complex relationships of ‘things’ in social-historical systems: people, places, etc.

With this hack day we seek to form groups of computing and humanities researchers that will work together to come up with small-scale prototypes that showcase new and novel ways of working with humanities data.

Date: Friday 28 November 2014
Time: 9.00 – 21.00
Location: King’s College, Strand, London
Sign up: Attendance is free but places are limited: please fill in the sign-up form to register .

For an impression of the first Humanities Hack event, please check this blog report .

Open Knowledge Foundation: This Index is yours!

Thu, 2014-10-09 20:23

How is your country doing with open data? You can make a difference in 5 easy steps to track 10 different datasets. Or, you can help us spread the word on how to contribute to the Open Data Index. This includes the very important translation of some key items into your local language. We’ll keep providing you week-by-week updates on the status of the community-driven project.

We’ve got a demo and some shareable slides to help you on your Index path.

Priority country help wanted

The amazing community provided content for over 70 countries last year. This year we set the bar higher with a goal of 100 countries. If you added details for your country last year, please be sure to add any updates this year. Also, we need some help. Are you from one of these countries? Do you have someone in your network who could potentially help? Please do put them in touch with the index team – index at okfn dot org.

DATASETS WANTED: Armenia, Bolivia, Georgia, Guyana, Haiti, Kosovo, Moldova, Morocco, Nicaragua, Ukraine, and Yemen.

Video: Demo and Tips for contributing to the Open Data Index

This is a 40 minute video with some details all about the Open Data Index, including a demo to show you how to add datasets.

Text: Tutorial on How to help build the Open Data Index

We would encourage you to download this, make changes (add country specific details), translate and share back. Please simply share on the Open Data Census Mailing List or Tweet us @okfn.

How to Global Open Data Index – Overview from School of Data

Thanks again for sharing widely!

District Dispatch: Libraries are early learning partners

Thu, 2014-10-09 16:06

Photo by Lester Public Library

The American Library Association (ALA) urged the Department of Education in a letter (pdf) Wednesday to include public libraries as early learning partners in the Proposed Requirements for School Improvement Grants (SIG). The Association specifically asks that the Department of Education include public libraries as eligible entities and allowable partners under the new intervention model that focuses on improving early learning educational outcomes.

“The country’s 16,400 public libraries are prepared to support early childhood education, but we can only do so if policies allow for better collaboration, coordination, and real partnerships between libraries and the various federal early learning programs, including SIG grants,” said Emily Sheketoff, executive director of the ALA Washington Office, in a statement.

“Public libraries in communities across the country work tirelessly to support children and families by helping children develop early literacy and early learning skills,” said Andrew Medlar, vice president and president-elect of the Association for Library Service to Children (ALSC). “Our libraries are a foundation of our communities and are ready and willing to help children succeed.”

By offering reading materials, story times and summer reading programs, public libraries across the nation are supporting and complementing early learning efforts. According to a 2010 national survey of public libraries conducted by the Institute of Museum and Library Services (IMLS), public libraries offered 3.75 million programs to the public in 2010. The survey found that 2.31 million of those programs are designed for children aged 11 and younger. Another report found that the circulation of children’s materials in libraries has increased by 28.3 percent in the last ten years and comprises over one-third of all materials circulated in public libraries.

The ALA Washington Office and ALSC collaborated on the letter sent to the Department of Education.

The post Libraries are early learning partners appeared first on District Dispatch.

Eric Hellman: Correcting Misinformation on the Adobe Privacy Gusher

Thu, 2014-10-09 15:29
We've learned quite a lot about Adobe Digital Editions version 4 (ADE4) since Nate Hoffelder broke the story that "Adobe is Spying on Users, Collecting Data on Their eBook Libraries". Unfortunately, there's also been a some bad information that's been generated along with the furor.

One thing that's clear is that Adobe Digital Editions version 4 is not well designed. It's also clear that our worst fears about ADE4 - that it was hunting down ALL the ebooks on a user's computer on installation and reporting them to Adobe - are not true. I've been participating with Nate and some techy people in the library and ebook world (including Galen, Liza, and Andromeda) to figure out what's really going on. It's looking more like an incompetently-designed, half-finished synchronization system than a spy tool. Follow Nate's blog to get the latest.
So, some misconceptions.
  1. The data being sent by ADE4 is NOT needed for digital rights management. We know this because the DRM still works if the Adobe "log" site is blocked. Also, we know enough about the Adobe DRM to know it's not THAT stupid.
  2. The data being sent by ADE4 is NOT part of the normal functioning of a well designed ebook reader. ADE4 is sending more than it needs to send even to accomplish synchronization. Also, ADE4 isn't really cloud-synchronizing devices, the way BlueFire is doing (well!).
On the legal side:
  1. The ADE4 privacy policy is NOT a magic incantation that makes everything it does legal. For example, all 50 states have privacy laws that cover library records. When ADE4 is used for library ebooks, the fact that it broadcasts a user's reading behavior makes it legally suspect. Even if the stream were encrypted, it's not clear that it would be legal.
  2. The NJ Reader Privacy Act is NOT an issue...yet. There's been no indication that it's been signed into law. If signed into law, and upheld, and found to apply, then Adobe would owe a lot of people in NJ $500.
  3. The California Reader Privacy Act is NOT relevant (as far as I can tell) because it's designed to protect against legal discovery and there's not been any legal process. However, California has a library privacy law.
  4. Europe might have more to say.
The bottom line for now is that ADE4 does not so much spy on you as it stumbles around in your closet and sometimes tells Adobe what it finds there. In a loud voice so everyone around can hear. And that's not a nice thing to do.

SearchHub: Introducing Our Solr Connector for Couchbase

Wed, 2014-10-08 23:27
You already know that Lucidworks has connectors and plugins to integrate with dozens of data sources like Amazon S3 buckets, Hadoop filesystems and clusters, FTP sites, Azure blobs, cloud storage like Box and Dropbox, JDBC-enabled databases – even Twitter feeds. We’re happy to announce the latest release from the Lucidworks labs: Lucidworks Connector for Couchbase. Now you can join the power of Solr to one of the most popular (and powerful) NoSQL database servers out there – all with continuous real-time replication, quick recovery from network failures, and topology awareness. Here’s the product data sheet or download it and give it a whirl today. Also this past week, we joined the Couchbase crew at their annual conference, Couchbase Connect, just across Union Square from our San Francisco office at the Westin St. Francis. Lucidworks CEO Will Hayes (@IamWillHayes) delivered the keynote, taking the crowd on a fantastic voyage through The Data-Driven Paradigm. Here’s his deck: (video coming soon) The Data-Driven Paradigm from Lucidworks

LITA: Jobs in Information Technology: October 8

Wed, 2014-10-08 17:20

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

IT Assistant Coordinator, Colorado State University, Fort Collins, CO

Visit the LITA Job Site for more available jobs and for information on submitting a  job posting.

District Dispatch: Watch and learn: Making the election connection for libraries

Wed, 2014-10-08 16:47

On Monday, October 6, 2014, the American Library Association (ALA) and Advocacy Guru Stephanie Vance collaborated to host “Making the Election Connection,” an interactive webinar that explored the ways that library advocates can legally engage during an election season, as well as what types of activities have the most impact. Library supporters who missed Tuesday’s advocacy webinar now have access to the archived video.

Making the Election Connection from ALA Washington on Vimeo.

The post Watch and learn: Making the election connection for libraries appeared first on District Dispatch.

Library of Congress: The Signal: Astronomical Data & Astronomical Digital Stewardship: Interview with Elizabeth Griffin

Wed, 2014-10-08 15:36

The following is a guest post from Jane Mandelbaum, co-chair of the National Digital Stewardship Alliance Innovation Working group and IT Project Manager at the Library of Congress.

Elizabeth Griffin is an astrophysicist at the Dominion Astrophysical Observatory in Victoria Canada.

As part of our ongoing series of Insights interviews with individuals doing innovative work related to digital preservation and stewardship, we are interested in talking to practitioners from other fields on how they manage and use data over time.

To that end, I am excited to explore some of these issues and questions with Elizabeth Griffin. Elizabeth is an astrophysicist at the Dominion Astrophysical Observatory in Victoria Canada. She is Chair of the International Astronomical Union Task Force for the Preservation and Digitization of Photographic Plates, and Chair of the Data at Risk Task Group of the International Council for Science Committee on Data for Science and Technology. Griffin presented on Preserving and Rescuing Heritage Information on Analogue Media (PDF) at Digital Preservation 2014. We’re interested in understanding how astronomers have been managing and using astronomical data and hope that others can learn from the examples of astronomers.

Jane: Do you think that astronomers deal with data differently than other scientists?

Elizabeth:  Not differently in principle – data are precious and need to be saved and shared – but the astronomical community has managed to get its act together efficiently, and is consequently substantially more advanced in its operation of data management and sharing than are other sciences.  One reason is that the community is relatively small compared to that of other natural sciences and its attendant international nature also requires careful attention to systems that have no borders.

Another is that its heritage records are photographic plates, requiring a Plate Archivist with duties to catalog what has been obtained; those archives contained a manageable amount of observations per observatory (until major surveys like the Sloan Digital Sky Survey became a practical possibility).  Thus, astronomers could always access past observations, even if only as photographs, so the advantages of archiving even analogue data was established from early times.

Jane: It is sometimes said that astronomers are the scientists who are closest to practitioners of digital preservation because they are interested in using and comparing historical data observations over time.  Do astronomers think they are in the digital preservation business?

Elizabeth: Astronomers  know (not just “think”!) that they are in the digital preservation business, and have established numerous accessible archives (mirrored worldwide) that enable researchers to access past data.  But “historical” indicates different degrees; if a star changes by the day, then yesterday’s (born-digital) data are “historical,” whereas for events that have timescales of the order of a century, then “historical” data must include analogue records on photographic plates.

In the former case, born-digital data abound worldwide; in the latter, they are only patchily preserved in digitized form.  But the same element of “change” applies throughout the natural sciences, not just in astronomy.  Think of global temperature changes and the attendant alterations to glacier coverage, stream flows, dependent flora and fauna, air pollution and so on.   Hand-written data in any of the natural sciences, be they ocean temperatures, weather reports, snow and ice measurements or whatever, all belong to modern research, and all relevant scientists have got to see themselves as being in the digital preservation business, and to devote an aliquot portion of their resources to nurturing those precious legacy data.

We have no other routes to the truth about environmental changes that are on a longer time-scale that our own personal memories or records take us.  Digital preservation of these types of data are vital for all aspects of knowledge regarding change in the natural world, and the scientists involved must join astronomers in being part of the digital preservation business.

Jane: What do you think the role of traditional libraries, museums and archives should be when dealing with astronomical data and artifacts?

Elizabeth: Traditional libraries and archives are invaluable for retaining and  preserving documents that mapped or recorded observations at any point in the past.  Some artifacts  also need to be preserved and displayed, because so often the precision which which measurements could be made (and thence the reliability of what was quoted as the “measuring error”) was dependent upon the technology of the time (for instance, the use of metals with low expansion coefficients in graduated scales, the accuracy with which graduation marks could be inscribed into metal, the repeatability of the ruling engine used to produce a diffraction grating, etc.).

There is also cultural heritage to be read in the historic books and equipment, and it is important to keep that link visible if only so as to retain a correct perspective of where we are now at.  Science advances by the way people customarily think and by what [new] information they can access to fuel that thinking, so understanding a particular line of argument or theory can depend importantly upon the culture of the day.

International Year of Astronomy (NASA, Chandra, 2/10/09) Messier 101 (M101) from NASA’s Marshall Space Flight Center on Flickr.

Jane: The word “innovation” is often used in the context of science and technology, and teaching science.  See for example: The Marshmallow Challenge.  How do you think the concept of “innovation” can be most effectively used?

Elizabeth: “Innovation” has become something of a buzz-word in modern science, particularly when one is groping for a new way to dress up an old project for a grant proposal!  The public must also be rather bemused by it, since so many new developments today are described as “innovative.” What is important is to teach the concept of thinking outside the box.  That is usually how “innovative” ideas get converted into new technologies – not just cranking the same old handle to tease out one more decimal place – so whether you label it “innovation” or something else, the principle of steering away from the beaten track, and working across scientific disciplines rather than entombing them within specialist ivory towers, is the essential factor in true progress.

Jane: “Big data” analysis is often cited as valuable for finding patterns and/or exceptions.  How does this relate to the work of astronomers?

Elizabeth: Very closely!  Astronomers invented the “Virtual Observatory” in the late 20th Century, with the express purpose of federating large data-sets (those resulting from major all-sky surveys, for instance) but at different wavelengths (say) or with other significantly different properties, so that a new dimension of science could be perceived/detected/extracted.  There are so very many objects in an astronomer’s “target list” (our Galaxy alone contains some 10 billion stars, though amongst those are very many different categories and types) and it was always going to be way beyond individual human power and effort to perform such federating unaided.  Concepts of “big data” analyses assist the astronomer very substantially in grappling with that type of new science, though obviously there are guidelines to respect, such as making all metadata conform to certain standards.

Jane: What do you think astronomers have to teach others about generating and using the increasing amounts of data you are seeing now in astronomy?

Elizabeth: A great deal, but the “others” also need to understand how we got to where we now are.  It was not easy; there was not the “plentiful funding” that some outsiders like to assume, and all along the way there were (and still are) internecine squabbles over competitions for limited grant funds: public data or individual research is never an easy balance to strike!  The main point is to design the basics of a system that can work, and to persevere with establishing what it involves.

The basic system needs to be dynamic – able to accommodate changing conditions and moving goal-posts – and to identify resources that will ensure long-term longevity and support.  One such resource is clearly the funding to maintain and operate dynamic, distributed databases of the sort that astronomers now find usefully productive; another is the trained personnel to operate, develop and expand the capabilities, especially in an ever-changing environment.  A third is the importance of educating early-career scientists in the relevance and importance of computing support for compute-intensive sciences.  That may sound tautological, but it is very true that behind every successful modern researcher is a dedicated computing expert.

Teamwork has been an essential ingredient in astronomers’ ability to access and re-purpose large amounts of data.  The Virtual Observatory was not built just by computing experts; at least one third of committee members are present-day research astronomers, able to give first-hand explanations or caveats, and to transmit practical ideas.  These aspects are important ingredients in the model.  At the same time, astronomers still have a very long way to go; only very limited amounts of their non-digital (i.e. pre-digital) data have so far made it to the electronic world; most observations from “history” were recorded on photographic plates and the faithful translation of those records into electronic images or spectra is a specialist task requiring specialist equipment.  One of the big battles which such endeavors face is even a familial one, with astronomer contending against astronomer: most want to go for the shiny and new things, not the old and less sophisticated ones, and it is an uphill task to convince one’s peers that progress is sometimes reached by moving backwards!

Jane: What do you think will be different about the type of data you will have available and use in 10 years or 20 years?

Elizabeth: In essence nothing, just as today we are using basically the same type of data that we have used for the past 100+ years.  But access to those data will surely be a bit different, and if wishes grew on trees then we will have electronic access to all our large archives of historic photographic observations and metadata, alongside our ever-growing digital databases of new observations.

Jane: Do astronomers share raw data, and if so, how? When they do share, what are their practices for assigning credit and value to that work? Do you think this will change in the future?

Elizabeth: The situation is not quite like that.  Professional observing is carried out at large facilities which are nationally or internationally owned and operated.  Those data do not therefore “belong” to the observer, though the plans for the observing, and the immediate results which the Principal Investigator(s) of the observing program may have extracted, are intellectual property owned by the P.I. or colleagues unless or until published.  The corresponding data may have limited access restrictions for a proprietary period (usually of the order of 1 year, but can be changed upon request).

Many of the data thus stored are processed by some kind of pipeline to remove instrumental signatures, and are therefore no longer “raw”; besides, raw data from space are telemetered to Earth and would have no intelligible content until translated by a receiving station and converted into images or spectra of some kind.  Credit to the original observing should be cited in papers emanating from the research that others carry out on the same data once they are placed in the public domain.  I hope that will not change in the future.  It is all too tempting for some “armchair” astronomers (one thinks particularly of theoreticians) who do not carry out their own observing proposals, but wait to see what they can cream off from public data archives.  That is of course above board, but those people do not always appreciate the subtleties of the equipment or the many nuances that may have affected the quality or content of the output.

Jane: Do astronomers value quantitative data derived from observations differently than images themselves?

Elizabeth: Yes, entirely.  The good scientist is a skeptic,  and one very effective driver for the high profile of our database management schemes is the undeniable truth that two separate researchers may get different quantitative data from the same initial image, be that “image” a direct image of the sky or of an object, or its spectrum.  The initial image is therefore the objective element that should ALWAYS be retained for others to use; the quantitative measurements now in the journal are very useful, but are always only subjective, and never definitive.

Jane: How do you think citizen science projects such as Galaxy Zoo can be used to make a case for preservation of data?

Elizabeth: There is a slight misunderstanding here, or maybe just a bad choice of example!  Galaxy Zoo is not a project in which citizens obtain and share data; the Galaxy data that are made available to the public have been acquired professionally with a major telescope facility; the telescope in question (the Sloan Telescope) obtained a vast number of sky images, and it is the classification of the many galaxies which those images show which constitute the “Galaxy Zoo” project.  There is no case to be made out of that project for the preservation of data, since the data (being astronomical!) are already, and will continue to be, preserved anyway.

Your question might be better framed if it referred (for instance) to something like eBird, in which individuals report numbers and dates of bird sightings in their locations, and ornithologists are then able to piece together all that incoming information and worm out of it new information like migration patterns, changes in those patterns, etc.  It is the publication of new science like that that helps to build the case for data preservation, particularly when the data in question are not statutorily preserved.

Galen Charlton: Verifying our tools; a role for ALA?

Wed, 2014-10-08 14:59

It came to light on Monday that the latest version of Adobe Digital Editions is sending metadata on ebooks that are read through the application to an Adobe server — in clear text.

I’ve personally verified the claim that this is happening, as have lots of other people. I particularly like Andromeda Yelton’s screencast, as it shows some of the steps that others can take to see this for themselves.

In particular, it looks like any ebook that has been opened in Digital Editions or added to a “library” there gets reported on. The original report by Nate Hofffelder at The Digital Reader also said that ebook that were not known to Digital Editions were being reported, though I and others haven’t seen that — but at the moment, since nobody is saying that they’ve decompiled the program and analyzed exactly when Digital Editions sends its reports, it’s possible that Nate simply fell into a rare execution path.

This move by Adobe, whether or not they’re permanently storing the ebook reading history, and whether or not they think they have good intentions, is bad for a number of reasons:

  • By sending the information in the clear, anybody can intercept it and choose to act on somebody’s choice of reading material.  This applies to governments, corporations, and unenlightened but technically adept parents.  And as far as state actors are concerned – it actually doesn’t matter that Digital Editions isn’t sending information like name and email addresses in the clear; the user’s IP address and the unique ID assigned by Digital Editions will often be sufficient for somebody to, with effort, link a reading history to an individual.
  • The release notes from Adobe gave no hint that Digital Editions was going to start doing this. While Amazon’s Kindle platform also keeps track of reading history, at least Amazon has been relatively forthright about it.
  • The privacy policy and license agreement similarly did not explicitly mention this. There has been some discussion to the effect that if one looks at those documents closely enough, that there is an implied suggestion that Adobe can capture and log anything one chooses to do with their software. But even if that’s the case – and I’m not sure that this argument would fly in countries with stronger data privacy protection than the U.S. – sending this information in the clear is completely inconsistent with modern security practices.
  • Digital Editions is part of the toolchain that a number of library ebook lending platforms use.

The last point is key. Everybody should be concerned about an app that spouts reading history in the clear, but librarians in particular have a professional responsibility to protect our user’s reading history.

What does it mean in the here and now? Some specific immediate steps I suggest for libraries is to:

  • Publicize the problem to their patrons.
  • Officially warn their patrons against using Digital Editions 4.0, and point to work arounds like pointing “” to “” in hosts files.
  • If they must use Digital Editions to borrow ebooks, to recommend the use of earlier versions, which do not appear to be spying on users.

However, there are things that also need to be done in the long term.

Accepting DRM has been a terrible dilemma for libraries – enabling and supporting, no matter how passively, tools for limiting access to information flies against our professional values.  On the other hand, without some degree of acquiescence to it, libraries would be even more limited in their ability to offer current books to their patrons.

But as the Electronic Frontier Foundation points out,  DRM as practiced today is fundamentally inimical to privacy. If, following Andromeda Yelton’s post this morning, we value our professional soul, something has to give.

In other words, we have to have a serious discussion about whether we can responsibly support any level of DRM in the ebooks that we offer to our patrons.

But there’s a more immediate step that we can take. This whole thing came to light because a “hacker acquaintance” of Nate’s decided to see what Digital Editions is sending home. And a key point? Once the testing starting, it probably didn’t take that hacker more than half an hour to see what was going on, and it may well have taken only five.

While the library profession probably doesn’t count very many professional security researchers among its ranks, this sort of testing is not black magic.  Lots of systems librarians, sysadmins, and developers working for libraries already know how to use tcpdump and Wireshark and the like.

So what do we need to do? We need to stop blindly trusting our tools.  We need to be suspicious, in other words, and put anything that we would recommend to our patrons to the test to verify that it is not leaking patron information.

This is where organizations like ALA can play an important role.  Some things that ALA could do include:

  • Establishing a clearinghouse for reports of security and privacy violations in library software.
  • Distribute information on ways to perform security audits.
  • Do testing of library software in house and hire security researches as needed.
  • Provide institutional and legal support for these efforts.

That last point is key, and is why I’m calling on ALA in particular. There have been plenty of cases where software vendors have sued, or threatened to sue, folks who have pointed out security flaws. Rather than permitting that sort of chilling effect to be tolerated in the realm of library software, ALA can provide cover for individuals and libraries engaged in the testing that is necessary to protect our users.

Andromeda Yelton: ebooks choices and the missing soul of librarianship

Wed, 2014-10-08 13:47

We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.
American Library Association Code of Ethics

Yesterday I watched as Adobe Digital Editions told Adobe what book I was reading — title, author, publisher, year of publication, subject, description — and every page I’d read, and the time at which I read them. Adobe’s EULA states that it also collects my user ID and my general location.

I was able to watch this information be collected because it was all sent unencrypted, readable to any English-speaking human with access to any of the servers it passes through, in whatsoever jurisdiction, and also (if your wifi is unencrypted) the open air between my laptop and my router.

The Council of the American Library Association strongly recommends that… [circulation and other personally identifying] records shall not be made available to any agency of state, federal, or local government except pursuant to such process, order or subpoena as may be authorized under the authority of, and pursuant to, federal, state, or local law relating to civil, criminal, or administrative discovery procedures or legislative investigative power [and that librarians] resist the issuance of enforcement of any such process, order, or subpoena until such time as a proper showing of good cause has been made in a court of competent jurisdiction.”
Policy on confidentiality of library records

Your patrons’ reading information is already part of a warrantless dragnet. Because it has been transmitted in cleartext, the government needs no further assistance from you, your patrons, or your vendors to read it. Even were they to present you with a valid subpoena, you would be powerless to resist it, because you have, in effect, already written the information on your walls; you have no technical ability to protect it.

The American Library Association urges all libraries to…

  • Limit the degree to which personally identifiable information is collected, monitored, disclosed, and distributed; and avoid creating unnecessary records; and
  • Limit access to personally identifiable information to staff performing authorized functions; and…
  • Ensure that the library work with its organization’s information technology unit to ensure that library usage records processed or held by the IT unit are treated in accordance with library records policies; and
  • Ensure that those records that must be retained are secure; and
  • Avoid library practices and procedures that place personally identifiable information on public view.”

Resolution on the Retention of Library Usage Records

If Adobe Digital Editions is part of your technical stack — if your library offers Overdrive or 3M Cloud Library or EBL or ebrary or Baker & Taylor Axis 360 or EBSCO or MyiLibrary or quite possibly other vendors I haven’t googled yet — you are not doing this. You cannot do this.

…ebook models make us choose. And I don’t mean choosing which catalog, or interface, or set of contract terms we want — though we do make those choices, and they matter. I mean that we choose which values to advance, and which to sacrifice. We’re making those values choices every time we sign a contract, whether we talk about it or not.
me, Library Journal, 2012

In 2012 I wrote and spoke about how the technical affordances, and legal restrictions, of ebooks make us choose among fundamental library values in a way that paper books have not. About how we were making those choices about values whether we made them explicitly or not. About how we default to choosing access over privacy.

We have chosen access over privacy, and privacy is not an option left for us to choose.

Because: don’t underestimate this. This is not merely a question of a technical slip-up in one version of an Adobe product.

This is about the fact that we do not have the technical skills to verify whether our products are in line with the values we espouse, the policies we hold, or even the contracts we sign, and we do not delegate this verification to others who do. Our failure to verify affects all the software we run.

This is about the fact that best practice in software is generally to log promiscuously; you’re trained, as a developer, to keep all the information, just in case it comes in handy. It takes a conscious choice (or a slipshod incompetence) not to do so. Libraries must demand that our vendors make that choice, or else we are in the awkward position of trusting to their incompetence. This affects all the software we run.

This is about the fact that encryption products are often hard to use, the fact that secure https is not yet the default everywhere, the fact that anyone can easily see traffic on the unencrypted wireless networks found at so many libraries, the fact that anyone with the password (which, if you’re a library, is everyone) can see all the traffic on encrypted networks too. This affects all the software we run.

This is about Adobe. It is not just about Adobe. These are questions we should ask of everything. These are audits we should be performing on everything. This affects all the software we run.

I am usually a middle-ground person. I see multiple sides to every argument, I entertain arguments that have shades of the abhorrent to find their shades of truth. This is not an issue where I can do that.

If you have chosen, whether actively or by default, to trust that the technical affordances of your software match both your contracts and your values, you have chosen to let privacy burn. If you’re content with that choice, have the decency to stand up and say it: to say that playing nice with your vendors matters more to you than this part or professional ethics, that protecting patron privacy is not on your list of priorities.

If you’re not content with that choice, it is time to set something else on fire.

LITA: Managing Library Projects: General Tips

Wed, 2014-10-08 13:00
Image courtesy of Joel Dueck. Flickr 2007.

During my professional career, both before and after becoming a Librarian, I’ve spent a lot of time managing projects, even when that wasn’t necessarily my specific role. I’ve experienced the joys of Project Management in a variety of settings and industries, from tiny software startups to large, established organizations. Along the way, I’ve learned that, while there are general concepts that are useful in any project setting, the specific processes and tools used needed to complete a specific project depend on the nature of the task at hand and the organization’s profile. Here are some general strategies to keep in mind when tackling a complex project:

Pay special attention to connection points

Unless your project is entirely contained within one department, there will be places in your workflow where interaction between two or more disparate units will take place. Each unit has its own processes and goals, which may or may not serve your project’s purposes, so it’s important that you as PM keep the overall goals of the project in mind and ensure that work is being done efficiently in terms of the project’s needs, not just the department’s usual workflow. Each unit will likely also have its own jargon, so you need to make sure that information is communicated accurately between parties. It’s at these connection points that the project is most likely to fail, so keep your eye on what happens here.

Don’t reinvent the wheel

While a cross-functional project will potentially require the creation of new workflows and processes, it’s not a good idea to force project participants to go about their work in a way that is fundamentally different from what they usually do. First, it will steepen the learning curve and reduce efficiency, and second, because these staff members are likely to be involved in multiple projects simultaneously, it will increase confusion and make it more difficult for them to correctly follow your guidelines for what needs to be done. Try to design your workflows so that they take advantage of existing processes within departments as much as possible, and increase efficiency by modifying the way departments interact with one another to maximize results.

Choose efficient tools, not shiny ones

Even in the wealthiest organizations, resources are always at a premium, so when picking tools to use in managing your project don’t fall for the beautiful picture on the front of the box. Consider the cost of a particular tool, both in terms of price and the learning curve involved in bringing everyone attached to the project up to speed on how to use it. Sometimes the investment will be worth it; often you will be better off with something simpler that project staff already know. You can create complex project plans with MS Project or Abak 360, but for most projects I find that a rudimentary scheduling spreadsheet and a couple of quick and dirty projection models, all created with MS Excel, will do just as well. Free web-based tools can also be useful: one of my favorites is Lucid Chart, a workflow diagram creation tool that can replace Visio for many applications (and offers pretty good deals for educational institutions). The main concerns with this type of approach are whether having your project plans stored in the cloud makes sense from a security point of view, and the potential for a particular tool to disappear unexpectedly (anyone remember Astrid?).


Those are a few of the strategies that I have found useful in managing projects. What’s your favorite project management tip?