You are here

Feed aggregator

Patrick Hochstenbach: Homework assignment #3 Sketchbookskool

planet code4lib - Sun, 2014-10-12 07:07
This week we were asked to go to a park and draw people using line art. It was raining, so I decided to go to the Church of Our Lady which is always a tourists attraction.   Filed under: Doodles

Cynthia Ng: Batch Appending a Single PDF to multiple PDFs

planet code4lib - Sun, 2014-10-12 03:59
So recently, I came up to the problem of having to add a page at the end of multiple PDFs.A couple of years ago, I’d done some work with GhostScript to merge a bunch of PDFs, so I thought I’d start there. Use Case I have a bunch of PDFs, and what I have is […]

DuraSpace News: The Archivematica + DuraCloud “Soup-to-Nuts” Preservation Service Launches a Beta Test

planet code4lib - Sun, 2014-10-12 00:00

Winchester, MA  The Archivematica + DuraCloud hosted service has launched a beta test with pilot partners that will be ongoing from October 2014 to January 2015.

Ensuring that robust Archivematica Archival Information Packages (AIPs) have a secure long-term home is the idea behind the new Archivematica + DuraCloud hosted service. The new integrated service is designed to provide users with a robust preservation workflow plus long-­term archiving in a single hosted solution.

John Miedema: Orlando: the lives and works of British women writers. Digital resources working together in unexpected and insightful ways.

planet code4lib - Sat, 2014-10-11 19:55

Orlando is a digital resource, indexing the lives and works of British women writers.

The full name of the project is, Orlando: Women’s Writing in the British Isles from the Beginnings to the Present. It is the work of scholars Susan Brown, Patricia Clements, and Isobel Grundy. The name of the work was inspired by Virginia Woolf’s 1928 novel, Orlando: A Biography. The project, like the novel, is an important resource in the history of women’s writing. It grew out of the limitations of a print-based publication, The Feminist Companion to Literature in English. The Companion presented a great deal of research on women writers but lacked an adequate index. The researchers decided to compile a digital index.

I have the good fortune to work with Susan Brown and the Orlando resource. I have extracted bibliographic and literary data from Orlando, and intend to integrate it with unstructured literary content using Natural Language Processing. The aim is a first demonstration of how digital resources like Orlando can provide new ways of reading and understanding literature. In particular I hope to show how digital resources can work together in unexpected and insightful ways.

More information:

The Orlando Project

Bigold, Melanie (2013) “Orlando: Women’s Writing in the British Isles from the Beginnings to the Present, edited by Susan Brown, Patricia
Clements, and Isobel Grundy,” ABO: Interactive Journal for Women in the Arts, 1640-1830: Vol. 3: Iss. 1, Article 8.
Available at:

Orlando: A Biography. Wikipedia


Open Library Data Additions: An error occurred

planet code4lib - Sat, 2014-10-11 10:12
The RSS feed is currently experiencing technical difficulties. The error is: Search engine returned invalid information or was unresponsive

Patrick Hochstenbach: My first VideoScribe project

planet code4lib - Sat, 2014-10-11 07:11
Trying out a little animation with VideoScribe to give an introduction into the services of Ghent University Library. The illustrations were created on paper using a fineliner. I scanned them and vector traced them in Adobe Illustrator (VideoScribe need to

FOSS4Lib Upcoming Events: Code3cme

planet code4lib - Sat, 2014-10-11 04:53
Date: Saturday, October 11, 2014 - 00:45 to Sunday, October 11, 2015 - 00:45Supports: DMP Online

Last updated October 11, 2014. Created by bunnychris on October 11, 2014.
Log in to edit this page.

Get enrolled for the refresher courses for a great medical career. To know more about the site click here .

LITA: Shifting & Merging

planet code4lib - Sat, 2014-10-11 00:39
McKenzie Pass, Ore. Courtesy of Ryan Shattuck. Task Easy Blog 2013.

It has been exactly seven weeks since I moved to Bloomington, Indiana, yet I finally feel like I have arrived. Let me rewind, quick, and tell you a little about my background. During my last two years of undergrad at the University of Nebraska-Lincoln (UNL), I spent my time working on as many Digital Humanities (DH) projects and jobs as I possibly could in the Center for Digital Research in the Humanities.

[DH is a difficult concept to define because everyone does it through various means, for various reasons. To me, it means using computational tools to analyze or build humanities projects. This way, we can find patterns we wouldn't see through the naked eye, or display physical objects digitally for greater access.]

By day, I studied English and Computer Science, and by night, my fingers scurried over my keyboard encoding poems, letters, and aphorisms. I worked at the Walt Whitman Archive, on an image analysis project with two brilliant professors, on text analysis and digital archives projects with leading professors in the fields, and on my own little project analyzing a historical newspaper. My classmates and I, both undergraduate and graduate, constantly talked about DH, what it is, who does it, how it is done, the technologies we use do it and how that differs from others.

Discovering an existing group of people already doing the same work you do is like merging onto a packed interstate where everyone is travelling at 80 miles per hour in the same direction. The thrill, the overwhelming “I know I am in the right place” feeling.

I chose Indiana University (IU) for my Library and Information Science degrees because I knew it was a hub for DH projects. I have an unparalleled opportunity working with Dr. John Walsh and Dr. Noriko Hara, both prominent DH and Information Science scholars.

However, I am impatient. After travelling on the DH interstate, I expected every classmate I met at IU to wear a button proclaiming, “I heart DH, let’s collaborate.” I half expected my courses to start from where I left off in my previous education. The beginning of the semester forced me to take a step back, to realize that I was shifting to a new discipline, and that I needed the basics first. My classes are satisfying my library love, but I was still missing that extra-curricular technology aspect, outside of my work for Dr. Walsh.

Then, one random, serendipitous meeting in the library and I was “zero to eighty” instantly. I met those DH students and learned about projects, initiatives, and IU networking. They reaffirmed that the community for which I was searching existed.

Since then, I have found others in the community and continue those same DH who, what, how, why conversations. While individual research is important, we can reach a higher potential through collaboration, especially in the digital disciplines. I am continuing to learn the importance of reaching out and learning from others, which I don’t believe will cease once I graduate. (Will it?)

I assure you that my future posts will be more closely related to library technology and digital humanities tools, but frankly, I’m new here. While I could talk about the library and information theory I’m learning, I will spare you those library school memories, and keep you updated on new technologies as I learn them.

In the meantime, I’ll ask you to reflect and share your experience transitioning to library school or into a library career. How were you first introduced to library technology or digital humanities? Any nuggets of advice for us beginners?

LITA: 2014 LITA Forum: 3 Amazing Keynotes

planet code4lib - Fri, 2014-10-10 17:10

Join your LITA colleagues in Albuquerque, Nov 5-8, 2041 for the 2014 LITA Forum.

This year’s Forum has three amazing keynotes you won’t want to miss:

AnnMarie Thomas, Engineering Professor, University of St. Thomas

AnnMarie is an engineering professor who spends her time trying to encourage the next generation of makers and engineers. Among a host of other activities she is the director of the Playful Learning Lab and leads a team of students looking at both the playful side of engineering (squishy circuits for students, the science of circus, toy design) and ways to use engineering design to help others. AnnMarie and her students developed Squishy Circuits.

Check out AnnMarie’s fun Ted Talk on Play-Doh based squishy circuits.

Lorcan Dempsey, Vice President, OCLC Research and Chief Strategist

Lorcan Dempsey oversees the research division and participates in planning at OCLC. He is a librarian who has worked for library and educational organizations in Ireland, England and the US.

Lorcan has policy, research and service development experience, mostly in the area of networked information and digital libraries. He writes and speaks extensively, and can be followed on the web at Lorcan Dempsey’s weblog and on twitter.

Kortney Ryan Ziegler, Founder Trans*h4ck

Kortney Ryan Ziegler is an Oakland based award winning artist, writer, and the first person to hold the Ph.D. of African American Studies from Northwestern University.

He is the director of the multiple award winning documentary, STILL BLACK: a portrait of black transmen, runs the GLAAD Media Award nominated blog, blac (k) ademic, and was recently named one of the Top 40 Under 40 LGBT activists by The Advocate Magazine and one of the most influential African Americans by TheRoot100.

Dr. Ziegler is also the founder of Trans*H4CK–the only tech event of its kind that spotlights trans* created technology, trans* entrepreneurs and trans* led startups.

See all the keynoters full bios at the LITA Forum Keynote Sessions web page

More than 30 concurrent colleague inspired sessions and a dozen poster sessions will provide a wealth of practical information on a wide range of topics. Networking opportunities, a major advantage of a smaller conference, are an important part of the Forum. Take advantage of the Thursday evening reception and sponsor showcase, the Friday networking dinners or Kitchen Table Conversations, plus meals and breaks throughout the Forum to get to know LITA leaders, Forum speakers, sponsors, and peers.

This year two preconference workshops will also be offered.

Linked Data for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users
Led by: Dean B. Krafft and Jon Corson-Rikert, Cornell University Library

Learn Python by Playing with Library Data
Led by: Francis Kayiwa, Kayiwa Consulting

2014 LITA Forums sponsors include EBSCO, Springshare, @mire, Innovative and OCLC.

Visit the LITA website for more information.

Library and Information Technology Association (LITA) members are information technology professionals dedicated to educating, serving, and reaching out to the entire library and information community.   LITA is a division of the American Library Association.

LITA and the LITA Forum fully support the Statement of Appropriate Conduct at ALA Conferences

OCLC Dev Network: WorldCat Discovery API and Linked Data

planet code4lib - Fri, 2014-10-10 14:00

This is the second post in our series introducing the WorldCat Discovery API. In our introductory remarks on the API, we told you about how the API can be used to power all aspects of resource discovery in your library. We also introduced some of the reasons why we chose entity-based bibliographic description for the API’s data serializations over more traditional API outputs. In this post we want to explore this topic even further and take a closer look at the Linked Data available in the WorldCat Discovery API.

Library of Congress: The Signal: Archiving from the Bottom Up: A Conversation with Howard Besser

planet code4lib - Fri, 2014-10-10 13:54

Howard Besser, Professor of Cinema Studies and Director of New York University’s Moving Image Archiving & Preservation Program and Senior Scientist for Digital Library Initiatives for NYU’s Library.

The following is a guest post from Julia Fernandez, this year’s NDIIPP Junior Fellow. Julia has a background in American studies and working with folklife institutions and worked on a range of projects leading up to CurateCamp Digital Culture in July. This is part of a series of interviews Julia conducted to better understand the kinds of born-digital primary sources folklorists, and others interested in studying digital culture, are making use of for their scholarship.

Continuing our NDSA Insights interview series, I’m delighted to interview Howard Besser, Director of New York University’s Moving Image Archiving & Preservation Program (MIAP) and Professor of Cinema Studies at NYU. He is also one of the founders of Activist Archivists, a group created in the fall of 2011 to coordinate the collection of digital media relating to the Occupy Wall Street political movement.

Julia: Could you tell us a bit about Activist Archivists?  What are the group’s objectives? What kinds of digital media are you exploring?

Howard: Activist Archivists began with the question of how archivists could help assure that digital media documenting the “Occupy” movement could be made to persist. This led us into a variety of interesting sub-areas: getting individuals making recordings to follow practices that are more archivable; documenting the corruption of metadata on YouTube and Vimeo; evangelizing for the adoption of Creative Commons licenses that would allow libraries and archives to collect and make available content created by an individual; making documenters aware that the material they create could be used against their friends; and a host of other sub-areas.

We focused mainly on moving images and sound, and to a lesser degree on still images.  As the Occupy movement began to dissipate, Activist Archivists morphed into a focus on community archiving that might be analog, digital or a hybrid. We worked with Third World Newsreel and Interference Archive and in 2014 produced the first Home Video Day in association with members of the NYC Asian American community and Downtown Community Television. And several Archivist Archivist members are on the planning committee for the 2015 Personal Digital Archiving Conference.

Julia: Could you tell us a bit about the digital materials you are working from? What made them an interesting source for you?

Peoples Library Occupy Wall Street 2011 Shankbone, shared by user David Shankbone on Flickr.

Howard: Working with Occupy, we were mainly dealing with sound and images recorded on cellphones. This was particularly interesting because of the lack of prior knowledge in the library/archiving community about how to employ the wealth of metadata that cellphones captured while recording images and sound. For example, it’s very easy to set a cellphone to capture geolocation information as part of the metadata coupled to every image or sound that is recorded. And this, of course, can raise privacy issues because a corpus of photos one takes creates an exact path of places that one has been. The other thing that made this project particularly interesting to me was how social media sites such as YouTube strip away so much metadata (including much that could be useful to archives and scholars).

Julia: What are some of the challenges of working with a “leaderless” and anti-establishment movement like Occupy?

Howard: It’s always difficult for people who have spent most of their lives in hierarchical environments to adapt to a bottom-up (instead of a top-down) structure. It means that each individual needs to take on more initiative and responsibility, and usually ends up with individuals becoming more intensively involved, and feeling like they have more of a stake in the issues and the work. I think that the toughest challenge that we experienced was that each time we met with an Occupy Committee or Group, we needed to start re-explaining things from scratch. Because each new meeting always included people who had not attended the previous meeting, we always had to start from the beginning. Other major problems we faced would always be true in all but the most severe hierarchical organizations: how do you get everyone in the organization to adopt standards or follow guidelines. This is an age-old problem that is seldom solved merely by orders from above.

Julia: Activist Archivists has printed a “Why Archive?” informational card that spells out the importance of groups taking responsibility for the record of their activity.  If libraries and archives wanted to encourage a more participatory mode of object and metadata gathering, what would you suggest? What would you want to see in how libraries and archives provide access to them?

Howard: One of the earliest issues we encountered with Occupy was the prevalent notion that history is documented in book-length essays about famous people. Many people in Occupy could not see that someone in the future might be interested in the actions of an ordinary person like them. Now, a third of a century after Howard Zinn’s “A People’s History Of The United States,” most progressive historians believe that history is made by ordinary individuals coming together to conduct acts in groups. And they believe that we can read history through archival collections of letters, post-cards and snapshots. Librarians, archivists and historians need to make the case to ordinary people that their emails, blogs and Flickr and Facebook postings are indeed important representations of early 21st century life that people in the future will want to access. And as librarians and archivists, we need to be aggressive about collecting these types of material and make concrete plans for access.

Julia: In a recent NDSA talk (PDF) you identified some of the challenges of archiving correspondence in the digital age. For one, “digital info requires a whole infrastructure to view it” and “each piece of that infrastructure is changing at an incredibly rapid rate”; and also “people no longer store their digital works in places over which they have absolute control,” opting instead for email services, cloud storage or social network services. What are some effective approaches you’ve seen to dealing with these challenges?

Howard: Only institutions that themselves are sustainable across centuries can commit to the types of continuous refreshing and either migration or emulation that are necessary to preserve born-digital works over time. Libraries, archives and museums are about the only long-term organizations that have preservation as one of their core missions, so effective long-term digital preservation is likely to only happen in these types of institutions. The critical issue is for these cultural institutions to get the born-digital personal collections of individuals into preservable shape (through file formats and metadata) early in the life-cycle of these works.

As we found in both the InterPARES II Project and the NDIIPP Preserving Digital Public Television Project (PDF), waiting until a digital collection is turned over to an archive (usually near the end of it’s life-cycle) is often too late to do adequate preservation (and even more difficult if the creator is dead). We either need to get creators to follow good practices (file formats, metadata, file-naming conventions, no compression, executing Creative Commons licenses, …) at the point of creation, or we need to get the creators to turn over their content to us shortly after creation. So we need to be aggressive about both offering training and guidelines and about collection development.

Updated 10/10/14 for typos.

Open Knowledge Foundation: Open Humanities Hack: 28 November 2014, London

planet code4lib - Fri, 2014-10-10 13:42

This is a cross-post from the DM2E-blog, see the original here

On Friday 28 November 2014 the second Open Humanities Hack event will take place at King’s College, London. This is the second in a series of events organised jointly by the King’s College London Department of Digital Humanities , the Digitised Manuscripts to Europeana (DM2E) project, the Open Knowledge Foundation and the Open Humanities Working Group

The event is focused on digital humanists and intended to target research-driven experimentation with existing humanities data sets. One of the most exciting recent developments in digital humanities include the investigation and analysis of complex data sets that require the close collaboration between Humanities and computing researchers. The aim of the hack day is not to produce complete applications but to experiment with methods and technologies to investigate these data sets so that at the end we can have an understanding of the types of novel techniques that are emerging.

Possible themes include but are not limited to

  • Research in textual annotation has been a particular strength of digital humanities. Where are the next frontiers? How can we bring together insights from other fields and digital humanities?

  • How do we provide linking and sharing humanities data that makes sense of its complex structure, with many internal relationships both structural and semantic. In particular, distributed Humanities research data often includes digital material combining objects in multiple media, and in addition there is diversity of standards for describing the data.

  • Visualisation. How do we develop reasonable visualisations that are practical and help build on overall intuition for the underlying humanities data set

  • How can we advance the novel humanities technique of network analysis to describe complex relationships of ‘things’ in social-historical systems: people, places, etc.

With this hack day we seek to form groups of computing and humanities researchers that will work together to come up with small-scale prototypes that showcase new and novel ways of working with humanities data.

Date: Friday 28 November 2014
Time: 9.00 – 21.00
Location: King’s College, Strand, London
Sign up: Attendance is free but places are limited: please fill in the sign-up form to register .

For an impression of the first Humanities Hack event, please check this blog report .

Open Knowledge Foundation: This Index is yours!

planet code4lib - Thu, 2014-10-09 20:23

How is your country doing with open data? You can make a difference in 5 easy steps to track 10 different datasets. Or, you can help us spread the word on how to contribute to the Open Data Index. This includes the very important translation of some key items into your local language. We’ll keep providing you week-by-week updates on the status of the community-driven project.

We’ve got a demo and some shareable slides to help you on your Index path.

Priority country help wanted

The amazing community provided content for over 70 countries last year. This year we set the bar higher with a goal of 100 countries. If you added details for your country last year, please be sure to add any updates this year. Also, we need some help. Are you from one of these countries? Do you have someone in your network who could potentially help? Please do put them in touch with the index team – index at okfn dot org.

DATASETS WANTED: Armenia, Bolivia, Georgia, Guyana, Haiti, Kosovo, Moldova, Morocco, Nicaragua, Ukraine, and Yemen.

Video: Demo and Tips for contributing to the Open Data Index

This is a 40 minute video with some details all about the Open Data Index, including a demo to show you how to add datasets.

Text: Tutorial on How to help build the Open Data Index

We would encourage you to download this, make changes (add country specific details), translate and share back. Please simply share on the Open Data Census Mailing List or Tweet us @okfn.

How to Global Open Data Index – Overview from School of Data

Thanks again for sharing widely!

District Dispatch: Libraries are early learning partners

planet code4lib - Thu, 2014-10-09 16:06

Photo by Lester Public Library

The American Library Association (ALA) urged the Department of Education in a letter (pdf) Wednesday to include public libraries as early learning partners in the Proposed Requirements for School Improvement Grants (SIG). The Association specifically asks that the Department of Education include public libraries as eligible entities and allowable partners under the new intervention model that focuses on improving early learning educational outcomes.

“The country’s 16,400 public libraries are prepared to support early childhood education, but we can only do so if policies allow for better collaboration, coordination, and real partnerships between libraries and the various federal early learning programs, including SIG grants,” said Emily Sheketoff, executive director of the ALA Washington Office, in a statement.

“Public libraries in communities across the country work tirelessly to support children and families by helping children develop early literacy and early learning skills,” said Andrew Medlar, vice president and president-elect of the Association for Library Service to Children (ALSC). “Our libraries are a foundation of our communities and are ready and willing to help children succeed.”

By offering reading materials, story times and summer reading programs, public libraries across the nation are supporting and complementing early learning efforts. According to a 2010 national survey of public libraries conducted by the Institute of Museum and Library Services (IMLS), public libraries offered 3.75 million programs to the public in 2010. The survey found that 2.31 million of those programs are designed for children aged 11 and younger. Another report found that the circulation of children’s materials in libraries has increased by 28.3 percent in the last ten years and comprises over one-third of all materials circulated in public libraries.

The ALA Washington Office and ALSC collaborated on the letter sent to the Department of Education.

The post Libraries are early learning partners appeared first on District Dispatch.

Eric Hellman: Correcting Misinformation on the Adobe Privacy Gusher

planet code4lib - Thu, 2014-10-09 15:29
We've learned quite a lot about Adobe Digital Editions version 4 (ADE4) since Nate Hoffelder broke the story that "Adobe is Spying on Users, Collecting Data on Their eBook Libraries". Unfortunately, there's also been a some bad information that's been generated along with the furor.

One thing that's clear is that Adobe Digital Editions version 4 is not well designed. It's also clear that our worst fears about ADE4 - that it was hunting down ALL the ebooks on a user's computer on installation and reporting them to Adobe - are not true. I've been participating with Nate and some techy people in the library and ebook world (including Galen, Liza, and Andromeda) to figure out what's really going on. It's looking more like an incompetently-designed, half-finished synchronization system than a spy tool. Follow Nate's blog to get the latest.
So, some misconceptions.
  1. The data being sent by ADE4 is NOT needed for digital rights management. We know this because the DRM still works if the Adobe "log" site is blocked. Also, we know enough about the Adobe DRM to know it's not THAT stupid.
  2. The data being sent by ADE4 is NOT part of the normal functioning of a well designed ebook reader. ADE4 is sending more than it needs to send even to accomplish synchronization. Also, ADE4 isn't really cloud-synchronizing devices, the way BlueFire is doing (well!).
On the legal side:
  1. The ADE4 privacy policy is NOT a magic incantation that makes everything it does legal. For example, all 50 states have privacy laws that cover library records. When ADE4 is used for library ebooks, the fact that it broadcasts a user's reading behavior makes it legally suspect. Even if the stream were encrypted, it's not clear that it would be legal.
  2. The NJ Reader Privacy Act is NOT an issue...yet. There's been no indication that it's been signed into law. If signed into law, and upheld, and found to apply, then Adobe would owe a lot of people in NJ $500.
  3. The California Reader Privacy Act is NOT relevant (as far as I can tell) because it's designed to protect against legal discovery and there's not been any legal process. However, California has a library privacy law.
  4. Europe might have more to say.
The bottom line for now is that ADE4 does not so much spy on you as it stumbles around in your closet and sometimes tells Adobe what it finds there. In a loud voice so everyone around can hear. And that's not a nice thing to do.

SearchHub: Introducing Our Solr Connector for Couchbase

planet code4lib - Wed, 2014-10-08 23:27
You already know that Lucidworks has connectors and plugins to integrate with dozens of data sources like Amazon S3 buckets, Hadoop filesystems and clusters, FTP sites, Azure blobs, cloud storage like Box and Dropbox, JDBC-enabled databases – even Twitter feeds. We’re happy to announce the latest release from the Lucidworks labs: Lucidworks Connector for Couchbase. Now you can join the power of Solr to one of the most popular (and powerful) NoSQL database servers out there – all with continuous real-time replication, quick recovery from network failures, and topology awareness. Here’s the product data sheet or download it and give it a whirl today. Also this past week, we joined the Couchbase crew at their annual conference, Couchbase Connect, just across Union Square from our San Francisco office at the Westin St. Francis. Lucidworks CEO Will Hayes (@IamWillHayes) delivered the keynote, taking the crowd on a fantastic voyage through The Data-Driven Paradigm. Here’s his deck: (video coming soon) The Data-Driven Paradigm from Lucidworks

LITA: Jobs in Information Technology: October 8

planet code4lib - Wed, 2014-10-08 17:20

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

IT Assistant Coordinator, Colorado State University, Fort Collins, CO

Visit the LITA Job Site for more available jobs and for information on submitting a  job posting.

District Dispatch: Watch and learn: Making the election connection for libraries

planet code4lib - Wed, 2014-10-08 16:47

On Monday, October 6, 2014, the American Library Association (ALA) and Advocacy Guru Stephanie Vance collaborated to host “Making the Election Connection,” an interactive webinar that explored the ways that library advocates can legally engage during an election season, as well as what types of activities have the most impact. Library supporters who missed Tuesday’s advocacy webinar now have access to the archived video.

Making the Election Connection from ALA Washington on Vimeo.

The post Watch and learn: Making the election connection for libraries appeared first on District Dispatch.

Library of Congress: The Signal: Astronomical Data & Astronomical Digital Stewardship: Interview with Elizabeth Griffin

planet code4lib - Wed, 2014-10-08 15:36

The following is a guest post from Jane Mandelbaum, co-chair of the National Digital Stewardship Alliance Innovation Working group and IT Project Manager at the Library of Congress.

Elizabeth Griffin is an astrophysicist at the Dominion Astrophysical Observatory in Victoria Canada.

As part of our ongoing series of Insights interviews with individuals doing innovative work related to digital preservation and stewardship, we are interested in talking to practitioners from other fields on how they manage and use data over time.

To that end, I am excited to explore some of these issues and questions with Elizabeth Griffin. Elizabeth is an astrophysicist at the Dominion Astrophysical Observatory in Victoria Canada. She is Chair of the International Astronomical Union Task Force for the Preservation and Digitization of Photographic Plates, and Chair of the Data at Risk Task Group of the International Council for Science Committee on Data for Science and Technology. Griffin presented on Preserving and Rescuing Heritage Information on Analogue Media (PDF) at Digital Preservation 2014. We’re interested in understanding how astronomers have been managing and using astronomical data and hope that others can learn from the examples of astronomers.

Jane: Do you think that astronomers deal with data differently than other scientists?

Elizabeth:  Not differently in principle – data are precious and need to be saved and shared – but the astronomical community has managed to get its act together efficiently, and is consequently substantially more advanced in its operation of data management and sharing than are other sciences.  One reason is that the community is relatively small compared to that of other natural sciences and its attendant international nature also requires careful attention to systems that have no borders.

Another is that its heritage records are photographic plates, requiring a Plate Archivist with duties to catalog what has been obtained; those archives contained a manageable amount of observations per observatory (until major surveys like the Sloan Digital Sky Survey became a practical possibility).  Thus, astronomers could always access past observations, even if only as photographs, so the advantages of archiving even analogue data was established from early times.

Jane: It is sometimes said that astronomers are the scientists who are closest to practitioners of digital preservation because they are interested in using and comparing historical data observations over time.  Do astronomers think they are in the digital preservation business?

Elizabeth: Astronomers  know (not just “think”!) that they are in the digital preservation business, and have established numerous accessible archives (mirrored worldwide) that enable researchers to access past data.  But “historical” indicates different degrees; if a star changes by the day, then yesterday’s (born-digital) data are “historical,” whereas for events that have timescales of the order of a century, then “historical” data must include analogue records on photographic plates.

In the former case, born-digital data abound worldwide; in the latter, they are only patchily preserved in digitized form.  But the same element of “change” applies throughout the natural sciences, not just in astronomy.  Think of global temperature changes and the attendant alterations to glacier coverage, stream flows, dependent flora and fauna, air pollution and so on.   Hand-written data in any of the natural sciences, be they ocean temperatures, weather reports, snow and ice measurements or whatever, all belong to modern research, and all relevant scientists have got to see themselves as being in the digital preservation business, and to devote an aliquot portion of their resources to nurturing those precious legacy data.

We have no other routes to the truth about environmental changes that are on a longer time-scale that our own personal memories or records take us.  Digital preservation of these types of data are vital for all aspects of knowledge regarding change in the natural world, and the scientists involved must join astronomers in being part of the digital preservation business.

Jane: What do you think the role of traditional libraries, museums and archives should be when dealing with astronomical data and artifacts?

Elizabeth: Traditional libraries and archives are invaluable for retaining and  preserving documents that mapped or recorded observations at any point in the past.  Some artifacts  also need to be preserved and displayed, because so often the precision which which measurements could be made (and thence the reliability of what was quoted as the “measuring error”) was dependent upon the technology of the time (for instance, the use of metals with low expansion coefficients in graduated scales, the accuracy with which graduation marks could be inscribed into metal, the repeatability of the ruling engine used to produce a diffraction grating, etc.).

There is also cultural heritage to be read in the historic books and equipment, and it is important to keep that link visible if only so as to retain a correct perspective of where we are now at.  Science advances by the way people customarily think and by what [new] information they can access to fuel that thinking, so understanding a particular line of argument or theory can depend importantly upon the culture of the day.

International Year of Astronomy (NASA, Chandra, 2/10/09) Messier 101 (M101) from NASA’s Marshall Space Flight Center on Flickr.

Jane: The word “innovation” is often used in the context of science and technology, and teaching science.  See for example: The Marshmallow Challenge.  How do you think the concept of “innovation” can be most effectively used?

Elizabeth: “Innovation” has become something of a buzz-word in modern science, particularly when one is groping for a new way to dress up an old project for a grant proposal!  The public must also be rather bemused by it, since so many new developments today are described as “innovative.” What is important is to teach the concept of thinking outside the box.  That is usually how “innovative” ideas get converted into new technologies – not just cranking the same old handle to tease out one more decimal place – so whether you label it “innovation” or something else, the principle of steering away from the beaten track, and working across scientific disciplines rather than entombing them within specialist ivory towers, is the essential factor in true progress.

Jane: “Big data” analysis is often cited as valuable for finding patterns and/or exceptions.  How does this relate to the work of astronomers?

Elizabeth: Very closely!  Astronomers invented the “Virtual Observatory” in the late 20th Century, with the express purpose of federating large data-sets (those resulting from major all-sky surveys, for instance) but at different wavelengths (say) or with other significantly different properties, so that a new dimension of science could be perceived/detected/extracted.  There are so very many objects in an astronomer’s “target list” (our Galaxy alone contains some 10 billion stars, though amongst those are very many different categories and types) and it was always going to be way beyond individual human power and effort to perform such federating unaided.  Concepts of “big data” analyses assist the astronomer very substantially in grappling with that type of new science, though obviously there are guidelines to respect, such as making all metadata conform to certain standards.

Jane: What do you think astronomers have to teach others about generating and using the increasing amounts of data you are seeing now in astronomy?

Elizabeth: A great deal, but the “others” also need to understand how we got to where we now are.  It was not easy; there was not the “plentiful funding” that some outsiders like to assume, and all along the way there were (and still are) internecine squabbles over competitions for limited grant funds: public data or individual research is never an easy balance to strike!  The main point is to design the basics of a system that can work, and to persevere with establishing what it involves.

The basic system needs to be dynamic – able to accommodate changing conditions and moving goal-posts – and to identify resources that will ensure long-term longevity and support.  One such resource is clearly the funding to maintain and operate dynamic, distributed databases of the sort that astronomers now find usefully productive; another is the trained personnel to operate, develop and expand the capabilities, especially in an ever-changing environment.  A third is the importance of educating early-career scientists in the relevance and importance of computing support for compute-intensive sciences.  That may sound tautological, but it is very true that behind every successful modern researcher is a dedicated computing expert.

Teamwork has been an essential ingredient in astronomers’ ability to access and re-purpose large amounts of data.  The Virtual Observatory was not built just by computing experts; at least one third of committee members are present-day research astronomers, able to give first-hand explanations or caveats, and to transmit practical ideas.  These aspects are important ingredients in the model.  At the same time, astronomers still have a very long way to go; only very limited amounts of their non-digital (i.e. pre-digital) data have so far made it to the electronic world; most observations from “history” were recorded on photographic plates and the faithful translation of those records into electronic images or spectra is a specialist task requiring specialist equipment.  One of the big battles which such endeavors face is even a familial one, with astronomer contending against astronomer: most want to go for the shiny and new things, not the old and less sophisticated ones, and it is an uphill task to convince one’s peers that progress is sometimes reached by moving backwards!

Jane: What do you think will be different about the type of data you will have available and use in 10 years or 20 years?

Elizabeth: In essence nothing, just as today we are using basically the same type of data that we have used for the past 100+ years.  But access to those data will surely be a bit different, and if wishes grew on trees then we will have electronic access to all our large archives of historic photographic observations and metadata, alongside our ever-growing digital databases of new observations.

Jane: Do astronomers share raw data, and if so, how? When they do share, what are their practices for assigning credit and value to that work? Do you think this will change in the future?

Elizabeth: The situation is not quite like that.  Professional observing is carried out at large facilities which are nationally or internationally owned and operated.  Those data do not therefore “belong” to the observer, though the plans for the observing, and the immediate results which the Principal Investigator(s) of the observing program may have extracted, are intellectual property owned by the P.I. or colleagues unless or until published.  The corresponding data may have limited access restrictions for a proprietary period (usually of the order of 1 year, but can be changed upon request).

Many of the data thus stored are processed by some kind of pipeline to remove instrumental signatures, and are therefore no longer “raw”; besides, raw data from space are telemetered to Earth and would have no intelligible content until translated by a receiving station and converted into images or spectra of some kind.  Credit to the original observing should be cited in papers emanating from the research that others carry out on the same data once they are placed in the public domain.  I hope that will not change in the future.  It is all too tempting for some “armchair” astronomers (one thinks particularly of theoreticians) who do not carry out their own observing proposals, but wait to see what they can cream off from public data archives.  That is of course above board, but those people do not always appreciate the subtleties of the equipment or the many nuances that may have affected the quality or content of the output.

Jane: Do astronomers value quantitative data derived from observations differently than images themselves?

Elizabeth: Yes, entirely.  The good scientist is a skeptic,  and one very effective driver for the high profile of our database management schemes is the undeniable truth that two separate researchers may get different quantitative data from the same initial image, be that “image” a direct image of the sky or of an object, or its spectrum.  The initial image is therefore the objective element that should ALWAYS be retained for others to use; the quantitative measurements now in the journal are very useful, but are always only subjective, and never definitive.

Jane: How do you think citizen science projects such as Galaxy Zoo can be used to make a case for preservation of data?

Elizabeth: There is a slight misunderstanding here, or maybe just a bad choice of example!  Galaxy Zoo is not a project in which citizens obtain and share data; the Galaxy data that are made available to the public have been acquired professionally with a major telescope facility; the telescope in question (the Sloan Telescope) obtained a vast number of sky images, and it is the classification of the many galaxies which those images show which constitute the “Galaxy Zoo” project.  There is no case to be made out of that project for the preservation of data, since the data (being astronomical!) are already, and will continue to be, preserved anyway.

Your question might be better framed if it referred (for instance) to something like eBird, in which individuals report numbers and dates of bird sightings in their locations, and ornithologists are then able to piece together all that incoming information and worm out of it new information like migration patterns, changes in those patterns, etc.  It is the publication of new science like that that helps to build the case for data preservation, particularly when the data in question are not statutorily preserved.


Subscribe to code4lib aggregator