Open Knowledge project The Public Domain Review is very proud to announce the launch of its very first book! Released through the newly born spin-off project the PDR Press, the book is a selection of weird and wonderful essays from the project’s first three years, and shall be (we hope) the first of an annual series showcasing in print form essays from the year gone by. Given that there’s three years to catch up on, the inaugural incarnation is a special bumper edition, coming in at a healthy 346 pages, and jam-packed with 146 illustrations, more than half of which are newly sourced especially for the book.
Spread across six themed chapters – Animals, Bodies, Words, Worlds, Encounters and Networks – there is a total of thirty-four essays from a stellar line up of contributors, including Jack Zipes, Frank Delaney, Colin Dickey, George Prochnik, Noga Arikha, and Julian Barnes.
What’s inside? Volcanoes, coffee, talking trees, pigs on trial, painted smiles, lost Edens, the social life of geometry, a cat called Jeoffry, lepidopterous spying, monkey-eating poets, imaginary museums, a woman pregnant with rabbits, an invented language drowning in umlauts, a disgruntled Proust, frustrated Flaubert… and much much more.
Order by 26th November to benefit from a special reduced price and delivery in time for Christmas.
If you are wanting to get the book in time for Christmas (and we do think it is a fine addition to any Christmas list!), then please make sure to order before midnight (PST) on 26th November. Orders place before this date will also benefit from a special reduced price!
Please visit the dedicated page on The Public Domain Review site to learn more and also buy the book!
Last updated November 18, 2014. Created by Peter Murray on November 18, 2014.
Log in to edit this page.
The PERICLES Extraction Tool (PET) is an open source (Apache 2 licensed) Java software for the extraction of significant information from the environment where digital objects are created and modified. This information supports object use and reuse, e.g. for a better long-term preservation of data. The Tool was developed entirely for the PERICLES EU project http://www.pericles-project.eu/ by Fabio Corubolo, University of Liverpool, and Anna Eggers, Göttingen State and University Library.Package Type: Data Preservation and ManagementLicense: Apache 2.0 Package Links In DevelopmentOperating System: LinuxMacWindows Releases for PERICLES Extraction Tool
- PERICLES Extraction Tool - 1.0 30-Oct-2014
Winchester, MA In October 26 participants from 16 institutions attended a German DSpace User Group Meeting hosted by the University Library of the Technische Universität Berlin.
Winchester, MA Hot Topics: The DuraSpace Community Webinar Series present big picture strategic issues by matching community experts with current topics of interest. Each webinar is recorded and made available at http://duraspace.org/hot-topics.
DuraSpace News: Yaffle: Memorial University’s VIVO-Based Solution to Support Knowledge Mobilization in Newfoundland and Labrador
One particular VIVO project that demonstrates the spirit of open access principles is Yaffle. Many VIVO implementations provide value to their host institutions, ranging from front-end access to authoritative organizational information to highlights of works created in the social sciences and arts and humanities. Yaffle extends beyond its host institution and provides a cohesive link between Memorial University and citizens from Newfoundland and Labrador. The prospects for launching Yaffle in other parts of Canada will be realized in the near future.
After writing about the Ferguson Twitter archive a few months ago three people have emailed me out of the blue asking for access to the data. One was a principal at a small, scaryish defense contracting company, and the other two were from a prestigious university. I’ve also had a handful of people interested where I work at the University of Maryland.
I ignored the defense contractor. Maybe that was mean, but I don’t want to be part of that. I’m sure they can go buy the data if they really need it. My response to the external academic researchers wasn’t much more helpful since I mostly pointed them to Twitter’s Terms of Service which says:
If you provide Content to third parties, including downloadable datasets of Content or an API that returns Content, you will only distribute or allow download of Tweet IDs and/or User IDs.
You may, however, provide export via non-automated means (e.g., download of spreadsheets or PDF files, or use of a “save as” button) of up to 50,000 public Tweets and/or User Objects per user of your Service, per day.
Any Content provided to third parties via non-automated file download remains subject to this Policy.
It’s my understanding that I can share the data with others at the University of Maryland, but I am not able to give it to the external parties. What I can do is give them the Tweet IDs. But there are 13,480,000 of them.
So that’s what I’m doing today: publishing the tweet ids. You can download them from the Internet Archive:
I’m making it available using the CC-BY license.Hydration
On the one hand, it seems unfair that this portion of the public record is unshareable in its most information rich form. The barrier to entry to using the data seems set artificially high in order to protect Twitter’s business interests. These messages were posted to the public Web, where I was able to collect them. Why are we prevented from re-publishing them since they are already on the Web? Why can’t we have lots of copies to keep stuff safe? More on this in a moment.
Twitter limits users to 180 requests every 15 minutes. A user is effectively a unique access token. Each request can hydrate up to 100 Tweet IDs using the statuses/lookup REST API call.180 requests * 100 tweets = 18,000 tweets/15 min = 72,000 tweets/hour
So to hydrate all of the 13,480,000 tweets will take about 7.8 days. This is a bit of a pain, but realistically it’s not so bad. I’m sure people doing research have plenty of work to do before running any kind of analysis on the full data set. And they can use a portion of it for testing as it is downloading. But how do you download it?
Gnip, who were recently acquired by Twitter, offer a rehydration API. Their API is limited to tweets from the last 30 days, and similar to Twitter’s API you can fetch up to 100 tweets at a time. Unlike the Twitter API you can issue a request every second. So this means you could download the results in about 1.5 days. But these Ferguson tweets are more than 30 days old. And a Gnip account costs some indeterminate amount of money, starting at $500…
I suspect there are other hydration services out there. But I adapted twarc the tool I used to collect the data, which already handled rate-limiting, to also do hydration. Once you have the tweet IDs in a file you just need to install twarc, and run it. Here’s how you would do that on an Ubuntu instance:sudo apt-get install python-pip sudo pip install twarc twarc.py --hydrate ids.txt > tweets.json
After a week or so, you’ll have the full JSON for each of the tweets.Archive Fever
Well, not really. You will have most of them. But you won’t have the ones that have been deleted. If a user decided to remove a Tweet they made, or decided to remove their account entirely you won’t be able to get their Tweets back from Twitter using their API. I think it’s interesting to consider Twitter’s Terms of Service as what Katie Shilton would call a value lever.
The metadata rich JSON data (which often includes geolocation and other behavioral data) wasn’t exactly posted to the Web in the typical way. It was made available through a Web API designed to be used directly by automated agents, not people. Sure, a tweet appears on the Web but it’s in with the other half a trillion Tweets out on the Web, all the way back to the first one. Requiring researchers to go back to the Twitter API to get this data and not allowing it circulate freely in bulk means that users have an opportunity to remove their content. Sure it has already been collected by other people, and it’s pretty unlikely that the NSA are deleting their tweets. But in a way Twitter is taking an ethical position for their publishers to be able to remove their data. To exercise their right to be forgotten. Removing a teensy bit of informational toxic waste.
As any archivist will tell you, forgetting is an essential and unavoidable part of the archive. Forgetting is the why of an archive. Negotiating what is to be remembered and by whom is the principal concern of the archive. Ironically it seems it’s the people who deserve it the least, those in positions of power, who are often most able to exercise their right to be forgotten. Maybe putting a value lever back in the hands of the people isn’t such a bad thing. If I were Twitter I’d highlight this in the API documentation. I think we are still learning how the contours of the Web fit into the archive. I know I am.
If you are interested in learning more about value levers you can download a pre-print of Shilton’s Value Levers: Building Ethics into Design.
The Centre for Research in Occupational Safety and Health asked me to give a lunch'n'learn presentation on ResearchGate today, which was a challenge I was happy to take on... but I took the liberty of stretching the scope of the discussion to focus on social networking in the context of research and academics in general, recognizing four high-level goals:
- Promotion (increasing citations, finding work positions)
- Finding potential collaborators
- Getting advice from experts in your field
- Accessing other's work
I'm a librarian, so naturally my take veered quickly into the waters of copyright concerns and the burden (to the point of indemnification) that ResearchGate, Academia.edu, Mendeley, and other such services put on their users to ensure that they are in compliance with copyright and the researchers' agreements with publishers... all while heartily encouraging their users to upload their work with a single click. I also dove into the darker waters of r/scholar, LibGen, and SciHub, pointing out the direct consequences that our university has suffered due to the abuse of institutional accounts at the library proxy.
Happily, the audience opened up the subject of publishing in open access journals--not just from a "covering our own butts" perspective, but also from the position of the ethical responsibility to share knowledge as broadly as possible. We briefly discussed the open access mandates that some granting agencies have put in place, particularly in the States, as well as similar Canadian initiatives that have occurred or are still emerging with respect to public funds (SSHRC and the Tri-Council). And I was overjoyed to hear a suggestion that, perhaps, research funded by the Laurentian University Research Fund should be required to publish in an open access venue.
I'm hoping to take this message back to our library and, building on Kurt de Belder's vision of the library as a Partner in Knowledge help drive our library's mission towards assisting researchers in not only accessing knowledge, but most effectively sharing and promoting the knowledge they create.
That leaves lots of work to do, based on one little presentation
Resources may be divided into groups called classes. The members of a class are known as instances of the class. Classes are themselves resources. They are often identified by IRIs and may be described using RDF properties. The rdf:type property may be used to state that a resource is an instance of a class.This seems simple, but it is in fact one of the primary areas of confusion about RDF.
If you are not a programmer, you probably think of classes in terms of taxonomies -- genus, species, sub-species, etc. If you are a librarian you might think of classes in terms of classification, like Library of Congress or the Dewey Decimal System. In these, the class defines certain characteristics of the members of the class. Thus, with two classes, Pets and Veterinary science, you can have:
- catsIn each of those, dogs and cats have different meaning because the class provides a context: either as pets, or information about them as treated in veterinary science.
For those familiar with XML, it has similar functionality because it makes use of nesting of data elements. In XML you can create something like this:
</drink>and it is clear which price goes with which type of drink, and that the bits directly under the <drink> level are all drinks, because that's what <drink> tells you.
Now you have to forget all of this in order to understand RDF, because RDF classes do not work like this at all. In RDF, the "classness" is not expressed hierarchically, with a class defining the elements that are subordinate to it. Instead it works in the opposite way: the descriptive elements in RDF (called "properties") are the ones that define the class of the thing being described. Properties carry the class information through a characteristic called the "domain" of the property. The domain of the property is a class, and when you use that property to describe something, you are saying that the "something" is an instance of that class. It's like building the taxonomy from the bottom up.
This only makes sense through examples. Here are a few:
1. "has child" is of domain "Parent".
If I say "X - has child - 'Fred'" then I have also said that X is a Parent because every thing that has a child is a Parent.
2. "has Worktitle" is of domain "Work"
If I say "Y - has Worktitle - 'Der Zauberberg'" then I have also said that Y is a Work because every thing that has a Worktitle is a Work.
In essence, X or Y is an identifier for something that is of unknown characteristics until it is described. What you say about X or Y is what defines it, and the classes put it in context. This may seem odd, but if you think of it in terms of descriptive metadata, your metadata describes the "thing in hand"; the "thing in hand" doesn't describe your metadata.
Like in real life, any "thing" can have more than one context and therefore more than one class. X, the Parent, can also be an Employee (in the context of her work), a Driver (to the Department of Motor Vehicles), a Patient (to her doctor's office). The same identified entity can be an instance of any number of classes.
"has child" has domain "Parent"
"has licence" has domain "Driver"
"has doctor" has domain "Patient"
X - has child - "Fred" = X is a Parent
X - has license - "234566" = X is a Driver
X - has doctor - URI:765876 = X is a PatientClasses are defined in your RDF vocabulary, as as the domains of properties. The above statements require an application to look at the definition of the property in the vocabulary to determine whether it has a domain, and then to treat the subject, X, as an instance of the class described as the domain of the property. There is another way to provide the class as context in RDF - you can declare it explicitly in your instance data, rather than, or in addition to, having the class characteristics inherent in your descriptive properties when you create your metadata. The term used for this, based on the RDF standard, is "type," in that you are assigning a type to the "thing." For example, you could say:
X - is type - Parent
X - has child - "Fred"This can be the same class as you would discern from the properties, or it could be an additional class. It is often used to simplify the programming needs of those working in RDF because it means the program does not have to query the vocabulary to determine the class of X. You see this, for example, in BIBFRAME data. The second line in this example gives two classes for this entity:
a bf:Instance, bf:Monograph .
One thing that classes do not do, however, is to prevent your "thing" from being assigned the "wrong class." You can, however, define your vocabulary to make "wrong classes" apparent. To do this you define certain classes as disjoint, for example a class of "dead" would logically be disjoint from a class of "alive." Disjoint means that the same thing cannot be of both classes, either through the direct declaration of "type" or through the assignment of properties. Let's do an example:
"residence" has domain "Alive"
"cemetery plot location" has domain "Dead"
"Alive" is disjoint "Dead" (you can't be both alive and dead)
X - is type - "Alive" (X is of class "Alive")
X - cemetery plot location - URI:9494747 (X is of class "Dead") Nothing stops you from creating this contradiction, but some applications that try to use the data will be stumped because you've created something that, in RDF-speak, is logically inconsistent. What happens next is determined by how your application has been programmed to deal with such things. In some cases, the inconsistency will mean that you cannot fulfill the task the application was attempting. If you reach a decision point where "if Alive do A, if Dead do B" then your application may be stumped and unable to go on.
All of this is to be kept in mind for the next blog post, which talks about the effect of class definitions on bibliographic data in RDF.
LITA has multiple learning opportunities available over the next several months. Hot topics to keep your brain warm over the winter.
Re-Drawing the Map Series
Presenters: Mita Williams and Cecily Walker
Offered: November 18, 2014, December 9, 2014, and January 6, 2015
All: 1:00 pm – 2:00 pm Central Time
Top Technologies Every Librarian Needs to Know
Presenters: Brigitte Bell, Steven Bowers, Terry Cottrell, Elliot Polak and Ken Varnum,
Offered: December 2, 2014
1:00 pm – 2:00 pm Central Time
Getting Started with GIS
Instructor: Eva Dodsworth, University of Waterloo
Offered: January 12 – February 9, 2015
For details and registration check out the fuller descriptions below and follow the links to their full web pages
Join LITA Education and instructors Mita Williams and Cecily Walker in “Re-drawing the Map”–a webinar series! Pick and choose your favorite topic. Can’t make all the dates but still want the latest information? Registered participants will have access to the recorded webinars.
Here’s the individual sessions.
Web Mapping: moving from maps on the web to maps of the web
Tuesday Nov. 18, 2014
1:00 pm – 2:00 pm Central Time
Instructor: Mita Williams
Get an introduction to web mapping tools and learn about the stories they can help you to tell!
OpenStreetMaps: Trust the map that anyone can change
Tuesday December 9, 2014,
1:00 pm – 2:00 pm Central Time
Instructor: Mita Williams
Ever had a map send you the wrong way and wished you could change it? Learn how to add your local knowledge to the “Wikipedia of Maps.”
Coding maps with Leaflet.js
Tuesday January 6, 2015,
1:00 pm – 2:00 pm Central Time
Instructor: Cecily Walker
Register Online page arranged by session date (login required)
We’re all awash in technological innovation. It can be a challenge to know what new tools are likely to have staying power — and what that might mean for libraries. The recently published Top Technologies Every Librarian Needs to Know highlights a selected set of technologies that are just starting to emerge and describes how libraries might adapt them in the next few years.
In this webinar, join the authors of three chapters as they talk about their technologies and what they mean for libraries.
December 2, 2014
1:00 pm – 2:00 pm Central Time
Hands-Free Augmented Reality: Impacting the Library Future
Presenters: Brigitte Bell & Terry Cottrell
The Future of Cloud-Based Library Systems
Presenters: Elliot Polak & Steven Bowers
Library Discovery: From Ponds to Streams
Presenter: Ken Varnum
Register Online page arranged by session date (login required)
Getting Started with GIS is a three week course modeled on Eva Dodsworth’s LITA Guide of the same name. The course provides an introduction to GIS technology and GIS in libraries. Through hands on exercises, discussions and recorded lectures, students will acquire skills in using GIS software programs, social mapping tools, map making, digitizing, and researching for geospatial data. This three week course provides introductory GIS skills that will prove beneficial in any library or information resource position.
No previous mapping or GIS experience is necessary. Some of the mapping applications covered include:
- Introduction to Cartography and Map Making
- Online Maps
- Google Earth
- KML and GIS files
- ArcGIS Online and Story Mapping
- Brief introduction to desktop GIS software
Instructor: Eva Dodsworth, University of Waterloo
Offered: January 12 – February 9, 2015
Register Online page arranged by session date (login required)
Questions or Comments?
For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty, email@example.com.
Every two years the International Budget Partnership (IBP) runs a survey, called the Open Budget Survey, to evaluate formal oversight of budgets, how transparent governments are about their budgets and if there are opportunities to participate in the budget process. To easily measure and compare transparency among the countries surveyed, IBP created the Open Budget Index where the participating countries are scored and ranked using about two thirds of the questions from the Survey. The Open Budget Index has already established itself as an authoritative measurement of budget transparency, and is for example used as an eligibility criteria for the Open Government Partnership.
However, countries do not release budget information every two years; they should do so regularly, on multiple occasions in a given year. There is, however, as stated above a two year gap between the publication of consecutive Open Budget Survey results. This means that if citizens, civil society organisations (CSOs), media and others want to know how governments are performing in between Survey releases, they have to undertake extensive research themselves. It also means that if they want to pressure governments into releasing budget information and increase budget transparency before the next Open Budget Index, they can only point to ‘official’ data which can be up to two years old.
To combat this, IBP, together with Open Knowledge, have developed the Open Budget Survey Tracker (the OBS Tracker), http://obstracker.org,: an online, ongoing budget data monitoring tool, which is currently a pilot and covers 30 countries. The data are collected by researchers selected among the IBP’s extensive network of partner organisations, who regularly monitor budget information releases, and provide monthly reports. The information included in the OBS Tracker is not as comprehensive as the Survey, because the latter also looks at the content/comprehensiveness of budget information — not only the regularity of its publication. The OBS Tracker, however, does provide a good proxy of increasing or decreasing levels of budget transparency, measured by the release to (or witholding from) the public of key budget documents. This is valuable information for concerned citizens, CSOs and media.
With the Open Budget Survey Tracker, IBP has made it easier for citizens, civil society, media and others to monitor, in near real time (monthly), whether their central governments release information on how they plan to and how they spend the public’s money. The OBS Tracker allows them to highlight changes and facilitates civil society efforts to push for change when a key document has not been released at all, or not in a timely manner.
Niger and Kyrgyz Republic have improved the release of essential budget information after the latest Open Budget Index results, something which can be seen from the OBS Tracker without having to wait for the next Open Budget Survey release. This puts pressure on other countries to follow suit.
The budget cycle is a complex process which involves creating and publishing specific documents at specific points in time. IBP covers the whole cycle, by monitoring in total eight documents which include everything from the proposed and approved budgets, to a citizen-friendly budget representation, to end-of-the-year financial reporting and the auditing from a country’s Supreme Audit Institution.
In each of the countries included in the OBS Tracker, IBP monitors all these eight documents showing how governments are doing in generating these documents and releasing them on time. Each document for each country is assigned a traffic light color code: Red means the document was not produced at all or published too late. Yellow means the document was only produced for internal use and not released to the general public. Green means the document is publicly available and was made available on time. The color codes help users quickly skim the status of the world as well as the status of a country they’re interested in.
To make monitoring even easier, the OBS Tracker also provides more detailed information about each document for each country, a link to the country’s budget library and more importantly the historical evolution of the “availability status” for each country. The historical visualisation shows a snapshot of the key documents’ status for that country for each month. This helps users see if the country has made any improvements on a month-by-month basis, but also if it has made any improvements since the last Open Budget Survey.
Is your country being tracked by the OBS Tracker? How is it doing? If they are not releasing essential budget documents or not even producing them, start raising questions. If your country is improving or has a lot of green dots, be sure to congratulate the government; show them that their work is appreciated, and provide recommendations on what else can be done to promote openness. Whether you are a government official, a CSO member, a journalist or just a concerned citizen, OBS Tracker is a tool that can help you help your government.
The new date for the November WMS Web services install is this Sunday, November 23rd. This install will include changes to two of our WMS APIs.
Imagine you’re a legal scholar and you’re examining the U.S. Supreme Court decisions of the late nineties to mid-two thousands and you want to understand what resources were consulted to support official opinions. A study in the Yale Journal of Law and Technology indicates you would find that only half of the nearly 555 URL links cited in Supreme Court opinions since 1996 would still work. This problem has been widely discussed in the media and the Supreme Court has indicated it will print all websites cited and place the printouts in physical case files at the Supreme Court, available only in Washington, DC.
On October 24, 2014 Georgetown University Law Library hosted a one-day symposium on this problem which has been studied across legal scholarship and other academic works. The meeting, titled 404/File Not Found: Link Rot, Legal Citation and Projects to Preserve Precedent, presented a broad overview of why websites disappear, why this is particularly problematic in the legal citation context and the proposal of actual solutions and strategies to addressing the problem.
The keynote address was given by Jonathan Zittrain, George Bemis Professor of Law at Harvard Law School. A video of his presentation is now available from the meeting website. In it he details a service created by Harvard Law School Libraries and other law libraries called Perma.cc that allows those with an account to submit links that can be archived at a participating library. The use case for Perma.cc is to support links in new forms of academic and legal writing. Today, over 26,000 links have been archived.
Herbert Van de Sompel of the Los Alamos National Laboratory also demonstrated the Memento browser plug-in that allows users who’ve downloaded the plug-in to see archived versions of a website (if that website has been archived) while they are using the live web. The Internet Archive, The British Library, the UK National Archives and other archives around the world all provide archived versions of websites through Memento. The Memento protocol has been widely implemented, integrated in MediaWiki sites and supports “time travel” to old websites that cover all topics.
Both solutions, Perma.cc and Memento, depend on action by, and coordination of, organizations and individuals who are affected by the linkrot problem. At the end of his presentation Van de Sompel reiterated that technical solutions exist to deal with linkrot; what is still needed is broad participation in the selection, collection and archiving of web resources and a sustainable and interoperable infrastructure of tools and services, like Memeno and Perma.cc, that connect the archived versions of website with the scholars, researchers and users that want to access them today and into the future.
Michael Nelson of Old Dominion University, a partner in developing Memento, posted notes on the symposium presentations. For even more background and documentation on the problem of linkrot, the meeting organizers collected a list of readings. The symposium demonstrated the ability of a community, in this case, law librarians, to come together to address a problem in their domain, the results of which benefit the larger digital stewardship community and serve as models for coordinated action.
It’s been a little over a month since we launched GIF IT UP, an international competition to find the best GIFs reusing public domain and openly licensed digital video, images, text, and other material available via DPLA and DigitalNZ. Since then we’ve received dozens of wonderful submissions from all over the world, all viewable in the competition gallery.
The winners of GIF IT UP will have their work featured and celebrated online at the Public Domain Review and Smithsonian.com. Haven’t submitted an entry yet? Well, what are you waiting for? Submit a GIF!About GIF IT UP
How it works. The GIF IT UP competition has six categories:
- Planes, trains, and other transport
- Nature and the environment
- Your hometown, state, or province
- WWI, 1914-1918
- GIF using a stereoscopic image
- Open category (any reusable material from DigitalNZ or DPLA)
A winner will be selected in each of these categories and, if necessary, a winner will be awarded in two fields: use of an animated still public domain image, and use of video material.
To view the competition’s official homepage, visit http://dp.la/info/gif-it-up/.
Judging. GIF IT UP will be co-judged by Adam Green, Editor of the Public Domain Review and by Brian Wolly, Digital Editor of Smithsonian.com. Entries will be judged on coherence with category theme (except for the open category), thoroughness of entry (correct link to source material and contextual information), creativity, and originality.
Gallery. All entries that meet the criteria outlined below in the Guidelines and Rules will be posted to the GIF IT UP Tumblr Gallery. The gallery entries with the most amount of Tumblr “notes” will receive the people’s choice award and will appear online at the Public Domain Review and Smithsonian.com alongside the category winners.
Deadline. The competition deadline is December 1, 2014 at 5:00 PM EST / December 2, 2014 at 10:00 AM GMT+13.
GIFtastic Resources. You can find more information about GIF IT UP–including select DPLA and DigitalNZ collections available for re-use and a list of handy GIF-making tips and tools–over on the GIF IT UP homepage.
[This is the second in a short series on our 2014 OCLC Research Library Partnership meeting, Libraries and Research: Supporting Change/Changing Support. You can read the first post and also refer to the event webpage contains links to slides, videos, photos, Storify summaries.]Anja Smit (University Librarian at Utrecht University) [link to video] chaired this session which focused on the ways in which libraries are or could be supporting eScholarship. In opening she shared a story that reflects how the library is really a creature of the larger institution. At Utrect the library engaged in scenario planning* and identified their future as being all about open access and online access to sources. When they brought faculty in to comment on their plans, they were told that they were “going too fast” and that they needed to slow down. Sometimes researchers request services and sometimes the library just acts to fill a void. But innovation is not only starting but also stopping. The Utretch experience with VREs are an example of a well-reasoned library “push” of services – thought they would have 200 research groups actively using the VRE but only 25 took it up. Annotated books on the other hand is an example of “pull,” something requested by researchers. Dataverse (a network for storing data) started as a service in the library that was needed by faculty but ultimately moved to DANS due to scale and infrastructure issues. The decision to discontinue local search was a “pull” decision, based on evidence that researchers were not using it. Ultimately, librarians need to be “embedded” in researcher workflows. If we don’t know what they are doing, we won’t be able to help them.
Ricky Erway (Senior Program Officer, OCLC Research) [link to video] gave her own story of push and pull — OCLC Research was asked by the Research Information Management Interest Group to “do something about digital humanities”. The larger question was, where can libraries make a unique contribution? Ricky and colleague Jennifer Schaffner immersed themselves in the researchers’ perspective regarding processes, issues, and needs, and then tried to see where the library might fill gaps. Their paper [Does Every Research Library Need a Digital Humanities Center?] was written for library directors not already engaged with digital humanities. The answer to the question posed in the title of the paper is, “It depends.” The report suggests that a constellation of engagement possibilities should be considered based on local needs. Start with what you are already offering and ensure that researchers are aware of those services. Scholars enthusiasm for metadata was a surprising finding — humanities researchers use and value metadata sources such as VIAF. (Colleague Karen Smith-Yoshimura has previously blogged about contributions to VIAF from the Syriac scholarly community and contributions from the Perseus Catalog.) A challenge for libraries is figuring out, when to support, when to collaborate, and when to lead. There is no one size fits all in digital humanities and libraries — not only is it the case that “changes in research are not evenly distributed,” but also every library has its own set of strengths and services which may be good matches for local needs.
Adam Farquhar (Head of Digital Scholarship at the British Library) [link to video] talked about what happens when large digital collections are brought together with scholars. Adam’s role, in brief is to get the British Library’s digital collections into the hands of scholars so they can create knowledge. Adam and his team have been trying to find ways to take advantage of the digital qualities of digital collections — up to now, most libraries have treated digital collections the same as print collections apart from delivery. This is a mistake, because there are unique aspects to large-scale digital collections and we should be leveraging them. The British Library has a cross-disciplinary team which is much needed for tackling the challenges at hand. Rather than highlighting the broad range of projects being undertaken at the BL, Adam chose instead to focus on a few small, illustrative examples. In the British Library Labs, developers are invited to sit alongside scholars and co-evolve projects and solutions. The BL Labs Competition is a challenge to encourage people to put forward interesting projects and needs. Winners of the 2014 competition included one from Australia (showing that there is global interest in the BL’s collections). One winner is the Victorian Meme Machine, which will pair Victorian jokes with likely images to illustrate what makes Victorian jokes funny. Another project extracted images from digitized books and put a million images on Flickr (where people go to look for images, not for books). These images have received 160 million views in the last year. These are impressive metrics especially when you consider that previously no one alive had looked any of those images. Now lots of people have and they have been used in a variety of ways, from an art piece at Burning Man, to serious research, to commercial use. Adam’s advice? Relax and take a chance on release of information into the public domain.
Antal van den Bosch (Professor at the Radboud University Nijmegen) [link to video] spoke from his perspective as a researcher. Scientists have long had the ability to shift from first gear (working at the chalkboard) to 5th or 6th gear (doing work on the Large Hadron Collider). Humanists have recently discovered that there is a 3rd or 4th gear and want to go there. In the humanities there is fast and slow scholarship. In his own field, linguistics and computer science, there is no data like more data. Large, rich corpuses are highly valued (and more common over time). One example is Twitter – in the Netherlands, seven million Tweets a day are generated and collected by his institute. Against this corpus, researchers can study the use of language at different times of day and use location metadata to identify use of regional dialect. Another example is the HiTiME (Historical Timeline Mining and Extraction) project which uses linked data in historical sources to enable the study of social movements in Europe. Within texts, markup of persons, locations, and events allow visualizations including timelines and social networks. Analysis of newspaper archives revealed both labor strikes that happened and those that didn’t. However, library technology was not up to the task of keeping up with the data so that findings were not repeatable, underscoring the need for version control and adequate technological underpinnings. Many times in these projects the software goes along with the data, so storing both data and code is important. Most researchers are not sure where to put their research data and may be using cloud storage like GitHub. Advice and guidance are all well and good but what researchers really need is storage, and easy to use services (“an upload button, basically”). In the Netherlands and in Europe, there are long tail storage solutions for e-research data. Many organizations and institutions say “here, let me help you with that.” Libraries seem well situated to help with metadata, but researchers want full text search against very big data sets like Twitter or Google Books. Libraries should be asking themselves if they can host something that big. If libraries can’t offer collections like these, at scale, researchers may not be interested. On the other hand in the humanities which has a “long tail of small topics,” there are many single researchers doing small research projects and here the library may be well positioned to help.
If you are interested in more details you can watch the discussion session that followed:
I’ll be back later to summarize the last two segments of the meeting.
*A few years ago, Jim and I attended one of the ARL 2030 Scenarios workshops. Since that time, I’ve been quite interested in the use of scenario planning as an approach for organizations like libraries that hope to build for resilience.
About Merrilee ProffittMail | Web | Twitter | Facebook | LinkedIn | More Posts (274)
Blessed with the gift-curse of seeing ~24h into the future, I spend it on bad TV.
Monday Nov 17th 2014 (IRC):
- 10:06 danbri: I’ve figured out what the world needs – a new modern WestWorld sequel.
- 10:06 libby: why does the world need that?
- 10:06 danbri: just that it was a great film and it has robots and cowboys and snakes and fembots and a guy who can take his face off and who is a robot and a cowboy. it double ticks all the boxes.
Tuesday Nov 18th 2014 (BBC):
JJ Abrams’ sci fi drama Westworld has been officially commissioned for a whole series by HBO. The Star Wars director is executive producer whilst Interstellar co-writer Jonathan Nolan will pen the scripts Sir Anthony Hopkins, Thandie Newton, Evan Rachel Wood, Ed Harris and James Marsden will all star. The show is a remake of a 1973 sci-fi western about a futuristic themed amusement park that has a robot malfunction.
The studio is calling the series, which will debut in 2015, “a dark odyssey about the dawn of artificial consciousness and the future of sin”
We are delighted to announce that Tufts University has become the latest formal Hydra Partner. Tufts has two Hydra-based projects, the Tufts Digital Library redesign and a New Nation Votes election portal. They are currently working on a Hydra-based administrative interface to allow staff self-deposit in the Tufts Fedora content repository; and the Tufts Digital Image Library, based on Northwestern’s DIL implementation.
In their Letter of Intent, Tufts say that they are committed to the Hydra community in helping solve digital repository and workflow challenges by supporting development and contributing code, documentation and expertise.
From The Fedora Steering Group
Fedora Development - In the past quarter, the development team released two Beta releases of Fedora 4; detailed release notes are here: