You are here

Feed aggregator

Open Knowledge Foundation: The heartbeat of budget transparency

planet code4lib - Tue, 2014-11-18 17:37

Every two years the International Budget Partnership (IBP) runs a survey, called the Open Budget Survey, to evaluate formal oversight of budgets, how transparent governments are about their budgets and if there are opportunities to participate in the budget process. To easily measure and compare transparency among the countries surveyed, IBP created the Open Budget Index where the participating countries are scored and ranked using about two thirds of the questions from the Survey. The Open Budget Index has already established itself as an authoritative measurement of budget transparency, and is for example used as an eligibility criteria for the Open Government Partnership.

However, countries do not release budget information every two years; they should do so regularly, on multiple occasions in a given year. There is, however, as stated above a two year gap between the publication of consecutive Open Budget Survey results. This means that if citizens, civil society organisations (CSOs), media and others want to know how governments are performing in between Survey releases, they have to undertake extensive research themselves. It also means that if they want to pressure governments into releasing budget information and increase budget transparency before the next Open Budget Index, they can only point to ‘official’ data which can be up to two years old.

To combat this, IBP, together with Open Knowledge, have developed the Open Budget Survey Tracker (the OBS Tracker),,: an online, ongoing budget data monitoring tool, which is currently a pilot and covers 30 countries. The data are collected by researchers selected among the IBP’s extensive network of partner organisations, who regularly monitor budget information releases, and provide monthly reports. The information included in the OBS Tracker is not as comprehensive as the Survey, because the latter also looks at the content/comprehensiveness of budget information — not only the regularity of its publication. The OBS Tracker, however, does provide a good proxy of increasing or decreasing levels of budget transparency, measured by the release to (or witholding from) the public of key budget documents. This is valuable information for concerned citizens, CSOs and media.

With the Open Budget Survey Tracker, IBP has made it easier for citizens, civil society, media and others to monitor, in near real time (monthly), whether their central governments release information on how they plan to and how they spend the public’s money. The OBS Tracker allows them to highlight changes and facilitates civil society efforts to push for change when a key document has not been released at all, or not in a timely manner.

Niger and Kyrgyz Republic have improved the release of essential budget information after the latest Open Budget Index results, something which can be seen from the OBS Tracker without having to wait for the next Open Budget Survey release. This puts pressure on other countries to follow suit.

The budget cycle is a complex process which involves creating and publishing specific documents at specific points in time. IBP covers the whole cycle, by monitoring in total eight documents which include everything from the proposed and approved budgets, to a citizen-friendly budget representation, to end-of-the-year financial reporting and the auditing from a country’s Supreme Audit Institution.

In each of the countries included in the OBS Tracker, IBP monitors all these eight documents showing how governments are doing in generating these documents and releasing them on time. Each document for each country is assigned a traffic light color code: Red means the document was not produced at all or published too late. Yellow means the document was only produced for internal use and not released to the general public. Green means the document is publicly available and was made available on time. The color codes help users quickly skim the status of the world as well as the status of a country they’re interested in.

To make monitoring even easier, the OBS Tracker also provides more detailed information about each document for each country, a link to the country’s budget library and more importantly the historical evolution of the “availability status” for each country. The historical visualisation shows a snapshot of the key documents’ status for that country for each month. This helps users see if the country has made any improvements on a month-by-month basis, but also if it has made any improvements since the last Open Budget Survey.

Is your country being tracked by the OBS Tracker? How is it doing? If they are not releasing essential budget documents or not even producing them, start raising questions. If your country is improving or has a lot of green dots, be sure to congratulate the government; show them that their work is appreciated, and provide recommendations on what else can be done to promote openness. Whether you are a government official, a CSO member, a journalist or just a concerned citizen, OBS Tracker is a tool that can help you help your government.

OCLC Dev Network: WMS Web Services Install November 23

planet code4lib - Tue, 2014-11-18 17:00

The new date for the November WMS Web services install is this Sunday, November 23rd. This install will include changes to two of our WMS APIs.

Library of Congress: The Signal: Losing and Finding Legal Links

planet code4lib - Tue, 2014-11-18 16:03

Imagine you’re a legal scholar and you’re examining the U.S. Supreme Court decisions of the late nineties to mid-two thousands and you want to understand what resources were consulted to support official opinions. A study in the Yale Journal of Law and Technology indicates you would find that only half of the nearly 555 URL links cited in Supreme Court opinions since 1996 would still work. This problem has been widely discussed in the media and the Supreme Court has indicated it will print all websites cited and place the printouts in physical case files at the Supreme Court, available only in Washington, DC.

Georgetown Law School, Washington, DC. Negative, part of National Photo Company Collection, 1910-1925.

On October 24, 2014 Georgetown University Law Library hosted a one-day symposium on this problem which has been studied across legal scholarship and other academic works. The meeting, titled 404/File Not Found: Link Rot, Legal Citation and Projects to Preserve Precedent, presented a broad overview of why websites disappear, why this is particularly problematic in the legal citation context and the proposal of actual solutions and strategies to addressing the problem.

The keynote address was given by Jonathan Zittrain, George Bemis Professor of Law at Harvard Law School. A video of his presentation is now available from the meeting website. In it he details a service created by Harvard Law School Libraries and other law libraries called that allows those with an account to submit links that can be archived at a participating library. The use case for is to support links in new forms of academic and legal writing. Today, over 26,000 links have been archived.

Herbert Van de Sompel of the Los Alamos National Laboratory also demonstrated the Memento browser plug-in that allows users who’ve downloaded the plug-in to see archived versions of a website (if that website has been archived) while they are using the live web. The Internet Archive, The British Library, the UK National Archives and other archives around the world all provide archived versions of websites through Memento. The Memento protocol has been widely implemented, integrated in MediaWiki sites and supports “time travel” to old websites that cover all topics.

Both solutions, and Memento, depend on action by, and coordination of, organizations and individuals who are affected by the linkrot problem. At the end of his presentation Van de Sompel reiterated that technical solutions exist to deal with linkrot; what is still needed is broad participation in the selection, collection and archiving of web resources and a sustainable and interoperable infrastructure of tools and services, like Memeno and, that connect the archived versions of website with the scholars, researchers and users that want to access them today and into the future.

Michael Nelson of Old Dominion University, a partner in developing Memento, posted notes on the symposium presentations. For even more background and documentation on the problem of linkrot, the meeting organizers collected a list of readings. The symposium demonstrated the ability of a community, in this case, law librarians, to come together to address a problem in their domain, the results of which benefit the larger digital stewardship community and serve as models for coordinated action.

DPLA: Two weeks left to submit your GIF IT UP entries!

planet code4lib - Tue, 2014-11-18 15:52

It’s been a little over a month since we launched GIF IT UP, an international competition to find the best GIFs reusing public domain and openly licensed digital video, images, text, and other material available via DPLA and DigitalNZ. Since then we’ve received dozens of wonderful submissions from all over the world, all viewable in the competition gallery.

The winners of GIF IT UP will have their work featured and celebrated online at the Public Domain Review and Haven’t submitted an entry yet? Well, what are you waiting for? Submit a GIF!


Cat Galloping (1887). The still images used in this GIF come from Eadweard Muybridge’s “Animal locomotion: an electro-photographic investigation of consecutive phases of animal movements” (1872-1885). Courtesy USC Digital Library, 2010. View original record (item is in the public domain). GIF available under a CC-BY license.

How it works. The GIF IT UP competition has six categories:

  1. Animals
  2. Planes, trains, and other transport
  3. Nature and the environment
  4. Your hometown, state, or province
  5. WWI, 1914-1918
  6. GIF using a stereoscopic image
  7. Open category (any reusable material from DigitalNZ or DPLA)

A winner will be selected in each of these categories and, if necessary, a winner will be awarded in two fields: use of an animated still public domain image, and use of video material.

To view the competition’s official homepage, visit

Judging. GIF IT UP will be co-judged by Adam Green, Editor of the Public Domain Review and by Brian Wolly, Digital Editor of Entries will be judged on coherence with category theme (except for the open category), thoroughness of entry (correct link to source material and contextual information), creativity, and originality.

Gallery. All entries that meet the criteria outlined below in the Guidelines and Rules will be posted to the GIF IT UP Tumblr Gallery. The gallery entries with the most amount of Tumblr “notes” will receive the people’s choice award and will appear online at the Public Domain Review and alongside the category winners.

Submit. To participate, please first take a moment to read “How it Works” and the guidelines and rules on the GIF IT UP homepage, and then submit your entry by clicking here.

Deadline. The competition deadline is December 1, 2014 at 5:00 PM EST / December 2, 2014 at 10:00 AM GMT+13.

GIFtastic Resources. You can find more information about GIF IT UP–including select DPLA and DigitalNZ collections available for re-use and a list of handy GIF-making tips and tools–over on the GIF IT UP homepage.

Questions. For questions or other inquiries, email us at or, or tweet us @digitalnz or @dpla. Good luck and happy GIFing!

HangingTogether: Libraries & Research: Supporting change in research

planet code4lib - Tue, 2014-11-18 14:23

[This is the second in a short series on our 2014 OCLC Research Library Partnership meeting, Libraries and Research: Supporting Change/Changing Support. You can read the first post and also refer to the event webpage contains links to slides, videos, photos, Storify summaries.]

[Anja Smit, Adam Farquhar, Antal van den Bosch, and Ricky Erway]

Anja Smit (University Librarian at Utrecht University) [link to video] chaired this session which focused on the ways in which libraries are or could be supporting eScholarship. In opening she shared a story that reflects how the library is really a creature of the larger institution. At Utrect the library engaged in scenario planning* and identified their future as being all about open access and online access to sources. When they brought faculty in to comment on their plans, they were told that they were “going too fast” and that they needed to slow down. Sometimes researchers request services and sometimes the library just acts to fill a void.  But innovation is not only starting but also stopping. The Utretch experience with VREs are an example of a well-reasoned library “push” of services – thought they would have 200 research groups actively using the VRE but only 25 took it up. Annotated books on the other hand is an example of “pull,” something requested by researchers. Dataverse (a network for storing data) started as a service in the library that was needed by faculty but ultimately moved to DANS due to scale and infrastructure issues.  The decision to discontinue local search was a “pull” decision, based on evidence that researchers were not using it. Ultimately, librarians need to be “embedded” in researcher workflows. If we don’t know what they are doing, we won’t be able to help them.

Ricky Erway (Senior Program Officer, OCLC Research) [link to video] gave her own story of push and pull — OCLC Research was asked by the Research Information Management Interest Group to “do something about digital humanities”. The larger question was, where can libraries make a unique contribution?  Ricky and colleague Jennifer Schaffner immersed themselves in the researchers’ perspective regarding processes, issues, and needs, and then tried to see where the library might fill gaps. Their paper [Does Every Research Library Need a Digital Humanities Center?] was written for library directors not already engaged with digital humanities. The answer to the question posed in the title of the paper is, “It depends.”  The report suggests that a constellation of engagement possibilities should be considered based on local needs. Start with what you are already offering and ensure that researchers are aware of those services. Scholars enthusiasm for metadata was a surprising finding — humanities researchers use and value metadata sources such as VIAF. (Colleague Karen Smith-Yoshimura has previously blogged about contributions to VIAF from the Syriac scholarly community and contributions from the Perseus Catalog.) A challenge for libraries is figuring out, when to support, when to collaborate, and when to lead. There is no one size fits all in digital humanities and libraries — not only is it the case that “changes in research are not evenly distributed,” but also every library has its own set of strengths and services which may be good matches for local needs.

Adam Farquhar (Head of Digital Scholarship at the British Library) [link to video] talked about what happens when large digital collections are brought together with scholars. Adam’s role, in brief is to get the British Library’s digital collections into the hands of scholars so they can create knowledge. Adam and his team have been trying to find ways to take advantage of the digital qualities of digital collections — up to now, most libraries have treated digital collections the same as print collections apart from delivery. This is a mistake, because there are unique aspects to large-scale digital collections and we should be leveraging them. The British Library has a cross-disciplinary team which is much needed for tackling the challenges at hand. Rather than highlighting the broad range of projects being undertaken at the BL, Adam chose instead to focus on a few small, illustrative examples. In the British Library Labs, developers are invited to sit alongside scholars and co-evolve projects and solutions. The BL Labs Competition is a challenge to encourage people to put forward interesting projects and needs. Winners of the 2014 competition included one from Australia (showing that there is global interest in the BL’s collections). One winner is the Victorian Meme Machine, which will pair Victorian jokes with likely images to illustrate what makes Victorian jokes funny. Another project extracted images from digitized books and put a million images on Flickr (where people go to look for images, not for books). These images have received 160 million views in the last year. These are impressive metrics especially when you consider that previously no one alive had looked any of those images. Now lots of people have and they have been used in a variety of ways, from an art piece at Burning Man, to serious research, to commercial use. Adam’s advice? Relax and take a chance on release of information into the public domain.

Antal van den Bosch (Professor at the Radboud University Nijmegen) [link to video] spoke from his perspective as a researcher. Scientists have long had the ability to shift from first gear (working at the chalkboard) to 5th or 6th gear (doing work on the Large Hadron Collider). Humanists have recently discovered that there is a 3rd or 4th gear and want to go there. In the humanities there is fast and slow scholarship. In his own field, linguistics and computer science, there is no data like more data. Large, rich corpuses are highly valued (and more common over time). One example is Twitter – in the Netherlands, seven million Tweets a day are generated and collected by his institute. Against this corpus, researchers can study the use of language at different times of day and use location metadata to identify use of regional dialect. Another example is the HiTiME (Historical Timeline Mining and Extraction) project which uses linked data in historical sources to enable the study of social movements in Europe. Within texts, markup of persons, locations, and events allow visualizations including timelines and social networks. Analysis of newspaper archives revealed both labor strikes that happened and those that didn’t. However, library technology was not up to the task of keeping up with the data so that findings were not repeatable, underscoring the need for version control and adequate technological underpinnings. Many times in these projects the software goes along with the data, so storing both data and code is important.  Most researchers are not sure where to put their research data and may be using cloud storage like GitHub. Advice and guidance are all well and good but what researchers really need is storage, and easy to use services (“an upload button, basically”). In the Netherlands and in Europe, there are long tail storage solutions for e-research data. Many organizations and institutions say “here, let me help you with that.” Libraries seem well situated to help with metadata, but researchers want full text search against very big data sets like Twitter or Google Books. Libraries should be asking themselves if they can host something that big. If libraries can’t offer collections like these, at scale, researchers may not be interested.  On the other hand in the humanities which has a “long tail of small topics,” there are many single researchers doing small research projects and here the library may be well positioned to help.

If you are interested in more details you can watch the discussion session that followed:

I’ll be back later to summarize the last two segments of the meeting.

*A few years ago, Jim and I attended one of the ARL 2030 Scenarios workshops. Since that time, I’ve been quite interested in the use of scenario planning as an approach for organizations like libraries that hope to build for resilience.


About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (274)

Dan Brickley: Near Futurism

planet code4lib - Tue, 2014-11-18 11:05

Blessed with the gift-curse of seeing ~24h into the future, I spend it on bad TV.


Monday Nov 17th 2014 (IRC):

  • 10:06 danbri: I’ve figured out what the world needs – a new modern WestWorld sequel.
  • 10:06 libby: why does the world need that?
  • 10:06 danbri: just that it was a great film and it has robots and cowboys and snakes and fembots and a guy who can take his face off and who is a robot and a cowboy. it double ticks all the boxes.

Tuesday Nov 18th 2014 (BBC):

JJ Abrams to remake sci-fi western Westworld into TV series

JJ Abrams’ sci fi drama Westworld has been officially commissioned for a whole series by HBO. The Star Wars director is executive producer whilst Interstellar co-writer Jonathan Nolan will pen the scripts Sir Anthony Hopkins, Thandie Newton, Evan Rachel Wood, Ed Harris and James Marsden will all star. The show is a remake of a 1973 sci-fi western about a futuristic themed amusement park that has a robot malfunction.

The studio is calling the series, which will debut in 2015, “a dark odyssey about the dawn of artificial consciousness and the future of sin”

Hydra Project: Tufts becomes a Hydra Partner

planet code4lib - Tue, 2014-11-18 10:21

We are delighted to announce that Tufts University has become the latest formal Hydra Partner.  Tufts has two Hydra-based projects, the Tufts Digital Library redesign and a New Nation Votes election portal. They are currently working on a Hydra-based administrative interface to allow staff self-deposit in the Tufts Fedora content repository; and the Tufts Digital Image Library, based on Northwestern’s DIL implementation.

In their Letter of Intent, Tufts say that they are committed to the Hydra community in helping solve digital repository and workflow challenges by supporting development and contributing code, documentation and expertise.

Welcome, Tufts!

DuraSpace News: Quarterly Report from Fedora, July - September 2014

planet code4lib - Tue, 2014-11-18 00:00

From The Fedora Steering Group

Fedora Development - In the past quarter, the development team released two Beta releases of Fedora 4; detailed release notes are here:

District Dispatch: Free financial literacy webinar for librarians

planet code4lib - Mon, 2014-11-17 22:40

Consumer Financial Protection Bureau

On November 19th, the Consumer Financial Protection Bureau and the Institute for Museum and Library Services will offer a free webinar on financial literacy. This session has limited space so please register quickly.

Tune in to the Consumer Financial Protection Bureau’s monthly webinar series intended to instruct library staff on how to discuss financial education topics with their patrons. As part of the series, the Bureau invites experts from other government agencies and nonprofit organizations to speak about key topics of interest.

Tax time is a unique opportunity for many consumers to make financial decisions about how to use their income tax refunds to build savings. In next free webinar “Ways to save during tax time: EITC,” finance leaders will discuss what consumers need to do to prepare before filing their income tax returns, the importance of taking advantage of the tax time moment to save, and the ways people can save automatically when filing their returns.

If you would like to be notified of future webinars, or ask about in-person trainings for large groups of librarians, email; subject: Library financial education training. All webinars will be recorded and archived for later viewing.

Webinar Details
November 19, 2014
2:30—3:30 p.m. EDT
Join the webinar at 2:30pm You do not need to register for this webinar.

If that link does not work, you can also access the webinar by going to and entering the following information:

  • Conference number: PW9469248
  • Audience passcode: LIBRARY

If you are participating only by phone, please dial the following number:

  • Phone: 1-888-947-8930
  • Participant passcode: LIBRARY

The post Free financial literacy webinar for librarians appeared first on District Dispatch.

HangingTogether: Libraries & Research, Supporting Change/Changing Support: Introduction

planet code4lib - Mon, 2014-11-17 21:22

Libraries and Research: Supporting Change/Changing Support was a meeting on 11-12 June for members of the OCLC Research Library Partnership. The meeting focused on how the evolving nature of academic research practices and scholarship are placing new demands on research library services. Shifting attitudes toward data sharing, methodologies in eScholarship, and rethinking the very definition of scholarly discourse . . . . these are all areas that have deep implications for the library. But it is not only the research process that is changing; research universities are evolving in new directions, often becoming more outcome-oriented, changing to reflect the increased importance of impact assessment, and competing for funding. Libraries are taking on new roles and responsibilities to support change in research and in the academy. From our perch in OCLC Research, we can see that as libraries prepare to meet new demands and position themselves for the future, libraries themselves are changing, both in their organizational structure and in their alliances with other parts of the university and with external entities.

This meeting focused on three thematic areas: supporting change in research; supporting change at the university level; and changing support structures in the library.

Our meeting venue, close to the Centraal Station.

For the first time, and in response to an increasing number of active partners in Europe we held our Partnership meeting outside of the United States. Since we have a number of partners in the Netherlands, we opted to hold our meeting in Amsterdam. We were in a terrific venue, and the beautiful weather didn’t hurt.

Meeting attendees were greeted by Maria Heijne (Director of the University of Amsterdam Library and of the Library of Applied Sciences/Hogeschool of Amsterdam). [Link to video.] Maria highlighted the global perspective represented by those attending the meeting — which haled from the Netherlands, the United Kingdom, Denmark, Italy, Germany, Australia, Japan, the US and Canada. The UofA library is a unique combination of library, special collections, and museum of archaeology. The offer a strong combination of services for the university and for the city of Amsterdam. Like so many libraries in the Partnership and beyond, the UofA library is preparing for a new facilities, and looking to shift effort from cataloging and other backroom functions to working more closely with researchers and other customers.

Maria Heijne, University of Amsterdam

Titia van der Werf (Senior Program Officer, OCLC Research) introduced the meeting and our themes [link to video], welcoming special guests from DANS, LIBER, RLUK and from OCLC EMEA Regional Council. The OCLC Research Library Partnership focuses on projects that have been defined as being of importance to partners. Examples of work in OCLC Research in support of the Partnership include looking at shifts in publication patterns and shifts in research (as highlighted in the Evolving Scholarly Record report), challenges in restructuring and redefining within the library (reflected in work done by my colleague Jim Michalko), and studying the behavior of researchers so we can understand evolving needs (reflected in our work synthesizing user and behavior studies). We also see interest and uptake in new ways of thinking about cataloging data, recasting metadata as identifiers (such as identifiers for people, subjects, or for works). As research changes, as universities change, so too do libraries need to change.

With that introduction to our meeting, I’ll close. Look for a short series of posts summarizing the remainder of the meeting, focusing on the three themes.

[The event webpage contains links to slides, videos, photos, Storify summaries]

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (274)

Nicole Engard: Bookmarks for November 17, 2014

planet code4lib - Mon, 2014-11-17 20:30

Today I found the following resources and bookmarked them on <a href=

  • GraphHopper Route Planner GraphHopper an efficient routing library and server based on OpenStreetMap data.
  • OpenConferenceWare OpenConferenceWare is an open source web application for events and conferences. This customizable, general-purpose platform provides proposals, sessions, schedules, tracks and more.

Digest powered by RSS Digest

The post Bookmarks for November 17, 2014 appeared first on What I Learned Today....

Related posts:

  1. KMW2006 – My Final Impressions
  2. New Librarian Q&A Site
  3. New Conference Aggregator –

District Dispatch: It’s now or (almost) never for real NSA reform; contacting Congress today critical!

planet code4lib - Mon, 2014-11-17 19:25

It was mid-summer when Senator Patrick Leahy (D-VT), the outgoing Chairman of the Senate Judiciary Committee, answered the House of Representative’s passage of an unacceptably weak version of the USA FREEDOM Act by introducing S. 2685, a strong, bipartisan bill of his own. Well, it’s taken until beyond Veterans Day, strong lobbying by civil liberties groups and tech companies, and a tough stand by Senate Majority Leader Harry Reid, but Leahy’s bill and real National Security Agency (NSA) reform may finally get an up or down vote in the just-opened “lame duck” session of the U.S. Senate. That result is very much up in the air, however, as this article goes to press.

Now is the time for librarians and others on the front lines of fighting for privacy and civil liberties to heed ALA President Courtney Young’s September call to “Advocate. Today.” And we do mean today. Here’s the situation:

Thanks to Majority Leader Reid, Senators will cast a key procedural vote late on Tuesday afternoon that is, in effect, “do or die” for proponents of meaningful NSA reform in the current Congress. If Senators Reid and Leahy, and all of us, can’t muster 60 votes on Tuesday night just to bring S. 2685 to the floor, then the overwhelming odds are—in light of the last election’s results—that another bill as good at reforming the USA PATRIOT Act as Senator Leahy’s won’t have a prayer of passage for many, many years.

Even if reform proponents prevail on Tuesday, however, our best intelligence is that some Senators will offer amendments intended to neuter or at least seriously weaken the civil liberties protections provided by Senator Leahy’s bill. Other Senators will try to strengthen the bill but face a steep uphill battle to succeed.

Soooooo….. now is the time for all good librarians (and everyone else) to come to the aid of Sens. Leahy and Reid, and their country. Acting now is critical . . . and it’s easy. Just click here to go to ALA’s Legislative Action Center. Once there, follow the user-friendly prompts to quickly find and send an e-mail to both of your U.S. Senators (well, okay, their staffs but they’ll get the message loud and clear) and to your Representative in the House. Literally a line or two is all you, and the USA FREEDOM Act, need. Tell ‘em:

  • The NSA’s telephone records “dragnet,” and “gag orders” imposed by the FBI without a judge’s approval, under the USA PATRIOT Act must end;
  • Bring Sen. Leahy’s USA FREEDOM Act to the floor of the Senate now; and
  • Pass it without any amendments that make it’s civil liberties protections weaker (but expanding them would be just fine) before this Congress ends!

Just as in the last election, in which so many races were decided by razor thin margins, your e-mail “vote” could be the difference between finally reforming the USA PATRIOT Act. . . or not. With the key vote on Tuesday night, there’s no time to lose. As President Young wrote: “Advocate. Today.”

The post It’s now or (almost) never for real NSA reform; contacting Congress today critical! appeared first on District Dispatch.

Patrick Hochstenbach: Feeding the cat of the neighbours

planet code4lib - Mon, 2014-11-17 19:04
Filed under: Doodles Tagged: cartoon, cat, comic, copic, marker, weekend

Open Knowledge Foundation: An unprecedented Public-Commons partnership for the French National Address Database

planet code4lib - Mon, 2014-11-17 17:14

This is a guest post, originally published in French on the Open Knowledge Foundation France blog

Nowadays, being able to place an address on a map is an essential information. In France, where addresses were still unavailable for reuse, the OpenStreetMap community decided to create its own National Address Database available as open data. The project rapidly gained attention from the government. This led to the signing last week of an unprecedented Public-Commons partnership  between the National Institute of Geographic and Forestry Information (IGN), Group La Poste, the new Chief Data Officer and the OpenStreetMap France community.

In August, before the partnership was signed, we met with Christian Quest, coordinator of the project for OpenStreetMap France. He explained the project and its implications to us.

Here is a summary of the interview, previously published in French on the Open Knowledge Foundation France blog.

Signature of the Public-Commons partnership for the National Address Database Credit: Etalab, CC-BY

Why Did OpenStreetMap (OSM) France decided to create an Open National Address Database?  

The idea to create an Open National Address Database came about one year ago after discussions with the Association for Geographic Information in France (AFIGEO). An Address Register was the topic of many reports  however these reports can and went without any follow-up and there were more and more people asking for address data on OSM.  

Address data are indeed extremely useful. They can be used for itinerary calculations or more generally to localise any point with an address on a map. They are also essentials for emergency rescues – ambulances, fire-fighters and police forces are very interested in the initiative.  

These data are also helpful for the OSM project itself as they enrich the map and are used to improved the quality of the data. The creation of such a register, with so many entries, required a collaborative effort both to scale up and to be maintained. As such, the OSM-France community naturally took it over. However, there was also a technical opportunity; OSM-France had previously developed a tool to collect information from the french cadastre website, which enabled them to start the register with significant amount of information.

Was there no National Address Registry project in France already?  

It existed on papers and in slides but nobody ever saw the beginning of it. It is, nevertheless, a relatively old project, launched in 2002 following the publication of a report on addresses from the CNIG. This report is quite interesting and most of its points are still valid today, but not much has been done since then.

IGN and La Poste were tasked to create this National Address Register but their commercial interests (selling data) has so far blocked this 12-year old project. As a result, a French address datasets did exist but these datasets were created for specific purposes as opposed to the idea of creating a reference dataset for French addresses. For instance, La Poste uses three different addresses databases: for mail, for parcels, and for advertisements.  

Technically, how do you collect the data? Do you reuse existing datasets?  

We currently use three main data sources: OSM which gathers a bit more than two million addresses, the address datasets already available as open data (see list here) and, when necessary, the address data collected from the website of the cadastre.  We also use FANTOIR data from the DGFIP which contains a list of all streets names and lieux-dits known from the Tax Office. This dataset is also available as open data.  

These different sources are gathered in a common database. Then, we process the data to complete entries and remove duplications, and finally we package the whole thing for export. The aim is to provide harmonised content that brings together information from various sources, without redundancy. The process is run automatically every night with the exception of manual corrections that are done from OSM contributors. Data are then made available as csv files, shapefiles and in RDF format for semantic reuse. A csv version is published on github to enable everyone to follow the updates. We also produce an overlay map which allows contributors to improve the data more easily.  OSM is used in priority because it is the only source from which we can collaboratively edit the data. If we need to add missing addresses, or correct them, we use OSM tools.  

Is your aim to build the reference address dataset for the country?  

This is a tricky question. What is a reference dataset? When you have more and more public services using OSM data, does that mean you are in front of a reference dataset?

According to the definition of the French National Mapping Council (CNIG), a geographic reference must enable every reuser to georeference its own data. This definition does not consider any particular reuse. On the other hand, its aim is to enable as much information as possible to be linked to the geographic reference.  For the National Address Database to become a reference dataset, it is imperative that data is more exhaustive. Currently, there is data for 15 million reusable addresses (August 2014) of an estimated total of about 20 million. We have more in our cumulative database, but our export scripts ensure there is a minimum quality and coherency and release only after the necessary checks have been made. We are also working on the lieux-dits which are not address data point, but which are still used in many rural areas in France.  

Beyond the question of the reference dataset, you can also see the work of OSM as complementary to the one of public entities. IGN has a goal of homogeneity in the exhaustivity of its information. This is due to its mission of ensuring an equal treatment of territories. We do not have such a constraint. For OSM, the density of data on a territory depends largely on the density of contributors. This is why we can offer a level of details sometimes superior, in particular in the main cities, but this is also the reason why we are still missing data for some départements.

Finally, we think to be well prepared for the semantic web and we already publish our data in RDF format by using a W3C ontology closed to the European INSPIRE model for address description.  

The reached agreement includes a dual license framework. You can reuse the data for free under an ODbL license, or you can opt for a non-share-alike license but you have to pay a fee.  Is share-alike clause an obstacle for the private sector?  

I don't think so because the ODbL license does not prevent commercial reuse. It only requires to mention the source and to share any improvement of the data under the same license. For geographical data aiming at describing our land, this share-alike clause is essential to ensure that the common dataset is up to date. Lands change constantly, data improvements and updates must, therefore, be continuous, and the more people are contributing, the more efficient this process is.  

I see it as a win-win situation compared to the previous one where you had multiple address datasets, maintained in closed silos with none of which were of acceptable quality for a key register as it is difficult to stay up to date on your own.  

However, for some companies, share-alike is incompatible with their business model, and a double licensing scheme is a very good solution. Instead of taking part in improving and updating the data, they pay a fee which will be used to improve and update the data.  

And now, what is next for the National Address Database?  

We now need to put in place tools to facilitate contribution and data reuse. Concerning the contribution, we want to set-up a one-stop-shop application/API, separated from OSM contribution tool, to enable everyone to report errors, add corrections or upload data. This kind of tool would enable us to easily integrate partners into the project. On the reuse side, we should develop an API for geocoding and address autocompletion because not everybody will necessarily want to manipulate millions of addresses!  

As a last word, OSM is celebrating its ten years anniversary. What does that inspire you?  

First, the success and the power of OpenStreetMap lies in its community, much more than in its data. Our challenge is therefore to maintain and develop this community. This is what enables us to do projects such as the National Addresses Database, but also to be more reactive than traditional actors when it is needed, for instance with the current Ebola situation. Centralised and systematic approaches for cartography reached their limits. If we want better and more up to date map data, we will need to adopt a more decentralised way of doing things, with more contributors on the ground. Here’s to Ten More Years of the OpenStreetMap community!


District Dispatch: ALA applauds strong finish to the E-rate proceeding

planet code4lib - Mon, 2014-11-17 16:01

Today, Federal Communications Commission (FCC) Chairman Tom Wheeler held a press call to preview the draft E-rate order that will be circulated at the Commission later this week. The FCC invited Marijke Visser, assistant director of the American Library Association’s (ALA) Program on Networks, to participate in the call. ALA President Courtney Young released a statement in response to the FCC activity, applauding the momentum:

ALA has worked extremely hard on this proceeding to move the broadband bar for libraries so that communities across the nation can more fully benefit from the E’s of Libraries™. That is, as Chairman Wheeler recognizes, libraries provide critical services to our communities across the nation relating to Education, Employment, Entrepreneurship, Engagement and Empowerment.

Of course, the extent to which communities benefit from these services depends on the broadband capacity our libraries have. Unfortunately, for all too many libraries, the bandwidth needed is either not available at all or it is prohibitively expensive.

But what Chairman Wheeler described today will go a long way towards changing the broadband dynamic. With support and guidance from our Senior Counsel, Alan Fishel, ALA stood fast behind our recommendations through many difficult rounds of discussions. After today we have every indication that ALA’s unwavering advocacy and determination over the past year and a half will add up to a series of changes for the E-rate program that will provide desperately needed increased broadband capacity for urban, suburban, and rural libraries across the country.

ALA applauds Chairman Wheeler for his strong leadership throughout the modernization proceeding in identifying a clear path to closing the broadband gap for libraries and schools and ensuring a sustainable E-rate program. The critical increase in permanent funding that the Chairman described during today’s press call will help ensure that libraries can maintain the broadband upgrades we know the vast majority of our libraries are anxious to make. Moreover, the program changes that were referenced today—on top of those the Commission adopted in July—coupled with more funding is without a doubt a win-win for libraries and most importantly for the people in the communities they serve.

Larry Neal, president of the Public Library Association, a division of ALA, and director of the Clinton-Macomb Public Library (MI), also commented on the FCC draft E-rate order.

“The well-connected library opens up literally thousands of opportunities for the people who walk through the doors of their local library,” said Neal. “Libraries are with you from the earliest years with family apps for literacy, through the school years with STEM learning labs, to collaborative workspaces and information resources for small businesses, entrepreneurs, and the next generation of innovators. This should be the story for every library and could be if they had the capacity they needed.”

The post ALA applauds strong finish to the E-rate proceeding appeared first on District Dispatch.

David Rosenthal: Andrew Odlyzko Strikes Again

planet code4lib - Mon, 2014-11-17 16:00
Last year I blogged about Andrew Odlyzko's perceptive analysis of the business of scholarly publishing. Now he's back with an invaluable, must-read analysis of the economics of the communication industry entitled Will smart pricing finally take off?. Below the fold, a taste of the paper and a validation of one of his earlier predictions from the Google Scholar team.

Among his observations are:
  • "by some measures the US spends almost 50% more in telecom services than it does for electricity."
  • Content is not king; "net of what they pay to content providers, US cable networks appear to be getting more revenue out of Internet access and voice services than out of carrying subscription video, and all on a far smaller slice of their transport capacity".
  • True streaming video, with its tight timing constraints, is not a significant part of the traffic. Video is a large part, "but it is almost exclusively transmitted as faster-than-real-time progressive downloads". Doing so allows for buffering to lift the timing constraints.
  • "The main function of data networks is to cater to human impatience. Thus "Overprovisioning is not a bug but a feature, as it is indispensable to provide low transmission latency". "Once you have overengineered your network, it becomes clearer that pricing by volume is not particularly appropriate, as it is the size and availability of the connection that creates most of the value."
  • "it seems safe to estimate worldwide telecom revenues for 2011 as being close to $2 trillion. About half the revenue ... comes from wireless."
  • "with practically all [wireline] costs coming from ... installing the wire to the end user, the marginal costs of carrying extra traffic are negligible. Hence charging according to the volume of traffic cannot easily be justified on the basis of costs.
  • "a modern telecom infrastructure for the US, with fiber to almost every premise, would not cost more than $450 billion, well under one year's annual revenue. But there is no sign of willingness to spend that kind of money ... Hence we can indeed conclude that modern telecom is less about high capital investment and far more a game of territorial control, strategic alliances, services and marketing, than of building a fixed infrastucture."
  • "Yet another puzzle is the claim that building out fiber networks to the home is impossibly expensive. Yet at the cost of $1,500 per household (in excess of the $1,200 estimate ... for the Google project in Kansas City, were it to reach every household), and at a cost of capital of 8% ..., this would cost only $10 per house per month. The problem is that managers and their shareholders expect much higher rates of return than 8% per year. One of the paradoxes is that the same observers who claim that pension funds cannot hope to earn 8% annually are also predicting continuation of much higher corporate profit rates."
Back in 2002, Odlyzko analyzed the usage of online content through time after its publication. Initially, the decay was rapid but after a while usage settled to a low constant level or increased. On this basis he predicted that there would be much wider citation of older articles.
Of the articles that were most frequently downloaded [from First Monday] in 1999, 6 of the top 10 were published in previous years! This supports the thesis that easy online access leads to much wider usage of older materials. [Section 9]After an initial period, frequency of access does not vary with age of article, and stays pretty constant with time (after discounting for general growth in usage). [Section 10] Now The Google Scholar team have followed their Rise of the Rest paper, which I blogged about here, with a validation of Odlyzko's prediction. Their new paper On the Shoulders of Giants: The Growing Impact of Older Articles takes another look at the effect that the dramatic changes as scholarly communications migrated to the Web have had on the behavior of authors. The two major changes have been:
  • The greater accessibility of the literature, caused by digitization of back content, born-digital journals and pre-print archives, and relevance ranking by search engines.
  • The great increase in the volume of publication, caused by the greatly reduced cost of on-line publication and the reduction of competition for space.
The paper shows that in most fields, the proportion of citations to articles more than 10 years old has increased significantly (28% to 36% overall) from 1990 to 2013. The same holds true for 15 and 20-year old articles. The rate of increase is accelerating. There are some outliers, Chemical and Materials Science and Engineering excluding Computer Science both show little change. Computer Science, on the other hand, shows a significant increase, but this is bi-modal, 5/18 of the CS subject categories show less than 30% increase whereas 11/18 show 50% or more.

Islandora: Meet Your Developer: Daniel Lamb

planet code4lib - Mon, 2014-11-17 14:30

It's been a while since we last Met a Developer, but we're getting back into it with recent Islandora Camp CO instructor and discoverygarden, Inc Team Lead Daniel Lamb. Most of Danny's contributions to Islandora's code have come to us by way of dgi's commitment to open source, but he did recently take on the Herculean task of coming up with the perfect one-line documentation to sum up the behvaior of a tetchy delete button. Here's Danny in his own words:

Please tell us a little about yourself. What do you do when you’re not at work?

When I'm not at work, I'm spending time with my wonderful family.  I have a beautiful wife and an amazing two year old son, and they're what keeps me going when times are tough.  I love cooking, and am very passionate about what I eat and how I prepare it.  I also reguarly exercise, and really enjoy lifting weights.  I've got a great life going and I want to keep it for as long as possible!   Academically, my background is in Mathematics and Physics, not Computer Science.  But close enough, right?  I've held jobs processing data for astronomers, crunching numbers as an actuary, and even making crappy facebook games before landing at discoverygarden.   How long have you been working with Islandora? How did you get started? I've been working with Islandora for about two years.  I started because of my job with discoverygarden, which was kind enough to take me in after being abused by the video game industry.  The first thing I developed for Islandora was the testing code, which is how I got to learn the stack.   Sum up your area of expertise in three words: Asynchronous distributed processing   What are you working on right now? I've got my finger in a lot of pies right now.  I'm managing my first project for discoverygarden, as well as finishing up the code for one of the longest running projects in the company's history.  It's for an enterprise client, and I've had to make a lot of innovations that I hope can eventually find their way back into the core software.  I'm also working on a statistical model to help management with scoping and allocation.  On top of all that, I'm researching frameworks and technologies for integrating with Fedora 4, which I hope to play a role in when the time finally comes.   What contribution to Islandora are you most proud of? Most of the awesome stuff I've done has been for our enterprise client, so I can't talk about it.  Well, I could, but then I'd have to kill you :P  I guess as far as impact on the software in general, I'm most proud of the lowly IslandoraWebTestCase, which is working in every module out there to help keep our development head as stable as possible.   What new feature or improvement would you most like to see? Asynchronous distributed processing :D  When we make the move to Fedora 4 and Drupal 8, this concept should be at the core of the software.  It’s what will allow us to split the stack apart on multiple machines to keep things running smoothly when we have to scale up and out.   What’s the one tool/software/resource you cannot live without? ZOMG I could never live without Vim!  It's the greatest text editor ever!  Put me in Eclipse or Netbeans and I'll litter :w's all over the place and hit escape a bunch of times unnecessarily.  Vim commands have been burned into my lizard brain.   If you could leave the community with one message from reading this interview, what would it be? You CAN contribute.  I know the learning curve is steep, but you don't need a background in Computer Science to contribute.  Pick up something small, and work with it until you feel comfortable.  And if you're afraid to try your hand as a developer, there's always something to do *cough documentation cough*.

FOSS4Lib Recent Releases: VuFind - 2.3.1

planet code4lib - Mon, 2014-11-17 14:29
Package: VuFindRelease Date: Monday, November 17, 2014

Last updated November 17, 2014. Created by Demian Katz on November 17, 2014.
Log in to edit this page.

Bug fix release.

D-Lib: New Opportunities, Methods and Tools for Mining Scientific Publications

planet code4lib - Mon, 2014-11-17 12:43
Guest Editorial by Petr Knoth, Drahomira Herrmannova, Lucas Anastasiou and Zdenek Zdrahal, Knowledge Media Institute, The Open University, UK; Kris Jack, Mendeley, Ltd., UK; Nuno Freire, The European Library, The Netherlands and Stelios Piperdis, Athena Research Center, Greece

D-Lib: Progress

planet code4lib - Mon, 2014-11-17 12:43
Editorial by Laurence Lannom, CNRI


Subscribe to code4lib aggregator