Fan-created works in general are broadly available to people at the click of a link. Fan fiction hasn’t been the subject of any litigation, but it plays an increasing role in literacy as its creation and consumption has skyrocketed. Practice on the ground can matter as much as cases in the courts, and the explosion of noncommercial creativity is a big part of the fair use ecosystem. This presentation will include many ways in which creativity has impacted the varied ways in which courts have been interpreting fair use, from Google books, to putting a mayor’s face on a T-shirt, to copying a competitor’s ad for a competing ad. Legal scholar and counsel to the Organization for Transformative Works, Rebecca Tushnet will enlighten us. Should be a blast!
There is no need to pre-register for this free webinar! Just show up on November 5, at 2pm (Eastern)/11 am (Pacific) and click here.
Rebecca Tushnet clerked for Chief Judge Edward R. Becker of the Third Circuit Court of Appeals in Philadelphia and Associate Justice David H. Souter of the United States Supreme Court and spent two years as an associate at Debevoise & Plimpton in Washington, DC, specializing in intellectual property. After two years at the NYU School of Law, she moved to Georgetown, where she teaches intellectual property, advertising law, and First Amendment law.
Her work currently focuses on the relationship between the First Amendment and false advertising law. She has advised and represented several fan fiction websites in disputes with copyright and trademark owners. She serves as a member of the legal team of the Organization for Transformative Works, a nonprofit dedicated to supporting and promoting fanworks, and is also an expert on the law of engagement rings.
Note that the webinar is limited to 100 seats so watch with colleagues if possible. An archived copy will be available after the webinar.
This week, the U.S. Copyright Office issued exemptions to the 1201 rulemaking, something that I have whined about before, but this time around, there is a growing number of the disenchanted—from all walks of life, including farmers, video game enthusiasts, vidders, and software security engineers and researchers. The Internet of Things has made circumvention of technological protection measures (aka DRM) a more common concern because software is embedded in tractors, refrigerators, pacemakers and some litter box containers. More people who lawfully purchase a product may have to deal with the convoluted, changing nature, and uncertainty that the 1201 rulemaking promises to bring.
Maybe you’ve seen the news about the auto industry and the medical device manufacturers. They are now part of the 1201 cabal because you have software in your Volvo and your medical implant. And welcome to the regulatory agencies! Pull up a chair! The Department of Transportation, the Environmental Protection Agency, and other government departments were asked to comment on emission standards and regulations that govern the use of the software inside your large terrain vehicle. Folks that want a smaller government would have a field day with this stuff.
Will this absurdity continue? Senators Leahy and Grassley, the Chairman and Ranking Member of the Committee on the Judiciary, sent a letter to the Copyright Office asking it to look into the impact of copyright law on software-enabled devices. Pity the Copyright Office. This is getting so complicated that it’ll take months, no years, for the Copyright Office to consider this very big issue. It points again to the disconnect between copyright law and the real world. The 1201 rulemaking initially was meant to limit unlawful access to motion pictures, music, and other copyrighted content—easy stuff like that. Now the 1201 rulemaking has grown well out of its hefty pants, and just might impact everyone, yes, even you. Where does it end? Where does it even begin?
Jackie, Dennis, and I wondered if there might be something valuable the Partnership could be working on to further web archiving efforts and that would not duplicate others’ initiatives, so we had discussions with partner staff, attended meetings, read up on what others are doing, and then presented some options to the Partnership in the form of a survey.
There were 76 responses from 60 institutions in 6 countries. We asked respondents to indicate which of five topics they felt were important to advance and which they’d be interested in working on.
The two most important topics – and the two that most people would be willing to work on are: [It’s so great when those two things align!]
- Metadata Guidelines, described as “Web archives often are hidden in silos, making access difficult. We could work on developing metadata guidelines to bridge the archival and bibliographic traditions so that records for live and/or harvested websites can appear in local catalogs, as well as in WorldCat and other aggregations.”
- Use of web archives, described as “Not enough is known about use of harvested websites. We might think about how to study users and potential users of web archives to find out what they want, what they want to do with them, and how they find and navigate what is available)”
We’re going to begin by launching a project related to metadata guidelines.
[If you work at a Partner institution and would like to be added to the web arching listserv, send an email to Dennis Massie (firstname.lastname@example.org). If you’d like to work on metadata guidelines, send a message to Jackie Dooley (email@example.com).]
What I want to talk about in this blog post, though, is the open-ended responses to a question asking for additional thoughts. There were 50 very thoughtful responses that I will summarize here.
Several respondents had suggestions about metadata, including coming up with a way to describe the harvest approach to researchers (e.g., so they’ll know what was selected, how deep the harvest went, and what was not captured). There were suggestions to explore use of linked data models and standardized vocabularies. And there was urging to investigate integration of web archiving with existing tools, such as ArchivesSpace and Archivematica.
There were additional thoughts about studying use of web archives. It was pointed out that there are two very different types of use: one focuses on trends, big data, digital humanities and social network analysis and the other is more akin to “traditional” research – essentially looking at archived websites as records of what happened or what was published at a point in time. Several people urged us to consider whether users would actually use library systems to discover web archives. Since so many rely on the Internet Archives’ web archives, it was suggested that we study how they are queried and used.
Several additional topics were brought up:
Many stated the need for tools. There was concern about there being basically only one tool for web harvesting (ArchiveIt). Some wished for tools to provide support throughout the workflow (to support automated quality analysis for capture, to make tasks such as description and quality assurance more efficient, and to automate the tasks associated with providing access to web archives.) The other big set of tools wanted were those to go beyond the harvest of static HTML web pages — to collect applications, embedded media, video, social media, non-public facing content on websites, streaming media… And some expressed the need for tools that would capture structured data before it is transformed to HTML. There was a wish for a browser plug-in that would inform users as they look at a site whether there is an archived version.
Many advocated for advocacy, citing the need to communicate with website owners about the challenges of capture and ways they can help; addressing the issue of consent and deeds of gift in harvesting others’ web sites; working with the community (owners, users, archivists) on property rights, fair use, and other policy issues; promoting persistent URLs and evolving web archiving standards, such as ResourceSync; working with the Internet Archive to see to what extent they are meeting library needs.
Some suggestions can be grouped under the topic of administrative needs: we should better understand how to make web archiving sustainable; we should share position descriptions; we should promote understanding about the types of investment that would be meaningful; we should explore trade-offs and relationships between large scale archiving efforts and targeted ones; we should improve metrics and assessment to inform financial/staffing allocations; and we should help to build the business case and strategy for a future state for web archiving.
Selection of what is to be archived is a big challenge. We can help to set evaluation criteria, help with appraisal of sites, and consider how the content of archived web sites affects appraisal decisions for both paper and electronic records in traditional archives. Respondents wanted help with deciding how deep the capture should go. Are there efficient approaches to continued archiving of web sites with added portions or connected to web sites with different directory addresses? What are the options for archiving huge sites? Another challenge was how to proceed when data or document limits are encountered. And we should help weigh selectivity against a broad swath approach:
- Is it worth the trouble of being selective?
- Is vacuuming it all up ultimately just as helpful?
- Should each institution take special care for a small subset?
- When is a scattershot approach acceptable?
- How can we make scoping rules easier to manage?
Not surprisingly, collaboration was another theme. Collaboration with web archivists across institutions, disciplines, borders is necessary to:
- develop collecting profiles, so we know what other institutions are collecting and we won’t duplicate work being done
- share workflows, QA procedures, metadata guidelines, and useful tools
- understand the existing roles of national services and others
- come up with a master harvest with different institutions archiving based on their particular collection needs
- encourage and improve use of web archives
- identify gaps
- and coordinate our work with other organizations and efforts.
We should also embed web archiving services with a community of practice to:
- get subject specialists/scholars involved in selecting materials for web archiving
- get researchers to use web archives in their research
- get faculty to use web archives in their teaching
So we got what we wanted (identification of two important projects that people are willing to work on) and so much more. Some of these ideas are ripe for others in the community to take on—and some are no doubt happening elsewhere. Some may need to percolate a bit more. And others may be ideas for future activities of the OCLC Research Library Partnership.
We always happy for more input. If you’re working on something interesting or have suggestions, let us know!About Ricky Erway
Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.Mail | Web | Twitter | LinkedIn | More Posts (42)
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week:
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
I had fun today. A colleague in Computer Science has been giving his C++ students an assignment to track down an article that is only available in print in the library. When we chatted about it earlier this year, I suggested that perhaps he could bring me in as a speaker to introduce the students to their liaison librarian. It was also my chance to get my foot further into the door with the faculty in the program, as well.
But when I started putting together the supporting materials, I realized that the class was more than half way through the year and that a standard instructional session might be a little low-energy for them. I wanted to do something that would be memorable. And I had just read The Martian by Andy Weir over the weekend, and our campus had just had a visit from Chris Hadfield a few weeks ago, so I thought that delivering a narrative in the style of The Martian might work.
Without further ado, I give you:
The Librarian: an intro for COSC 2947 (C++) Open the Speaker Notes to follow the narrative!
The students were chuckling throughout the presentation, so I think I achieved my goals of increasing the energy level, presenting the material as something memorable, and introducing myself as someone approachable. Or at least giving the impression that I try to have a sense of humour.
As an aside, I'm kicking myself for using Google Slides instead of reveal.js. It's so much easier working with HTML + images instead of a browser-driven proprietary Flash-using-when-it-can mess. It is what it is, however.
The mission of Open Knowledge International is to open up all essential public interest information and see it utilized to create insight that drives change. To this end we work to create a global movement for open knowledge, supporting a network of leaders and local groups around the world; we facilitate coordination and knowledge sharing within the movement; we build collaboration with other change-making organisations both within our space and outside; and, finally, we prototype and provide a home for pioneering products.
A decade after its foundation, Open Knowledge International is ready for its next phase of development. We started as an organisation that led the quest for the opening up of existing data sets – and in today’s world most of the big data portals run on CKAN, an open source software product developed first by us.
Today, it is not only about opening up of data; it is making sure that this data is usable, useful and – most importantly – used, to improve people’s lives. Our current projects (OpenSpending, OpenTrials, School of Data, and many more) all aim towards giving people access to data, the knowledge to understand it, and the power to use it in our everyday lives.Portfolios at Open Knowledge International
At Open Knowledge International, we are creating a new organisational structure that will help us to grow into our next phase of development. This will better enable us to support new and existing open knowledge initiatives to help people to improve their lives and the societies that they live in.
We are excited to be hiring for the roles of three Portfolio Managers, who will each lead a portfolio of products in a different stage of development:
- In the portfolio Planting the Seeds we focus on developing prototypes and early-stage products. When a new approach to the use of data can be tested, or the application of open data in a new field becomes more relevant, this is where we trial whether our ideas are sound and are able to generate wider traction. This portfolio is closely connected to our cutting-edge research work;
- The Growing the Trees portfolio focuses on those products that have proven to be viable and deserve broader investment to really affect change through innovative applications of open data. Examples might include initiatives such as our OpenTrials project developed with Ben Goldacre. We build these initiatives into platforms that shape the world. All of our products here are collaborative in nature, and we seek to develop partnerships with other organisations and stakeholders who share our interest in using data to improve the world;
- When products have sufficient traction from other organisations and communities they move onto the third portfolio, Harvesting the Fruits. In this portfolio we focus on a mature governance structure of the products that involve high-level buy-in from other key organisations. We seek to sustain the products together with those stakeholders and the focus is on building lasting partnerships, while ensuring that new innovative ideas can be generated from those mature products.
For each of these portfolios we are looking for an enthusiastic and passionatePortfolio Manager
(flexible location, full time)
As a Portfolio Manager, we expect you to lead the strategic development of the portfolio, as well as monitoring and reporting progress on the portfolio. You will function as a product manager for existing and new products. You will develop and manage the budget of your portfolio, and will be responsible for staffing all projects together with project managers. You will collaborate closely with the other Portfolio Managers and the Portfolio Director, and support the CEO in fundraising. You understand how open licenses in software, content and data enable collaborative innovation and have demonstrable experience in these.
While these qualifications are similar for all the Portfolio Managers, we define specific profiles for each Portfolio Manager which matches the stage of development of the products within each portfolio. Please have a read through our role descriptions below and have a think whether you are the kind of person that would thrive in an innovative, very dynamic environment; whether you excel more when executing on a few key initiatives, and really want to build those into highly successful products; or whether you are a better fit building lasting partnerships and coalitions around products that have demonstrated their value to the world.
Portfolio Manager Planting the Seeds
- You are excited by the opportunities that new technology and the availability of data present, to help citizens and civil society organisations to shape the world around us – which could include, for example, social, democratic and environmental impacts
- You thrive on developing new concepts and ideas, and know what to do to develop those into early-stage products
- You know how to evaluate early-stage products over a period of 6-12 months, and how to develop clear metrics of success
- You understand how innovative projects are successfully executed and are not afraid to make tough decisions to cease activity
- You have practical and hands-on experience of working in an innovative tech-related environment, for example in a ‘lab’ or incubator
- You are able to handle multiple projects at the same time and have demonstrable skills in leading multiple teams
Portfolio Manager Growing the Trees
- You relish the opportunity to develop and oversee a portfolio of products that are built on data and can change the world
- You understand the opportunity that data, through online technology, offers to impact our lives
- You know how to build a sustainable open source software product, including how to build a sustainable network of contributors and stakeholders who take an active role in developing the product
- You can move a product out of prototype and roll it into multiple markets at the same time. For this, you use proven marketing techniques and you have the ability to tweak products according to customer and market needs
- You know how to work with Theories of Change and how to apply them to products’ development cycles to achieve the maximum value
- You know how to build partnerships around products and develop them into mature collaborative initiatives
Portfolio Manager Harvesting the Fruits
- You are a coalition builder who excels in fostering long-lasting partnerships with demonstrable value and impact
- You naturally develop products and partnerships into sustainable networks, and would know how to represent and coordinate with Open Knowledge International as one partner amongst others
- You support networks and partners in sharing responsibility for products, moving from full ownership by Open Knowledge International to collaboration and collective ownership
- You consider future sustainability for products, developing – together with networks – a roadmap for future roll-outs
- You are invested in partnerships and know how to work with diverse communities, including with volunteers
- You are enthusiastic about open knowledge, and would be able to represent Open Knowledge International in diverse networks and projects
Personally, you have a demonstrated commitment to working collaboratively, with respect and a focus on results over credit.
You are comfortable working with people from different cultural, social and ethnic backgrounds. You are happy to share your knowledge with others, and you find working in transparent and highly visible environments interesting and fun.
Instead of your formal education, we believe that your track record over the last 5 years speaks clearly of your abilities. You communicate in English like a native speaker.
We demand a lot, but we offer a great opportunity as well: together with the other two Portfolio Managers and the Portfolio Director, this Portfolio Manager leads the strategic focus of Open Knowledge International. You will be at the heart of the development of projects and products, able to make a huge impact and shape our future.
We also encourage people who are looking to re-enter the workplace to apply, and are willing to adjust working hours to suit.
You should be based somewhere in between the time zones UTC -1 to +3. You can work from home, with flexibility offered and required. You will be compensated with a market salary, in line with the parameters of a non-profit-organisation.
Interested? Then send us a motivational letter and a one page CV via https://okfn.org/about/jobs/. Please indicate your current country of residence, as well as your salary expectations (in GBP) and your earliest availability.
Early application is encouraged, as we are looking to fill the positions as soon as possible. These vacancies will close when we find a suitable candidate.
If you have any questions, please direct them to Naomi Lillie, via naomi.lillie [at] okfn.org.
Galen Charlton: Books and articles thud so nicely: a response to a lazy post about gender in library technology
The sort of blog post that jumbles together a few almost randomly-chosen bits on a topic, caps them off with an inflammatory title, then ends with “let’s discuss!” has always struck me as one of the lazier options in the blogger’s toolbox. Sure, if the blog has an established community, gently tweaking the noses of the commentariat may provide some weekend fun and a breather for the blogger. If the blog doesn’t have such a community, however, a post that invites random commenters to tussle is better if the blogger takes the effort to put together a coherent argument for folks to respond to. Otherwise, the assertion-jumble approach can result in the post becoming so bad that it’s not even wrong.
Case in point: Jorge Perez’s post on the LITA blog yesterday, Is Technology Bringing in More Skillful Male Librarians?
It’s a short read, but here’s a representative quote:
[…] I was appalled to read that the few male librarians in our profession are negatively stereotyped into being unable to handle a real career and the male dominated technology field infers that more skillful males will join the profession in the future.
Are we supposed to weep for the plight of the male librarian, particularly the one in library technology? On reflection, I think I’ll just follow the lead of the scrivener Bartleby and move on. I do worry about many things in library technology: how money spent on library software tends to be badly allocated; how few libraries (especially public ones) are able to hire technology staff in the first place; how technology projects all too often get oversold; the state of relations between library technologists and other sorts of library workers; and yes, a collective lack of self-confidence that library technology is worth doing as a distinct branch of library work (as opposed to giving the game up and leaving it to our commercial, Google-ish “betters”).
I am also worried about gender balance (and balance on all axes) among those who work in library technology — but the last thing I worry about in that respect is the ability of men (particularly men who look like me) to secure employment and promotions building software for libraries. For example, consider Melissa Lamont’s article in 2009, Gender, Technology, and Libraries. With men accounting for about 65% of heads of library systems department positions and about 65% of authorship in various library technology journals… in a profession that is predominantly comprised of women… no, I’m not worried that I’m a member of an underrepresented class. Exactly the opposite. And to call out the particular pasture of library tech I mostly play in: the contributor base of most large library open source software projects, Koha and Evergreen included, continue to skew heavily male.
I do think that library technology does better at gender balance than Silicon Valley as a whole.
That previous statement is, of course, damning with faint praise (although I suppose there could be some small hope that efforts in library technology to do better might spill over into IT as whole).
Back to Perez’s post. Some other things that I raise my eyebrow at: an infographic of a study of stereotypes of male librarians from 23 years ago. Still relevant? An infographic without a complete legend (leading free me to conclude that 79.5% of folks in ALA-accredited library schools wear red socks ALL THE TIME). And, to top it off, a sentence that all too easily could be read as a homophobic joke — or perhaps as a self-deprecating joke where the deprecation comes from imputed effemination, which is no improvement. Playing around with stereotypes can be useful, but it requires effort to do well, which this post lacks.
Of course, by this point I’ve written over 500 words regarding Perez’s post, so I suppose the “let’s discuss!” prompt worked on me. I do think think that LITA should be tackling difficult topics, but… I am disappointed.
LITA, you can do better. (And as a LITA member, perhaps I should put it this way: we can do better.)
I promised stuff to make satisfying thuds with. Sadly, what with the epublishing revolution, most of the thuds will be virtual, but we shall persevere nonetheless: there are plenty of people around with smart things to say about gender in library technology. Here some links:
- Barbar I. Dewey, Transforming Knowledge Creation: An Action Framework for Library Technology Diversity. Code4Lib Journal, Issue 28, 2015-04-15.
- Roma Harris, Gender and Technology Relations in Librarianship (doi:10.2307/40324095)
- Roma Harris and Kim Lutton, Role representation in advertisements for library technology: Who is representing what and for whom?. From the Proceedings of the Annual Conference of CAIS/Actes du congrès annuel de l’ACSI, 1997.
- Melissa Lamont, Gender, Technology, and Libraries (doi:10.6017/ital.v28i3.3221). Yes, I already linked to this, but I want to also point out Lamont’s discussion of organizational culture.
- Lisa Rabey, Why (white) men should not (mostly) write about gender disparity in technology
- Bess Sadler and Chris Bourg, Feminism and the Future of Library Discovery. Code4Lib Journal, Issue 28, 2015-04-15.
- Cecily Walker, Moving the #Libtechgender Conversation Forward
- Andromeda Yelton, my first hackathon; or, gender, status, code, and sitting at the table.
- Becky Yoose, Your code does not exist in a vacuum. Presentation at the 2015 Code4Lib conference
I hope LITA will reach out to some of them.
- Editorial Response to “Is Technology Bringing in More Skillful Male Librarians?” by Brianna Marshall. I look forward to Mr. Perez’s forthcoming follow-up.
- Heidi Blackburn’s 2015 PhD dissertation, Factors That Influence Male Millennials to Become Professional Librarians.
- Swapped in a more direct link to Lisa Rabey’s post.
Smartphones didn’t capitalize on that whole “library without walls” promise like we expected. Instead, the library as a place became more tangible and responsive than ever. Our user-centric push to be at our patrons’ point of need — especially through mobile-first responsive web design — reinforces precisely what makes libraries hyperlocal. Our potential to interact through geolocation, beacons, cameras, and context is a pretty strange and newfangled opportunity — especially since an internet-ready device is predictably on almost every person — to iron out the kinks that complicate how patrons interface with the library.
I have a favorite hypothetical when I talk about designing around the internet of things. Pretend you happen to wander into the range of a beacon and you get this text from the library:
Hey, it’s pretty chilly this morning. We just made some coffee. If you have time, swing by and have a cup. Your Library, with love.
Cool, huh? This isn’t vaporware. Let’s walk through this. First, a user comes in range of a beacon or GPS coordinates, which triggers a notification. Second, an opening sentence based off the time of day and the weather is puzzled together: “Hey, it’s pretty chilly this morning.” Then, third, the app makes a time- and weather-relevant suggestion to invite you into the building.
This alone is pretty nifty. Were this suggestion actionable and you swiped to confirm and tapped through an order, then the library just compelled you on a journey with the promise that each touchpoint is pleasant. At this point, it’s on the library to deliver.
Even then, wouldn’t it be easier if all you had to do was talk back?
Good idea. I’ll be there in ten. I take my coffee black.A Voice User Interface
I read Design for Voice Interfaces by Laura Klein and I am having fun thinking about the library application for voice user interfaces (VUI). Voice recognition’s been around but it hasn’t quite lived-up to the dream (yet) fanned through scifi. We’re getting there, though.
Some products in 1999 had around a 65 percent recognition rate, whereas today’s rates are closer to 92 percent. p. 27
Good speech recognition has unique design challenges in that unlike other visual interfaces where the way we interact has precedence in physical objects — we push buttons, we turn pages and move things around in our hands, sometimes there’s even haptic feedback — there is no analog for VUI other than human conversation.
Skeuomorphic material design uses depth and animation to feel like shuffling paper around on a desk. For Siri and Cortana to succeed they need to mimic consciousnesses who can parse complex meaning in human speech. The more artificial they seem by clearly misunderstanding, then the poorer the user experience and the greater the failure.
We need to practice contextually aware design. If I say, ‘Make it warmer’ in my house, something should know if I mean the toast or the temperature. So, successful voice interfaces must understand context. Thomas Hebner, quoted on p. 6
So, yeah, context is key. In “Does the best library web design eliminate choice?” I wrote about how services that anticipate your needs are built on three (or so) pillars: context, behavior, and personal data.
However, it can only be a user experience for one when personal data can expose real preferences — Michael loves science fiction, westerns, prefers Overdrive audiobooks to other vendors and formats — to automatically skip the hunt-and-peck and tailor a unique service.
Unlike forms and other visual interfaces, the input for voice can be loaded with meaning that must buffer through a ton of logic. Whether your VUI has to open an app, search the web, text a friend, place a hold, tell you about upcoming events (but not all 23 – that’d be too much), is determined by what it knows about you. That’s why Siri is a “personal assistant:” she knows your agenda and watches you sleep.
An uncanny valley of personalization can be overcome if the voice interface is useful enough.
For libraries, that killer app might be real close to home: our bread and butter — complex search.
Ok, Library: I need a recent full-text article about design patterns that’s been peer reviewed.Footnotes
- I used Aaron Schmidt’s redesigned library-guy graphic as the logo in the “Ok Library” mockup from his post, “National Library Symbol History & Implications.”
- The card in the same mockup, which reads “Stories of Strength,” is from a screenshot from the Alvin Sherman Library’s Lists site, where librarians make lists (which is what librarians like to do).
I write a weekly newsletter called the Web for Libraries, chock-full of data-informed commentary about user experience design, including the bleeding-edge trends and web news I think user-oriented thinkers should know. Take a minute to sign up!
We’re now in the final stretch for the 2015 Global Open Data Index, and will be publishing the results in the very near future! As a community driven measurement tool, this year we have incorporated feedback we’ve received over the past several years to make the Index more useful as an instrument for civil society — particularly around what data should be measured and what attributes are important for each dataset.
As a crowdsourced survey, we have taken extra steps to ensure the measurement instrument is more reliable. We are are aware that there is no perfect measurement that can be applied globally, but we aim to be as accurate as we possibly can. We have documented our processes year to year and inevitably not everything has been perfect, but by engaging in this process of experimentation, trial and error we hope the Global Open Data Index will continue to evolve as an innovative, grassroots, global tool for civil society to measure the state of open data.
The journey this year was long, but productive. Here is a recap of the steps we have taken in the long road to publishing the 2015 Index:
- Global consultation on new datasets — We sought your opinions and ideas for new themes that are important for civil society which should be added to the Index. As a result of this initiative, we have added 4 new datasets to this year’s Index, including: Government procurement tenders, Water Quality, Land Ownership and Weather forecast.
- Consultation on methodology —The Index team refined the definitions of the datasets based on feedback from open data advocates, researchers and communities from around the world. We have tightened the definitions of the datasets to allow for greater accuracy and comparability.
- Submissions phase – The crowdsourced phase where submissions are made to the Index with the help of the great index community and our new local index coordinators.
- Quality Assurance of the data — We added a preliminary stage of QA this year to conduct a systematic review of the license and machine readable questions — the two attributes that have given past submitters the most trouble.
- Thematic review with experts — This year, instead of conducting reviews of complete submissions by country or regional reviewers, we deployed expert thematic reviewers. Thematic reviewers assessed the submissions of all entries for a given dataset, and made sure that we are comparing the right datasets to one another between all 120 places included in this year’s Index, and that they were compliant with the new definition we made for each dataset .
Now, we are in the final phase of assessing the submissions for this year’s Index. After conducting a lengthy review phase, we seek your help to understand if we have evaluated the submissions correctly before finalizing the Index and publishing this year’s scores. In the next two weeks, from today until November 6, we will open the Index again to your comments. We encourage everyone to comment on the Index, civil society and governments alike.
Before you comment on a submission, note that we allowed thematic reviewers to apply their own logic to their review based on their expertise and assessment of the entire body of submissions across all places. This logic was grounded in the published definitions for each dataset, but allowed for some subjective flexibility in order to maintain a consistent review and account for the challenges faced by submitters, particularly in the cases of the datasets that were added this year and those with substantial changes to their definitions. Please read this section carefully before commenting on submissions. Note two things:
After careful consideration, we’ve omitted two datasets from the final scoring of the 2015 Index — public transport and health performance. We omitted public transport because 45 countries do not have a national level public transport system, which accounts for 37% of the Index sample. This does not allow an equal comparison between places. We omitted health performance data because we asked for two different datasets, and could record only one dataset faithfully in the Index system and as such it was almost impossible to score any of these entries as a unified submission. In both cases we will review the data and make it available for further investigation, and will see how we can make adjustments and incorporate these important datasets into future indexes. In some places, our reviewers could not complete their evaluation and needed more information. We would appreciate if you can help provide more information on any of these submissions. Any entry that displays a number ‘1’ in an orange circle on it needs further attention.
Here is a summary of the reviewers approaches to evaluating submissions for each dataset included in the 2015 Index:
Reviewer: Mor Rubinstein
The stated description of the Government Budget dataset is as follows:
National government budget at a high level. This category is looking at budgets, or the planned government expenditure for the upcoming year, and not the actual expenditure. To satisfy this category, the following minimum criteria must be met: Planned budget divided by government department and sub-department Updated once a year. The budget should include descriptions regarding the different budget sections.
Submissions that included data for both department AND sub-department/program were accepted. Submissions that included only department level data were not accepted. Additionally, budget speeches that did not include detailed data about the the estimated expenditures for the coming year were not accepted as a submission. Only datasets from an official source (e.g. The Ministry of Finance or equivalent agency) were accepted.
Reviewer: Tryggvi Björgvinsson
The stated description of the Government Spending dataset is as follows:
Records of actual (past) national government spending at a detailed transactional level; A database of contracts awarded or similar will not considered sufficient. This data category refers to detailed ongoing data on actual expenditure. Data submitted in this category should meet the following minimum criteria: Individual record of transactions. Date of the transactions Government office which had the transaction Name of vendor amount of the transaction Update on a monthly basis
Submissions that included aggregate data or simply procurement contracts (results of calls for tenders) were not accepted. In cases where aggregate data or procurement data was submitted or the submitter claimed that the data did not exist, an attempt was made to locate transactional data with a simple Google search and/or via IBP’s Open Budget Survey. If data was available for the previous year (or applicable recent budget cycle) the submission was adjusted accordingly and accepted.
Reviewer: Kamil Gregor
The stated description of the Election Results dataset is as follows:
This data category requires results by constituency / district for all major national electoral contests. To satisfy this category, the following minimum criteria must be met: Result for all major electoral contests Number of registered votes Number of invalid votes Number of spoiled ballots All data should be reported at the level of the polling station
Submissions that did not show the data at polling station level were omitted and marked as ‘Data does not exist’, even if votes are not counted at polling station level as a matter of policy. The reason for this is the polling station level is the most granular level that allow to monitor election fraud .
Reviewer: Rebecca Sentance
The stated description of the Company Register dataset is as follows:
List of registered (limited liability) companies. The submissions in this data category does not need to include detailed financial data such as balance sheet etc. To satisfy this category, the following minimum criteria must be met: Name of company Unique identifier of the company Company address Updated at least once a month
Data was marked as unsure if it exists when the submitted dataset did not contain address or a company ID. If the submission referenced a relevant government website that does not indicate the data exists, or if there is no evidence even which government body would hold the data, the submission was changed to ‘data does not exist’.. If it is clear that a governmental body collects company data, but there is no way of knowing what it consists of, where it is held, or how to access it, and no indication that it would fulfil our requirements, the submission was also marked as ‘data does not exist’.
Based on the definition, it was decided that a company register that is freely available to searchable by the public but requires entering a search term (search applications) did not count as free or publicly accessible. However, a company register that can be browsed through page-by-page does present all of the data and is the type of dataset required for acceptance.
Reviewer: Zach Christensen
The stated description of the National Statistics dataset is as follows:
Key national statistics such as demographic and economic indicators (GDP, unemployment, population, etc). To satisfy this category, the following minimum criteria must be met: GDP for the whole country updated at least quarterly Unemployment statistics updated at least monthly Population updated at least once a year
For each submission, the reviewer checked for national accounts, unemployment, and population data as required by the description. It was found that most countries don’t have these data for the last year and very few had quarterly GDP figures or monthly unemployment figures. Submissions were only marked as ‘data does not exist’ if they did not have any national statistics more recent than 2010.
Reviewer: Kamil Gregor
The stated description of the Legislation dataset is as follows:
This data category requires all national laws and statutes available to be available online, although it is not a requirement that information on legislative behaviour e.g. voting records is available. To satisfy this category, the following minimum criteria must be met: Content of the law / status If applicable, all relevant amendments to the law Date of last amendments Data should be updated at least on quarterly
Submissions were reviewed to ensure the data met the criteria. Regularity of updating was assessed based on the date of the most recently submitted data.
Reviewer: Yaron Michl
The stated description of the Pollutant Emissions dataset is as follows:
Aggregate data about the emission of air pollutants especially those potentially harmful to human health (although it is not a requirement to include information on greenhouse gas emissions). Aggregate means national-level or available for at least three major cities. In order to satisfy the minimum requirements for this category, data must be available for the following pollutants and meet the following minimum criteria: Particulate matter (PM) Levels Sulphur oxides (SOx) Nitrogen oxides (NOx) Volatile organic compounds (VOCs) Carbon monoxide (CO) Updated on at least once a week. Measured either at a national level by regions or at leasts in 3 big cities.
VOCs is a generic designation for many organic chemicals, therefore, when measuring VOCs it is possible to measure any one of a number of compounds such as Benzene or MTBE. Measurements of Volatile Organic compounds(VOCs) was ultimately not considered as part of the data requirements because of this discrepancy and the fact that it is rarely measured on a national level (see this link).
Carbon monoxide (CO) and Nitrogen Oxides (NoX) were also not considered as a requirement because their main origin is usually from transportation.
In addition, some countries publish air pollution by using the Air Quality Index, a formula that translates air quality data into numbers and colors to help citizens understand when to take action to protect their health. Submissions that relied on the Air Quality Index was considered not to exist because it is not raw data.
Government Procurement Tenders
Reviewer: Georg Neumann
The stated description of the Government Procurement Tenders dataset is as follows:
All tenders and awards of the national/federal government aggregated by office. Monitoring tenders can help new groups to participate in tenders and increase government compliance. Data submitted in this category must be aggregated by office, updated at least monthly & satisfy the following minimum criteria: Tenders: tenders name tender description tender status Awards: Award title Award description value of the award suppliers name
Quality of published information varied strongly and was not evaluated here. As long as the minimum information was available the data was said to exist for a given place.
Thresholds for publication of this information varies strongly by country. For all EU countries, tenders above a specific amount, detailed here, need to be published. This allowed for all EU submissions to qualify as publishing open procurement data even though some countries, such as Germany, do not publish award value for contracts below those thresholds, and others have closed systems to access specific information on contracts awarded.
In other countries not all sectors of government publish tenders and awards data. Submissions were evaluated to ensure that the main government tenders and contracts were made public, notwithstanding that data from certain ministries may have been missing.
Reviewer: Nisha Thompson
The stated description of the Water Quality dataset is as follows:
Data, measured at the water source, on the quality of water is essential for both the delivery of services and the prevention of diseases. In order to satisfy the minimum requirements for this category, data should be available on level of the following chemicals by water source and be updated at least weekly: fecal coliform arsenic fluoride levels nitrates TDS (Total dissolved solids)
If a country treats water or distributes it, then there will be data regarding water quality because all water treatment requires quality checks. Even though water quality is a local responsibility in most countries, very few countries have a completely decentralized system. Usually there is a monitoring role by the central government, either by the Environmental Protection Agency, Ministry of the Environment or the Ministry of Public Health. If there is monitoring role, the data does exist, if monitoring is completely decentralized, like in the UK, the submission was marked as ‘does not exist’ because there is no aggregation of the data. If data was not available daily or weekly it wasn’t considered timely.
In some cases, all the parameters were accounted for except TDS. Even though it is standard, some countries only collect conductivity, which can be used to calculate TDS. In this case, the submission was approved as is.
Reviewer: Codrina Maria Ilie
The stated description of the Land Ownership dataset is as follows: Cadaster showing land ownership data on a map and include all metadata on the land. Cadaster data submitted in this category must include the following characteristics: Land borders Land owners name Land size national level updated yearly For various reasons, the land owner’s name attribute was widely unmet and as such, lack of this data was not considered a factor in evaluating these submissions. As this dataset is subject to well-kept historic records (not always the case), to legislation (which can be fluctuant), to very expensive activities that a government must implement in order to keep data up to date, to the complexity of the data itself (sometimes data that makes a national cadastre is registred in different registries or systems), a first year indexing exercise must not be considered exhaustive.
Reviewers: Neal Bastek & Stephen Gates
The stated description of the Weather dataset is as follows:
5 days forecast of temperature, precipitation and wind as well as recorded data for temperature, wind and precipitation for the past year. In order to satisfy the minimum requirements for this category, data submitted should meet the following criteria: 5 days forecast of temperature updated daily 5 days forecast of wind updated daily 5 days forecast of precipitation updated daily Historical temperature data for the past year
Based on a general assessment of the submissions, a minimum threshold for claiming the data existed was set at forecast data for today + two days (three days) with a qualitative allowance made for arid regions substituting humidity data for precipitation data. The threshold for inclusion could also be met with four day forecasts that include temperature and precipitation data, and/or a generic statement using text or descriptive icons about conditions (e.g. windy, stormy, partly cloudy, sunny, fair, etc.).
Reviewer: Codrina Maria Ilie
The stated description of the Location dataset is as follows:
A database of postcodes/zipcodes and the corresponding spatial locations in terms of a latitude and a longitude (or similar coordinates in an openly published national coordinate system). If a postcode/zipcode system does not exist in the country, please submit a dataset of administrative borders. Data submitted in this category must satisfy the following minimum conditions Zipcodes Address Coordinate (latitude longitude) national level updated once a year Administrative boundaries Boarders poligone name of poligone (city, neighborhood) national level updated once a year
In cases in which a country has not adopted a postcode system, the location dataset is considered to be administrative boundaries. The Universal Postal Union – Postal Addressing System was used to identify the structure of a postcode for a given place [http://www.upu.int/en/activities/addressing/postal-addressing-systems-in-member-countries.html]. This tool proved significantly useful when identifying countries that do not use a postcode system.
In situations where countries only had a postcode search service, either by postcode or address, data was said to not exist. If the postcodes were not geocoded, submissions did not meet the Index requirements due to the difficulty of geocoding such a dataset. On the other hand, if the postcode system took into account just the smallest administrative boundary and if that boundary was officially available, considering the easiness of obtaining the geocoded postcodes number, the data was marked as ‘does exist’ for that submission.
National Map Reviewer: Gil Zaretzer
The stated description of the National Map dataset is as follows:
This data category requires a high level national map. To satisfy this category, the following minimum criteria must be met: Scale of 1:250,000 (1 cm = 2.5km). Markings of national roads National borders Marking of streams, rivers, lakes, mountains. Updated at least once a year.
Only submissions from an official source, with original data was considered. A link to Google Maps, which was often provided, does not satisfy the criteria for these submissions.
In cases where there was no link provided in the submission and entries were marked as “unsure” if there was any indication that the data exists but was not available online, i.e. a national mapping service without a website.
Yesterday, after years of contention and literally millions of grassroots messages opposing it, the US Senate passed CISA, the Cybersecurity Information Sharing Act (S. 754), by a vote of 74 – 21. The senate action comes despite broad public and private sector opposition for many reasons detailed in District Dispatch. As the NY Times reported, the move comes notwithstanding the fact that the bill would do nothing to prevent modern data breaches like those recently perpetrated by criminal rings and nation states against entities like Sony Pictures, the Office of Personnel Management or the Department of Defense’s non-classified email servers. Upon CISA’s passage, ALA released a statement by President Sari Feldman underscoring that:
“CISA won’t prevent cyberattacks like the Office of Personnel Management breach and other high-profile incidents cited by its sponsors. It will, however, weaken the privacy of millions of Americans and expose library and other computer systems to potentially damaging “defensive measures.” Sadly, with CISA, Congress has again traded civil liberty for a mirage of security.”
Prior to approving the final version of CISA, a “Manager’s Amendment” that made some improvements to the bill as introduced, the Senate also rejected a number of potentially constructive amendments. These included two that ALA and its many coalition partners particularly backed: one by Sen. Patrick Leahy (D-VT) to prevent FOIA from being weakened by adding a new and overbroad exemption, and another by Sen. Al Franken (D-MN) to better protect personal privacy by narrowing the statutory definition of what constitutes a “cyberthreat.” These proposals were rejected by 37 – 59 and 35 – 60, respectively. Two other pro-privacy amendments also were defeated.
The fight to improve CISA, though now an even more uphill battle will shift to informal talks between the House and Senate, which must reconcile their different approaches to “cybersecurity” as reflected in their disparate bills. To become law, a single bill will need to be reapproved by both chambers of Congress and signed by the President (who backed S. 754). Further action is not anticipated, however, before late this year or early 2016.
- EFF Disappointed as CISA Passes Senate
- Privacy-infringing Cyber-surveillance Bill Passes in Senate
- CISA Security Bill Passes Senate With Privacy Flaws Unfixed
- Senate Passes Dangerous Cybersecurity Information Sharing Act
One of my main research interests is in user experience design; specifically, how people see and remember information. Certain aspects of “seeing” information are passive; that is, we see something without needing to do anything. This is akin to seeing a “Return Materials Here” sign over a book drop: you see this area fills a function that you need, but other than looking for it and finding it, you don’t have to do much else. But how much of this do we actually acknowledge, little less remember?
Countless times I’ve seen patrons fly past signs that tell them exactly where they need to find a certain book or when our library opens. It’s information they need but for some reason they haven’t gotten. So how can we make this more efficient?
I visited the Boston Museum of Science recently and participated in their Hall of Human Life exhibit. Now, anyone can participate in an exhibit, especially in a science museum: turn the crank to watch water flow! Push a button to light up the circulatory system! Touch a starfish! I’ll call this “active passivity”: you’re participating but you’re doing so at a bare minimum. What little information you’re receiving may or may not stick.nudecelebvideoWho knew feet could be so interesting? (Photo courtesy of the Museum of Science, Boston)
The Hall of Human Life is different because it necessitates your input. You must give it data for the exhibit to be effective. For instance, I had to see how easily distracted I was by selecting whether I saw more red dots or blue dots while other images flashed across the screen. I had to position a virtual module on the International Space Station with only two joysticks to see how blue light affects productivity. I even had to take off my shoes and walk across a platform so I could measure the arch of my foot. All of my data is then compared with two hundred other museum-goers who gave their time and data based on my age, my sex, and other myriad factors such as how much time I spent sleeping the night before and whether or not I played video games.
But that’s not all of it. In order to do these things, you must wear a wristband with a barcode and a number on it. This stores your data and feeds it to each exhibit as well as keeps track of the data the exhibits give back to you. This way, you can see from home how many calories you burn while walking and how well you recognize faces out of a group.
Thus, in order for people to remember a bit of information, they need to experience it as much as possible. That’s all well and good for a science museum exhibit, but how would that work in a library, where almost all of our information is passively given? We need to take some things into consideration:
- The exhibit didn’t require participation, it invited it – I could’ve ignored the exhibit and kept on walking, but it was hard: there were bright colors, big pictures, lights, and sounds. It got your attention without demanding it. Since we humans love bright lights and pretty colors, the exhibit is asking us to come see what the fuss is about.
- The exhibit was accessible – I don’t necessarily mean ADA-type accessibility here (although it fit that, too). As I said before, the exhibit hall was bright and welcoming. In addition to being aesthetically pleasing, each station had a visual aide demonstrating what the exhibit was, how to participate, and how your results matched up. It directed you to look at different axes on a graph, for instance, and if it wanted to show you something in particular, it would highlight it. This made it easy for anyone of any age to come and play and – gasp – learn.
- The exhibit prompted you for your input – Not only did it prompt you to participate, it would ask you questions: “Does the data we’ve collected match what we thought we’d get?” “Do you think age, sex, or experience will affect on the results?” “Were your predicitons right?” The exhibits asked you to make decisions before, during, and after the activity, and it encouraged reflection.
You’re probably saying to yourself that as library staff we do try to invite participation, to be accessible, and ask for input. But it’s not as effective as it should – or could – be. It’s not feasible for all library systems to get touch screens and interactive devices (yet), but we can mould our information to require less active passivity and more action. Using bright colors, welcoming imagery, and memorable, punchy explanations is a start. Some libraries already have interactive kiosks but they may not offer a video guide to using it. Adding more lighting and windows can make a space more lively and inspire more focus in our patrons.
There’s still a lot more to learn about visual communication and how humans process and store information, and I certainly don’t claim to have all the answers. But these are the questions I’m starting to ask and starting to research, and by the looks of things, it’s not just libraries and museums that are doing the same.
A brief note to say that Hydra has accepted a generous offer from the Boston Public Library, Northeastern University, WGBH and the DPLA jointly to host Hydra Connect 2016. We’re now looking into possible dates and we’ll let you know what these are as soon as they’re finalized.
From Tim Donohue, DSpace Tech Lead
Winchester, MA Many core DSpace developers have been concentrating their efforts on upcoming DSpace 5.4 and 6.0 releases (links below). As a result the deadline for UI prototype submissions has been extended to Friday, December 4. We still ask that you only spend a maximum of 80 hours on your prototype (full guidelines may be found here). You are welcome to submit your prototype prior to the new deadline, if it is already nearing completion.
From Lance Stuchell, Digital Preservation Librarian, University Library, University of Michiganon behalf of the PDA 2016 Program Committee
Ann Arbor, Michigan We are pleased to announce that the annual Personal Digital Archiving 2016 conference will be hosted at the University of Michigan in Ann Arbor on May 12-14, 2016.
Today I found the following resources and bookmarked them on Delicious.
- VersionPress WordPress meets Git, properly. Undo anything (including database changes), clone & merge your sites, maintain efficient backups, all with unmatched simplicity.
Digest powered by RSS Digest
That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Jackie Shieh of George Washington University, Naun Chew of Cornell and Dawn Hale of Johns Hopkins University. Information professionals want to repurpose, present and connect the data they have created and curated from century-old standards and practices by publishing library metadata in the linked data framework. Recent linked data efforts have highlighted the importance of identifiers— a unique alphanumeric string associated with a digital object and resolvable globally over networks via specific protocols that is unambiguous to use, find, and identify the resource. Local identifiers cannot be shared or re-used. We need identifiers to be unchanging over time, and independent of where the digital object is or will be stored, that is, “persistent”. Persistent identifiers help collections become accessible globally, as they can be used, shared and re-used.
The practice for assigning identifiers has been inconsistent. Focus group members noted that maintaining identifiers and losing semantics when mapping one identifier system to another as particular challenges.
Identifiers for “works” has been problematic, as there is no consensus on what represents a distinct work. Two different workflows were mentioned: 1) find an OCLC work ID and add it to the local record and 2) use local algorithms to cluster records in the local catalog, assign a local identifier, and then match that ID with external sources such as the OCLC work ID.
The discussions were wide-ranging, but tended to focus on identifiers for personal names over other types of entities. The desire to present a comprehensive compilation of scholarly output on faculty profile pages has prompted a number of research libraries to roll out ORCIDs (Open Researcher and Contributor ID) for their faculty. ORCID is seen as a way to address the big gap that currently exists in the LC/NACO and other national authority files that do not customarily include authors of journal articles and other scholarly output. Authority files are used only within the library domain. Funding agencies have begun to require ORCIDs as part of the submission process. Few felt that current authority workflows would scale to cover all an institution’s researchers; some journal articles may have several hundred different “authors” listed from multiple countries. Some researchers are reluctant to use any identifier they are not already using. Faculty can be sensitive about keeping their data private and the potential of “surveillance” or “Big Brotherism” by their institution. Automated ways of comparing faculty output can be seen as threatening.
Some outstanding issues with name identifiers:
- Some researchers already have a half-dozen or more ORCIDs as well as other identifiers.
- Skeletal entries make it difficult to determine whether they represent the same or different people.
- ORCID relies on self-registration, so the deceased are not covered. To be comprehensive, more than one identifier system is needed.
- There’s an emerging need for a name reconciliation service that can link multiple identifiers representing the same person.
- For identifiers registered through VIVO, it’s unclear what happens when the person moves to a new institution, retires or dies.
- Libraries’ data suppliers and system vendors need to support persistent identifiers.
Identifiers for organizations are even more complex than those for persons, as organizations can merge, split, acquire other organizations, have multiple hierarchies, change locations, etc. The Representing Organizations in ISNI Task Group is documenting these issues and recommending some ways to better represent organizations with International Standard Name Identifiers (ISO 27729). These identifiers are important to accurately reflect researchers’ affiliations so that institutions can compile and report their scholarly output easily. Digital Science’s newly released GRID (Global Research Identifier Database) includes ISNI identifiers and maps institutions through GeoNames. GRID is seen as a way to help facilitate linking and promoting the work of the organization.
Identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs, locally-minted identifiers, PURL handles, DOIs (Digital Object Identifiers), URIs, URNs and ARKs (Archival Resource Keys). Some are using DataCite to mint and publish DOIs. Resources can have both multiple copies and versions and change over time. Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets. Libraries want to be able to link related pieces such as preprints, supplementary data and images with the publication. Multiple DOIs pointing to the same object pose a problem. Some libraries are considering using the EZID created by the California Digital Library to mint and publish unique, long-term identifiers and thus minimize the potential for broken citation links. Ideally, libraries would contribute to a hub for the metadata describing their researchers’ data sets regardless of where the data sets are stored.
About Karen Smith-Yoshimura
Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.Mail | Web | Twitter | More Posts (63)