Updated February 3, 2015
Total no. participating publishers & societies 5772
Total no. voting members 3058
% of non-profit publishers 57%
Total no. participating libraries 1926
No. journals covered 37,687
No. DOIs registered to date 72,062,095
No. DOIs deposited in previous month 471,657
No. DOIs retrieved (matched references) in previous month 41,726,414
DOI resolutions (end-user clicks) in previous month 134,057,984
We're all pretty excited about catching up with everyone at Code4Lib in Portland, Oregon next week. Karen Coombs, George Campbell and I will be going, along with Bruce Washburn and a couple of our other OCLC colleagues. Stop us and fill us in on what's new with you - we're anxious to hear about the projects you've been working on and what you'll be doing next. Or ask us about Developer House, our API Explorer, or whatever you'd like to know about OCLC Web services.
Last week, U.S. Senator Jack Reed (D-RI) joined Senate Appropriations Committee Chairman Thad Cochran (R-MS) in introducing the SKILLS Act (S.312). Key improvements to the program include expanding professional development to include digital literacy, reading and writing instruction across all grade levels; focusing on coordination and shared planning time between teachers and librarians; and ensuring that books and materials are appropriate students with special learning needs, including English learners.
The legislation would expand federal investment in school libraries so they can continue to offer students the tools they need to develop the critical thinking, digital, and research skills necessary for success in the twenty-first century.
“Effective school library programs are essential for educational success. Multiple education and library studies have produced clear evidence that school libraries staffed by qualified librarians have a positive impact on student academic achievement. Knowing how to find and use information are essential skills for college, careers, and life in general,” said Senator Reed, a member of the Senate Appropriations Committee, in a statement.
“Absent a clear federal investment, the libraries in some school districts will languish with outdated materials and technology, or cease to exist at all, cutting students off from a vital information hub that connects them to the tools they need to develop the critical thinking and research skills necessary for success,” Senator Reed continued. “This is a true equity issue, which is why I will continue to fight to sustain our federal investment in this area and why renewing and strengthening the school library program is so critical.”
“School libraries should be an integral part of our educational system,” said Chairman Cochran. “This bipartisan legislation is intended to ensure that school libraries are better equipped to offer students the reading, research and digital skills resources they need to succeed.”
The bipartisan SKILLS Act would further amend the Elementary and Secondary Education Act by requiring state and school districts plan to address the development of effective school library programs to help students gain digital literacy skills, master the knowledge and skills in the challenging academic content standards adopted by the state, and graduate from high school ready for college and careers. Additionally, the legislation would broaden the focus of training, professional development and recruitment activities to include school librarians.
he American Library Association (ALA) last week sent comments (pdf) to the U.S. Senate Committee on Health, Education, Labor, and Pensions (HELP) Chairman Sen. Lamar Alexander and member Sen. Patty Murray on the discussion draft to reauthorize the Elementary and Secondary Education Act.
The post Sens. Reed and Cochran introduce school library bill appeared first on District Dispatch.
Library of Congress: The Signal: Conservation Documentation Metadata at MoMA – An NDSR Project Update
The following is a guest post by Peggy Griesinger, National Digital Stewardship Resident at the Museum of Modern Art.
As the National Digital Stewardship Resident at the Museum of Modern Art I have had the opportunity to work with MoMA’s newly launched digital repository for time-based media. Specifically, I have been tasked with updating and standardizing the Media Conservation department’s documentation practices. Their documentation needs are somewhat unique in the museum world, as they work with time-based media artworks that are transferred from physical formats such as VHS and U-matic tape to a variety of digital formats, each encoded in different ways. Recording these processes of digitization and migration is a huge concern for media conservators in order to ensure that the digital objects they store are authentic representations of the original works they processed.
It is my job to find a way of recording this information that adheres to standards and can be leveraged for indexing, searching and browsing. The main goal of this project is to integrate the metadata into the faceted browsing system that already exists in the repository. This would mean that, for example, a user could narrow down a results set to all artworks digitized using a particular make and model of a playback device. This would be hugely helpful in the event that an error were discovered with that playback device, making all objects digitized using it potentially invalid. We need the “process history metadata” (which records the technical details of tools used in the digitization or migration of digital objects) to be easily accessible and dynamic so that the conservators can make use of it in innovative and viable ways.
The first phase of this project involved doing in-depth research into existing standards that might be able to solve our documentation needs. Specifically, I needed to find a standardized way to describe – in technical detail – the process of digitizing and migrating various iterations of a time-based media work, or what we call the process history of an object. This work was complicated by the fact that I had little technical knowledge of time-based media. This meant that I not only had to research and understand a variety of metadata standards but I also had to simultaneously learn the technical language being used to express them.
Fortunately, my education in audiovisual technology developed naturally through my extensive interviews and collaborations with the media conservators at MoMA. In order to decide upon a metadata standard to use, I needed to learn very specifically the type of information the conservators wanted to express with this metadata, and how that information would be most effectively structured. This involved choosing artworks from the collection and going over, in great detail, how these objects were assessed, processed, and, if necessary, digitized. After selecting a few standards (namely PBCore, PREMIS, and reVTMD) I thought were worth pursuing in detail, I mapped this information into XML to see if the standards could, in fact, adequately express the information.
Before making a final decision on which standard or combination of standards to use, I organized a metadata experts meeting to get feedback on my project. The discussion at this meeting was immensely helpful in allowing me to understand my project in the wider scope of the metadata world. I also found it extremely helpful to get feedback from experts in the field who did not have much exposure to the project itself, so that they could catch any potential problems or errors that I might not be able to see from having worked so closely with the material for so long.
One important point that was brought up at the meeting was the need to develop detailed use cases for the process history metadata in the repository. I talked with the media conservators at MoMA to see what intended uses they had for this information. To get an idea of the specific types of uses they foresee for this metadata, we can look at the use case for accessing process history metadata. This seems simple on the surface, but we had a number of questions to answer: How do users navigate to this information? Is it accessed at the artwork level (including all copies and versions of an artwork) or at the file level? How is it displayed? Is every element displayed, or only select elements? Where is this information situated in our current system? The discussions I had with the media conservators and our digital repository manager allowed us to answer these questions and create clear and concise use cases.
Developing use cases was simplified by two things:
1) we already had a custom-designed digital repository into which this metadata would be ingested and
2) we had a very clear idea of the structure and content of this metadata.
This meant we were very aware of what we had to work with, and what our potential limitations were. It was therefore very simple for us to know which use cases would be simple fixes and which would require developing entirely new functionalities and user interfaces in the repository. Because we had a good idea of how simple or complex each use case would be, we could prescribe levels of desirability to each use case to ensure the most important and achievable use cases were implemented first.
The next stop for this project will be to bring these use cases, as well as wireframes we have developed to reflect them, to the company responsible for developing our digital repository system. Through conversation with them we will begin the process of integrating process history metadata into the existing repository system.
As I pass the halfway point of my residency, I can look back on the work I have done with pride and look forward to the work still to come with excitement. I cannot wait to see this metadata fully implemented into MoMA’s time-based media digital repository as a dynamic resource for conservators to use and explore. Hopefully the tools we are in the process of creating will be useful to other institutions looking to make their documentation more accessible and interactive.
This semester, I have the exciting opportunity to work as an intern among the hum of computers and maze of cubicles at Indiana University’s Digital Library Program! My main projects include migrating two existing digital collections from TEI P4 to TEI P5 using XSLT. If you are familiar with XML and TEI, feel free to skim a bit! Otherwise, I’ve included short explanations of each and links to follow for more information.
Texts for digital archives and libraries are frequently marked up in a language called eXtensible Markup Language (XML), which looks and acts similarly to HTML. Marking up the texts allow them to be human- and machine-readable, displayed, and searched in different ways than if they were simply plain text.
The Text Encoding Initiative (TEI) Consortium “develops and maintains a standard for the representation of texts in digital form” (i.e. guidelines). Basically, if you wanted to encode a poem in XML, you would follow the TEI guidelines to markup each line, stanza, etc. in order to make it machine-readable and cohesive with the collection and standard. In 2007, the TEI consortium unveiled an updated form of TEI called TEI P5, to replace the older P4 version.
However, many digital collections still operate under the TEI P4 guidelines and must be migrated over to P5 moving forward. Here is where XSLT and I come in.
eXtensible Stylesheet Language (XSL) Transformations are used to convert an XML document to another text document, such as (new) XML, HTML or text. In my case, I’m migrating from one type of XML document to another type of XML document, and the tool in between, making it happen, is XSLT.
Many utilize custom XSLT to transform an XML representation of a text into HTML to be displayed on a webpage. The process is similar to using CSS to transform basic HTML into a stylized webpage. When working with digital collections, or even moving from XML to PDF, XSLT is an invaluable tool to have handy. Learning it can be a bit of an undertaking, though, especially adding to an already full work week.
I have free time, sign me up!
Here are some helpful tips I have been given (and discovered) in the month I’ve been learning XSLT to get you started:
- Register for a tutorial.
Lynda.com, YouTube, and Oracle provide tutorials to get your feet wet and see what XSLT actually looks like. Before registering for anything with a price, first see if your institution offers free tutorials. Indiana University offers an IT Training Workshop on XSLT each semester.
- Keep W3Schools bookmarked.
Their XSLT page acts as a self-guided tutorial, providing examples, function lists, and function implementations. I access it nearly every day because it is clear and concise, especially for beginners.
- Google is your best friend.
If you don’t know how to do something, Google it! Odds are someone before you didn’t have your exact problem, but they did have one like it. Looking over another’s code on StackOverflow can give you hints to new functions and expose you to more use possibilities. **This goes for learning every coding and markup language!!
- Create or obtain a set of XML documents and practice!
A helpful aspect of using Oxygen Editor (the most common software used to encode in XML) for your transformations is that you can see the results instantly, or at least see your errors. If you have one or more XML documents, figure out how to transform them to HTML and view them in your browser. If you need to go from XML to XML, create a document with recipes and simply change the tags. The more you work with XSLT, the simpler it becomes, and you will feel confident moving on to larger projects.
- Find a guru at your institution.
Nick Homenda, Digital Projects Librarian, is mine at IU. For my internship, he has built a series of increasingly difficult exercises, where I can dabble in and get accustomed to XSLT before creating the migration documents. When I feel like I’m spinning in circles, he usually explains a simpler way to get the desired result. Google is an unmatched resource for lines of code, but sometimes talking it out can make learning less intimidating.
Note : If textbooks are more your style, Mastering XSLT by Chuck White lays a solid foundation for the language. This is a great resource for users who already know how to program, especially in Java and the C varieties. White makes many comparisons between them, which can help strengthen understanding.
If you have found another helpful resource for learning and applying XSLT, especially an online practice site, please share it! Tell us about projects you have done utilizing XSLT at your institution!
This is a cross-post from the Open Knowledge Switzerland blog, see the original here.
It has been a big year for us in Switzerland. An openness culture spreading among civil administration, NGOs, SMEs, backed by the efforts of makers, supporters and activists throughout the country, has seen the projects initiated over the past three years go from strength to strength – and establish open data in the public eye.
Here are the highlights of what is keeping us busy – and information on how you can get involved in helping us drive Open Knowledge forward, no matter where you are based. Check out our Storify recap, or German- and French-language blogs for further coverage.
To see the Events Calendar for 2015, scroll on down.2014 in review #sports
Our hackdays went global, with Milan joining Basel and Sierre for a weekend of team spirit and data wrangling. The projects which resulted ranged from the highly applicable to the ludicrously inventive, and led us to demand better from elite sport. The event was a starting point for the Open Knowledge Sports Working Group, aiming to “build bridges between sport experts and data scientists, coaches and communities”. We’re right behind you, Rowland Jack!#international
The international highlight of the year was a chance for a sizeable group of our members to meet, interact and make stuff with the Open Knowledge community at OK Festival Berlin. Unforgettable! Later in the year, the Global Open Data Index got journalists knocking on our doorstep. However, the recently opened timetable data is not as open as some would like to think – leading us to continue making useful apps with our own open Transport API, and the issuing of a statement in Errata.#community
The yearly Opendata.ch conference attracted yet again a big crowd of participants to hear talks, participate in hands-on workshops, and launch exciting projects (e.g. Lobbywatch). We got some fantastic press in the media, with the public encouraged to think of the mountains of data as a national treasure. At our annual association meeting we welcomed three new Directors, and tightened up with the Wikimedia community inviting us to develop open data together.#science
CERN’s launch of an open data portal made headlines around the world. We were excited and more than a little dazzled by what we found when we dug in – and could hardly imagine a better boost for the upcoming initiative OpenResearchData.ch. Improving data access and research transparency is, indeed, the future of science. Swiss public institutions like the National Science Foundation are taking note, and together we are making a stand to make sure scientific knowledge stays open and accessible on the Internet we designed for it.#politics
Swiss openness in politics was waymarked in 2014 with a motion regarding Open Procurement Data passing through parliament, legal provisions to opening weather data, the City of Zürich and Canton of St.Gallen voting in commitments to transparency, and fresh support for accountability and open principles throughout the country. This means more work and new responsibility for people in our movement to get out there and answer tough questions. The encouragement and leadership on an international level is helping us enormously to work towards national data transparency, step by step.#government
The Swiss Open Government Data Portal launched at OKCon 2013 has 1’850 datasets published on it as of January 2015, now including data from cantons and communes as well as the federal government. New portals are opening up on a cantonal and city level, more people are working on related projects and using the data in their applications to interact with government. With Open Government Data Strategy confirmed by the Swiss Federal Council in April, and established as one of the six priorities of the federal E-Government action plan, the project is only bound to pick up more steam in the years ahead.#finance
With Open Budget visualisations now deployed for the canton of Berne and six municipalities – including the City of Zurich, which has officially joined our association – the finance interest group is quickly proving that it’s not all talk. Spending data remains a big challenge, and we look forward to continuing the fight for financial transparency. This cause is being boosted by interest and support from the next generation, such as the 29 student teams participating in a recent Open Data Management and Visualization course at the University of Berne.#apps/#apis
We may be fast, but our community is faster. Many new open data apps and APIs have been released and enhanced by our community: New open data projects were released by the community: such as WindUndWetter.ch and SwissMetNet API, based on just-opened national weather data resulting from a partial revision of the Federal Act on Meteorology and Climatology. Talk about “hold your horses”: a city waste removal schedule app led to intense debate with officials over open data policy, the results making waves in the press and open data developers leading by doing.#culture
An OpenGLAM Working Group started over the summer, and quickly formed into a dedicated organising committee of our first hackathon in the new year. Towards this at least a dozen Swiss heritage institutions are providing content, data, and expertise. We look forward to international participants virtually and on-location, and your open culture data!What’s coming up in 2015
Even if we do half the things we did in ‘14, a big year is in store for our association. Chances are that it will be even bigger: this is the year when the elections of the Federal Council are happening for the first time since our founding. It is an important opportunity to put open data in the spotlight of public service. And we are going to be busy running multiple flagship projects at the same time in all the areas mentioned.
Here are the main events coming up – we will try to update this as new dates come in, but give us a shout if we are missing something:
- 21. January: Open Finance and Participatory Budgeting, Bern
- 3. February: FlashHack with OpenCorporates, Zurich
- 6. February: Data Canvas Visualization Challenge, Lift15, Geneva
- 21. February: International Open Data Day
- 27. & 28. February: Open Cultural Data Hackathon, Bern
- 05. & 06. June: Open Research Data Hackdays, Lausanne & Basel
- 01. July: Opendata.ch Conference 2015, Bern
- 04. & 05. September: Election Hackdays 2015, Lausanne & Zurich
So, happy new year! We hope you are resolved to make more of open data in 2015. The hardest part may be taking the first step, and we are here for sport and support.
There is lots going on, and the easiest way to get started is to take part in one of the events. Start with your own neighbourhood: what kind of data would you like to have about your town? What decisions are you making that could benefit from having a first-hand, statistically significant, visually impressive, and above all, honest and critical look at the issue?
Lots is happening online and offline, and if you express interest in a topic you’re passionate about, people are generally quick to respond with invitations and links. To stay on top of things we urge you to join our mailing list, follow us on social media, and check out the maker wiki and forum. Find something you are passionate about, and jump right in! Reach out if you have any questions or comments.
I thought I might take a break to post an amusing photo of something I wrote out today:
The photo is a first draft of a database schema for a revamp of how LibraryThing will do library subjects. All told, it has 26 tables. Gulp.
About eight of the tables do what a good cataloging system would do:
- Distinguishes the various subject systems (LCSH, Medical Subjects, etc.)
- Preserves the semantic richness of subject cataloging, including the stuff that never makes it into library systems.
- Breaks subjects into their facets (e.g., “Man-woman relationships — Fiction”) has two subject facets
Most of the tables, however, satisfy LibraryThing’s unusual core commitments: to let users do their own thing, like their own little library, but also to let them benefit from and participate in the data and contributions of others.(1) So it:
- Links to subjects from various “levels,” including book-level, edition-level, ISBN-level and work-level.
- Allows members to use their own data, or “inherit” subjects from other levels.
- Allows for members to “play librarian,” improving good data and suppressing bad data.(2)
- Allows for real-time, fully reversible aliasing of subjects and subject facets.
The last is perhaps the hardest. Nine years ago (!) I compared LibraryThing to the “Ship of Theseus,” a ship which is “preserved” although its components are continually changed. The same goes for much of its data, although “shifting sands” might be a better analogy. Accounting for this makes for some interesting database structures, and interesting programming. Not every system at LibraryThing does this perfectly. But I hope this structure will help us do that better for subjects.(3)
Weird as all this is, I think it’s the way things are going. At present most libraries maintain their own data, which, while generally copied from another library, is fundamentally siloed. Like an evolving species, library records descend from each other; they aren’t dynamically linked. The data inside the records are siloed as well, trapped in a non-relational model. The profession that invented metadata, and indeed invented sharing metadata, is, at least as far as its catalogs go, far behind.
Eventually that will end. It may end in a “Library Goodreads,” every library sharing the same data, with global changes possible, but reserved for special catalogers. But my bet is on a more LibraryThing-like future, where library systems will both respect local cataloging choices and, if they like, benefit instantly from improvements made elsewhere in the system.
When that future arrives, we got the schema!1. I’m betting another ten tables are added before the system is complete.
2. The system doesn’t presume whether changes will be made unilaterally, or voted on. Voting, like much else, existings in a separate system, even if it ends up looking like part of the subject system.
3. This is a long-term project. Our first steps are much more modest–the tables have an order-of-use, not shown. First off we’re going to duplicate the current system, but with appropriate character sets and segmentation by thesaurus and language.
I think there are two keys to why I was a successful electrical engineer, when I did not (initially) succeed as a computer scientist—despite being more interested in the latter, to begin with, and despite wanting to pursue the latter now.The first key: invisible struggle, no displays of fallibility
I went to the University of Virginia as an undergrad. I transferred into the Engineering School a year in, which put me approximately one semester behind my peers. I chose Electrical Engineering (EE) instead of Computer Science (CS), even though it was a CS major who convinced me to switch. You see, I fell for a lot of the misconceptions laid out in Unlocking the Clubhouse: despite evidence to the contrary (I earned a high enough grade in the class to be hired as a teaching assistant for CS 101 in my second semester of college), I didn’t believe I could compete* with people who had been programming for their whole lives; and I vastly over-estimated how many people really fell into that bucket.
Also, because nobody told me that programming is hard for everyone when they start, I didn’t think CS was a field where I could be successful. I didn’t see everyone around me struggling, the way I did in my first EE class (which, to be fair, was pretty hellish).
I think that points to an important difference between my EE and CS education: I saw other EEs fail as often as I failed. Although it sounds that way, this isn’t me being modest, or feeling like an impostor, or anything else; I worked very hard and did very well. But I also know that a large part of my success comes from my peers and me taking time outside of class to teach each other the things the faculty didn’t see fit to impart; the homework assignments were too hard for us to do otherwise, because we were (deliberately?) not taught how to solve our homework problems in class. This is a common experience in engineering and, apparently, in physics — this Medium article does a fantastic job of explaining what’s wrong with teachers refusing to teach (though it comes with a trigger warning: it was written in the aftermath of a professor sexually harassing students).
So, one key to how my EE experience differs from CS is that I got to see my peers struggling, and it got me past my initial concern that they had all been tearing apart VCRs and putting them back together since the age of 10. (It was a very specific fear: I remember, it was VCRs, specifically, not watches or robots or anything else. Perhaps that points to a lack of imagination on my part.)
In CS, all of the assigned work was individual, and the focus on the school’s Honor Code meant that we were afraid to work together. I saw other CS students in the computer lab, but I didn’t know they were struggling as hard as I was. Even after working as a TA and helping people through their struggles, it took me more than a decade to internalize the fact that CS, like most things, is hard for beginners.
So, key one: In CS I kept believing the “everyone has been programming forever” lie, combined with the “I am not naturally good at this, and other people are” lie. In EE it was actively disproven, pretty much immediately.The second key: starting with ‘hello world’
But there was one other key to my success as an electrical engineering student: I took the “intro to EE for non EEs” course that they were piloting at the time—even though, unfortunately (for them), most of my colleagues didn’t join me in taking it. In that class we got an introduction to the broader field, with short descriptions of the various sub-fields of EE and beginner-level introductions to concepts we would later be taking in-depth classes on. The portion of the class dealing with information theory and signal processing gave me the background to understand several really difficult subjects when they were introduced (poorly) in 300-level classes, and that confidence (bolstered by the experience of explaining it to some of my peers) ultimately led me to double-specialize in “Communications” (by which I mean wireless engineering, signal processing, etc.), along with “Computers/Digital” (processor and chip design, etc.).
I would probably not have become a wireless engineer without that experience.
CS, on the other hand, had nothing like that. CS 101 was “Hey, here’s how you program really simple stuff in C++. Also, ignore half of what you’re typing.” It wasn’t “Here are the sub-fields of computer science,” or “Here are introductory-level explanations of some of the important stuff we’ll talk about later,” both of which would have been better.
CS 101 should be an introduction to the field of computer science and computer programming, not a first programming course. It should consist of a little Boolean logic, maybe some control flow (i.e. loops), and some basic information about data structures; then, “here’s what an algorithm is”; then, some high-level information about computer networks; then, maybe slip in something about software testing and/or version control; and, finally, it should definitely include an exploration of the differences between web programming, DevOps, middleware, and math-heavy CS research. Not only would that class help people understand the field and how they might like to be part of it; it would also improve interview questions, later on. (Seriously, front-end developers don’t need to know how to implement QuickSort!)
There are lots of important changes we should be making to the way CS is taught, but when we’re looking at how to find and retain students for a four-year major, I think adding a high-level class before beginning programming would help tremendously. It’s certainly better than the then-popular (and, I sincerely hope, now-outmoded) practice of making the second programming course into a “weeding” class—a course so hard that half the students quit or fail, then change majors. And I think that, in the process of designing the intro course’s curriculum, the CS faculty might find themselves rethinking the whole major. So, yes, you could say I’m proposing a band-aid, and I agree; but it might also be a first step to structural change.
*In an environment where grades are issued on a curve, education is a competition. Assignments and tests were so hard at UVA’s engineering school that one time I got 38% on a midterm, and that translated to an A. (back)
John Miedema: Lila is cognitive writing technology built on top of software like Evernote. Key differences.
Writers everywhere benefit from content management software like Evernote. Evernote can collect data from multiple devices and locations and organize it into a single writing repository. Evernote is beautiful software. For the last few years, I have been using Google Drive to collect notes. Recently I tried Evernote again, and I am impressed enough to switch. Notebooks, tags, collaboration, web clipping, related searches. All very nice.
Lila is cognitive writing technology built on top of software like Evernote. Here are some key differences between the products:
1. Evernote users read long-form content manually, decide if it is relevant, and then write notes to integrate it into their project. Lila will pre-read content for users and embed relevant notes (slips) in the context of the user’s writing. This will save the writer lots of reading and evaluation time.
2. Evernote users get “related searches” from a very limited number of web sources. Lila will perform open web searches for related content.
3. Evernote users can visualize a limited number of connections between notes. I am yet to get any utility out of this. Lila will use natural language search to generate a vast number of connections between notes, allowing a user to quickly understand complex relationships between notes.
4. Evernote users can use tags to construct a hierarchical organization of content. Notebooks can only have one sub-level of categorization, essentially chapters, but many writers need additional levels of classification. Tags can be ordered hierarchically and if you prefix them with a number they will sort in a linear order. You can use tags for hierarchical classification but it creates problems.
- If you want both categories and tags, you will have to use a naming convention to split tags into two types.
- Numbering tags causes them to lose type-ahead look-up functionality, i..e, you have to start by typing the number. It is a problem because numbers can be expected to change often.
- If you decide to insert a category in the middle of two tags, you have to manually re-number all the tags below.
- Tags are shared between Notebooks. Maybe that works for tags? Not for hierarchical sectioning of a single work.
None of these problems are technically insurmountable. I hope Evernote comes out with enhancements soon. I would like to build Lila on top of Evernote. Lila has something to add. To be cognitive means an inherent ability to automate hierarchical classification. Lila will be able to suggest hierarchical views, different ways of understanding the data, different choices for what could be a table of contents.
Last night [some days ago now], the Arbitration Committee for English Wikipedia reached a final decision on the case: https://en.wikipedia.org/wiki/Wikipedia:Arbitration/Requests/Case/GamerGate#Final_decision. The Committee chose to issue one complete site ban for a male editor, citing a pattern of disruptive behavior that included more than 20 lesser sanctions since 2006. No other Wikipedia editors received site-wide bans.
We can confirm that in addition to a single site-wide ban, the Committee issued and endorsed nearly 150 warnings, sanctions, or topic bans to other editors from various sides of the case. We can clarify that of the eleven Committee-issued topic bans, only one was applied to an editor who identifies as female. All of the sanctioned editors have the right to appeal in the future: over the years, the Committee has approved appeals if they are found to no longer be necessary.
Some reporting portrayed this case as a referendum on Gamergate itself, or as a purge of women or feminist voices from Wikipedia. That mischaracterizes the case, the role of the volunteer Arbitration Committee, and the nature of their findings.
The Committee does not consider the content of articles, it only focuses on the behavior of editors. This decision was also not a purge. Only one user has been removed from Wikipedia. Finally, it is not intended as a referendum on Gamergate — what is right, what is wrong, and its place in broader discourse — and should not be understood that way. That discussion may be necessary, but it is better suited for another forum.
Wikipedia is an encyclopedia. It is also the largest free knowledge resource in human history — and it is written by people from all over the world, often from very different backgrounds, who may hold differing points of view. This is made possible thanks to a fundamental principle of mutual respect: respectful discourse, and respect for difference and diversity.
The Wikimedia Foundation offers resources for programs and outreach with our partners across the global Wikimedia movement, and engage people that have been underrepresented in traditional encyclopedias. These include women, people of color, people from the Global South, immigrant communities, and members of the LGBTQ community. They are invaluable contributors to our community and partners in our mission.
For Wikipedia to represent the sum of all knowledge, it has to be a place where people can collaborate and disagree constructively even on difficult topics. It has to be a place that is welcoming for all voices. This is essential to ensuring people are free to to focus on being creative and constructive, and contributing to this remarkable collective human achievement.
For more on our stance on this issue, please see a blog post we released this week: https://blog.wikimedia.org/2015/01/27/civility-wikipedia-gamergate/.
I am sorry for jumping the gun on this, and I am deeply sorry for mischaracterizing the situation. I hope that this post can help to rectify any damage I may have unwittingly caused. Mea culpa.
On Monday of this week, legislators introduced two bipartisan Freedom of Information Act (FOIA) bills in both the U.S. House (H.R. 653) and the U.S. Senate (S.337). Representative Darrell Issa (R-CA) introduced H.R. 653, with Elijah Cummings (D-MD) and Mike Quigley (D-IL) cosponsoring. The bill was referred to the House Committee on Oversight and Government Reform.
Action in the Senate was slightly more interesting; not only did Senator John Cornyn (R-Tx) introduce S. 337 with Patrick Leahy (D-Vt) and Charles Grasssley (R-LA)—the ranking member and chair of the Senate Judiciary Committee cosponsoring—but the Senate Judiciary Committee today passed the bill out of committee!
Earlier today, ALA joined with forty-six other groups to state our support for these bills (pdf) and to thank these men for introducing them. As the letter states, “Public oversight is critical to ensuring accountability, and the reforms embodied in both the FOIA Oversight and Implementation Act (H.R. 653), introduced by Representatives Issa and Cummings, and the FOIA Improvement Act of 2015 (S.337), introduced by Senators Cornyn and Leahy, are necessary to enable that oversight.”
It’s exciting to have this legislation be introduced and move so early in the 114th Congress and we will keep you informed as things move forward!
Today Federal Communications Commission (FCC) Chairman Tom Wheeler will circulate his network neutrality proposal to fellow Commissioners in preparation for a February 26 vote. While we can’t read the detailed draft as it is not yet public, the Chairman did outline his plans in a Wired op-ed and fact sheet released yesterday. To paraphrase our Vice President, this is a Big Deal.
“I am submitting to my colleagues the strongest open Internet protections ever proposed by the FCC. These enforceable, bright-line rules will ban paid prioritization, and the blocking and throttling of lawful content and services,” Chairman Wheeler writes.
Today, the American Library Assicuation (ALA) President Courtney Young responded: “I am very pleased that Chairman Wheeler’s outlined proposal matches the network neutrality principles ALA and nearly a dozen library and higher education groups called for last July. America’s libraries collect, create and disseminate essential information to the public over the Internet, and enable our users to create and distribute their own digital content and applications. Network neutrality is essential to meeting our mission in serving America’s communities and preserving the Internet as a platform for free speech, innovation, research and learning for all.”
In a nutshell, the proposal:
- Asserts FCC authority under both Title II of the Communications Act and Section 706 of the Telecommunications Act of 1996 to provide the strongest possible legal foundation for network neutrality rules;
- Applies network neutrality protections to both fixed and mobile broadband (which the ALA, Association of Research Libraries and EDUCAUSE advocated for—unsuccessfully—in the 2010 Open Internet Order and in our most recent filings);
- Prohibits blocking or degrading access to legal content, applications, services and non-harmful devices; as well as banning paid prioritization, or favoring some content over other traffic;
- Allows for reasonable network management while enhancing transparency rules regarding how Internet service providers (ISPs) are doing this;
- Creates a general Open Internet standard for future ISP conduct;
- Identifies major provisions of Title II that will apply and others that will be subject to forbearance (i.e., not enforced).
Among the provisions that will be enforced are sections that assert no “unjust and unreasonable practices” (Sections 201 and 202), protect consumer privacy (Section 222), protect people with disabilities (Sections 225 and 226) and parts of Section 254, which includes the E-rate program and other Universal Service Fund (USF) programs. After the recent successful completion of E-rate program modernization to better enable affordable access to high-capacity broadband through libraries and schools, ALA has a particular interest in safeguarding FCC authority related to the Universal Service Fund. We agree the new Order should not automatically apply any new USF fees, but we would like to better understand how a partial application of Section 254 will work in practice. We’re reaching out to the FCC on this question now.
As always, more information on libraries and network neutrality is available on the ALA website and we’ll keep blogging here on the Dispatch.
The post It’s a Big Deal: FCC Chairman outlines strong network neutrality protections appeared first on District Dispatch.
Open Knowledge Foundation: Open Knowledge Belgium: Bringing Together Open Communities, Policy Makers & Industry
Open Knowledge Belgium to host The Second Edition of Open Belgium in Namur on Feb 23rd, 2015! Register Today!
On 23 February, Open Knowledge Belgium is hosting the second edition of Open Belgium, an event expected to attract over 200 people, coming together to learn and discuss the growing open knowledge movement in Belgium. This year Open Knowledge Belgium is hosting the conference, together with our Walloon colleagues and partners, at the Palais des Congrès in Namur.
The jam-packed programme is not to be missed! With over 35 speakers, the objective of the day is unpack challenges, explore opportunities and learn about technological developments as they relate to Open Data and Open Knowledge. The event presents an ideal opportunity to exchange best practices with national and international experts.The conference program includes:
The conference will open with a panel discussion on the state-of-play of open data and open knowledge in Belgium, followed by a series of keynote talks and eight participatory workshops!
A panel discussion on Open data in Belgium, with representatives from the federal and regional governments.
A Series of Keynotes
- Jörgen Gren of DG Connect on the future of Open Data in Europe
- Dimitri Brosens of the institute of Nature and Forests (INBO) becoming an open research institut
- Thomas Hermine (Nextride) and Antoine Patris (TEC) on how opening up Walloon public transport data offers new opportunities and economic value.
Eight Participatory Workshops:
Following the keynotes, participants will have the opportunity to participate in eight workshops focused on specific themes and organised by national and international experts.
- Open Transport, from data source to journey planner (moderated by Pieter Colpaert)
- Open Culture, tackling barriers with benefits (Barbara Dierickx)
- Open Tools, using tools to release the full Open Data potential (Philippe Duchesne)
- Open Tourism, the importance of framing the scheme online efforts (Raf Buyle)
- OpenStreetMap, the importance of working with communities (Ben Abelshausen)
- Open Science, going beyond open access (Gwen Franck)
- Local Open Data efforts in Belgium (Wouter Degadt)
- Emerging Open Data business models (Tanguy De LESTRE).
Open Knowledge Belgium will close the day with networking drinks on a rooftop terrace overlooking the city of Namur.
Practical information and registration
- Date and Location: Monday, February 23, 2015 in [Namur Palais des Congrès](http://2015.openbelgium.be/practical/)
- Admission: € 130 – [Register online](http://2015.openbelgium.be/registrations/)
- Contact the organisers: firstname.lastname@example.org
Code4Lib is an annual, volunteer-organized conference focused on the intersection of technology and cultural heritage. DPLA is participating heavily in Code4Lib 2015, taking place on February 9 – 12 in Portland, Oregon. Here’s a handy guide detailing some of the key places they’ll be and how you can connect with them.
- Monday, February 9 (9 AM – noon): Tom Johnson (DPLA Metadata and Platform Architect) will lead a Linked Data Workshop with Karen Estlund (University of Oregon).
- Monday, February 9 (1:30 – 4:30 PM): Tom Johnson, Mark Matienzo (DPLA Director of Technology), Mark Breedlove (DPLA Technology Specialist), Audrey Altman (DPLA Technology Specialist), Gretchen Gueguen (DPLA Data Services Coordinator), and Amy Rudersdorf (DPLA Assistant Director for Content) will lead an introductory workshop on the DPLA API.
- Wednesday, February 11 (4:30 PM): Audrey Altman, Mark Breedlove, and Gretchen Gueguen will present on DPLA’s new ingestion system. The presentation is entitled, “Heiðrún: DPLA’s Metadata Harvesting, Mapping and Enhancement System.”
Beyond these formal opportunities to connect, these folks are eager to chat and answer questions about timely topics including the Community Reps application, DPLAfest 2015, and DPLA’s recent work upgrading its ingestion system.
In addition to staff participation, and in keeping with DPLA’s broader commitment to diversity, DPLA has also supported Code4Lib 2015 by helping to sponsor one of the Code4Lib 2015 Diversity Scholarships as part of the Code4Lib community.
Questions about where specific DPLA staffers will be at Code4Lib 2015? Drop one of us a line!
Backblaze now have over 41K drives ranging from 1.5TB to 6TB spinning. Their data for a year consists of 365 daily tables each with one row for each spinning drive, so there is a lot of it, over 12M records. The 4TB disk generation looks good:
We like every one of the 4 TB drives we bought this year. For the price, you get a lot of storage, and the drive failure rates have been really low. The Seagate Desktop HDD.15 has had the best price, and we have a LOT of them. Over 12 thousand of them. The failure rate is a nice low 2.6% per year. Low price and reliability is good for business.
The HGST drives, while priced a little higher, have an even lower failure rate, at 1.4%. It’s not enough of a difference to be a big factor in our purchasing, but when there’s a good price, we grab some. We have over 12 thousand of these drives.Its too soon to tell about the 6TB generation:
Currently we have 270 of the Western Digital Red 6 TB drives. The failure rate is 3.1%, but there have been only 3 failures. ... We have just 45 of the Seagate 6 TB SATA 3.5 drives, although more are on order. They’ve only been running a few months, and none have failed so far.What grabbed all the attention was the 3TB generation:
The HGST Deskstar 5K3000 3 TB drives have proven to be very reliable, but expensive relative to other models (including similar 4 TB drives by HGST). The Western Digital Red 3 TB drives annual failure rate of 7.6% is a bit high but acceptable. The Seagate Barracuda 7200.14 3 TB drives are another story.Their 1163 Seagate 3TB drives with an average age of 2.2 years had an annual failure rate (AFR) over 40% in 2014. Backblaze's economics mean that they can live with a reasonably high failure rate:
Double the reliability is only worth 1/10th of 1 percent cost increase. ...
Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.
The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).
Moral of the story: design for failure and buy the cheapest components you can. :-)40% AFR is really high, but labor to replace the failed drives would still have cost less than $8/drive. The cost isn't the interesting aspect of this story. The drives would have failed at some point anyway, incurring the replacement labor cost. The 40% AFR just meant the labor cost, and the capital cost of new drives, was incurred earlier than expected, reducing the return on the investment in purchasing those drives.
Alas, there is a long history of high failure rates among particular batches of drives. An experience similar to Backblaze's at Facebook is related here, with an AFR over 60%. My first experience of this was nearly 30 years ago in the early days of Sun Microsystems. Manufacturing defects, software bugs, mishandling by distributors, vibration resonance, there are many causes for these correlated failures. It is the correlated failures that make the interesting connection with the Self-Repairing Disk Arrays paper.
The first thing to note about the paper is that Paris et al are not dealing with Backblaze-scale arrays:
These solutions are not difficult to implement in installations that have trained personnel on site round-the-clock. When this is not the case, disk repairs will have to wait until a technician can service the failed disk. There are two major disadvantages to this solution. First, it introduces an additional delay, which will have a detrimental effect on the reliability of the storage system. Second, the cost of the service call is likely to exceed that of the equipment being replaced.4-slot DroboThe first problem with the paper is that there has been a technological solution to this problem for a decade since Data Robotics (now Drobo) introduced the Drobo. I've been using them ever since. They are available in configurations from 4 to 12 slots and in all cases when a drive fails the light by the slot flashes red. All that is needed is to pull out the failed drive and push in a replacement disk the same size or bigger. The Drobo's firmware handles hot-swapping and recovers the failed drive's data with no human intervention. No technician and much less than 15 minutes per drive needed.
The second problem is that although the paper's failure model is based on 2013 failure data from Backblaze, it appears to assume that the failures are uncorrelated. The fact that errors in storage systems are correlated has been known since at least the work of Talagala at Berkeley in 1999. Correlated failures such as those of the 3TB Seagate drives at Backblaze in 2014 would invalidate the paper's claim that:
we have shown that several complete two-dimensional disk arrays with n parity disks, n ( n– 1)/2 data disks, and less than n ( n + 1)/2 data disks could achieve a 99.999 percent probability of not losing data over four years.A 99.999 percent probability would mean that only 1 in 100,000 arrays would lose data in 4 years. But the very next year's data from their data source would probably have caused most of the arrays to lose data. When designing reliable storage, the failure model needs to be pessimistic, not average. And it needs to consider correlated failures, which is admittedly very hard to do.
[Thanks to Geneva Henry, University Librarian and Vice Provost for Libraries at the George Washington University, for contributing this guest blog post.]
While many may think of the scholarly record as the products surrounding scholarly works that are eventually disseminated, usually through publications, there is another aspect to the scholarly record that people at academic institutions – especially administrators – care about. This can be thought of as the campus scholarly record that frames the identity of an institution. In considering this perspective, there is an even more compelling reason to consider how the many activities surrounding scholarly dissemination are captured and managed. The libraries at academic institutions are arguably the obvious leaders to assume responsibility for managing these resources; libraries have been the stewards of the scholarly record for a very long time. But librarians must now recognize the changing nature of the elements of that record and take a proactive role in its capture and preservation. Moreover, they have a responsibility to the many campus stakeholders who have an interest in these resources for differing and sometimes conflicting purposes.
Research activities and early dissemination of findings have changed with the proliferation of social media and the Web. Scholars can exchange information via blog posts, twitter messages, Facebook posts and every other means of social media available, with feedback from colleagues helping to refine the final formal publication. The traditional methods of peer review are now being further enhanced through web-based prepublications and blogs where reviewers from anywhere can provide less formal feedback to authors. For an increasing number of scholars, social media is the new preprint. Data is posted and shared, comments are exchanged, methods are presented and questioned, revisions happen and the process can continue, even after the “formal” publication has been released in a more traditional form. This requires librarians to think about how they’re preserving their websites and social media outputs that now need to be part of the scholarly record as well as the overall campus record of scholarship.
The campus is full of stakeholders who have an interest in this new, constantly evolving record. Some would like all of this information fully exposed to publicize the work being done, while others feel that there are limits to how much should be made available for everyone to view. Systems such as VIVO and Elements provide platforms that will highlight faculty activities to provide more visibility into the research activities on campus. Sponsored research offices want insights into what people are doing so that they can match research opportunities with relevant researchers and help with identifying partners at other institutions. Media relations staff want to identify experts as media inquiries come in related to current issues happening in the world. Academic departments are interested in showcasing the scholarly record of their faculty in order to attract more graduate students and new faculty to their departments. Promotion and tenure committees want a full understanding of all of the activities of faculty members, including their service activities; increasingly, social media is blurring the line between scholarship and service as one feeds into the other.
Faculty members, the source of creating these resources, are understandably confused. Their attitudes and perceptions range from excited to worried, from protective to open. Their activities on social media do not always relate cleanly to a single scholarly record and will often be mixed with personal, non-scholarly information they may not want the world to see (e.g. pictures of their dinner, political commentaries, stories of their family vacation). This mixed landscape helps to fuel the legal concerns of an institution’s general counsel and the image consciousness of the public relations folks who are cautious about what might end up in the public with the exercising of academic freedom.
Circling back, now, to the library as the logical keeper of the academic record, it is important to realize that there is a vast range of stakeholders that the records serve. These stakeholders become partners with the library in helping to determine what information will be kept, what will be exposed and what needs to remain in restricted access. Partnerships with campus IT units that manage security and authoritative feeds from enterprise systems are critical. Sometimes some stakeholders will ask that exposed information be “redacted” from its online availability and librarians must be able to intelligently communicate the limits of successfully removing this from the world wide web.
The change in the scholarly record raises many questions and will continue to present challenges for libraries and academic institutions. As faculty change institutions, who will be responsible for managing their record of scholarship that is disseminated through social media so that it is preserved long-term? Constantly changing methods for communicating and sharing knowledge will require a scholarly record that can readily accommodate innovations. What will the scholarly record of the future be and what should be captured? While we don’t have a crystal ball to help with this prediction, we do have a good barometer surrounding us in our libraries everyday: study your students and how they communicate.About Merrilee ProffittMail | Web | Twitter | Facebook | LinkedIn | More Posts (278)
In the two years since the launch of FundRef we have been helping participating publishers with their implementations and listening to their feedback. As is often the case with new services, we have found that some of our original assumptions now need tweaking, and so the FundRef Advisory Group (made up of representatives from a dozen or so publishers and funding agencies) has been discussing the next phase of FundRef. I'd like to share some of our findings and proposals for improving the service.
When CrossRef launched FundRef, the FundRef Registry - the openly available taxonomy of standardised funder names that is central to the project - contained around 4,000 international funders. In the past 24 months this has doubled to over 8,000, thanks to input from funders and publishers and the ongoing work of the team at Elsevier who curate and update the list. There are over 170,000 articles with a properly identified funders. Unfortunately, there are also over 400,000 articles with a funder name that hasn't been match to the Registry and doesn't have a Funder Identifier. While a number of publishers are routinely supplying Funder IDs in all of their deposits, some are only managing to supply Funder IDs in as little as 30% of cases. Funder IDs are critical to FundRef - they allow us to collate and display the data accurately. Analysis shows that the deposits we are receiving without IDs fall into roughly three categories:
- Funder names that are in the Registry but have not been matched to an ID
- Entries into the funder name field that are clearly grant numbers, program names, or free-form text that has been entered or extracted incorrectly
- Funders that are not yet listed in the Registry.
At the outset we expected most of the deposits with no IDs to be a result of the third of these use cases. What we are finding, however, is that the vast majority are a result of the first two. Delving into this a little more and talking to publishers about their processes and experiences, we have identified the following reasons:
- Where authors are asked to input funding data on submission or acceptance of their paper, the margin for error appears to be quite high. They are not used to being asked for this data, and so very clear instructions are needed to stress its importance and ensure that they understand what it is they are being asked for. Authors should be strongly encouraged to pick their funding sources from the list in the FundRef Registry, but presenting a list of 8000+ funder names in a navigable, straightforward way is not without its challenges. Back in 2013 CrossRef worked with UI experts to develop a widget that publishers and their vendor partners could use - either outright or as a guideline - for collecting data from authors. Two years down the line we are reviewing this UI to see how we can further encourage authors to select the canonical funder name and only enter a free-form funder if it is genuinely missing from the Registry. Even with the most intuitive of interfaces, however, some authors will copy and paste an alternative name, or enter a program name instead of a funding body. Editorial and production staff should be aware of FundRef requirements and incorporate this metadata into their routine reviews.
- Some publishers have opted to extract funding data from manuscripts instead of asking authors to supply it in a form. This is perfectly acceptable - after all, the information is usually right there in the paper's acknowledgements. However, this process also needs to be accompanied by a certain amount of QA. We are seeing instances of grant numbers being extracted instead of funders, funder names that are concatenated into a single field, and funder names that are 100% accurate but have simply not been matched with IDs ahead of deposit. (In the CrossRef database we currently have 16,989 FundRef deposits that contain the name "National Natural Science Foundation of China" but have no accompanying ID. These are clearly slipping through the QA net.)
So what are we going to do to try and improve things?
Firstly, we are undergoing a review of our own UI and talking with vendors about changes that might encourage better data input by authors. We are also going to find out more about what processes are being undertaken by the publishers that are depositing consistently accurate data, and share these with the publishing community as a set of best practices. Whether publishers are asking authors or extracting the data from manuscripts, an element of QA seems to be critical to ensure the integrity of the data being deposited.
Secondly, we are going to start on some data "tidying" tasks at our end. Traditionally, CrossRef has not altered or corrected any of the data that publishers deposit: we provide error reports and ask that they make the corrections themselves. But with FundRef there seem to be a few quick wins - those 16,000 instances of the Natural National Science Foundation of China could easily and without ambiguity be matched to the correct FundRef ID (http://dx.doi.org/10.13039/501100001809), along with other names that have some very obvious minor discrepancies ("&" in place of "and", "US" instead of "U.S."). Cleaning up these deposits and adding the Funder IDs should result in a significant increase in the amount of FundRef data that is retrievable through FundRef Search and our Search API (and by extension, CHORUS Search).
We are also asking publishers to continue to review their own processes and procedures to see where improvements can be made, as the success of FundRef ultimately depends on the quality of the data that is fed into it.
The LITA Blog features original content by LITA members on technologies and trends relevant to librarians. The writers represent a variety of perspectives, from library students to public, academic, and special librarians.
The blog also delivers announcements about LITA programming, conferences, and other events, and serves as a place for LITA committees to share information back with the community if they so choose.
Sharing on the LITA blog ensures a broad audience for your content. Four recent LITA blog posts (authored by Brianna Marshall, Michael Rodriguez, Bryan Brown, and John Klima) have been picked up by American Libraries Direct – and most posts have been viewed hundreds of times and shared dozens of times on social media. John Klima’s post on 3D printers has been shared 40 times from the LITA Twitter account and another 40 times directly from the blog (a cumulative record), Bryan Brown’s post on MOOCs has been viewed over 800 times (also a record as of this writing), and Michael Rodriguez’s post on web accessibility was shared over 60 times direct from the blog (another record).
Anyone can write a guest post for the LITA Blog, even non-LITA members, as long as the topic is relevant. Would you like to write a guest post or share posts reflecting the interests of your committee or interest group? Contact blog editor Brianna Marshall at briannahmarshall(at)gmail(dot)com or Mark Beatty at mbeatty(at)ala(dot)org.