In September the Library held its annual Designing Storage Architectures for Digital Collections meeting. The meeting brings together technical experts from the computer storage industry with decision-makers from a wide range of organizations with digital preservation requirements to explore the issues and opportunities around the storage of digital information for the long-term. I always learn quite a bit during the meeting and more often than not encounter terms and phrases that I’m not familiar with.
One I found particularly interesting this time around was the term “anti-entropy.” I’ve been familiar with the term “entropy” for a while, but I’d never heard “anti-entropy.” One definition of “entropy” is a “gradual decline into disorder.” So is “anti-entropy” a “gradual coming-together into order?” Turns out that the term has a long history in information science and is important to get an understanding of some very important digital preservation processes regarding file storage, file repair and fixity checking.
The “entropy” we’re talking about when we talk about “anti-entropy” might also be called “Shannon Entropy” after the legendary information scientist Claude Shannon. His ideas on entropy were elucidated in a 1948 paper called “A Mathematical Theory of Communication” (PDF), developed while he worked at Bell Labs. For Shannon, entropy was the measure of the unpredictability of information content. He wasn’t necessarily thinking about information in the same way that digital archivists think about information as bits, but the idea of the unpredictability of information content has great applicability to digital preservation work.
“Anti-entropy” represents the idea of the “noise” that begins to slip into information processes over time. It made sense that computer science would co-opt the term, and in that context “anti-entropy” has come to mean “comparing all the replicas of each piece of data that exist (or are supposed to) and updating each replica to the newest version.” In other words, what information scientists call “bit flips” or “bit rot” are examples of entropy in digital information files, and anti-entropy protocols (a subtype of “gossip” protocols) use methods to ensure that files are maintained in their desired state. This is an important concept to grasp when designing digital preservation systems that take advantage of multiple copies to ensure long-term preservability, LOCKSS being the most obvious example of this.
Anti-entropy and gossip protocols are the means to ensure the automated management of digital content that can take some of the human overhead out of the picture. Digital preservation systems invoke some form of content monitoring in order to do their job. Humans could do this monitoring, but as digital repositories scale up massively, the idea that humans can effectively monitor the digital information under their control with something approaching comprehensiveness is a fantasy. Thus, we’ve got to be able to invoke anti-entropy and gossip protocols to manage the data.
An excellent introduction to how gossip protocols work can be found in the paper “GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems.” The authors note three key parameters to gossip protocols: monitoring, failure detection and consensus. Not coincidentally, LOCKSS “consists of a large number of independent, low-cost, persistent Web caches that cooperate to detect and repair damage to their content by voting in “opinion polls” (PDF). In other words, gossip and anti-entropy.
I’ve only just encountered these terms, but they’ve been around for a long while. David Rosenthal, the chief scientist of LOCKSS, has been thinking about digital preservation storage and sustainability for a long time and he has given a number of presentations at the LC storage meetings and the summer digital preservation meetings.
LOCKSS is the most prominent example in the digital preservation community on the exploitation of gossip protocols, but these protocols are widely used in distributed computing. If you really want to dive deep into the technology that underpins some of these systems, start reading about distributed hash tables, consistent hashing, versioning, vector clocks and quorum in addition to anti-entropy-based recovery. Good luck!
One of the more hilarious anti-entropy analogies was recently supplied by the Register, which suggested that a new tool that supports gossip protocols “acts like [a] depressed teenager to assure data reliability” and “constantly interrogates itself to make sure data is ok.”
You learn something new every day.
This issue of The Web for Libraries was mailed Wednesday, October 29th, 2014. Want to get the latest from the cutting-edge web made practical for libraries and higher ed every Wednesday? You can subscribe here!The UX Bandwagon
Is it a bad thing? Throw a stone and you’ll hit a user experience talk at a library conference (or even a whole library conference). There are books, courses, papers, more books, librarians who understand the phrase “critical rendering path,” this newsletter, this podcast, interest groups, and so on.
It is the best fad that could happen for library perception. The core concept behind capital-u Usability is continuous data-driven decision making that invests in the library’s ability to iterate upon itself. Usability testing that stops is usability testing done wrong. What’s more, libraries concerned with UX are thus concerned about measurable outward perception – marketing–which libraries used to suck at–that can neither be haphazard nor half-assed. This bandwagon values experimentation, permits change, and increases the opportunities to create delight.Latest Podcast: A High-Functioning Research Site with Sean Hannan
Sean Hannan talks about designing a high functioning research site for the John Hopkins Sheridan Libraries and University Museums. It’s a crazy fast API-driven research dashboard mashing up research databases, LibGuides, and a magic, otherworldly carousel actually increasing engagement. Research tools are so incredibly difficult to build well, especially when libraries rely so heavily on third parties, that I’m glad to have taken the opportunity to pick Sean’s brain. You can catch this and every episode on Stitcher, iTunes, or on the Web.Top 5 Problems with Library Websites – a Review of Recent Usability Studies
Emily Singley looked at 16 library website usability studies over the past two years and broke down the biggest complaints. Can you guess what they are?“Is the semantic web still a thing?”
Jonathan Rochkind sez: “The entire comment, and, really the entire thread, are worth a read. There seems to be a lot of energy in libraryland behind trying to produce “linked data”, and I think it’s important to pay attention to what’s going on in the larger world here.
Especially because much of the stated motivation for library “linked data” seems to have been: “Because that’s where non-library information management technology is headed, and for once let’s do what everyone else is doing and not create our own library-specific standards.” It turns out that may or may not be the case ….”
Let’s draw a line. There are libraries that blah-blah “take content seriously” enough in that they pair down the content patrons don’t care about, ensure that hours and suchlike are findable, that their #libweb is ultimately usable. Then there are libraries that dive head-first into content creation. They podcast, make lists, write blogs, etc. For the latter, the library without a content strategy is going to be a mess, and I think these suggestions by James Deer on Smashing Magazine are really helpful.New findings: For top ecommerce sites, mobile web performance is wildly inconsistent
I’m working on a new talk and maybe even a #bigproject about treating library web services and apps as e-commerce – because, think about it, what a library website does and what a web-store wants you to do isn’t too dissimilar. That said, I think we need to pay a lot of attention to stats that come out of e-commerce. Every year, Radware studies the mobile performance of the top 100 ecommerce sites to see how they measure up to user expectations. Here’s the latest report.
These are a few gems I think particularly important to us:
- 1 out of 4 people worldwide own a smartphone
- On mobile, 40% will abandon a page that takes longer than 3 seconds to load
- Slow pages are the number one issue that mobile users complain about. 38% of smartphone users have screamed at, cursed at, or thrown their phones when pages take too long to load.
- The median page is 19% larger than it was one year ago
There is also a lot of ink dedicated to sites that serve m-dot versions to mobile users, mostly making the point that this is ultimately dissatisfying and, moreover, tablet users definitely don’t want that m-dot site.
I recently circulated a petition to start a new interest group within LITA, to be called the Patron Privacy Technologies IG. I’ve submitted the formation petition to the LITA Council, and a vote on the petition is scheduled for early November. I also held an organizational meeting with the co-chairs; I’m really looking forward to what we all can do to help improve how our tools protect patron privacy.
But enough about the IG, let’s talk about the petition! To be specific, let’s talk about when the signatures came in.
I’ve been on Twitter since March of 2009, but a few months ago I made the decision to become much more active there (you see, there was a dearth of cat pictures on Twitter, and I felt it my duty to help do something about it). My first thought was to tweet the link to a Google Form I created for the petition. I did so at 7:20 a.m. Pacific Time on 15 October:
— Galen Charlton (@gmcharlt) October 15, 2014
Also, if you are interested in being co-chair of the LITA Patron Privacy Tech IG, please indicate on the petition or drop me a line.
— Galen Charlton (@gmcharlt) October 15, 2014
Since I wanted to gauge whether there was interest beyond just LITA members, I also posted about the petition on the ALA Think Tank Facebook group at 7:50 a.m. on the 15th.
By the following morning, I had 13 responses: 7 from LITA members, and 6 from non-LITA members. An interest group petition requires 10 signatures from LITA members, so at 8:15 on the 16th, I sent another tweet, which got retweeted by LITA:
— Galen Charlton (@gmcharlt) October 16, 2014
By early afternoon, that had gotten me one more signature. I was feeling a bit impatient, so at 2:28 p.m. on the 16th, I sent a message to the LITA-L mailing list.
That opened the floodgates: 10 more signatures from LITA members arrived by the end of the day, and 10 more came in on the 17th. All told, a total of 42 responses to the form were submitted between the 15th and the 23rd.
The petition didn’t ask how the responder found it, but if I make the assumption that most respondents filled out the form shortly after they first heard about it, I arrive at my bit of anecdata: over half of the petition responses were inspired by my post to LITA-L, suggesting that the mailing list remains an effective way of getting the attention of many LITA members.
By the way, the petition form is still up for folks to use if they want to be automatically subscribed to the IG’s mailing list when it gets created.
DuraSpace News: The Society of Motion Picture and Television Engineers (SMPTE) Archival Technology Medal Awarded to Neil Beagrie
From William Kilbride, Digital Preservation Coalition
Heslington, York At a ceremony in Hollywood on October 23, 2014, the Society of Motion Picture and Television Engineers® (SMPTE®) awarded the 2014 SMPTE Archival Technology Medal to Neil Beagrie in recognition of his long-term contributions to the research and implementation of strategies and solutions for digital preservation.
ALA and the Association of Research Libraries (ARL) renewed their opposition to a petition filed by the Coalition of E-book Manufacturers seeking a waiver from complying with disability legislation and regulation (specifically Sections 716 and 717 of the Communications Act as Enacted by the Twenty-First Century Communications and Video Accessibility Act of 2010). Amazon, Kobo, and Sony are the members of the coalition, and they argue that they do not have to make their e-readers’ Advanced Communications Services (ACS) accessible to people with print disabilities.
Why? The coalition argues that because basic e-readers (Kindle, Sony Reader, Kobo E-Reader) are primarily used for reading and have only rudimentary ACS, they should be exempt from CVAA accessibility rules. People with disabilities can buy other more expensive e-readers and download apps in order to access content. To ask the Coalition to modify their basic e-readers is a regulatory burden, will raise consumer prices, will ruin the streamlined look of basic e-readers, and inhibit innovation (I suppose for other companies and start-ups that want to make even more advanced inaccessible readers).
We believe denying the Coalition’s petition will not only increase access to ACS, but also increase access to more e-content for more people. As we note in our FCC comments: “Under the current e-reader ACS regime proposed by the Coalition and tentatively adopted by the Commission, disabled persons must pay a ‘device access tax.’ By availing oneself of one of the ‘accessible options’ as suggested by the Coalition, a disabled person would pay at minimum $20 more a device for a Kindle tablet that is heavier and has less battery life than a basic Kindle e-reader.” Surely it is right that everyone ought to be able to buy and use basic e-readers just like everybody has the right to drink from the same water fountain.
This decision will rest on the narrowly question of whether or not ACS is offered, marketed and used as a co-primary purpose in these basic e-readers. We believe the answer to that question is “yes,” and we will continue our advocacy to support more accessible devices for all readers.
Recently, we've joined forces to submit a proposal to the Knight Foundation's News Challenge, whose theme is "How might we leverage libraries as a platform to build more knowledgeable communities? ". Here are some excerpts:
Abstract Project Gutenberg (PG) offers 45,000 public domain ebooks, yet few libraries use this collection to serve their communities. Text quality varies greatly, metadata is all over the map, and it's difficult for users to contribute improvements. We propose to use workflow and software tools developed and proven for open source software development- GitHub- to open up the PG corpus to maintenance and use by libraries and librarians. The result- GITenberg- will include MARC records, covers, OPDS feeds and ebook files to facilitate library use. Version-controlled fork and merge workflow, combined with a change triggered back-end build environment will allow scaleable, distributed maintenance of the greatest works of our literary heritage. Description Libraries need metadata records in MARC format, but in addition they need to be able to select from the corpus those works which are most relevant to their communities. They need covers to integrate the records with their catalogs, and they need a level of quality assurance so as not to disappoint patrons. Because this sort of metadata is not readily available, most libraries do not include PG records in their catalogs, resulting in unnecessary disappointment when, for example, a patron want to read Moby Dick from the library on their Kindle. Progress 43,000 books and their metadata have been moved to the git version control software, this will enable librarians to collaboratively edit and control the metadata. The GITenberg website, mailing list and software repository has been launched at https://gitenberg.github.io/ . Software for generating MARC records and OPDS feeds have already been written.Background Modern software development teams use version control, continuous integration, and workflow management systems to coordinate their work. When applied to open-source software, these tools allow diverse teams from around the world to collaboratively maintain even the most sprawling projects. Anyone wanting to fix a bug or make a change first forks the software repository, makes the change, and then makes a "pull request". A best practice is to submit the pull request with a test case verifying the bug fix. A developer charged with maintaining the repository can then review the pull request and accept or reject the change. Often, there is discussion asking for clarification. Occasionally versions remain forked and diverge from each other. GitHub has become the most popular sites for this type software repository because of its well developed workflow tools and integration hooks. The leaders of this team recognized the possibility to use GitHub for the maintenance of ebooks, and we began the process of migrating the most important corpus of public domain ebooks, Project Gutenberg, onto GitHub, thus the name GITenberg. Project Gutenberg has grown over the years to 50,000 ebooks, audiobooks, and related media, including all the most important public domain works of English language literature. Despite the great value of this collection, few libraries have made good use of this resource to serve their communities. There are a number of reasons why. The quality of the ebooks and the metadata around the ebooks is quite varied. MARC records, which libraries use to feed their catalog systems, are available for only a subset of the PG collection. Cover images and other catalog enrichment assets are not part of PG. To make the entire PG corpus available via local libraries, massive collaboration amoung librarians and ebook develeopers is essential. We propose to build integration tools around github that will enable this sort of collaboration to occur.
- Although the PG corpus has been loaded into GITenberg, we need to build a backend that automatically converts the version-controlled source text into well-structured ebooks. We expect to define a flavor of MarkDown or Asciidoc which will enable this automatic, change-triggered building of ebook files (EPUB, MOBI, PDF). (MarkDown is a human-readable plain text format used on GitHub for documentation; MarkDown for ebooks is being developed independently by several team of developers. Asciidoc is a similar format that works nicely for ebooks.)
- Similarly, we will need to build a parallel backend server that will produce MARC and XML formatted records from version-controlled plain-text metadata files.
- We will generate covers for the ebooks using a tool recently developed by NYPL and include them in the repository.
- We will build a selection tool to help libraries select the records best suited to their libraries.
- Using a set of "cleaned up" MARC records from NYPL, and adding custom cataloguing, we will seed the metadata collection with ~1000 high quality metadata records.
- We will provide a browsable OPDS feed for use in tablet and smartphone ebook readers.
- We expect that the toolchain we develop will be reusable for creation and maintenance of a new generation of freely licensed ebooks.
The rest of the proposal is on the Knight News Challenge website. If you like the idea of GITenberg, you can "applaud" it there. The "applause' is not used in the judging of the proposals, but it makes us feel good. There are lots of other interesting and inspiring proposals to check out and applaud, so go take a look!
DPLA: Building the newest DPLA student exhibition, “From Colonialism to Tourism: Maps in American Culture”
Two groups of MLIS students from the University of Washington’s Information School took part in a DPLA pilot called the Digital Curation Program during the 2013-2014 academic year. The DPLA’s Amy Rudersdorf worked with iSchool faculty member Helene Williams as we created exhibits for the DPLA for the culminating project, or Capstone, in our degree program. The result is the newest addition to DPLA’s exhibitions, called “From Colonialism to Tourism: Maps in American Culture.”
My group included Kili Bergau, Jessica Blanchard, and Emily Felt; we began by choosing a common interest from the list of available topics, and became “Team Cartography.” This project taught us about online exhibit creation and curation of digital objects, copyright and licensing, and took place over two quarters. The first quarter was devoted to creating a project plan and learning about the subject matter. We asked questions including: What is Cartography? What is the history of American maps? How are they represented within the DPLA collections?
As we explored the topic, the project became less about librarianship and more about our life as historians. Cartography, or the creation of maps, slowly transformed into the cultural “maps in history” as we worked through the DPLA’s immense body of aggregated images. While segmenting history and reading articles to learn about the pioneers, the Oregon Trail, the Civil War, and the 20th Century, we also learned about the innards of the DPLA’s curation process. We learned how to use Omeka, the platform for creating the exhibitions, and completed forms for acquiring usage rights the images we would use in our exhibit.
One of the greatest benefits of working with the team was the opportunity to investigate niche areas among the broad topics, as well as leverage each other’s interests to create one big fascinating project. With limited time, we soon had to focus on selecting images and writing the exhibit narrative. We wrote, and revised, and wrote again. We waded through hundreds of images to determine which were the most appropriate, and then gathered appropriate metadata to meet the project requirements.
Our deadline for the exhibit submission was the end of the quarter, and our group was ecstatic to hear the night of the Capstone showcase at the UW iSchool event that the DPLA had chosen our exhibit for publication. Overjoyed, we celebrated remotely, together. Two of us had been in Seattle, one in Maine, and I had been off in a Dengue Fever haze in rural Cambodia (I’m better now).Shortly after graduation in early June, Helene asked if I was interested in contributing further to this project: over the summer, I worked with DPLA staff to refine the exhibit and prepare it for public release. Through rigorous editing, some spinning of various themes in new directions, and a wild series of conversations over Google Hangouts about maps, maps, barbecue, maps, libraries, maps, television, movies, and more maps, the three of us had taken the exhibition to its final state.
Most experiences in higher education, be they on the undergrad or graduate levels (sans PhD), fail to capture a sense of endurance and longevity. The exhibition was powerful and successful throughout the process from many different angles. For me, watching its transformation from concept to public release has been marvelous, and has prepared me for what I hope are ambitious library projects in my future.
A huge thanks to Amy Rudersdorf for coordinating the program, Franky Abbott for her work editing and refining the exhibition, Kenny Whitebloom for Omeka wrangling, and the many Hubs and their partners for sharing their resources.
All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Following last week’s Open Access Week blog series, we continue our celebration of community efforts in this field. Today we give the microphone to Dr. Salua Nassabay from Open Knowledge Ireland in a great account from Ireland, originally posted on the Open Knowledge Ireland blog.
In Ireland, awareness of OA has increased within the research community nationally, particularly since institutional repositories have been built in each Irish university. Advocacy programmes and funder mandates (IRCSET, SFI, HEA) have had a positive effect; but there is still some way to go before the majority of Irish researchers will automatically deposit their papers in their local OA repository.Brief Story
In summer 2004, the Irish Research eLibrary (IReL) was launched, giving online access to a wide range of key research journals. The National Principles on Open Access Policy Statement were launched on Oct 23rd 2012 at the Digital Repository of Ireland Conference by Sean Sherlock, Minister of State, Department of Enterprise, Jobs & Innovation and Department of Education & Skills with responsibility for Research & Innovation. The policy consists of a ‘Green way’ mandate and encouragement to publish in ’Gold’ OA journals. It aligns with the European policy for Horizon 2020. OA on national level is managed by the National Steering Committee on OA Policy, see table 3.
A Committee of Irish research organisations is working in partnership to coordinate activities and to combine expertise at a national level to promote unrestricted, online access to outputs which result from research that is wholly or partially funded by the State:
National Principles on Open Access Policy Statement
Definition of OA
Reaffirm: freedom of researchers; increase visibility and access; support international interoperability, link to teaching and learning, and open innovation.
Defining Research Outputs:
“include peer-reviewed publications, research data and other research artefacts which
feed the research process”.
General Principle (1): all researchers to have deposit rights for an AO repository.
Deposit: post-print/publisher version and metadata; peer-reviewed journal articles and
conference publication. Others where possible; at time of acceptance for publication; in
compliance with national metadata standards.
General Principle (2):Release: immediate for meta-data; respect publisher copyright, licensing and embargo (not
normally exceeding 6months/12months).
Green route policy – not exclusive
Research data linked to publications.
Infrastructure and sustainability: depositing once, harvesting, interoperability and long-term preservation.
Advocacy and coordination: mechanisms for and monitoring of implementation, awareness raising and engagement for ALL.
Exploiting OA and implementation: preparing metadata and national value-added metrics.
Table 1. National Principles on Open Access Policy Statement. https://www.dcu.ie/sites/default/files/communications/pdfs/PatriciaClarke2014.pdf and http://openaccess.thehealthwell.info/sites/default/files/documents/NationalPrinciplesonOAPolicyStatement.pdf
There are seven universities in Ireland http://www.hea.ie/en/about-hea). These Irish universities received government funding to build institutional repositories in each Irish university and to develop a federated harvesting and discovery service via a national portal. It is intended that this collaboration will be expanded to embrace all Irish research institutions in the future. OA repositories are currently available in all Irish universities and in a number of other higher education institutions and government agencies:
Dublin Business School; Dublin City University; Dublin Institute of Technology; Dundalk Institte of Technology; Mary Immaculate College; National University of Ireland Galway; National University of Ireland, Maynooth; Royal College of Surgeons in Ireland; Trinity College Dublin; University College Cork; University College Dublin, University of Limerick; Waterford Intitute of Technology
Table 2. Currently available repositories in Ireland
AO Ireland’s statistics show more than 58,859 OA publications in 13 repositories, distributed as can be seen in the figures 1 and 2.
Figure 1. Publications in repositories.From rian.ie (date: 16/9/2014). http://rian.ie/en/stats/overview
Some samples of Irish OA journals are:
- Crossings: Electronic Journal of Art and Technology: http://crossings.tcd.ie;
-Economic and Social Review: http://www.esr.ie;
-Journal of the Society for Musicology in Ireland: http://www.music.ucc.ie/jsmi/index.php/jsmi;
-Journal of the Statistical and Social Inquiry Society of Ireland: http://www.ssisi.ie;
-Minerva: an Internet Journal of Philosophy: http://www.minerva.mic.ul.ie//;
-The Surgeon: Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland: http://www.researchgate.net/journal/1479-666X_The_surgeon_journal_of_the_Royal_Colleges_of_Surgeons_of_Edinburgh_and_Ireland;
-Irish Journal of Psychological Medicine: http://www.ijpm.ie/1fmul3lci60?a=1&p=24612705&t=21297075.
Figure 2. Publications by document type. From rian.ie (date: 16/9/2014). http://rian.ie/en/stats/overview
Institutional OA policies:
Health Research Board (HRB) - Funders
Science Foundation Ireland (SFI) – Funders
Higher Education Authority (HEA) – Funders
Department of Agriculture, Food and Marine (DAFM) – Funders
Yes effective 2013
Environmental Protection Agency (EPA) – Funders
Marine Institute (MI) – Funders
Irish Research Council (IRC) – Funders
Teagasc – Funders
Institute of Public Health in Ireland (IPH) – Funders
Irish Universities Association (IUA) – Researchers
Representative body for Ireland’s seven universities:
Yes effective 2010
Health Service Executive (HSE) – Researchers
Yes effective 2013
Institutes of Technology Ireland (IOTI) – Researchers
Dublin Institute of Technology (DIT) – Researchers
Royal College of Surgeons in Ireland (RCSI) – Researchers
Consortium of National and University Libraries (CONUL) – Library and Repository
IUA Librarians’ Group (IUALG) - Library and Repository
Digital Repository of Ireland (DRI) - Library and Repository
Webside and Repository: http://www.dri.ie
DRI Position Statement on Open Access for Data: http://dri.ie/sites/default/files/files/dri-position-statement-on-open-access-for-data-2014.pdf
EdepositIreland - Library and Repository
*IRC: Some exceptions like books. See policy.
*Teagasc: Material in the repository is licensed under the Creative Commons Attribution-NonCommercial Share-Alike License
*DIT: Material that is to be commercialised, or which can be regarded as confidential, or the publication of which would infringe a legal commitment of the Institute and/or the author, is exempt from inclusion in the repository.
*RCSI: Material in the repository is licensed under the Creative Commons Attribution-NonCommercial Share-Alike License
Table 3. Institutional OA Policies in Ireland
Funder OA policies:
Major research funders in Ireland
Department of Agriculture, Fisheries and Food: http://www.agriculture.gov.ie/media/migration/research/DAFMOpenAccessPolicy.pdf
IRCHSS (Irish Research Council for Humanities and Social Sciences): No Open Access policies as yet.
Enterprise Ireland: No Open Access policies as yet.
IRCSET (Irish Research Council for Science, Engineering and Technology): OA Mandate from May 1st 2008:http://roarmap.eprints.org/63/
HEA (Higher Education Authority): OA Mandate from June 30th 2009: http://roarmap.eprints.org/95/
Marine Institute: No Open Access policies as yet
HRB (Health Research Board): OA Recommendations, Policy: http://roarmap.eprints.org/76/
SFI (Science Foundation Ireland): OA Mandate from February 1st 2009: http://roarmap.eprints.org/115/
Table 4. Open Access funders in Ireland.
Figure 3. Public sources of funds for Open Access. From rian.ie (date: 16/9/2014), http://rian.ie/en/stats/overview
Infrastructural support for OA:
Open Access organisations and groups
Open Access projects and initiatives. The Open Access to Irish Research Project. Associated National Initiatives
RIAN Steering Group. IUA (Irish Universities Association) Librarian’s Group (Coordinating body). RIAN is the outcome of a project to build online open access to institutional repositories in all seven Irish universities and to harvest their content to the national portal.
NDLR (National Digital Learning Repository):http://www.ndlr.ie
National Steering Group on Open Access Policy. See Table 3
RISE Group (Research Information Systems Exchange)
Irish Open Access Repositories Support Project Working Group. ReSupIE: http://www.irel-open.ie/moodle/
Repository Network Ireland is a newly formed group of Repository managers, librarians and information: http://rni.wikispaces.com
Digital Repository Ireland DRI is a trusted national repository for Ireland’s humanities and social sciences data @dri_ireland
Table 5. Open Access infrastructural support.
Challenges and ongoing developments
Ireland already has considerable expertise in developing Open Access to publicly funded research, aligned with international policies and initiatives, and is now seeking to strengthen its approach to support international developments on Open Access led by the European Commission, Science Europe and other international agencies.
The greatest challenge is the increasing pressure faced by publishers in a fast-changing environment.
The launch of Ireland’s national Open Access policy has put Ireland ahead of many European partners. Irish research organisations are particularly successful in the following areas of research: Information and Communication Technologies, Health and Food, Agriculture, and Biotechnology.
- Repository Network Ireland / http://rni.wikispaces.com
-Open Access Scholarly Publishers / http://oaspa.org/blog/
- OpenDoar – Directory of Repositories / http://www.opendoar.org
- OpenAire – Open Access Infrastructure for research in Europe / https://www.openaire.eu
- Repositories Support Ireland / http://www.resupie.ie/moodle/
-UCD Library News / http://ucdoa.blogspot.ie
- Trinity’s Open Access News / http://trinity-openaccess.blogspot.ie
- RIAN / http://rian.ie/en/stats/overview
Contact person: Dr. Salua Nassabay firstname.lastname@example.org
https://www.openknowledge.ie; twitter: @OKFirl
The following is a guest post by Barrie Howard, IT Project Manager at the Library of Congress.
Last month the Digital Preservation Outreach and Education (DPOE) Program wrapped up the “2014 DPOE Training Needs Assessment Survey” in an effort to get a sense of the state of digital preservation practice and understand more about what capacity exists for organizations and professionals to effectively preserve digital content. This survey is a follow up to a similar survey that was conducted in 2010, and mentioned in a previous blog post.
The 17-question survey was open for seven weeks to relevant organizations and received 436 responses, which is excellent considering summer vacation schedules and survey fatigue. The questions addressed issues like primary function (library, archive, museum, etc.), staff size and responsibilities, collection items, preferred training content and delivery options and financial support for professional development and training.
Response rates from libraries, archives, museums, and historical societies were similar in 2010 and 2014, with a notable increase this year in participation from state governments. There was good geographic coverage, including responses from organizations in 48 states, DC and Puerto Rico (see below), and none of the survey questions were skipped by any of the respondents.
The most significant takeaways are: 1) an overwhelming expression of concern that respondents ensure their digital content is accessible for 10 or more years (84%), and; 2) evidence of a strong commitment to support employee training opportunities (83%). Other important discoveries reveal changes in staff size and configuration over the last four years. There was a marked 6% decrease in staff size at smaller organizations ranging from 1-50 employees, and a slight 2% drop in staff size at large organizations with over 500 employees. In comparison, medium-size organizations reported a 4% uptick in the staff range of 51-200 and 3% for the 201-500 tier. There was a substantial 13% increase across all organizations in paid full-time or part-time professional staff with practitioner experience, and a 5% drop in organizations reporting no staff at all. These findings suggest positive trends across the digital preservation community, which bode well for the long-term preservation of our collective cultural heritage.
One survey question tackled the issue of what type of digital content is held by each institution. While reformatted material digitized from collections already held has the highest frequency across all respondents (83%), born-digital content created by and for your organization trails close behind (76.4%). Forty-five percent of all respondents reported that their institution had deposited digital materials managed for other individuals or institutions. These results reflect prevailing trends, and it will be interesting to see how things change between now and the next survey.
The main purpose of the survey was to collect data about the training needs of these organizations, and half a dozen questions were devoted to this task. Interestingly, while online training is trending across many sectors to meet the constraints of reduced travel budgets, the 2014 survey results find that respondents still value intimate, in-person workshops. In-person training often comes at a higher price than online, and the survey attempted to find out how much money an employee would receive annually for training. Not surprisingly, the majority (25%) of respondents didn’t know, and equally as important, another 24% reported a modest budget range of $0-$250.
When given the opportunity to be away from their place of employment, respondents preferred half or full-day training sessions over 2-3 days or week-long intensives. They showed a willingness to travel off-site up to a 100-mile radius of their places of work. There was a bias towards training on applicable skills, rather than introductory material on basic concepts, and respondents identified training investments that result in an increased capacity to work with digital objects and metadata management as the most beneficial outcome for their organization.
DPOE currently offers an in-person, train-the-trainer workshop, and is exploring options for extending the workshop curriculum to include online delivery options for the training modules. These advancements will address some of the issues raised in the survey, and may include regularly scheduled webinars, on-demand videos and pre- and post-workshop videos. The 2014 survey results will be released in a forthcoming report, which will be made available in November, so keep a watchful eye on the DPOE website and The Signal for the report and subsequent DPOE training materials as they become available.
Most of the time, if you see “Washington”, “November” & “$” in the same article, you are probably reading about Elections, Campaign Finance Reform, Super-PACs, Attack Ads, and maybe even Criminal Investigations.
This is not one of those articles.
Today I’m here to remind you that on November 13th, you can “Win, Win! Win!!!” big prizes if you have a tough Lucene/Solr question that manages to Stump The Chump!
- 1st Prize: $100 Amazon gift certificate
- 2nd Prize: $50 Amazon gift certificate
- 3rd Prize: $25 Amazon gift certificate
To enter: just email your tough question to our panel of judges via email@example.com any time until the day of the session. Even if you won’t be able to attend the conference in D.C., you can still participate — and maybe win a prize — by emailing in your tricky questions.
The post Economic Stimulus from Washington: Prizes for Stumping The Chump! appeared first on Lucidworks.
Winchester, MA DuraSpace’s open source projects—DSpace, Fedora, and VIVO—are officially launching the nominations phase of the Leadership Group elections to expand the community's role in setting strategic direction and priorities for each project.
New Project Governance
A step toward establishing an operational, long-term preservation system shared across the academy
A few years ago, it seemed as if everyone was talking about the semantic web as the next big thing. What happened? Are there still startups working in that space? Are people still interested?
Note that “linked data” is basically talking about the same technologies as “semantic web”, it’s sort of the new branding for “semantic web”, with some minor changes in focus.
The top-rated comment in the discussion says, in part:
A bit of background, I’ve been working in environments next to, and sometimes with, large scale Semantic Graph projects for much of my career — I usually try to avoid working near a semantic graph program due to my long histories of poor outcomes with them.
I’ve seen uncountably large chunks of money put into KM projects that go absolutely nowhere and I’ve come to understand and appreciate many of the foundational problems the field continues to suffer from. Despite a long period of time, progress in solving these fundamental problems seem hopelessly delayed.
The semantic web as originally proposed (Berners-Lee, Hendler, Lassila) is as dead as last year’s roadkill, though there are plenty out there that pretend that’s not the case. There’s still plenty of groups trying to revive the original idea, or like most things in the KM field, they’ve simply changed the definition to encompass something else that looks like it might work instead.
The reasons are complex but it basically boils down to: going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.
The entire comment, and, really the entire thread, are worth a read. There seems to be a lot of energy in libraryland behind trying to produce “linked data”, and I think it’s important to pay attention to what’s going on in the larger world here.
Especially because much of the stated motivation for library “linked data” seems to have been: “Because that’s where non-library information management technology is headed, and for once let’s do what everyone else is doing and not create our own library-specific standards.” It turns out that may or may not be the case, if your motivation for library linked data was “so we can be like everyone else,” that simply may not be an accurate motivation, everyone else doesn’t seem to be heading there in the way people hoped a few years ago.
On the other hand, some of the reasons that semantic web/linked data have not caught on are commercial and have to do with business models.
One of the reasons that whole thing died was that existing business models simply couldn’t be reworked to make it make sense. If I’m running an ad driven site about Cat Breeds, simply giving you all my information in an easy to parse machine readable form so your site on General Pet Breeds can exist and make money is not something I’m particularly inclined to do. You’ll notice now that even some of the most permissive sites are rate limited through their API and almost all require some kind of API key authentication scheme to even get access to the data.
It may be that libraries and other civic organizations, without business models predicated on competition, may be a better fit for implementation of semantic web technologies. And the sorts of data that libraries deal with (bibliographic and scholarly) may be better suited for semantic data as well compared to general commercial business data. It may be that at the moment libraries, cultural heritage, and civic organizations are the majority of entities exploring linked data.
Still, the coarsely stated conclusion of that top-rated HN comment is worth repeating:
going through all the effort of putting semantic markup with no guarantee of a payoff for yourself was a stupid idea.
Putting data into linked data form simply because we’ve been told that “everyone is doing it” without carefully understanding the use cases such reformatting is supposed to benefit and making sure that it does — risks undergoing great expense for no payoff. Especially when everyone is not in fact doing it.GIGO
Taking the same data you already have and reformatting as “linked data” does not neccesarily add much value. If it was poorly controlled, poorly modelled, or incomplete data before — it still is even in RDF. You can potentially add a lot more value and more additional uses of your data by improving the data quality than by working to reformat it as linked data/RDF. The idea that simply reformatting it as RDF would add significant value was predicated on the idea of an ecology of software and services built to use linked data, software and services exciting enough that making your data available to them would result in added value. That ecology has not really materialized, and it’s hardly clear that it will (and to the extent it does, it may only be if libraries and cultural heritage organizations create it; we are unlikely to get a free ride on more general tools from a wider community).But please do share your data
To be clear, I still highly advocate taking the data you do have and making it freely available under open (or public domain) license terms. In whatever formats you’ve already got it in. If your data is valuable, developers will find a way to use it, and simply making the data you’ve already got available is much less expensive than trying to reformat it as linked data. And you can find out if anyone is interested in it. If nobody’s interested in your data as it is — I think it’s unlikely the amount of interest will be significantly greater after you model it as ‘linked data’. The ecology simply hasn’t arisen to make using linked data any easier or more valuable than using anything else (in many contexts and cases, it’s more troublesome and challenging than less abstract formats, in fact).Following the bandwagon vs doing the work
Part of the problem is that modelling data is inherently a context-specific act. There is no universally applicable model — and I’m talking here about the ontological level of entities and relationships, what objects you represent in your data as distinct entities and how they are related. Whether you model it as RDF or just as custom XML, the way you model the world may or may not be useful or even usable by those in different contexts, domains and businesses. See “Schemas aren’t neutral” in the short essay by Cory Doctorow linked to from that HN comment. But some of the linked data promise is premised on the idea that your data will be both useful and integrate-able nearly universally with data from other contexts and domains.
These are not insoluble problems, they are interesting problems, and they are problems that libraries as professional information organizations rightly should be interested in working on. Semantic web/linked data technologies may very well play a role in the solutions (although it’s hardly clear that they are THE answer).
It’s great for libraries to be interested in working on these problems. But working on these problems means working on these problems, it means spending resources on investigation and R&D and staff with the right expertise and portfolio. It does not mean blindly following the linked data bandwagon because you (erroneously) believe it’s already been judged as the right way to go by people outside of (and with the implication ‘smarter than’) libraries. It has not been.
For individual linked data projects, it means being clear about what specific benefits they are supposed to bring to use cases you care about — short and long term — and what other outside dependencies may be necessary to make those benefits happen, and focusing on those too. It means understanding all your technical options and considering them in a cost/benefit/risk analysis, rather than automatically assuming RDF/semantic web/linked data and as much of it as possible.
It means being aware of the costs and the hoped for benefits, and making wise decisions about how best to allocate resources to maximize chances of success at those hoped for benefits. Blindly throwing resources into taking your same old data and sharing it as “linked data”, because you’ve heard it’s the thing to do, does not in fact help.
Filed under: General
From October 13 - 16, 2014, I had the opportunity to go to (and the priviledge to present at) Islandora Camp Colorado (http://islandora.ca/camps/co2014). These were four fairly intensive days, including a last day workshop looking to the future with Fedora Commons 4.x. We had a one day introduction to Islandora, a day of workshops, and a final day of community presentations on how Libraries (and companies that work with Libraries such as ours) are using Islandora. The future looks quite interesting for the relationship between Fedora Commons and Drupal.
- The new version of Islandora allows you to regenerate derivatives on the fly. You can specify which datastreams are derivatives of (what I am calling) parent datastreams. As a result, the new feature allows you to regenerate a derivative through the UI or possibly via Drush, which something the Colorado Alliance is working to have working with the ...
Last updated October 28, 2014. Created by Peter Murray on October 28, 2014.
Log in to edit this page.
Important note: this is not a required upgrade from 1.2.x. Only new users, those wanting to try out 14.04, or DuraCloud account holders need this release.
Today I found the following resources and bookmarked them on <a href=
- ZenHub.io ZenHub provides a project management solution to GitHub with customizable task boards, peer feedback, file uploads, and more.
- Thingful Thingful® is a search engine for the Internet of Things, providing a unique geographical index of connected objects around the world, including energy, radiation, weather, and air quality devices as well as seismographs, iBeacons, ships, aircraft and even animal trackers. Thingful’s powerful search capabilities enable people to find devices, datasets and realtime data sources by geolocation across many popular Internet of Things networks
- Zanran Numerical Data Search Zanran helps you to find ‘semi-structured’ data on the web. This is the numerical data that people have presented as graphs and tables and charts. For example, the data could be a graph in a PDF report, or a table in an Excel spreadsheet, or a barchart shown as an image in an HTML page. This huge amount of information can be difficult to find using conventional search engines, which are focused primarily on finding text rather than graphs, tables and bar charts.
- Gwittr Gwittr is a Twitter API based search website. It allows you to better search any Twitter account for older tweets, linked web pages and pictures.
- ThingLink Easily create interactive images and videos for your websites, infographics, photo galleries, presentations and more!
Digest powered by RSS Digest
- NFAIS: Innovation for Today’s Chemical Researchers
- How Search Works
- NFAIS: Making the Most of Published Literature
We asked our LITA Midwinter Workshop Presenters to tell us a little more about themselves and what to expect from their workshops in January. This week, we’re hearing from Wayne Johnston, who will be presenting the workshop:
Developing mobile apps to support field research
(For registration details, please see the bottom of this blog post)
LITA: Can you tell us a little more about you?
Wayne: I am currently Head of Research Enterprise and Scholarly Communication at the University of Guelph Library. Prior to joining the Library I worked for the United Nations in both New York and Geneva. My international experience includes work I’ve done in Ghana, Nepal, Croatia and Canada’s Arctic.
LITA: Who is your target audience for this workshop?
Wayne: I think this workshop will be most relevant to academic librarians who are supporting research activity on their campuses. It may be of particular interest to those working in research data management. Beyond that, anyone interested in mobile technology and/or open source software will find the workshop of interest.
LITA: How much experience with programming do attendees need to succeed in the workshop?
Wayne: None whatsoever. Some experience with examples of field research undertaken by faculty and/or graduate students would be useful.
LITA: If you were a character from the Marvel or Harry Potter universe, which would it be, and why?
Wayne: How about the Silver Surfer? By living vicariously through the field research I support I feel that I glide effortlessly to the far corners of the world.
LITA: Name one concrete thing your attendees will be able to take back to their libraries after participating in your workshop.
Wayne: You will be equipped to enable researchers on your campus to dispense with paper data collection and discover new efficiencies and data security by using mobile technology.
LITA: What kind of gadgets/software do your attendees need to bring?
Wayne: Nothing required but any mobile devices would be advantageous. If possible, have an app that enables you to read QR codes.
LITA: Respond to this scenario: You’re stuck on a desert island. A box washes ashore. As you pry off the lid and peer inside, you begin to dance and sing, totally euphoric. What’s in the box?
Wayne: A bottle of craft beer.http://alamw15.ala.org/ Registration start page: http://alamw15.ala.org/rates LITA Workshops registration descriptions: http://alamw15.ala.org/ticketed-events#LITA When you start the registration process and BEFORE you choose the workshop, you will encounter the Personal Information page. On that page there is a field to enter the discount promotional code: LITA2015 As in the example below. If you do so, then when you get to the workshops choosing page the discount prices, of $235, are automatically displayed and entered. The discounted total will be reflected in the Balance Due line on the payment page. Please contact the LITA Office if you have any registration questions.