The Code4Lib 2015 Program Committee is happy to announce that voting is now open for prepared talks.
To vote, visit http://vote.code4lib.org/election/33, review the proposals, and assign points to those presentations you would like to see on the program this year.
You will need to log in with your code4lib.org account in order to vote. If you have any issues with your account, please contact Ryan Wick at firstname.lastname@example.org.
Voting will end on Tuesday, November 25, 2014 at 11:59:59 PM PT (GMT-8).
The top 10 proposals are guaranteed a slot at the conference. The Program Committee will curate the remainder of the program in an effort to ensure diversity in program content and presenters. Community votes will still weigh heavily in these decisions.
The final list of presentations will be announced in early- to mid-December.
For more information about Code4Lib 2015, visit
France may not have any money left for its universities but it does have money for academic publishers.
While university presidents learn that their funding is to be reduced by EUR 400 million, the Ministry of Research has decided, under great secrecy, to pay EUR 172 million to the world leader in scientific publishing Elsevier .
In an exclusive piece published by the French news outlet Rue89 (Le Monde press group), Open Knowledge France members and open science evangelists Pierre-Carl Langlais and Rayna Stamboliyska released the agreement between the French Ministry and Elsevier. The post originally appeared here, in French.The Work of Volunteers
The scientific publishing market is an unusual sector, those who create value are never remunerated. Instead, they often pay to see their work published. Authors do not receive any direct financial gain from their articles, and the peer review is conducted voluntarily.
This enormous amount of work is indirectly funded by public money. Writing articles and participating in peer review are part of the expected activities of researchers, expected activities that lead to further research funding from the taxpayer.
Scientific publishing is centred around several privately-held publishing houses who own the journals where scientific research is published. Every journal has an editorial review board who receive potential contributions which are then sent to volunteer scientists for peer review. It is on the basis of comments and feedback from the peer review process that a decision is made whether an article is to be published or rejected and returned to the author(s).
When the article is accepted, the authors usually sign their copyright over to the publishers to sell access to the work, or can choose to make their work available to everyone, which oftentimes involves paying a given sum. In some cases journals only receive income for the service of publishing an article which is henceforth free to the consumer, but some journals have a mixed ‘hybrid’ selection so authors pay to publish some articles and their library still pays to purchase the rest of the journal. This is called ‘double dipping’ and while publishers claim they take it into account in their journal pricing, the secrecy around publisher contracts and lack of data means it is impossible to tell where money is flowing.
Huge Profit Margins
This is important because access to these journals is rarely cheap and publishers sell access primarily to academic libraries and research laboratories. In other words, financial resources for the publication of scientific papers come from credits granted to research laboratories; access to the journals these papers are published in is purchased by these same institutions. In both cases, these purchases are subsidies by the public.
The main actors in scientific publishing generate considerable income. In fact, the sector is dominated by an oligopoly with “the big four” sharing most of the global pie:
- The Dutch Elsevier
- The German Springer
- The American Wiley
- The English Informa
They draw huge profits: from 30% to 40% annual net profit in the case of Elsevier and Springer.
In other words, these four major publishers resell to universities content that the institutions themselves have produced.
In this completely closed market, competition does not exist, and pre-existing agreement is the rule: subscription prices have continued to soar for thirty years, while the cost of publishing, in the era of electronic publishing, has never been lower. For example, the annual subscription to Elsevier’s journal ‘Brain Research’ costs a whopping 15,000 EUR.
The Ministry Shoulders This Policy
The agreement between France and Elsevier amounted to ca. EUR 172 million for 476 universities and hospitals.
The first payment (approximately EUR 34 million of public money) was paid in full in September 2014. In return, 476 public institutions will have access to a body of about 2,000 academic journals.
This published research was mainly financed by public funds. Therefore in the end, we will have paid to Elsevier twice: once to publish, a second time to read.
This is not a blip. The agreement between Elsevier and the government is established policy. In March 2014, Geneviève Fioraso, Minister of Higher Education and Research, elaborated upon the main foci of her political agenda to the Academy of Sciences;two of which involve privileged interactions with Elsevier. This would be the first time that negotiating the right to read for hundreds of public research institutions and universities was managed at national level.
One could argue in favour of the Ministry’s benevolence vis-à-vis public institutions to the extent it supports this vital commitment to research. Such an argument would, however, fail to highlight multiple issues. Among these, we would pinpoint the total opacity in the choice of supplier (why Elsevier in particular?) and the lack of competitive pitch between several actors (for such an amount, open public tendering is required). The major problem which prevents competition is the monopolistic hold of publishers over knowledge – no-one else has the right to sell that particular article on cancer research that a researcher in Paris requires for their work – so there is little choice but to continue paying the individual publishers under the current system. Their hold on only expires with copyright, which is 70 years from the death of the last author and therefore entirely incompatible with the timeline of scientific discovery.
Prisoners of a game with pre-set rules, the negotiators (the Couperin consortium and the Bibliographic Agency for Higher Education, abbreviated as ABES) have not had much breathing space for negotiation. As aforementioned, a competitive pitch did not happen. Article 4 of the Agreement is explicit:“Market for service provision without publication and without prior competition, negotiated with a particular tenderer for reasons connected with the protection of exclusive distribution rights.”
Therefore, a strange setup materialises for Elsevier to keep its former customers in its back pocket. The research organisations already having a contract with the publisher can only join the national license providing they accept a rise of the costs (that goes from 2.5 to 3.5%). Those without previous contract are not concerned.
How Many Agreements of the Sort?
To inflate the bill even more, Elsevier sells bundles of journals (its ‘flagship journals’): “No title considered as a ‘flagship journal’ (as listed in Annex 5) can be withdrawn from the collection the subscribers can access” (art. 6.2). These ‘flaghip journals’ cannot all claim outstanding impact factors. Moreover, they are not equally relevant acrossdisciplines and scientific institutions.
The final price has been reduced from the estimation initially planned in February: “only” EUR 172 million instead of EUR 188 million. Yet, this discount does not seem to be a gratuitous gift from Elsevier. Numerous institutions have withdrawn from the national license: from 642 partners in February, only 476 remain in the final deal.
Needless to say, the sitation is outrageous. Yet, it is just one agreement with one among several vendors. A recent report by the French Academy of Science [http://www.academie-sciences.fr/presse/communique/rads_241014.pdf] alluded to a total of EUR 105 million annually, dedicated to acquiring access to scientific publications. This figure, however, comes out as far below the reality. Indeed, the French agreement with Elsevier grants access to publications only to some of the research institutions and universities in France; and yet in this case, the publisher already preempts EUR 33-35 million per year. The actual costs plausibly reach a total of EUR 200-300 million.
An alternative exists.
Elsewhere in Europe…
An important international movement has emerged and developed promoting and defending a free and open access to scientific publications. The overall goal is to make this content accessible and reusable to anyone.
As a matter of fact, researchers have no interest whatsoever in maintaining the current system. Copyright in scholarly publication does not requite authors and thus constitutes a fiction whose main goal is to perpetrate the publisher’s rights. Not only does this enclosure limit access to scientific publications — it also prevents the researcher from reusing their own work, as they oftenconcede their copyright when opting in to publication agreements.
The main barrier to opening up access to publications appears to stem from the government. No action is taken for research to be released from the grip of oligopolistic publishers. Assessment of publicly funded research focuses on journals referred to as “qualifying” (that is, journals mainly published by big editors). Some university departments even consider that open access publications are, by default, “not scientific”.
Several European Countries lead the way:
- Germany has passed a law limiting the publishers’ exclusive rights to one year. Once the embargo has expired, the researcher is free to republish his work and allow open access to it. More details here.
- Negotiations have been halted in Elsevier’s base, the Netherlands. Even though Elsevier pays most of its taxes there, the Dutch governement fully supports the demands of researchers and librarians, aiming to open up the whole corpus of Dutch scientific publications by 2020. More details here.
The most chilling potential effect of the Elsevier deal is removing, for five years, any possible collective incentive to an ambitious French open access policy. French citizens will continue to pay twice for research they cannot read. And the government will sustain a closed and archaic editorial system whose defining feature is to single-handedly limit the right to read.
For astronomers, it might be once in a few million years when a key comet comes back around. For a soccer-crazed world, it’s every four years until the World Cup is back in play. For library-focused copyright “geeks” the not-really-magic-at-all interval is 3 years. That’s how often librarians, educators, disabled persons, internet security researchers, technologists, businesses and anyone else who needs access to copyrighted digital information secured with digital locks has to apply for special exemptions from Section 1201 of the Digital Millennium Copyright Act (DMCA). That’s the part of the law that prohibits the “circumvention” of “technological protection measures” (TPMs) employed by the owners of copyrighted works to block unauthorized access to them.
Unfortunately, TPMs also block perfectly lawful uses that don’t require prior authorization, such as fair uses for education or journalism or converting text to speech for the print disabled. Nonetheless, it’s still an actual crime under the DMCA to circumvent TPMs even if the eventual use of the protected material is perfectly legal unless a specific exemption to circumvent (i.e., pick the digital lock) for that otherwise lawful use has been granted! To make matters worse, even once the expensive and time consuming case for an exemption has been made successfully, the recipient of the exemption has to make the case in full to the U.S. Copyright Office all over again every three years.
ALA, in tandem with other library organizations, has actively participated in this so-called “Triennial Rulemaking” process at every opportunity since the DMCA was passed in 1998. This time around, in conjunction with the Association of College and Research Libraries (ACRL) and Association of Research Libraries (ARL), ALA is pursuing the renewal of two critical exemptions. One has permitted the circumvention of TPMs that otherwise would prevent educators from incorporating film excerpts into their lectures and curricula. The second allows TPMs to be worked around so that e-reader devices may be freely used by the print disabled to convert digitally “locked” text into accessible speech. Many others have filed for a wide range of other exemptions to facilitate all kinds of valuable commercial and non-commercial activities. Our friends at the Electronic Frontier Foundation have helpfully created and will maintain this online repository for all filings made in the 2015 Rulemaking.
The latest Triennial Rulemaking is just getting going, but libraries will continue to lobby Congress and the Copyright Office for sensible changes to the rulemaking process that would alleviate some of the significant and senseless burdens placed upon those who seek exemption renewal, particularly in the absence of any opposition. With luck, it won’t take until a comet comes around again for such common-sense changes to be made.
The post Libraries again fight for exemptions from “Digital Locks” copyright law appeared first on District Dispatch.
Today we are delighted to put out our formal announcement for a new Executive Director. In our announcement about changes in leadership in September we had already indicated we would be looking to recruit a new senior executive and we are now ready to begin the formal process.
We are very excited to have this opportunity to bring someone new on board. Please do share this with your networks and especially anyone in particular you think would be interested. We emphasize that we are conducting a world-wide search for the very best candidates, although the successful candidate would ideally be able commute to London or Berlin as needed.
Full role details are below – to apply or to download further information on the required qualifications, skills and experience for the role, please visit http://www.perrettlaver.com/candidates quoting reference 1841. The closing date for applications is 9am (GMT) on Monday, 8th December 2014.Role Details
Open Knowledge is a multi-award winning international not-for-profit organisation. We are a network of people passionate about openness, using advocacy, technology and training to unlock information and enable people to work with it to create and share knowledge. We believe that by creating an open knowledge commons and developing tools and communities around this we can make a significant contribution to improving governance, research and the economy. We’re changing the world by promoting a global shift towards more open ways of working in government, arts, sciences and much more. We don’t just talk about ideas, we deliver extraordinary software, events and publications.
We are currently looking for a new Executive Director to lead the organisation through the next exciting phase of its development. Reporting into the Board of Directors, the Executive Director will be responsible for setting the vision and strategic direction for the organisation, developing new business and funding opportunities and directing and managing a highly motivated team. S/he will play a key role as an ambassador for Open Knowledge locally and internationally and will be responsible for developing relationships with key stakeholders and partners.
The ideal candidate will have strong visionary and strategic skills, exceptional personal credibility, a strong track record of operational management of organisations of a similar size to Open Knowledge, and the ability to influence at all levels both internally and externally. S/he will be an inspiring, charismatic and engaging individual, who can demonstrate a sound understanding of open data and content. In addition, s/he must demonstrate excellent communication and stakeholder management skills as well as a genuine passion for, and commitment to, the aims and values of the Open Knowledge.
To apply or to download further information on the required qualifications, skills and experience for the role, please visit http://www.perrettlaver.com/candidates quoting reference 1841. The closing date for applications is 9am (GMT) on Monday, 8th December 2014.
The role is flexible in terms of location but ideally will be within commutable distance of London or Berlin (relocation is possible) and the salary will be competitive with market rate.
We are proud to announce an updated screencast which demos the increased functionality and updated user interface of the PeerLibrary website. This screencast debuted at the Mozilla Festival in October as part of our science fair presentation. The video showcases an article by Paul Dourish and Scott D. Mainwaring entitled “Ubicomp’s Colonial Impulse” as well as the easy commenting and discussion features which PL emphasizes. One of the MozFest conference attendees actually recognized the article which drew him towards our booth and into a conversation with our team. Check out the new screencast and let us know what you think!
Mozilla Festival brings developers, educators, and tech enthusiasts from a variety of fields together with the common goal of promoting and building the open web. Among others, some of the sessions most relevant to PeerLibrary’s goals included “Community Building” and “Science and the Web”. A delegation from the PeerLibrary team presented at the science fair on the first evening of the conference. This provided an opportunity to reconnect with some of our UK based supporters and contributors as well as introduce the platform to hundreds of MozFest attendees. We received valuable feedback from the web dev community and have a slew of new features and improvements to consider implementing in the coming months. Another phenomenal conference and we’re already looking forward to MozFest 2015!
Winchester, MA More than 400 delegates made the trip to Melbourne, Victoria, Australia in October to learn about current best practices in research support and to share innovative examples and ideas at the eResearch Australasia Conference. The annual Conference focuses on how information and communications technologies help researchers collect, manage and reuse information.
Last updated November 10, 2014. Created by Peter Murray on November 10, 2014.
Log in to edit this page.
The first Islandora Camp of 2015 will be in Vancouver, BC from February 16 - 18, for our West Coast Islandorians and anyone else who would like to see beautiful British Columbia while learning about Islandora. Many thanks to our sponsor Simon Fraser University for making this camp possible!
If you have any questions about this or future camps, please contact us.
Our projects range in scope from fast-moving prototypes to long-term innovations. The best way to get a feel for what we do is by looking at some of our current efforts.
Perma.cc, a web archiving service that is powered by libraries
H2O, a platform for creating, sharing and adapting open course materials
Amber, a server side plugin to keep links working on blogs and websites
What you’ll do
- - Work with our multi-disciplinary team to build elegant web tools
- - Contribute to our broad vision for the Internet, libraries, and society
- - Rely on your good design sense and user-centricity
- - Create beautiful graphics and use modern web technologies to share them
- - Have fun while producing meaningful work with fantastic folks
This is a term limited position running through Spring and Summer semesters (January-August 2015).
All nominees have been contacted and the 19 (!) nominees included in this election are all potentially available to speak. The top two available vote recipients will be invited to be our keynote speakers this year. Voting will end on Tuesday, November 18th, 2014 at 20:00:00 PM PDT.
When rating nominees, please consider whether they are likely to be an
excellent contributor in each the following areas:
1) Appropriateness. Is this speaker likely to convey information that is useful to many members of our community?
2) Uniqueness. Is this speaker likely to cover themes that may not commonly appear in the rest of the program?
3) Contribution to diversity. Will this person bring something rare, notable, or unique to our community, through unusual experience or background?
If you have any issues with your code4lib.org account, please contact Ryan Wick at
This is my first R script, wordcloud.r:
#!/usr/bin/env Rscript # wordcloud.r - output a wordcloud from a set of files in a given directory # Eric Lease Morgan <email@example.com> # November 8, 2014 - my first R script! # configure MAXWORDS = 100 RANDOMORDER = FALSE ROTPER = 0 # require library( NLP ) library( tm ) library( methods ) library( RColorBrewer ) library( wordcloud ) # get input; needs error checking! input <- commandArgs( trailingOnly = TRUE ) # create and normalize corpus corpus <- VCorpus( DirSource( input[ 1 ] ) ) corpus <- tm_map( corpus, content_transformer( tolower ) ) corpus <- tm_map( corpus, removePunctuation ) corpus <- tm_map( corpus, removeNumbers ) corpus <- tm_map( corpus, removeWords, stopwords( "english" ) ) corpus <- tm_map( corpus, stripWhitespace ) # do the work wordcloud( corpus, max.words = MAXWORDS, random.order = RANDOMORDER, rot.per = ROTPER ) # done quit()
Given the path to a directory containing a set of plain text files, the script will generate a wordcloud.
Like Python, R has a library well-suited for text mining — tm. Its approach to text mining (or natural language processing) is both similar and dissimilar to Python’s. They are similar in that they both hope to provide a means for analyzing large volumes of texts. It is similar in that they use different underlying data structures to get there. R might be more for analytic person. Think statistics. Python may be more for the “literal” person, all puns intended. I will see if I can exploit the advantages of both.
Not coincidentally, LOCKSS “consists of a large number of independent, low-cost, persistent Web caches that cooperate to detect and repair damage to their content by voting in “opinion polls” (PDF). In other words, gossip and anti-entropy.The main use for gossip protocols is to disseminate information in a robust, randomized way, by having each peer forward information it receives from other peers to a random selection of other peers. As the function of LOCKSS boxes is to act as custodians of copyright information, this would be a very bad thing for them to do.
It is true that LOCKSS peers communicate via an anti-entropy protocol, and it is even true that the first such protocol they used, the one I implemented for the LOCKSS prototype, was a gossip protocol in the sense that peers forwarded hashes of content to each other. Alas, that protocol was very insecure. Some of the ways in which it was insecure related directly to its being a gossip protocol.
An intensive multi-year research effort in cooperation with Stanford's CS department to create a more secure anti-entropy protocol led to the current protocol, which won "Best Paper" at the 2003 Symposium on Operating System Principles. It is not a gossip protocol in any meaningful sense (see below the fold for details). Peers never forward information they receive from other peers, all interactions are strictly pair-wise and private.
For the TRAC audit of the CLOCKSS Archive we provided an overview of the operation of the LOCKSS anti-entropy protocol; if you are interested in the details of the protocol this, rather than the long and very detailed paper in ACM Transactions on Computer Systems (PDF), is the place to start.
According to Wikipedia:
a gossip protocol is one that satisfies the following conditions:
- The core of the protocol involves periodic, pairwise, inter-process interactions.
- The information exchanged during these interactions is of bounded size.
- When agents interact, the state of at least one agent changes to reflect the state of the other.
- Reliable communication is not assumed.
- The frequency of the interactions is low compared to typical message latencies so that the protocol costs are negligible.
- There is some form of randomness in the peer selection. Peers might be selected from the full set of nodes or from a smaller set of neighbors.
- Due to the replication there is an implicit redundancy of the delivered information.
The redundancy of preserved content in a LOCKSS network is a higher-level concept than the details of individual peer communication. The current protocol is a peer-to-peer consensus protocol.
I’ve been working with Linked Data off and on for a while now but really the last year has been my deepest dive into it. Much of that dive involved writing a PHP library to interact with the WorldCat Discovery API. Since I started seeing how much could be done with Linked Data in discovery, I’ve been re-adjusting my worldview and acquiring a new skills set to work with Linked Data. This meant understanding the whole concept of triples and the subject, predicate, object nomenclature. In our recent blog posts on the WorldCat Discovery API, we touched on some of the basics of Linked Data. We also mentioned some tools for working with Linked Data in Ruby.
Library of Congress: The Signal: Digital Preservation Capabilities at Cultural Heritage Institutions: An Interview With Meghan Banach Bergin
The following is a guest post by Jefferson Bailey of Internet Archive and co-chair of the NDSA Innovation Working Group.
In this edition of the Insights Interview series we talk with Meghan Banach Bergin, Bibliographic Access and Metadata Coordinator, University of Massachusetts Amherst Libraries. Meghan is the author of a Report on Digital Preservation Practices at 148 Institutions. We discuss the results of her research and its implications of her work for digital preservation policies in general and at her institution in particular.
Jefferson: Thanks for talking with us today. Tell us about your sabbatical project.
Meghan: Thank you, I’m honored to be interviewed for The Signal blog. The goal of my sabbatical project last year was to investigate how various institutions are preserving their digital materials. I decided that the best way to reach a wide range of institutions was to put out a web-based survey. I was thrilled at the response. I received responses from 148 institutions around the world, roughly a third each were large academic libraries, smaller academic libraries and non-academic institutions (including national libraries, state libraries, public libraries, church and corporate archives, national parks archives, historical societies, research data centers and presidential libraries).
It was fascinating to learn what all of these different institutions were doing to preserve our cultural heritage for future generations. I also conducted phone interviews with 12 of the survey respondents from various types of institutions, which gave me some additional insight into the issues involved in the current state of the digital preservation landscape.
Jefferson: What made you choose this topic for your sabbatical research? What specific conclusions or insight did you hope to gain in conducting the survey?
Meghan: We have been working to build a digital preservation program over the last several years at the University of Massachusetts Amherst Libraries and I thought I could help move it forward by researching what other institutions are doing in terms of active, long-term preservation of digital materials. I was hoping to find systems or models that would work for us at UMass or for the Five Colleges Consortium.
Jefferson: How did you go about putting together the survey? Were there any specific areas that you wanted to focus on?
Meghan: I had questions about a lot of things, so I brainstormed a list of everything I wanted to know. When I reviewed the resulting list, four main areas of inquiry emerged: solutions, services, staffing and cost. I wanted to know what systems were being used for digital preservation and what digital preservation services were being offered, particularly at academic institutions. Here at UMass we currently offer research data curation services and digital preservation consulting services, but we don’t have a specific budget or staff devoted to digital preservation, which was why I also wanted to know what kind of staffing other institutions had devoted to their digital preservation programs and the cost of those programs.
Jefferson: What surprised you about the responses? Or what commonalities did you find in the answers that you hadn’t considered when writing the questions?
Meghan: I was surprised at the sheer volume and variety of tools and technologies being used to preserve digital materials. I think this shows that we are in an experimental phase and that everyone is trying to figure out what solutions will work best for different kinds of digital collections and materials, as well as what solutions will work best given the available staffing, skill sets and resources at their institutions. It also shows that there is a lot of development happening right now, and this makes me feel optimistic about the future of the digital preservation field.
Jefferson: Did any themes or trends emerge from reading people’s responses?
Meghan: Some common themes did emerge. Most people reported that budgets are tight and that they are trying to manage digital preservation with existing staff that also have other primary job responsibilities aside from digital preservation. Almost everyone I talked to said that they thought they needed additional staff. Also, most of those interviewed were not completely satisfied with the systems and tools they were using. One person said, “No system is perfect right now. It’s a matter of getting a good enough system.” Others mentioned various issues such as difficulties with interoperability between systems and tools, lack of functionality such as the ability to capture technical or preservation metadata or to migrate file formats, and struggles with implementation and use of the systems. People were using multiple systems and tools in an effort to get all of the different functionality they were looking for. One archivist described their methods as “piecemeal” and said that “It would be good if we could make these different utilities more systematic. Right now every collection is its own case and we need an overall solution.”
Jefferson: Your summary report does a nice job balancing the technical and managerial issues involved with digital preservation. Could you tell us a little bit more about what those are and what your survey revealed in these areas?
Meghan: The survey, and the follow-up phone interviews, highlighted the fact that people are dealing with a wide range of technical issues, including storage cost and capacity, the complexities of web archiving and video preservation, automating processes, the need for a technical infrastructure to support long-term digital preservation, the complexity of preserving a wide variety of formats, and keeping up with standards, trends, and technology, especially when there aren’t overall agreed-upon best practices. The managerial issues mainly centered around staffing levels, staff skill sets and funding.
Jefferson: I was curious to see that while 90% of respondents had “undertaken efforts to preserve digital materials” only 25% indicated they had a “written digital preservation policy.” What do you think accounts for this discrepancy? And, having recently contributed to writing a policy yourself, what would you say to those just starting to consider it?
Meghan: We were inspired to write our policy by Nancy McGovern’s Digital Preservation Management workshop, and we used an outline she provided at the workshop. It was time consuming, and I think that’s why a lot of institutions have decided to skip writing a policy and just proceed straight to actually doing something to preserve their digital materials. This approach has its merits, but we felt like writing the policy gave us the opportunity to wrap our heads around the issues, and having the policy in place provides us with a clearer path forward.
Some of the things we felt were important to define in our policy were the scope of what we wanted to preserve and the roles and responsibilities of the various stakeholders. To those who are just starting to consider writing a digital preservation policy, I would recommend forming a small group to talk through the issues and looking at lots of examples of policies from other institutions. Also, I would suggest looking at Library of Congress Junior Fellow Madeline Sheldon’s report Analysis of Current Digital Preservation Policies: Archives, Libraries and Museums.
Jefferson: Your survey also delved into both staffing and services being provided by institutions. Tell us a bit about some of your findings in those areas (and for staffing, how they compare to the NDSA Staffing Survey (PDF).
Meghan: Almost everyone said that they didn’t have enough staff. One librarian said, “No one is dedicated to working on digital preservation. It is hard to fulfill my main job duties and still find time to devote to working on digital preservation efforts.” Another stated that, “Digital preservation gets pushed back a lot, because our first concern is patron requests, getting collections in and dealing with immediate needs.” My survey results echoed the NDSA staffing survey findings in that almost every institution felt that digital preservation was understaffed, and that most organizations are retraining existing staff to manage digital preservation functions rather than hiring new staff. As far as services, survey respondents reported offering various digital preservation services such as consulting, education and outreach. However, most institutions are at the stage of just trying to raise awareness about the digital preservation services they offer.
Jefferson: Your conclusion poses a number of questions about the path forward for institutions developing digital preservation programs. How does the future look for your institution and what advice would you give to institutions in a similar place as far as program development?
Meghan: I think the future of our digital preservation program at UMass Amherst looks very positive. We have made great advances toward digital preservation over the last decade. We have implemented an institutional repository to manage and provide access to the scholarly output of the University, created a digital image repository to replace the old slide library and developed a Fedora-based repository system to manage and preserve our digital special collections and archives. We wrote our digital preservation policy to guide us in our path forward.
We are planning to join a LOCKSS PLN to preserve the content in our institutional repository; we just joined the Hathi Trust which should provide digital preservation for the materials we have digitized through the Internet Archive; and we are working with the Five Colleges to test and possibly implement new digital preservation tools and technologies. It helps to have the support of the administration at your institution, which we are very fortunate to have. My guess is that we will see an increase in collaboration in the future, so my advice would be to pay attention to the development of national-level collaborative digital preservation initiatives and to think about state or regional opportunities to work together on digital preservation efforts.
Jefferson: Finally, after conducting the survey and writing your sabbatical report, how do you feel about the current state of digital preservation?
Meghan: I think it’s really encouraging to see institutions trying different technologies and experimenting with what will work even with limited resources and uncertainty over what the best solution might be. Despite the many challenges, we aren’t just throwing our hands up in the air and doing nothing. We are trying different things, sharing the results of our efforts with each other, and learning as a community. It’s an exciting time of innovation in the digital preservation field!
Below is my first Python script, concordance.py:
#!/usr/bin/env python2 # concordance.py - do KWIK search against a text # # usage: ./concordance.py <file> <word>ph # Eric Lease Morgan <firstname.lastname@example.org> # November 5, 2014 - my first real python script! # require import sys import nltk # get input; needs sanity checking file = sys.argv[ 1 ] word = sys.argv[ 2 ] # do the work text = nltk.Text( nltk.word_tokenize( open( file ).read( ) ) ) text.concordance( word ) # done quit()
Given the path to a plain text file as well as a word, the script will output no more than twenty-five lines containing the given word. It is a keyword-in-context (KWIC) search engine, one of the oldest text mining tools in existence.
The script is my first foray into Python scripting. While Perl is cool (and “kewl”), it behooves me to learn the language of others if I expect good communication to happen. This includes others using my code and me using the code of others. Moreover, Python comes with a library (module) call the Natural Langauge Toolkit (NLTK) which makes it relatively easy to get my feet wet with text mining in this environment.
One of the features of Islandora Camp is the camp t-shirt given to all attendees. Every camp has its own logo. This is the logo won a free registration for our last Islandora Camp, in Denver:
We want to give a free registration and a couple of extra t-shirts to the iCampBC attendee who comes up with the best logo to represent our first trip to western Canada.
Entries will be accepted through January 3rd, 2015. Entries will be put up on the website for voting and a winner will be selected and announced January 10th, 2015.
Here are the details to enter:The Rules:
- Camp Registration is not necessary to enter; anyone with an interest in Islandora is welcome to send in a design - however, the prize is a free registration, so you'll have to be able to come to camp to claim it.
- Line art and text are acceptable; photographs are not.
- You are designing for the front of the shirt for an area up to 12 x 12 inches. Your design must be a single image.
- Your design may be up to four colours. The t-shirt colour will be determined in part by the winning design.
- By entering the contest you agree that your submission is your own work. The design must be original, unpublished, and must not include any third-party logos (other than the Islandora logo, which you are free to use in your design) or copyrighted material.
- One free registration to Islandora Camp BC (or a refund if you are already registered)
- An extra t-shirt with your awesome logo
- Bragging rights
- Please submit the following by email to email@example.com:
- Your full name
- A brief explanation of your logo idea
- Your logo entry as an attachment. Minimum 1000 x 1000 pixels. High-resolution images in .eps or .ai format are preferred. We will accept .png and .jpg for the contest, but the winner must be able to supply a high resolution VECTOR art version of their entry if it is selected as the winner. Don't have a vector program? Try Inkscape - it's free!
- Entries will be accepted through January 3rd, 2015.
- Multiple entries allowed.
- Submissions will be screened by the Islandora Team before posting to the website for voting.
- By submitting your design, you grant permission for your design to be used by the Islandora project, including but not limited to website promotions, printed materials and (of course) t-shirt printing.
- We reserve the right to alter your image as necessary for printing requirements and/or incorporate the name and date of the camp into the final t-shirt design. You are free to include these yourself as part of your logo.
- The Islandora Team reserves the right to make the final decision.
Thank you and good luck!
Deployment of the latest software release was unsuccessful. We have restored the software to its current release. We will let you know when another installation date is established.
Deployment of the latest software release was unsuccessful. We have restored the software to its current release. We will let you know when another installation date is established.