You are here

Feed aggregator

Library of Congress: The Signal: From the Field: More Insight Into Digital Preservation Training Needs

planet code4lib - Mon, 2015-01-26 15:45

The following is a guest post by Jody DeRidder, Head of Digital Services at the University of Alabama Libraries.  This post reports on efforts in the digital preservation community that align with the Library’s Digital Preservation Outreach & Education (DPOE) Program. Jody, among many other accomplishments, has completed one of the DPOE Train-the-Trainer workshops and delivered digital preservation training online to the Association of Southeastern Research Libraries (ASERL).

Jody DeRidder

As previously discussed on The Signal, DPOE has conducted two surveys to better understand the digital preservation capacities of cultural heritage institutions. The respondents provide insight into their digital preservation practice, what types of training are necessary to address their staffing needs and preferences for the best delivery options of training events. Between the 2010 and 2014 DPOE surveys, I conducted an interim survey in 2012 to identify the digital preservation topics and types of materials most important to webinar attendees and their institutions. A comparison of the information uncovered by these three surveys provides insight into changing needs and priorities, and indicates what type of training is most needed and in what venues.

In terms of topics, technical training (to assist practitioners in understanding and applying techniques) is the clear top preference in all three surveys. In the 2010 DPOE survey, the highest percentage of respondents (32%) ranked technical training as their top choice. This was echoed in the 2014 DPOE survey as well. In my 2012 survey, this question was represented by multiple options. (Each of the rankings referenced is the percentage of participants who considered training in this topic to be extremely important.) The top two selected were training in “methods of preservation metadata extraction, creation, and storage” (77%) and “determining what metadata to capture and store” (68%). Both of these could easily be considered technical training.

Other technical training options included:

  • File conversion and migration issues (59%).
  • Validating files and capturing checksums (54%).
  • Monitoring status of files and media (53%).
  • How to inventory content to be managed for preservation (42%).

These preferences are echoed in the DPOE 2014 survey, where respondents identified training investments that result in “an increased capacity to work with digital objects and metadata management” as the most beneficial outcome with a three-year horizon.

In the 2010 DPOE survey, the need for “project management,” “management and administration,” and “strategic planning” followed “technical training” in priority (in that order). By 2014, this had shifted a bit: “strategic planning” led “management and administration,” followed by “project management.” Last in importance to participants in both surveys was fundamentals (described as “basic knowledge for all levels of staff”).

Has the need for strategic planning increased? Topics in the 2012 survey that related to management included:

  • Planning for provision of access over time (the third highest ranking: 65%).
  • Developing your institution’s preservation policy and planning team (51%).
  • Legal issues surrounding access, use, migration, and storage (43%).
  • Self-assessment and external audits of your preservation implementation (34%).

Strategic planning might include the following topics from the 2012 survey:

  • Developing selection criteria, and setting the scope for what your institution commits to preserving (52%).
  • Selecting file formats for archiving (45%).
  • Selecting storage options and number of copies (44%).
  • Security and disaster planning at multiple levels of scope (33%).
  • Business continuity planning (28%).

Thus it seems that in the 2012 survey, strategic planning was still secondary to management decisions, but that may have shifted, as indicated in the DPOE 2014 survey. A potential driving force for this shift could well be the increased investment in digital preservation in recent years.

When asked in 2010 about the types of digital content in organizational holdings, 94% of the respondents to the DPOE survey selected reformatted material digitized from collections, and 39.5% indicated digital materials. In 2014 the reformatted content had dropped to 83%, deposited digital materials had increased to 44%, and a new category, “born digital,” was selected by over 76% of participants. Within these categories, digital images, PDFs and audiovisual materials were the most selected types of content, followed closely by office files. Research data and websites were secondary contenders, with architectural drawings third, followed by geospatial information and finally “other.”

From the 2012 survey, with the numbers representing percentages of the types of content in organizational holdings.

In the 2012 survey, participants were only asked to rank categories of digital content in terms of importance for preservation at their institution. Within this, 65% selected born-digital special collections materials as extremely important; 63% selected born-digital institutional records, and 61% selected digitized (reformatted) collections. “Other” was selected by 47%, and comments indicate that most of this was audiovisual materials, followed by state archives content and email records. The lowest categories selected were digital scholarly content (institutional repository or grey lit, at 37%); digital research data (34%), and web content (31%).

Clearly, preservation of born-digital content has now become a priority to survey respondents over the past few years, though concern for preservation of reformatted content continues to be strong. As the amount of born-digital content continues to pour into special collections and archives, the pressure to meet the burgeoning challenge for long-term access is likely to increase.

In both the 2010 and 2014 DPOE surveys, an overwhelming number of participants (84%) expressed the importance of ensuring digital content is accessible for 10 years or more. Training is a critical requirement to support this process. While the 2012 survey focused only on webinars as an option, both of the DPOE surveys indicated that respondents preferred small, in-person training events, on-site or close to home. However, webinars were the second choice in both 2010 and 2014, and self-paced, online courses were the third choice in 2014. As funding restrictions on travel and training continue, an increased focus on webinars and nearby workshops will be best-suited to furthering the capacity for implementing long-term access for valuable digital content.

In the interest of high impact for low cost, the results of these surveys can help to fine-tune digital preservation training efforts in terms of topics, content and venues in the coming months.

Open Knowledge Foundation: Launching Open Data Day Coalition Micro-Grant Scheme: Apply Today!

planet code4lib - Mon, 2015-01-26 15:44

OPEN DATA DAY 2015 is coming and a coalition of partners have come together to provide a limited number of micro-grants designed to support communities organise ODD activities all over the world !

Open Data Day (ODD) is one of the most exciting events of the year. As a volunteer led event, with no organisation behind it, Open Data Day provides the perfect opportunity for communities all over the world to convene, celebrate and promote open data in ways most relevant to their fellow citizens. This year, Open Data Day will take place on Saturday, the 21st of February 2015 and a coalition of partners have gotten together to help make the event bigger (and hopefully better) than its has ever been before!

While Open Data Day has always been a volunteer led initiative, organising an event often comes with quite a hefty price tag. From hiring a venue, to securing a proper wifi connection, to feeding and caffeinating the volunteer storytellers, data wranglers and developers who donate their Saturday to ensuring that open data empowers citizens in their communities, there are costs associated with convening people! Our Open Data Day Coalition is made of open data, open knowledge and open access organisations who are interested in providing support for communities organising ODD activities. This idea emerged from an event that was organised in Kenya last year, where a small stipend helped local organisers create an amazing event, exposing a number of new people to open data. This is exactly what we are trying to achieve on Open Data Day!

As such, this year, for the first time ever, we are proud to announce the availability of a limited number of micro grants of up to $300 to help communities organise amazing events without incurring prohibitive personal costs. The coalition will also provide in-kind support in the form mentorship and guidance or simply by providing a list of suggested activities proven effective at engaging new communities!

The coalition consists of the following organisations (in alphabetical order): Caribbean Open Institute, Code for Africa, DAL, E-Democracy, ILDA, NDI, Open Access Button, Open Coalition, Open Institute, Open Knowledge, Sunlight Foundation and Wikimedia UK. Want to join? Read on.

Applying for a Microgrant!

Any group or organisation from any country can apply. Given the difference focus of our partners, grants in Latin America will be handled and awarded by ILDA. In the Caribbean, the Caribbean Open Institute will handle the process. Finally, The Partnership for Open Data will focus on other low to mid income countries. Of course, in order to ensure that we are able to award the maximum number of grants, we will coordinate this effort!

You can find the application form here. The deadline to apply is February 3rd and we aim to let you know whether your grant was approved ASAP.

Currently, we have one micro grant, provided by The Sunlight Foundation, for a group organising open data day activites in a high income country. We would love to provide additional support for groups organising in any country; as such, if you are interested in helping us find (or have!) additional funding (or other forms of in kind support such as an event space!), do get in touch (see below how to join the coalition). We will make sure to spread the word far and wide once we have additional confirmed support!

How to Apply for an Open Data Day Micro Grant

If you are organising an event and would like additional support, apply here. If your grant is approved, you will be asked to provide us with bank transfer details and proof of purchase. If it is not possible for you to make the purchases in advance and be reimbursed, we will be sure to find an alternative solution.

Is this your first Open Data Day event? Fear not! In addition to the grant itself, our coalition of partners is here to provide you with the support you need to ensure that your event is a success. Whether you need help publicising the event, deciding what to do, or some tips on event facilitation, we are here to help!


All groups who receive support will be asked to add their event to the map by registering your event here as well as by adding it to list of events on the Open Data Day wiki.

After the event, event organisers will be asked to share a short blog post or video discussing the event! What data did you work with, how many people attended, are you planning on organising additional events? We’d also love to hear about what you learned, what were the challenges and what you would have done differently?

You can publish this in any language but if possible, we would love an English translation that we can share in a larger blog series about Open Data Day. I you would like to have your event included in our summary blog series but are not comfortable writing in English, write to us at local [at] okfn [dot] org and we will help you translate (or connect you with someone who can!).

What To Do Now

The next step is to start organising your event so that you can apply for your micro-grant ASAP! We are aware that we are a bit late getting started and that communities will need time to organise! As such, we aim to let you know whether your grant has been approved ASAP and ideally by the February 6th, 2015. If February 3rd proves to be too tight a deadline, we will extend!

Finally, if you need inspiration for what to do on the day, we are building a menu of suggested activities on the Open Data Day wiki. Go here for inspiration or add your ideas and inspire others! For further inspiration and information, check out the Open Data Day website, which the community will be updating and improving as we move closer to the big day. If you need help, reach out to us at local [at] okfn [dot] org, or check in with one of the other organisations in the coalition.

Interested in joining the coalition?

We have a limited number of grants available and expect a large demand! If you are interested in joining the coalition and have either financial and/or in-kind support available, do get in touch and help us make Open Data Day 2015 the the largest open data hackday our community and the world has ever seen!

Patrick Hochstenbach: Homework assignment #4 Sketchbookskool

planet code4lib - Mon, 2015-01-26 13:40
Filed under: Doodles Tagged: fudenosuke, onion, portrait, sketchbookskool, watercolor

Cynthia Ng: Making Web Services Accessible With Universal Design

planet code4lib - Mon, 2015-01-26 05:38
This was presented as a webinar for the Education Institute on Thursday, January 22, 2015. This presentation is mostly an amalgamation of the Access 2014 and LibTechConf 2014 presentations. There are a couple of small sections (namely analytics, how ever did I forget about that?) that have been added, but a lot of it is … Continue reading Making Web Services Accessible With Universal Design

DuraSpace News: DuraCloud Services Presentations Set for PASIG–Early Bird Registration Fast Approaching!

planet code4lib - Mon, 2015-01-26 00:00

SanDiego, CA  The upcoming 2015 Preservation and Archiving Special Interest Group (PASIG) event will be held March 11-13 on the campus of UC San Diego. The organizers are bringing together an international group of experts in a wide range of fields, dedicated to providing timely, useful information.

Roy Tennant: Wikipedia’s Waterloo?

planet code4lib - Sun, 2015-01-25 23:50

If you are involved in technology at all, you no doubt have heard about GamerGate. Normally at this point I would say that if you hadn’t heard about it, go read about it and come back.

But that would be foolish.

You would likely never come back. Perhaps it would be from disgust at how women have been treated by many male gamers. Perhaps it would be because you can’t believe you have just wasted hours of your life that you are never getting back. Or perhaps it is because you disappeared down the rat hole of controversy and won’t emerge until either hunger or your spouse drags you out. Whatever. You aren’t coming back. So don’t go before I explain why I am writing about this.

Wikipedia has a lot to offer. Sure, it has some gaping holes you could drive a truck through, just about any controversial subject can end up with a sketchy page as warring factions battle it out, and the lack of pages on women worthy of them is striking.

You see, it is well known that Wikipedia has a problem with female representation — both with the percentage of pages devoted to deserving women as well as the number of editors building the encyclopedia.

So perhaps it shouldn’t come as a surprise that Wikipedia has now sanctioned the editors trying to keep a GamerGate Wikipedia page focused on what it is really all about — the misogynistic actions of a number of male gamers. But the shocking part to me is that it even extends beyond that one controversy into really dangerous muzzling territory. According to The Guardian, these women editors* have been banned from editing “any other article about ‘gender or sexuality, broadly construed'”.

I find that astonishingly brutal. Especially for an endeavor that tries to pride itself on an egalitarian process.

Get your act together, Wikipedia.


* My bad. Editors were banned. They are not necessarily women. Or even feminists.

Nicole Engard: Bookmarks for January 25, 2015

planet code4lib - Sun, 2015-01-25 20:30

Today I found the following resources and bookmarked them on <a href=

  • Krita Open Source Software for Concept Artists, Digital Painters, and Illustrators

Digest powered by RSS Digest

The post Bookmarks for January 25, 2015 appeared first on What I Learned Today....

Related posts:

  1. Governments Urging the use of Open Source
  2. eXtensible Catalog (XC) gets more funding
  3. Evaluating Open Source

Casey Bisson: Photo hipster: playing with 110 cameras

planet code4lib - Sun, 2015-01-25 18:44

After playing with Fuji Instax and Polaroid (with The Impossible Project film) cameras, I realized I had to do something with Kodak. My grandfather worked for Kodak for years, and I have many memories of the stories he shared of that work. He retired in the late 70s, just as the final seeds of Kodak’s coming downfall were being sown, but well before anybody could see them for what they were.

The most emblematic Kodak camera and film I could think of was the 110 cartridge film type, and that’s what I used to captured this picture of Cliff Pearson and Millicent Prancypants.

I bought two cameras and a small bundle of film from various eBay sellers. They look small in the following photo, but they’re significantly larger and less pocketable than even my iPhone 6 plus.

Developing is $4 per cartridge at Adolph Gasser’s, but they can’t print or scan the film there, so that had me looking for other solutions. I couldn’t find a transparency scanner that had film holders for 110 film. That isn’t surprising, but it did leave me wondering and hesitant long enough to look for other ways to capture this film. For these shots I re-photographed them with my EOS M:

John Miedema: Writing has changed with digital technology, but much is the same. Pirsig’s slip-based writing system was inspired by information technology.

planet code4lib - Sun, 2015-01-25 16:41

Writing has changed with digital technology, but much is the same. The Lila writing technology builds on both the dynamic and static features.

Writers traditionally spend considerable time reading individual works closely and carefully. The emergence of big data and analytic technologies causes a shift toward distant reading, the ability to analyze a large volume of text in terms of statistical patterns. Lila uses these technologies to select relevant content for deeper reading.

Writing, as always, occurs in many locations, from a car seat to a coffee shop to a desk. Digital technology makes it easier to aggregate text from these different locations. Existing technologies like Evernote and Google Drive can gather these pieces for Lila to perform its cognitive functions.

Writing is performed on a variety of media. In the past it might have been napkins, stickies and binder sheets. Today it includes a greater variety, from cell phone notes to email and word processor documents. Lila can only analyze digital media. It is understood that there is still much text in the world that is not digital. Going forward, text will likely always be digital.

Writing tends to be more fragmented today, occurring in smaller units of text. Letter length is replaced with cell phone texts, tweets, and short emails. The phrase “too long; didn’t read” is used on the internet for overly long statements. Digital books are shorter than print books. Lila is expressly designed around a “slip” length unit of text, from at least a tweet length for a subject line, up to a few paragraphs. It would be okay to call a slip a note. Unlike tweets, there will be no hard limit on the number of characters.

A work is written by one or many authors. Print magazines and newspapers are compilation of multiple authors, so too are many websites. Books still tend to be written by a single author, but Lila’s function of compiling content into views will make it easier for authors to collaborate on a work with the complexity and coherence of a book.

In the past, the act of writing was more isolated. There was a clear separation between authors and readers. Today, writing is more social. Authors blog their way through books and get immediate feedback. Readers talk with authors during their readings. Fans publish their own spin on book endings. Lila extends reading and writing capabilities. I have considered additional capabilities with regard to publishing drafts to the web for feedback and iteration. A WordPress integration perhaps.

Pirsig’s book, Lila, was published in 1991, not long after the advent of the personal computer and just at the dawn of the web. His slip-based writing system used print index cards, but he deliberately chose that unit of text over pages because it allowed for “more random access.” He also categorized some slips as “program” cards, instructions for organizing other slips. As cards about cards, they were powerful, he said, in the way that John Von Neuman explained the power of computers, “the program is data and can be treated like any other data.” Pirsig’s slip-based writing system was no doubt inspired by the developments in information technology.

Alf Eaton, Alf: Exploring a personal Twitter network

planet code4lib - Sun, 2015-01-25 13:59
PDF version
  1. Fetch the IDs of users I follow on Twitter, using vege-table:

    var url = ''; var params = { screen_name: ‘invisiblecomma’, stringify_ids: true, count: 5000 }; var collection = new Collection(url, params); collection.items = function(data) { return data.ids; } = function(data) { if (!data.next_cursor) { return null; } params.cursor = data.next_cursor_str; return [url, params]; } return collection.get('json');
  2. Using similar code, fetch the list of users that each of those users follows.

  3. Export the 10,000 user IDs with the highest intra-network follower counts.

  4. Fetch the details of each Twitter user:

    return Resource('', { user_id: user_id }).get('json').then(function(data) { return data[0]; });
  5. Process those two CSV files into a list of pairs of connected identifiers suitable for import into Gephi.

  6. In Gephi, drag the “Topology > In Degree Range” filter into the Queries section, and adjust the range until a small enough number of users with the most followers is visible:

  7. Set the label size to be larger for users with more incoming links:

  8. Set the label colour to be darker for users with more incoming links:

  9. Apply the ForceAtlas 2 layout, then the Expansion layout a few times, then the Label Adjust layout:

  10. Switch to the Preview window and adjust the colour and opacity of the edges and labels appropriately. Hide the nodes, set the label font to Roboto, then export to PDF.

  11. Use imagemagick to convert the PDF to JPEG: convert —density 200 twitter-foaf.pdf twitter-foaf.jpg

It would probably be possible to automate this whole sequence - perhaps in a Jupyter Notebook. The part that takes the longest is fetching the data from Twitter, due to the low API rate limits.

Mark E. Phillips: What do we put in our BagIt bag-info.txt files?

planet code4lib - Sat, 2015-01-24 23:03

The UNT Libraries makes heavy use of the BagIt packaging format throughout our digital repository infrastructure.  I’m of the opinion that BagIt is one of the technologies that has contributed more toward moving digital preservation forward in the last ten years than any other one technology/service/specification.  The UNT Libraries uses BagIt for our Submission Information Packages (SIP),  our Archival Information Packages (AIP), our Dissemination Information Packages, and our local Access Content Package (ACP).

For those that don’t know BagIt,  it is a set of conventions for packaging content into a directory structure in a consistent and repeatable way.  There are a number of other descriptions of BagIt that do a very good job of describing the conventions and some of the more specific bits of the specification.

There are a number of great tools for creating, modifying and validating BagIt bags,  and my favorite for a long time has been bagit-python from the Library of Congress.   (To be honest I usually am using Ed Summers fork which I grab from here)

The BagIt specification has a metadata file that is stored in the root of a bag,  this metadata file is called bag-it.txt.  The BagIt specification has a number of fields defined for this file which are stored as key value pairs in the file in the format of.

key: value

I thought it might be helpful for those new to using BagIt bags to see what kinds of information we are putting into these bag-info.txt files,  and also explain some of the unique fields that we are adding to the file for managing items in our system.  Below is a typical bag-info.txt file from one of our AIPs in the Coda Repository.

Bag-Size: 28.32M Bagging-Date: 2015-01-23 CODA-Ingest-Batch-Identifier: f2dbfd7e-9dc5-43fd-975a-8a47e665e09f CODA-Ingest-Timestamp: 2015-01-22T21:43:33-0600 Contact-Email: Contact-Name: Mark Phillips Contact-Phone: 940-369-7809 External-Description: Collection of photographs held by the University of North Texas Archives that were taken by Junebug Clark or other family members. Master files are tiff images. External-Identifier: ark:/67531/metadc488207 Internal-Sender-Identifier: UNTA_AR0749-002-0016-0017 Organization-Address: P. O. Box 305190, Denton, TX 76203-5190 Payload-Oxum: 29666559.4 Source-Organization: University of North Texas Libraries

In the example above,  several of the fields are boiler plate, and others are machine generated.

Field How we create the Value Bag-Size Machine Bagging-Date Machine CODA-Ingest-Batch-Identifier Machine CODA-Ingest-Timestamp Machine Contact-Email Boiler-Plate Contact-Name Boiler-Plate Contact-Phone Boiler-Plate External-Description Changes per “collection” External-Identifier Machine Internal-Sender-Identifier Machine Organization-Address Boiler-Plate Payload-Oxum Machine Source-Organization Boiler-Plate

You can tell from looking at the example bag-info.txt file above that some of the fields are very self explanatory.  I’m going to run over a few of the fields that either are non-standard, or that we’ve made explicit decisions on as we were implementing BagIt.

CODA-Ingest-Batch-Identifier is a UUID for each batch of content added to our Coda Repository,  this helps us identify other items that may have been added during a specific run of our ingest process,  helpful for troubleshooting.

CODA-Ingest-Timestamp is the timestamp when the AIP was added to the Coda Repository.

External-Identifier will change for each collection that gets processed,  it has just enough information about the collection to help jog someone’s memory about where this item came from and why it was created.

External-Identifier is the ARK identifier assigned the item on ingest into one of the Aubrey systems where we access the items or manage the descriptive metadata.

Internal-Sender-Identifier is the locally important (often not unique) identifier for the item as it is being digitized or collected.  It often takes the shape of an accession number from our University Special Collections, or the folder name of an issue of newspaper.

We currently have 1,070,180 BagIt bags in our Coda Repository and they have be instrumental in us being able to scale our digital library infrastructure and verify that each item is just the same as when we added it to our collection.

If you have any specific questions for me let me know on twitter.

John Miedema: Writing non-fiction is mostly reading, thinking, and sorting; the rest is just keystrokes. Lila is for writing non-fiction; poetry, not so much.

planet code4lib - Sat, 2015-01-24 16:31

Writing non-fiction is mostly reading, thinking, and sorting; the rest is just keystrokes. And style. Think clearly and the rest comes easy. Lila is designed to extend human writing capabilities by performing cognitive work:

  1. The work of reading, especially during the early research phase. Writers can simply drop unread digital content onto disk, and Lila will convert it into manageable chunks — slips. These slips are shorter than the full length originals, making them quicker to evaluate. More important, these slips are embedded in the context of relevant content written by the author; context is meaning, so unread content will be easier to evaluate
  2. The work of analyzing content and sorting it into the best view, using visualization. As Pirsig said, “Instead of asking ‘Where does this metaphysics of the universe begin?’ – which was a virtually impossible question – all he had to do was just hold up two slips and ask, ‘Which comes first?'” This work builds of a table of contents, a hierarchical view of the content. Lila will show multiple views so the author can choose the best one.
  3. The ability to uncover bias and ensure completeness of thought. Author bias may filter out content when reading, but Lila will compel a writer to notice relevant content.

Lila’s cognitive abilities depend on the author’s engagement in a writing project, generating content that guides the above work. Lila is designed expressly for the writing of non-fiction; poetry, not so much. The cognitive work is performed in most kinds of writing, and so Lila will aid with other kinds of fiction. Both fiction and creative non-fiction still require substantial stylistic work after Lila has done her part.

CrossRef: CrossRef Indicators

planet code4lib - Fri, 2015-01-23 21:13

Updated January 20, 2015

Total no. participating publishers & societies 5736
Total no. voting members 3022
% of non-profit publishers 57%
Total no. participating libraries 1926
No. journals covered 37,469
No. DOIs registered to date 71,820,143
No. DOIs deposited in previous month 648,271
No. DOIs retrieved (matched references) in previous month 46,260,320
DOI resolutions (end-user clicks) in previous month 134,057,984

CrossRef: New CrossRef Members

planet code4lib - Fri, 2015-01-23 21:06

Updated January 20, 2015

Voting Members
All-Russia Petroleum Research Exploration Institute (VNIGRI)
Barbara Budrich Publishers
Botanical Research Institute of Texas
Faculty of Humanities and Social Sciences, University of Zagreb
Graduate Program of Management and Business, Bogor Agricultural University
IJSS Group of Journals
IndorSoft, LLC
Innovative Pedagogical Technologies LLC
International Network for Social Network Analysts
Slovenian Chemical Society
Subsea Diving Contractor di Stefano Di Cagno Publisher
The National Academies Press
Wisconsin Space Grant Consortium

Represented Members
Artvin Coruh Universitesi Orman Fakultesi Dergisi
Canadian Association of Schools of Nursing
Indian Society for Education and Environment
Journal for the Education of the Young Scientist and Giftedness
Kastamonu University Journal of Forestry Faculty
Korean Society for Metabolic and Bariatric Surgery
Korean Society of Acute Care Surgery
The Korean Ophthalmological Society
The Pharmaceutical Society of Korea
Uludag University Journal of the Faculty of Engineering
YEDI: Journal of Art, Design and Science

Last updated January 12, 2015

Voting Members
Association of Basic Medical Sciences of FBIH
Emergent Publications
Kinga - Service Agency Ltd.
Particapatory Educational Research (Per)
Robotics: Science and Systems Foundation
University of Lincoln, School of Film and Media and Changer Agency
Uniwersytet Przyrodniczy w Poznaniu (Poznan University of Life Sciences)
Voronezh State University
Wyzsza Szkola Logistyki (Poznan School of Logistics)

Represented Members
Journal of the Faculty of Engineering and Architecture of Gazi University
Korean Insurance Academic Society
Korean Neurological Association
Medical Journal of Suleyman Demirel University

CrossRef: Upcoming CrossRef Webinars

planet code4lib - Fri, 2015-01-23 20:38

Introduction to CrossCheck
Date: Tuesday, Jan 27, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Text and Data Mining
Date: Thursday, Jan 29, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Technical Basics
Date: Wednesday, Feb 11, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Patricia Feeney

CrossCheck: iThenticate Admin Webinar
Date: Thursday, Feb 19, 2015
Time: 7:00 am (San Francisco), 10:00 am (New York), 3:00 pm (London)
Moderator: iThenticate

Introduction to CrossRef
Date: Wednesday, Mar 4, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Patricia Feeney

Introduction to CrossCheck
Date: Tuesday, Mar 17, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 3:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Technical Basics
Date: Wednesday, Mar 18, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 3:00 pm (London)
Moderator: Patricia Feeney

Introduction to CrossRef Text and Data Mining
Date: Thursday, Mar 19, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 3:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossCheck
Date: Tuesday, May 5, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Text and Data Mining
Date: Thursday, May 7, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossCheck
Date: Tuesday, July 21, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Introduction to CrossRef Text and Data Mining
Date: Thursday, July 23, 2015
Time: 8:00 am (San Francisco), 11:00 am (New York), 4:00 pm (London)
Moderator: Rachael Lammey

Ed Summers: Library of Alexandria v2.0

planet code4lib - Fri, 2015-01-23 19:16

In case you missed Jill Lepore has written a superb article for the New Yorker about the Internet Archive and archiving the Web in general. The story of the Internet Archive is largely the story of its creator Brewster Kahle. If you’ve heard Kahle speak you’ve probably heard the Library of Alexandria v2.0 metaphor before. As a historian Lepore is particularly tuned to this dimension to the story of the Internet Archive:

When Kahle started the Internet Archive, in 1996, in his attic, he gave everyone working with him a book called “The Vanished Library,” about the burning of the Library of Alexandria. “The idea is to build the Library of Alexandria Two,” he told me. (The Hellenism goes further: there’s a partial backup of the Internet Archive in Alexandria, Egypt.)

I’m kind of embarrassed to admit that until reading Lepore’s article I never quite understood the metaphor…but now I think I do. The Web is on fire and the Internet Archive is helping save it, one HTTP request and response at a time. Previously I couldn’t get the image of this vast collection of Web content that the Internet Archive is building as yet another centralized collection of valuable material that, as with v1.0, is vulnerable to disaster but more likely, as Heather Phillips writes, creeping neglect:

Though it seems fitting that the destruction of so mythic an institution as the Great Library of Alexandria must have required some cataclysmic event like those described above – and while some of them certainly took their toll on the Library – in reality, the fortunes of the Great Library waxed and waned with those of Alexandria itself. Much of its downfall was gradual, often bureaucratic, and by comparison to our cultural imaginings, somewhat petty.

I don’t think it can be overstated: like the Library of Alexandria before it, the Internet Archive is an amazingly bold and priceless resource for human civilization. I’ve visited the Internet Archive on multiple occasions, and each time I’ve been struck by how unlikely it is that such a small and talented team have been able to build and sustain a service with such impact. It’s almost as if it’s too good to be true. I’m nagged by the thought that perhaps it is.

Herbert van de Sompel is quoted by Lepore:

A world with one archive is a really bad idea.

Van de Sompel and his collaborator Michael Nelson have repeatedly pointed out just how important it is for there to be multiple archives of Web content, and for there to be a way for them to be discoverable, and work together. Another thing I learned from Lepore’s article is that Brewster’s initial vision for the Internet Archive was much more collaborative, which gave birth to the International Internet Preservation Consortium, which is made up of 32 member organizations who do Web archiving.

A couple weeks ago one prominent IIPC member, the California Digital Library announced that it was retiring its in house archiving infrastructure and out sourcing its operation to ArchiveIt, which is the subscription web archiving service from the Internet Archive.

The CDL and the UC Libraries are partnering with Internet Archive’s Archive-It Service. In the coming year, CDL’s Web Archiving Service (WAS) collections and all core infrastructure activities, i.e., crawling, indexing, search, display, and storage, will be transferred to Archive-It. The CDL remains committed to web archiving as a fundamental component of its mission to support the acquisition, preservation and dissemination of content. This new partnership will allow the CDL to meet its mission and goals more efficiently and effectively and provide a robust solution for our stakeholders.

I happened to tweet this at the time:

good news for ArchiveIt and CDL, but probably bad news for web archiving in general

— Ed Summers (@edsu)

January 14, 2015

Which at least inspired some mirth from Jason Scott, who is an Internet Archive employee, and also a noted Internet historian and documentarian.

@edsu bwa ha ha

— Jason Scott (@textfiles)

January 14, 2015

Jason is also well known for his work with ArchiveTeam, which quickly mobilizes volunteers to save content on websites that are being shutdown. This content is often then transferred to the Internet Archive. He gets his hands dirty doing the work, and inspires others to do the same. So I deserved a bit of derisive laughter for my hand-wringing.

But here’s the thing. What does it mean if one of the pre-eminent digital library organizations needs to outsource their Web archiving operation? And what if, as the announcement indicates, Harvard, MIT, Stanford, UCLA, and others might not be far behind. Should we be concerned that the technical expertise and infrastructure for doing this work is becoming consolidated in a single organization? What does it say about our Web archiving tools that it is more cost-effective for CDL to outsource this work?

The situation isn’t as dire as it might sound since ArchiveIt subscribers retain the right to download their content and store it themselves. How many institutions do that with regularity isn’t well known (at least to me). But Web content isn’t like paper that you can put in a box, in a climate controlled room, and return to years hence. As Matt Kirschenbaum has pointed out:

the preservation of digital objects is logically inseparable from the act of their creation — the lag between creation and preservation collapses completely, since a digital object may only ever be said to be preserved if it is accessible, and each individual access creates the object anew

Can an organization download their WARC content, not provide any meaningful access to it, and say that it is being preserved? I don’t think so. You can’t do digital preservation without thinking about some kind of access to make sure things are working and people can use the stuff. If the content you are accessing is on a platform somewhere else that you have no control over you should probably be concerned.

I’m hopeful that this collaboration between CDL and ArchiveIt, and other organizations, will lead to a fruitful collaboration and improved tools. But I’m worried that it will mean organizations can simply outsource the expertise and infrastructure of web archiving, while helping reinforce what is already a huge single point of failure. David Rosenthal of Stanford University notes that diversity is a vital component to digital preservation:

Media, software and hardware must flow through the system over time as they fail or become obsolete, and are replaced. The system must support diversity among its components to avoid monoculture vulnerabilities, to allow for incremental replacement, and to avoid vendor lock-in.

I’d like to see more Web archiving classes in iSchools and computer science departments. I’d like to see improved and simplified tools for doing the work of Web archiving. Ideally I’d like to see more in house crawling and access of web archives, not less. I’d like to see more organizations like the Internet Archive that are not just technically able to do this work, but are also bold enough to collect what they think is important to save on the Web and make it available. If we can’t do this together I think the Library of Alexandria metaphor will be all too literal.

Islandora: Islandora Conference: Registration Now Open

planet code4lib - Fri, 2015-01-23 18:52

The Islandora Foundation is thrilled to invite you to the first-ever Islandora Conference, taking place August 3 - 7, 2015 in the birthplace of Islandora: Charlottetown, PEI.

This full week event will consist of sessions from the Islandora Foundation, Interest groups, community presentations, two full days of hands-on Islandora training, and will end with a Hackfest where we invite you to make your mark in the Islandora code and work together with your fellow Islandorians to complete projects selected by the community.

Our theme for the conference is Community - the Islandora community, the community of people our institutions serve, the community of researchers and librarians and developers who work together to curate digital assets, and the community of open source projects that work together and in parallel.

Registration is now open, with an Early Bird rate available until the end of March. Institutional rates are also available for groups of three or more.

For more information or to sign up for the conference, please visit our conference website:

Thank you,

The Islandora Team

M. Ryan Hess: Your Job Has Been Robot-sourced

planet code4lib - Fri, 2015-01-23 18:15

“People are racing against the machine, and many of them are losing that race…Instead of racing against the machine, we need to learn to race with the machine.”

- Erik Brynjolfsson, Innovation Researcher

Libraries are busy making lots of metadata and data networks. But who are we making this for anyway? Answer: The Machines

I spent the last week catching up on what the TED Conference has to say on robots, artificial intelligence and what these portend for the future of humans…all with an eye on the impact on my own profession: librarians.

A digest of the various talks would go as follows:

    • Machine learning and AI capabilities are advancing at an exponential rate, just as forecast
    • Robots are getting smarter and more ubiquitous by the year (Roomba, Siri, Google self-driving cars, drone strikes)

Machines are replacing humans at an increasing rate and impacting unemployment rates

The experts are personally torn on the rise of the machines, noting that there are huge benefits to society, but that we are facing a future where almost every job will be at risk of being taken by a machine. Jeremy Howard used words like “wonderful” and “terrifying” in his talk about how quickly machines are getting smarter (quicker than you think!). Erik Brynjolfsson (quoted above) shared a mixed optimism about the prospects this robotification holds for us, saying that a major retooling of the workforce and even the way society shares wealth is inevitable.

Personally, I’m thinking this is going to be more disruptive than the Industrial Revolution, which stirred up some serious feelings as you may recall: Unionization, Urbanization, Anarchism, Bolshevikism…but also some nice stuff (once we got through the riots, revolutions and Pinkertons): like the majority of the world not having to shovel animal manure and live in sod houses on the prairie. But what a ride!

This got me thinking about the end game the speakers were loosely describing and how it relates to libraries. In their estimation, we will see many, many jobs disappear in our lifetimes, including lots of knowledge worker jobs. Brynjolfsson says the way we need to react is to integrate new human roles into the work of the machines. For example, having AI partners that act as consultants to human workers. In this scenario (already happening in healthcare with IBM Watson), machines scour huge datasets and then give their advice/prognosis to a human, who still gets to make the final call. That might work for some jobs, but I don’t think it’s hard to imagine that being a little redundant at some point, especially when you’re talking about machines that may even be smarter than their human partner.

But still, let’s take the typical public-facing librarian, already under threat by the likes of an ever-improving Google. As I discussed briefly in Rise of the Machines, services like Google, IBM Watson, Siri and the like are only getting better and will likely, and possibly very soon, put the reference aspect of librarianship out of business altogether. In fact, because these automated information services exist on mobile/online environments with no library required, they will likely exacerbate the library relevance issue, at least as far as traditional library models are concerned.

Of course, we’re quickly re-inventing ourselves (read how in my post Tomorrow’s Tool Library on Steroids), but one thing is clear, the library as the community’s warehouse and service center for information will be replaced by machines. In fact, a more likely model would be one where libraries pool community resources to provide access to cutting-edge AI services with access to expensive data resources, if proprietary data even exists in the future (a big if, IMO).

What is ironic, is that technical service librarians are actually laying the groundwork for this transformation of the library profession. Every time technical service librarians work out a new metadata schema, mark up digital content with micro-data, write a line of RDF, enhance SEO of their collections or connect a record to linked data, they are really setting the stage for machines to not only index knowledge, but understand its semantic and ontological relationships. That is, they’re building the infrastructure for the robot-infused future. Funny that.

As Brynjolfsson suggests, we will have to create new roles where we work side-by-side with the machines, if we are to stay employed.

On this point, I’d add that we very well could see that human creativity still trumps machine logic. It might be that this particular aspect of humanity doesn’t translate into code all that well. So maybe the robots will be a great liberation and we all get to be artists and designers!

Or maybe we’ll all lose our jobs, unite in anguish with the rest of the unemployed 99% and decide it’s time the other 1% share the wealth so we can all, live off the work of our robots, bliss-out in virtual reality and plan our next vacations to Mars.

Or, as Ray Kurzweil would say, we’ll just merge with the machines and trump the whole question of unemployment, let alone mortality.

Or we could just outlaw AI altogether and hold back the tide permanently, like they did in Dune. Somehow that doesn’t seem likely…and the machines probably won’t allow it. LOL

Anyway, food for thought. As Yoda said: “Difficult to see. Always in motion is the future.”

Meanwhile, speaking of movies…

If this subject intrigues you, Hollywood is also jumping into this intellectual meme, pushing out several robot and AI films over the last couple years. If you’re interested, here’s my list of the ones I’ve watched, ordered by my rating (good to less good).

  1. Her: Wow! Spike Jonze gives his quirky, moody, emotion-driven interpretation of the AI question. Thought provoking and compelling in every regard.
  2. Black Mirror, S02E01 – Be Right Back: Creepy to the max and coming to a bedroom near you soon!
  3. Automata: Bleak but interesting. Be sure NOT to read the expository intro text at the beginning. I kept thinking this was unnecessary to the film and ruined the mystery of the story. But still pretty good.
  4. Transcendence: A play on Ray Kurzwell’s singularity concept, but done with explosions and Hollywood formulas.
  5. The Machine: You can skip it.

Two more are on my must watch list: Chappie and Ex Machina, both of which look like they’ll be quality films that explore human-robot relations. They may be machines, but I love when we dress them up with emotions…I guess that’s what you should expect from a human being. :)

FOSS4Lib Updated Packages: Repox

planet code4lib - Fri, 2015-01-23 15:45

Last updated January 23, 2015. Created by Peter Murray on January 23, 2015.
Log in to edit this page.

REPOX is a framework to manage data spaces. It comprises several
channels to import data from data providers, services to transform
data between schemas according to user's specified rules, and
services to expose the results to the exterior.
This tailored version of REPOX aims to provide to all the TEL and Europeana partners a
simple solution to import, convert and expose their bibliographic data via
OAI-PMH, by the following means:

  • Cross platform
    It is developed in Java, so it can be deployed in any
    operating system that has an available Java virtual machine.
  • Easy deployment
    It is available with an easy installer, which includes
    all the required software.
  • Support for several data formats and encodings
    It supports UNIMARC and MARC21 schemas, and encodings in ISO 2709 (including several variants),
    MarcXchange or MARCXML. During the course of the TELplus project, support
    will be added for other possible encodings required by the partners.
  • Data crosswalks
    It offers crosswalks for converting UNIMARC and MARC21 records to simple
    Dublin Core as also to TEL-AP (TEL Application
    Profile). A simple user interface makes it possible to customize these
    crosswalks, and create new ones for other formats.
Package Type: Metadata Manipulation Package Links Development Status: Production/Stable Releases for Repox Operating System: Browser/Cross-PlatformTechnologies Used: Dublin CoreMARC21MARCXMLOAITomcatProgramming Language: JavaDatabase: MySQLPostgreSQLOpen Hub Link: Hub Stats Widget: 

FOSS4Lib Upcoming Events: 2015 VIVO implementation Fest

planet code4lib - Fri, 2015-01-23 15:31
Date: Monday, March 16, 2015 - 08:00 to Wednesday, March 18, 2015 - 17:00Supports: Vivo

Last updated January 23, 2015. Created by Peter Murray on January 23, 2015.
Log in to edit this page.

The i-Fest will be held March 16-18 and is being hosted by the Oregon Health and Science University Library in Portland, Oregon.

For further details about the i-Fest program, registration, travel, and accommodations, visit the blog post on at


Subscribe to code4lib aggregator