You are here

Feed aggregator

Mark E. Phillips: Metadata Edit Events: Part 3 – What

planet code4lib - Mon, 2015-03-30 02:16

This is the third post in a series related to metadata event data that we collected from January 1, 2014 to December 31, 2014 for the UNT Libraries Digital Collections.  We collected 94,222 metadata editing events during this time.

The first post was about the when of the events,  when did they occur, what day of the week and what day of the week the occurred.

The second post touched on the who of the events,  who were the main metadata editors, how were edits distributed among the different users, and how the number of years per month, day, hour were distributed.

This post will look at the what of the events data.  What were the records that were touched,  what collections or partners did they belong to and so on.

Of the total 94,222 edit events there were 68,758 unique metadata records edited.

By using the helpful st program we can quickly get the statistics for these 68,758 unique metadata records.  By choosing the “complete” stats we get the following data.

N min q1 median q3 max sum mean stddev stderr 68,758 1 1 1 1 45 94,222 1.37034 0.913541 0.0034839

With this we can see that there is a mean of 1.37 edits per record over the entire dataset with the maximum number of edits for a record being 45.

The total distribution of number of edits-per-record a presented in the table below.

Number of Edits Instances 1 53,213 2 9,937 3 3,519 4 1,089 5 489 6 257 7 111 8 60 9 30 10 13 11 14 12 7 13 5 14 5 15 1 16 2 17 1 19 1 21 1 26 1 30 1 45 1

From the 68,758 records edited,  53,213 (77%) of the records were only edited once, with two and three edits per record edit 9,937 (14%),  and 3,519 (5%) respectively. From there things level out very quickly to under 1% of the records.

When indexing these edit events in Solr I also merged the events with additional metadata from the records.  By doing so we have a few more facets to take a look at, specifically how the edit events are distributed over partner, collection, resource type and format.


There are 167 partner institutions represented in the edit event dataset.

The top ten partners by the number of edit events is presented in the table below.

Partner Code Partner Name Edit Count Unique Records Edited Unique Collections UNTGD UNT Libraries Gov Docs Department 21,932 14,096 27 OKHS Oklahoma Historical Society 10,377 8,801 34 UNTA UNT Libraries Special Collections 9,481 6,027 25 UNT UNT Libraries 7,102 5,274 27 PCJB Private Collection of Jim Bell 5,504 5,322 1 HMRC Houston Metropolitan Research Center at Houston Public Library 5,396 2,125 5 HPUL Howard Payne University Library 4,531 4,518 4 UNTCVA UNT College of Visual Arts and Design 4,296 3,464 5 HSUL Hardin-Simmons University Library 2,765 2,593 6 HIGPL Higgins Public Library 1,935 1,130 3

In addition to the number of edit events,  I have added a column for the number of unique records for each of the institutions.  The same data is presented in the graph below.

Graph showing the edit event count and unique record count for each of the institutions with the most edit events

The larger the difference between the Edit Count and the Unique Records Edited represents more repetitive edits of the same records by that partner.

The final column in the table above shows the number of different collections that were edited that belong to each specific partner.  Taking UNTGD as an example, there are 27 different collection that held records that were edited during the year.

Collection Code Collection Name Edit Events Records Edited TLRA Texas Laws and Resolutions Archive 8,629 5,187 TXPT Texas Patents 7,394 4,636 TXSAOR Texas State Auditor’s Office: Reports 2,724 1,223 USCMC United States Census Map Collection 1,779 1,695 USTOPO USGS Topographic Map Collection 490 458 TRAIL Technical Report Archive and Image Library 287 279 CRSR Congressional Research Service Reports 271 270 FCCRD Federal Communications Commission Record 211 208 NACA National Advisory Committee for Aeronautics Collection 62 62 WWPC World War Poster Collection 49 49 WWI World War One Collection 41 41 USDAFB USDA Farmers’ Bulletins 21 19 ATOZ Government Documents A to Z Digitization Project 19 18 WWII World War Two Collection 19 19 ACIR Advisory Commission on Intergovernmental Relations 14 13 NMAP World War Two Newsmaps 12 12 TR Texas Register 12 8 TXPUB Texas State Publications 12 12 GAORT Government Accountability Office Reports 10 10 BRAC Defense Base Closure and Realignment Commission 4 4 OTA Office of Technology Assessment 4 4 GDCC CyberCemetery 2 2 FEDER Federal Communications Commission Record 1 1 GSLTX General and Special Laws of Texas 1 1 TXHRJ Texas House of Representatives Journals 1 1 TXSS Texas Soil Surveys 1 1 UNTGOV Government Documents General Collection 1 1

This is set of data that is a bit easer to see with a simple graph.  I’ve plotted the ratio of records and the number of edit events to a simple line graph.

UNT Government Documents Edits to Record Ratios for each collection.

You can look at the graph above and quickly see which of the collections have had a higher edit-to-record ratio with the Texas State Auditor’s Office: Reports being the most number of edits per record with a ratio of over 2 edits per record for that collection.  Many of the other collections are much closer to 1 where there would be one edit per record.


The edit events occur in 266 different collections in the UNT Libraries’ Digital Collections.  As with the 167 partners above,  that is too many to stick into a table so I’m going to just list the top ten of them for us in the table below.

Collection Code Collection Name Edit Events Unique Records TLRA Texas Laws and Resolutions Archive 8,629 5,187 ABCM Abilene Library Consortium 8,481 8,060 TDNP Texas Digital Newspaper Program 7,618 6,305 TXPT Texas Patents 7,394 4,636 OKPCP Oklahoma Publishing Company Photography Collection 5,799 4,729 JBPC Jim Bell Texas Architecture Photograph Collection 5,504 5,322 TCO Texas Cultures Online 5,490 2,208 JJHP John J. Herrera Papers 5,194 1,996 UNTETD UNT Theses and Dissertations 4,981 3,704 UNTPC University Photography Collection 4,509 3,232

Again plotting the ratio of edit events to the number of unique records gives us the graph below.

Edit Events to Record Ratio grouped by Collection

You can quickly see the two collections that averaged over two edit events for each of the records that were edited during the last year,  meaning if a record was edited,  most likely it was edited at least two times.  Other collections like the Jim Bell Photography Collection or the Abilene Library Consortium Collection appear to have only been edited one time per record on average,  so when the edit was complete, it wasn’t revisited for additional editing.

Resource Type

The UNT Libraries makes use of a locally controlled vocabulary for its resource types.  You can view all of the available resource types here .

If you group the edit events and associated edit events by the resource type you will get the following table.

Resource Type Edit Events Unique Records image_photo 31,702 24,384 text_newspaper 11,598 10,176 text_leg 8,633 5,191 text_patent 7,480 4,667 physical-object 5,591 4,921 text_etd 4,986 3,709 text 4,311 2,511 text_letter 4,276 2,136 image_map 3,542 3,160 text_report 3,375 1,822 image_artwork 1,217 1,042 text_article 1,060 758 video 931 461 sound 719 694 text_legal 687 341 text_journal 549 288 text_book 476 422 image_presentation 430 313 image_postcard 429 180 image_poster 427 321 text_paper 423 312 text_pamphlet 303 199 text_clipping 275 149 text_yearbook 91 66 dataset 54 19 image_score 49 37 collection 41 34 image 34 20 website 22 20 text_chapter 17 14 text_review 13 11 text_poem 3 1 specimen 1 1

By calculating the edit-event-to-record ratio and plotting that you get the following graph.

Edit Events to Record Ratio grouped by Resource Type.

In the graph above I presented the data in the same order as it appears in the table just above the chart.  You can see that the highest ratio is for our text_poem record that was edited three different times.  Other notably high ratios are for postcards and datasets though there are several others that are at or close to 2 to 1 ratio of edits to records.


The final way we are going to look at the “what” data is by Format.  Again the UNT Libraries uses a controlled vocabulary for the format which you can look at here.  I’ve once again facetted on the format field and presented the total number of edit events and then unique records for each of the five format types that we have in the system.

Format Edit Events Unique Records text 48,580 32,770 image 43,477 34,436 video 931 461 audio 720 695 website 22 20

Converting the ratio of events-to-records into a bar graph results in the graph below.

Edit Events to Record Ratio grouped by Format

It looks like we edit video files more times per record than any of the other types with text and then image coming in behind.


There are almost endless combinations of collections, partners, resource types, and formats that can be put together and it deserves some further analysis to see if there are patters that we should pay attention to present in the data.  But that’s more for another day.

This is the third in a series of posts related to metadata edit events in the UNT Libraries’ Digital Collections.  check back for the next installment.

As always feel free to contact me via Twitter if you have questions or comments.

DuraSpace News: TOMORROW: Washington D.C. Fedora User Group Meeting, March 31 - April 1

planet code4lib - Mon, 2015-03-30 00:00

Washington, DC  The Washington D.C. Fedora User Group Meeting will get underway tomorrow, Mar. 31 at the USDA National Agriculture Library. Day one presentations include updates on DuraSpace and Fedora 4, Fedora at the National Agriculture Library, Fedora at the University of Maryland Libraries, an Islandora Update and Specifying the Fedora API, and Short Presentations and a Project Roundtable. View the agenda here.

DuraSpace News: TOMORROW: Washington D.C. Fedora User Group Meeting, March 31 - April 1

planet code4lib - Mon, 2015-03-30 00:00

Washington, DC  The Washington D.C. Fedora User Group Meeting will get underway tomorrow, Mar. 31 at the USDA National Agriculture Library. Day one presentations include updates on DuraSpace and Fedora 4, Fedora at the National Agriculture Library, Fedora at the University of Maryland Libraries, an Islandora Update and Specifying the Fedora API, and Short Presentations and a Project Roundtable. View the agenda here.

Mita Williams: The Setup

planet code4lib - Sun, 2015-03-29 21:33
For this post, I’m going to pretend that the editors of the blog, The Setup (“a collection of nerdy interviews asking people from all walks of life what they use to get the job done”) asked me for a contribution. But in reality, I’m just following Bill Denton’s lead.

It feels a little self-indulgent to write about one’s technology purchases so before I describe my set up, let me explain why I’m sharing this information.

Some time back, in preparation for a session I was giving on Zotero for my university’s annual  technology conference, I realized that before going into the reasons how to use Zotero, I had to address the reasons why. I recognized that I was asking students and faculty who were likely already time-strapped and overburdened, to abandon long-standing practices that were already successfully working for them if they were going to switch to Zotero for their research work.

Before my presentation, I asked on Twitter when and why faculty would change their research practices.  Most of the answers were on the cynical side but there were some that gave me some room to maneuver, namely this one: “when I start a new project.”  And there’s a certain logic to this approach. If you were starting graduate school and know that you have to prepare for comps and generate a thesis at the end of the process, wouldn’t you want to conscientiously design your workflow at the start to capture what you learn in such a way that it’s searchable and reusable?

My own sabbatical is over and oddly enough, it is now at the end of my sabbatical in which I feel the most like I’m starting all over again in my professional work. So I’m using that New Project feeling to fuel some self-reflection in my own research process, bring some mindfulness to my online habits, and deliberate design into My Setup.

There’s another reason why I’m thinking about the deliberate design of research practice. As libraries start venturing into the space of research service consultation, I believe that librarians need to follow best practices for ourselves if we hope to develop expertise in this area.

As well, I think we need to more conscious of how and when our practices are not in line with our values. It’s simply not possible to live completely without hypocrisy in this complicated world but that doesn’t mean we can’t strive for praxis. It’s difficult for me to take seriously accusations that hackerspaces are neoliberal when it’s being stated by a person cradling a  Macbook or iPhone. That being said, I greatly rely on products from Microsoft, Amazon, and Google so I'm in no position to cast stones.

I just want to care about the infrastructures we’re building….

And with that, here’s my setup!


There are three computers that I spend my time on: the family computer in the kitchen (a Dell desktop running Windows 7), my work computer (another Dell desktop running Windows 7), and my Thinkpad X1 Carbon laptop which I got earlier this year.  Grub turned my laptop into a dual boot machine that I can switch between Ubuntu and Windows 7. I feel I need a Windows environment so I can run any ESRI products and all those other Mac/Windows only products if need be.

I have a Nexus 4 Android phone made by LG and a Kindle DX as my ebook reader. I don’t own a tablet or an mp3 player.

Worldbackup Day is March 31st. I need to get myself an external drive for backups (Todo1).


After getting my laptop, the first thing I did was investigated password managers to find which one would work best for me. I ended up choosing LastPass and I felt the benefits immediately. Using a password manager has saved me so much pain and aggravation and now my passwords are now (almost) all unique. Next, I need to set up two factor authentication for the services that I haven’t gotten around to yet (Todo2).  

With work being done on three computers, it’s not surprising that I have a tendency to work online. My browser of choice is Mozilla but I will flip to Chrome from time to time. I use the sync functionality on both so my bookmarks are the automatically updated and the same across devices. I use SublimeText for my text editor for code, GIMP as my graphics editor, and QGIS for my geospatial needs.

This draft, along with much of my other writing and presentations are on Google Drive. I spend much of my time in Gmail and Google Calendar. While years ago, I downloaded all my email using Mozilla Thunderbird, I have not set up a regular backup strategy for these documents (Todo3). I’ve toyed with using Dropbox to back up Drive but think I’m better with an external drive. I have a Dropbox account because people occasionally share documents with me through it but at the moment, I only use it to backup my kids Minecraft games.

From 2007 to 2013, I used delicious to capture and share the things I read online. Then delicious tried to be the new Pinterest and made itself unusable (although it has since reverted back to close to its original form) and so I switched to Evernote (somewhat reluctantly because I missed the public aspect of sharing bookmarks).   I’ve grown to be quite dependent on Evernote to save my outboard brain. I use IFTTT to post the links from my Twitter faves to delicious which are then imported automatically into Evernote.  I also use IFTTT to automatically backup my Tumblr posts to Evernote, my Foursquare check-ins saved to Evernote (and Google Calendar) and my Feedly saved posts to Evernote. Have I established a system to back up my Evernote notes on a regular basis? No, no I have not (Todo4).

The overarching idea that I have come up with is that the things I write are backed up on my Google Drive account and the library of things that I have read or saved to future reading (ha!) are saved on Evernote.  To this end, I use IFTTT to save my Tweets to a Google Spreadsheet and my Blogger and WordPress posts are automatically saved to Google Drive (still in a work in progress. Todo 5). My ISP is Dreamhost but I am tempted to jump ship to Digital Ocean.

My goal is to have at least one backup for the things I’ve created. So I use IFTTT to save my Instagram posts to Flickr. My Flickr posts are just a small subset of all the photos that are automatically captured and saved on Google Photos.  No, I have not backed up these photos  (Todo 6) but I have, since 2005, printed the best of my photos on an annual basis into beautiful softcover books using QOOP and then later, through Blurb.  My Facebook photos and status updates from 2006 to 2013 have been printed in a lovely hardcover book using MySocialBook.  One day I would like to print a book of the best of my blogged writings using Blurb, if just as a personal artifact.

Speaking of books, because I’m one of the proud and the few to own a KindleDX, I use it to read PDFs and most of my non-fiction reading. When I stumble upon a longread on the web, I use Readability’s Send to Kindle function so I can read it later without eyestrain. I’m inclined to buy the books that I used in my writing and research as Kindle ebooks because I can easily attach highlighted passages from these books to my Zotero account. My ebooks are backed up in my calibre library. I also use Goodreads to keep track of my reading because I love knowing what my friends are into.

I subscribe to Rdio and for those times that I actually spend money on owning music, I try to use Bandcamp. I’m an avid listener of podcasts and for this purpose use BeyondPod. Our Sonos system allows us to play music from all these services, as well as TuneIn, in the living room.  The music that I used to listen to on CD is now sitting on an unused computer running Windows XP and I know if I don’t get my act together and transfer those files to an external drive soon those files will be gone for good.. if they haven’t already become inaccessible (*gulp*) (Todo 8).

For my “Todo list” I use Google Keep, which also captures my stray thoughts when I’m away from paper or my computer. Google Keep has an awesome feature that will trigger reminders based on your location.

So that’s My Setup. Let me know if you have any suggestions or can see some weaknesses in my workflow. Also, I’d love to learn from your Setup.

And please please please call me out if I don’t have a sequel to this post called The Backup by the time of next year's World Backup Day.

Nicole Engard: Bookmarks for March 29, 2015

planet code4lib - Sun, 2015-03-29 20:30

Today I found the following resources and bookmarked them on Delicious.

Digest powered by RSS Digest

The post Bookmarks for March 29, 2015 appeared first on What I Learned Today....

Related posts:

  1. No more Delicious?
  2. Can you say Kebberfegg 3 times fast
  3. Are you backing up?

John Miedema: I’m a bit of a classification nut. It comes from my Dutch heritage. How do you organize files and emails into folders?

planet code4lib - Sat, 2015-03-28 18:05

I’m a bit of a classification nut. It comes from my Dutch heritage — those Dutchies are always trying to be efficient with their tiny bits of land. It’s why I’m drawn to library science too. I think a lot about the way I organize computer files and emails into folders. It provides insight into the way all classification works, and of course ties into my Lila project. I’d really like to hear about your own practices. Here’s mine:

  1. Start with a root folder. When an activity starts, I put a bunch of files into a root folder (e.g., a Windows directory or a Gmail label).
  2. Sort files by subject or date. As the files start to pile up in a folder, I find stuff by sorting files by subject or date using application sorting functions (e.g., Windows Explorer).
  3. Group files into folders by subject. When there are a lot of files in a folder, I group files into different folders. The subject classification is low level, e.g, Activity 1, Activity 2. Activities that are expire are usually grouped together into an ‘archive’ folder.
  4.  Develop a model. Over time the folder and file structure can get complex, making  it hard to find stuff. I often resort to search tools. What helps is developing a model that reflects my work. E.g., Client 1, Client 2. Different levels correspond to my workflow, E.g., 1. Discovery, 2. Scoping, 3. Estimation, etc. The model is really a taxonomy, an information architecture. I can use the same pattern for each new activity.
  5. Classification always requires tinkering. I’ve been slowly improving the way I organize files into folders for as long as I’ve been working. Some patterns get reused over time, others get improved. Tinkering never ends.

(I will discuss the use of tagging later. Frankly, I find manual tagging hopeless.)

Mark E. Phillips: Metadata Edit Events: Part 2 – Who

planet code4lib - Sat, 2015-03-28 15:53

In the previous post I started to explore the metadata edit events dataset generated from 94,222 edit events from 2014 for the UNT Libraries’ Digital Collections.  I focused on some of the information about when these edits were performed.

This post focuses on the “who” of the dataset.

All together we had 193 unique users edit metadata for one of the systems that comprise the UNT Libraries’ Digital Collections.  This includes The Portal to Texas History, UNT Digital Library, and the Gateway to Oklahoma History.

The top ten most frequent editors of metadata in the system are responsible for 57% of the overall edits.

Username Edit Events htarver 15,451 aseitsinger 10,105 twarner 4,655 mjohnston 4,143 atraxinger 3,905 cwilliams 3,490 sfisher 3,466 thuang 3,327 mphillips 2,669 sdillard 2,518

The overall distribution of edits per user looks like this.

Distribution of edits per user for the Edit Event Dataset

As you can see it shows the primary users of the system and then very quickly tapers down to the “long tail” of users who have a lower number of edit events.

A quick look at the total number of users active for given days of the week across the entire dataset.

Sun Mon Tue Wed Thu Fri Sat 40 95 122 122 123 97 39

There is a swell for Tue, Wed, and Thu in the table above.  It seems to be pretty consistent, either you have 39,40 users, 95-97 users, or 122-123 unique users on a given day of the week.

In looking at how unique users were spread across the year, grouped into months,  we got the following table and then graph.

Month Unique Users January 54 February 73 March 64 April 61 May 44 June 40 July 48 August 50 September 50 October 84 November 49 December 36

Unique Editors Per Month

There were some spikes throughout the year,  most likely related to a metadata class in the UNT College of Information that uses the Edit system as part of their teaching.  This is the October and February spikes in number of unique users.  Other than that we are a consistently over 40 unique users per month with a small dip for the December holiday season when school is not is session.

In the previous post we had a heatmap with the number of edit events distributed over the hours of the day and the days of the week.  I’ve included that graph below.

94,222 edit events plotted to the time and day they were performed

I was curious to see how the unique number of editors mapped to this same type of graph,  so that is included below.

Unique editors distribution across day of the week and hour of the day.

User Status

Of the 193 unique metadata editors in the dataset, 135 (70%) of the users were classified as Non-UNT-Employee and  58 (30%) were classified as UNT-Employee. For the edit events themselves, 75,968 (81%) were completed by users classified with a status of UNT-Employee  and 18,254 (19%) by users classified with the status of Non-UNT-Employee.

User Rank Rank Edit Events Percentage of Total Edits (n=94,222) Unique Users Percentage of Total Users (n=193) Librarian 22,466 24% 16 8% Staff 12,837 14% 13 7% Student 41,800 44% 92 48% Unknown 17,119 18% 72 37%

You can see that 44% of all of the edits in the dataset were completed by users who were students. Librarians and Staff members accounted for 38% of the edits.

This is the second in a series of posts related to metadata edit events in the UNT Libraries’ Digital Collections.  check back for the next installment.

As always feel free to contact me via Twitter if you have questions or comments.

Ed Summers: The Adventure of Experiment

planet code4lib - Sat, 2015-03-28 11:50

Love of certainty is a demand for guarantees in advance of action. Ignoring the fact that truth can be bought only by the adventure of experiment, dogmatism turns truth into an insurance company. Fixed ends upon one side and fixed “principles” — that is authoritative rules — on the other, are props for a feeling of safety, the refuge of the timid, and the means by which the bold prey upon the timid.

John Dewey in Human Nature and Conduct (p. 237)

Nicole Engard: Bookmarks for March 27, 2015

planet code4lib - Fri, 2015-03-27 20:30

Today I found the following resources and bookmarked them on Delicious.

Digest powered by RSS Digest

The post Bookmarks for March 27, 2015 appeared first on What I Learned Today....

Related posts:

  1. Herding Cattle
  2. Google Floor Plans
  3. Planning to Travel?

DPLA: DPLAfest in Light of SEA 101

planet code4lib - Fri, 2015-03-27 19:18

In my social media feeds yesterday, I saw some friends and acquaintances say that they were reconsidering their attendance at DPLAfest, scheduled to be held in Indianapolis, IN, April 17-18, in light of the recent signing of SEA 101, or the “Religious Freedom Restoration Act,” into law by Governor Pence of Indiana.  I must admit that as an openly gay employee at DPLA, I had an immediate and strong negative reaction.  I was unhappy about my organization spending money in a place that would allow businesses not to serve me simply because I am gay.

However, after more thought and a night of sleep, I have come to a different conclusion.  The passing of this law should make us all want to attend DPLAfest even more than we might have before.  We should want to support our hosts and the businesses in Indianapolis who are standing up against this law, and we should make it clear that our money will only be spent in places that welcome all.

At DPLA, we have already begun to diligently ensure that all the venues we are supporting welcome all of the DPLA staff and community.  Messages like these have already helped put our mind at ease about a number of our scheduled activities:













Stickers like the one below are going to help us know which businesses to support while we are in Indianapolis:










At DPLAfest, we will also have visible ways to show that we are against this kind of discrimination, including enshrining our values in our Code of Conduct.  We encourage you to use this as an opportunity to let your voice and your dollars speak.  Let’s use this as a time to support those businesses and venues that support true freedom, all while enjoying each other’s company and a great conference lineup!


Emily Gore

DPLA Director for Content

HangingTogether: Round of 16: The plot thickens … and so do the books

planet code4lib - Fri, 2015-03-27 17:29

OCLC Research Collective Collections Tournament


Our second round of competition is complete, and only eight conferences remain standing! And yes, our tournament Cinderella, Big South, is still with us! Details below, but here are the Round of 16 results:

[Click to enlarge]

Competition in this round was on book length – which conference has the thickest books?* Big South, continuing its magical tournament run, ended up with the thickest books of all the conferences, averaging about 292 pages and ousting the powerful Big Ten from the tournament! West Coast also continues on to the next round, with a convincing victory over the Ivy Leaguers! Summit League, Ohio Valley, Atlantic 10, Missouri Valley, and Big Sky will also move on to the Round of 8. Conference USA and American Athletic had the tightest battle, with Conference USA coming out on top by less than 10 pages!

While Big South had the thickest books of all the conferences competing in this round (averaging about 292 pages), the Ivy League had the thinnest books, averaging about 225 pages. Does this surprise you? It turns out that the larger the size of the collective collection, the thinner the books. Take a look at this:

[Click to enlarge]

Big South had the smallest collective collection among the conferences competing in this round; the Ivy League had the largest. As the chart shows, there is a pretty strong correlation between collection size, and the percentage of the collection accounted for by books with less than 100 pages. Got any ideas why? Put them in the comments!

By the way, in case you were wondering, the average length of a print book in WorldCat is about 255 pages.

Bracket competition participants: Remember, if the conference you chose has been ousted from the tournament, do not despair! If no one picked the tournament Champion, all entrants will be part of a random drawing for the big prize!

The Round of 8 is next, where the tournament field will be reduced to just four conferences! Results will be posted March 31.


*Average number of pages per print book in conference collective collection. Data is current as of January 2015.

[Click to enlarge]

More information:

Introducing the 2015 OCLC Research Collective Collections Tournament! Madness!

OCLC Research Collective Collections Tournament: Round of 32 Bracket Revealed!

Round of 32: Blow-outs, buzzer-beaters, and upsets!

About Brian Lavoie

Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.

Mail | Web | LinkedIn | More Posts (11)

Sean Chen: Waving a Dead Fish

planet code4lib - Fri, 2015-03-27 16:05

I’ve been using Vagrant & Virtualbox for development on my OS X machines for my solo projects. But in an effort to get an intern started up on developing a front-end to a project I started a while ago I ran into a really strange problem getting Vagrant working on Windows.

So as a tale of caution for whatever robot wants to pick up this bleg.

Bootcamp partition on a Mid-2010 MacBook Pro. Running a dormant OS X and a full Windows 7. The Windows 7 is the main environment:

Use the git bash shell since it has SSH to stand up the boxes with vagrant init, vagrant up.

And then stuck (similar to Vagrant stuck connection timeout retrying):

==> default: Clearing any previously set network interfaces... ==> default: Preparing network interfaces based on configuration... default: Adapter 1: nat default: Adapter 2: hostonly ==> default: Forwarding ports... default: 22 => 2222 (adapter 1) ==> default: Booting VM... ==> default: Waiting for machine to boot. This may take a few minutes... default: SSH address: default: SSH username: vagrant default: SSH auth method: private key default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying... default: Error: Connection timeout. Retrying...

Well we booted into the VM with a head and it looked like the booting got interrupted by some sort of kernal panic due to:

Spurious ACK on isa0060/serio0. Some program might be trying to access hardware directly.

Ok makes sense…the machine isn’t booting up and there has to be a reason why.

Long story short. The Windows 7 partition didn’t have virtualization enabled, and there is no BIOS setting or switch somewhere to do it. So what do you do:

How to enable hardware virtualization on a MacBook?

Like waving a dead fish in front of your computer.

  • Boot into OSX.
  • System Preferences > Select the Start Up preference pane
  • Select the Boot Camp partition with Windows
  • Restart into the Boot Camp partition
  • Magic

Go figure

FOSS4Lib Recent Releases: Goobi - 1.11.0

planet code4lib - Fri, 2015-03-27 14:10

Last updated March 27, 2015. Created by Peter Murray on March 27, 2015.
Log in to edit this page.

Package: GoobiRelease Date: Wednesday, March 25, 2015

LITA: Making LibGuides Into Library Websites

planet code4lib - Fri, 2015-03-27 12:00

Welcome to Part 2 of my two-part series introducing LibGuides CMS for use as a website. Read Part 1 (with comments from Springshare!). This companion piece was released February 27.

Why LibGuides?

LibGuides logo (© Springshare)

We can design surprisingly good websites with LibGuides 2.0 CMS. WordPress and Drupal are free and open source, but Springshare, the maker of LibGuides, also delivers reliable hosting and support for two grand a year. Moreover, even folks clueless about coding can quickly learn to maintain a LibGuides-based website because (1) the interface is drop-and-drag, fill-in-the-box intuitive, and (2) many academic librarians create research guides as part of their liaison duties and are already familiar with the system. Most importantly, libraries can customize LibGuides-based websites as extensively or minimally as available talent and time permits, without sacrificing visual appeal or usability–or control of the library’s own site.

LibGuides-Based Websites

There are some great LibGuides-based websites out there. Springshare has compiled exemplars across various library sectors here and here. Below are screenshots showing what you can do.

Albuquerque and Bernalillo County (ABC) Library homepage

The Albuquerque and Bernalillo County (ABC) Library is that rare public library that uses LibGuides. The homepage is beautifully laid out, with tons of neat customizations and a carousel that actually enhances UX, despite the load time. One of my favorite LibGuides sites!

World Maritime University Library homepage

The World Maritime University Library, run by the United Nations, has a beautifully minimalist blue-and-white look – classic Scandinavian. Like Google, the logo and search box are front and center; everything else is placed discreetly in tabs at the top and bottom of the homepage.

John S. Bailey Library, American College of Greece

The American College of Greece’s John S. Bailey Library is text-heavy, but its navigation is as clear as the Aegean Sea. Note the absence of a federated search box, which, unless the algorithms are of search-engine caliber, tends to produce results that undergraduates find bewildering.

Even you have other priorities or skills, you can still create a quality LibGuides-based website without major customizations to the stylesheets. Hillsborough Community College Library and Harrison College both do nice jobs, albeit with LibGuides 1.0. Walters State Community College did hardly any deep customizing of LibGuides 2.0, but its site is perfectly functional.

Walters State Community College Library homepage

My Library’s Website

Moving the Hodges University Library to LibGuides has followed a three-stage agile process.

1. September 2014. We upgraded the existing LibGuides CMS to LibGuides 2.0 and reorganized and enhanced existing content. Review my February 27 post for more on this first stage.

Hodges University Library’s faculty support page

2. January 2015. We rolled out the new library homepage and associated pages, which unified the library’s entire web presence under LibGuides. Previously our homepage was designed and run by the university’s IT department using Microsoft SharePoint (ugh), so students could only access the homepage by signing into the university intranet–dreadful for accessibility. We also shuffled DNS records and redirects so that the homepage has a much cleaner URL ( than previously ( The new site can be accessed by anyone from anywhere without logging into anything. #librarianwin

3. June 2015. We will roll out the next major iteration of our website, integrating OCLC’s new and improved WorldCat discovery layer, our new LibAnswers virtual reference service, and our revamped website to build better UX. The page header and federated search box will be optimized for mobile devices, as the rest of the site already is. Our motto? Continual improvement!

Have you used LibGuides as a website? What is your experience?

Nicole Engard: Bookmarks for March 26, 2015

planet code4lib - Thu, 2015-03-26 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Booktype Lets you produce beautiful, engaging books in minutes. Booktype is free and open source software that helps you write and publish print and digital books.

Digest powered by RSS Digest

The post Bookmarks for March 26, 2015 appeared first on What I Learned Today....

Related posts:

  1. CA Law to Produce Open Source Textbooks
  2. Espresso Book Machine
  3. E-book reading on the rise

FOSS4Lib Upcoming Events: Northeast Fedora User Group Meeting

planet code4lib - Thu, 2015-03-26 19:59
Date: Monday, May 11, 2015 - 08:00 to Tuesday, May 12, 2015 - 17:00Supports: Fedora Repository

Last updated March 26, 2015. Created by Peter Murray on March 26, 2015.
Log in to edit this page.

From the announcement:

A Northeast Fedora User Group meeting will be held at Yale University on May 11-12. Monday May 11 will be an unconference style format with a lightning round in the afternoon. Tuesday May 12 will focus on Fedora 4 training led by Andrew Woods

Please register for this event by April 3 here:

DPLA: Girl Scout super stars

planet code4lib - Thu, 2015-03-26 19:22

Unless you haven’t been out of your house for the past month, you know that it’s Girl Scout cookie season. The girls out tugging boxes of cookies around the neighborhood are learning all sorts of skills they’ll use later in life as political leaders, entertainers, astronauts, and athletes. Literally. For proof, check out this list of 25 of the most famous Girl Scouts while enjoying the last of your Thin Mints and Caramel Delights…until next year.

Madeleine Albright, former US Secretary of State

Marion Anderson, singer

Lucille Ball, comedian and film studio executive

Lynda Carter, actress and star of “Wonder Woman.”

Rosalyn Carter, former First Lady

Chelsea (and Hillary) Clinton

Katie Couric, journalist

Sandra Day O’Connor, former Supreme Court Justice

Queen Elizabeth II

Carrie Fisher, actress

Dorothy Hamill, figure skater

Jackie Joyner-Kersee, Olympic athlete

Dorothy Lamour, actress and singer

Shari Lewis, puppeteer and children’s entertainer

Christa McAullife, teacher aboard the Space Shuttle Challenger

Michelle Obama, First Lady

Nancy Reagan, former First Lady

Sally Ride, astronaut

Chita Rivera, actress, dancer, and singer

Gloria Steinem, political activist

Martha Stewart, businesswoman

Shirley Temple, actress

Mary Tyler Moore, actress

Dionne Warwick, singer

Venus Williams, tennis player


Banner image from Digital Commonwealth, Boston Public Library.

FOSS4Lib Recent Releases: Jpylyzer - 1.14.1

planet code4lib - Thu, 2015-03-26 15:50

Last updated March 26, 2015. Created by Peter Murray on March 26, 2015.
Log in to edit this page.

Package: JpylyzerRelease Date: Wednesday, March 25, 2015

FOSS4Lib Recent Releases: Siegfried - 1.0

planet code4lib - Thu, 2015-03-26 15:45

Last updated March 26, 2015. Created by Peter Murray on March 26, 2015.
Log in to edit this page.

Package: SiegfriedRelease Date: Sunday, March 22, 2015

FOSS4Lib Updated Packages: Siegfried

planet code4lib - Thu, 2015-03-26 15:44

Last updated March 26, 2015. Created by Peter Murray on March 26, 2015.
Log in to edit this page.

Siegfried is a PRONOM-based file format identification tool.

Key features are:

  • complete implementation of PRONOM (byte and container signatures)
  • reliable results (siegfried is tested against Ross Spencer’s skeleton suite and QA tested against DROID and FIDO output using
  • fast matching without limiting the number of bytes scanned
  • detailed information about the basis for format matches
  • simple command line interface with a choice of outputs (YAML, JSON, CSV)
  • a built-in server for integrating with workflows and language inter-op
  • power options including debug mode, signature modification, and multiple identifiers.
Package Type: Data Preservation and Management Package Links Operating System: LinuxMacWindows Releases for Siegfried Programming Language: GoOpen Hub Link: Hub Stats Widget: 


Subscribe to code4lib aggregator