You are here

Feed aggregator

Library of Congress: The Signal: “Elementary!” A Sleuth Activity for Personal Digital Archiving

planet code4lib - Mon, 2014-12-15 17:15

Sherlock Holmes and Doctor Watson. “The Adventure of Silver Blaze,” in The Strand Magazine. Illustration by Sidney Paget (1860-1908). On Wikimedia.

As large institutions and organizations continue to implement preservation processes for their digital collections, a smattering of self-motivated information professionals are trying to reach out to the rest of the world’s digital preservation stakeholders —  individuals and small organizations — to help them manage their digital collections.

Part of that challenge is just making people aware that:
1. Their digital possessions are at risk of becoming inaccessible
2. They need to take responsibility for preserving their own stuff
3. The preservation process is easy.

The Library of Congress offers personal digital archiving resources and takes an active role in outreach. [Watch for the announcement of Personal Digital Archiving 2015 next April in New York City.] And we are always happy to discover novel approaches by our colleagues to teaching personal digital archiving. Consider the work of one group of information professionals from Georgia.

The Society of Georgia Archivists, the Atlanta Chapter of ARMA International and the Georgia Library Association have collaborated on a curriculum for a personal digital archiving workshop that addresses the basic problems and solutions. Among the steps they outline, they emphasize the need to make files “findable.”

To that end they devised an activity called “Find the Person in the Personal Digital Archive” (the activity data set and all the workshop materials are free and available for download, reuse and remixing). The premise is simple and the game is fun but it drives home an important message about organizing your files. The producers created a folder filled with files and sub-folders — messy, disorganized files; pointless sub-folders; mis-named files; highly personal files mixed with business files; encrypted files and obsolete file formats, many sourced from the Open Preservation Foundation’s Format Corpus — and they invite people to participate in a forensics activity, to look through all the files and directories and try to piece together some information about the owner of the files.

Courtesy of the Society of Georgia Archivists.

As the user looks through the folder, there are questions to answer, such as “How would you describe the contents?”, “How did the creator of the archive name and arrange the files?” and “How do the features of the archive (such as file names, organization scheme, file format, etc.) make some of the records easy to understand and some of them impossible to understand?”

Though the goal is to deduce the identity and fate of the owner through various clues and “Aha!” moments, in doing the activity the users ends up making judgments about what is useful (like descriptive file and folder names) and what is not (files called “things.xml” and “untitled.txt”). Poring over a fake mess such as this drives home a point: how do you organize your own personal stuff? If someone, such as a loved one, had to go through your digital files, how easy or difficult would it be for them to find specific files and make sense of it all? Are you leaving a mess for someone else to trudge through?

Wendy Hagenmaier, the outreach manager for the Society of Georgia Archivists, is one of the workshop producers. Hagenmaier wanted to reach beyond her community to demystify digital archives stewardship and explain to the general public why digital preservation matters and how they can preserve their own stuff. She researched other like-minded organizations in Georgia to find interested parties for the workshop. “This topic really seems to be taking off in public libraries,” said Hagenmaier,”and genealogists are very much interested in personal digital archiving, though I don’t know if the topic comes up in their circles on its own.”

Michelle Kirk, president of the Atlanta Chapter of ARMA, gives a presentation. Photo courtesy of the Society of Georgia Archivists.

Hagenmaier — and her colleagues Michelle Kirk, Cathy Miller and Oscar Gittemeier — geared the workshop toward information professionals and encouraged the workshop attendees to go out and teach the workshop to others so that the message will reach the general public in a sort of trickle-down effect. So far she has presented the curriculum at a “train the trainer” webinar, a workshop and at a Georgia State Archives genealogy event.

The Society of Georgia Archivists also offer a Personal Digital Archiving Workshop Outreach Grant to help information professionals in Georgia promote the idea that librarians, archivists and records managers are a source of expertise for assisting individuals (the public, family members, students, corporate employees, etc.) with their personal digital archiving needs. The grant will be given to individuals who apply for the grant after hosting and teaching a workshop at their institutions or in their communities, using the curriculum materials designed by SGA, GLA and Atlanta ARMA.

Hagenmaier is fervent about getting the word out to people, making them aware that they casually create and use digital stuff in their everyday lives, so digital stewardship could and should be just as casual and effortless. She feels that knowledge of digital stewardship will empower people and assure them that their digital files can be safe if they keep them safe. She said that in the course of her work she sees in people a fear of the unknown, a huge anxiety about the fate of digital files. To illustrate her point she cites a moment during her genealogy conference presentation when she asked a group of genealogists, “How many of you think you will be able to access your digital files in ten years?” No one raised a hand.

“They are hopeful but not confident,” said Hagenmaier. “Personal digital archiving is still foreign to people. It is important for us to just get the word out that they can preserve their own stuff.”

OCLC Dev Network: WorldCat Registry RDF Interface Write Operations to be Decomissioned

planet code4lib - Mon, 2014-12-15 17:00

The WorldCat Registry RDF interface write operations will be decommissioned in January.  Developers will continue to have read only access to the WorldCat Registry via the both the RDF and SRU/XSD interfaces.

LITA: Making Connections in the New Year

planet code4lib - Mon, 2014-12-15 16:25

This new year, make a resolution to be more proactive, network and update your professional skills. Resolve to attend a professional conference, discussion or symposium!

Flickr, 2010

GameDevHacker Conference
New York, January 28

The GameDevHacker conference is just around the corner. Combining the wits of three segments of the gaming industry, the gamers, developers and hackers, the conference aims at discussing future developments. The tagline for next year’s conference is “Past Trends and Future Bets.”


The Creativity Workshop
New York, February 20 – 23 & April 17 – 20

Do you have writers block, want to create dynamic programming or transform the way you view digital arts? The creativity workshop is geared toward professionals in the sciences, business, arts and humanities. Two 4-day workshops will be held this spring 2015.


2015 National STEM Education Conference
Cleveland, April 16-17

The typical STEMcon audience includes educators in the K-12 arena. However, if altering the current landscape of STEM education is important to you, STEMcon may be a great venue to voice those concerns. Participants will, among other topics, discuss using educational technology and bridging gender and ethnic divides in the science, technology, engineering and math fields.


8th Annual Emerging Technologies for Online Learning Symposium
Dallas, April 22 – 24

Perhaps you may not be interested in STEM education at the K-12 level, but almost everyone in the information field has either facilitated or participated in online learning technologies. Web-based technology will continue to alter delivery of instruction to students around the world. Network, share and learn about new trends with participants from around the nation.


Educause Annual Conference
Indianapolis and Virtual, October 27 – 30

If travel is an issue, Educause will hold a virtual conference in October of next year. The conference is geared toward IT professionals in higher education, but can be useful for students and novice practitioners. More information will be published in the spring of 2015.


Have a happy New Year LITAblog readers!

Ed Summers: Languages on Twitter.

planet code4lib - Mon, 2014-12-15 16:12

There have been some interesting visualizations of languages in use on Twitter, like this one done by Gnip and published in the New York Times. Recently I’ve been involved in some research on particular a topical collection of tweets. One angle that’s been particularly relevant for this dataset is language. When perusing some of the tweet data we retrieved from Twitter’s API we noticed that there were two lang properties in the JSON. One was attached to the embedded user profile stanza, and the other was a top level property of the tweet itself.

We presumed that the user profile language was the language the user (who submitted the tweet) had selected, and that the second language on the tweet was the language of the tweet itself. The first is what Gnip used in its visualization. Interestingly, Twitter’s own documentation for the /get/statuses/:id API call only shows the user profile language.

When you send a tweet you don’t indicate what language it is in. For example you might indicate in your profile that you speak primarily English, but send some tweets in French. I can only imagine that detecting language for each tweet isn’t a cheap operation for the scale that Twitter operates at. Milliseconds count when you are sending 500 million tweets a day, in real time. So at the time I was skeptical that we were right…but I added a mental note to do a little experiment.

This morning I noticed my friend Dan had posted a tweet in Hebrew, and figured now was as a good a time as any.

????? ?????

— Dan Chudnov (@dchud) December 4, 2014

I downloaded the JSON for the Tweet from the Twitter API and sure enough, the user profile had language en and the tweet itself had language iw which is the deprecated ISO 639-1 code for Hebrew (current is he. Here’s the raw JSON for the tweet, search for lang:

{ "contributors": null, "truncated": false, "text": "\u05d0\u05e0\u05d7\u05e0\u05d5 \u05e0\u05ea\u05d2\u05d1\u05e8", "in_reply_to_status_id": null, "id": 540623422469185537, "favorite_count": 2, "source": "<a href=\"\" rel=\"nofollow\">Tweetbot for Mac</a>", "retweeted": false, "coordinates": null, "entities": { "symbols": [], "user_mentions": [], "hashtags": [], "urls": [] }, "in_reply_to_screen_name": null, "id_str": "540623422469185537", "retweet_count": 0, "in_reply_to_user_id": null, "favorited": true, "user": { "follow_request_sent": false, "profile_use_background_image": true, "profile_text_color": "333333", "default_profile_image": false, "id": 17981917, "profile_background_image_url_https": "", "verified": false, "profile_location": null, "profile_image_url_https": "", "profile_sidebar_fill_color": "DDFFCC", "entities": { "description": { "urls": [] } }, "followers_count": 1841, "profile_sidebar_border_color": "BDDCAD", "id_str": "17981917", "profile_background_color": "9AE4E8", "listed_count": 179, "is_translation_enabled": false, "utc_offset": -18000, "statuses_count": 14852, "description": "", "friends_count": 670, "location": "Washington DC", "profile_link_color": "0084B4", "profile_image_url": "", "following": true, "geo_enabled": false, "profile_banner_url": "", "profile_background_image_url": "", "name": "Dan Chudnov", "lang": "en", "profile_background_tile": true, "favourites_count": 1212, "screen_name": "dchud", "notifications": false, "url": null, "created_at": "Tue Dec 09 02:56:15 +0000 2008", "contributors_enabled": false, "time_zone": "Eastern Time (US & Canada)", "protected": false, "default_profile": false, "is_translator": false }, "geo": null, "in_reply_to_user_id_str": null, "lang": "iw", "created_at": "Thu Dec 04 21:47:22 +0000 2014", "in_reply_to_status_id_str": null, "place": null }

Although tweets are short they certainly can contain multiple languages. I was curious what would happen if I tweeted two words, one in English and one in French.

testing, essai

— Ed Summers (@edsu) December 15, 2014

When I fetched the JSON data for this tweet the language of the tweet was indicated to be pt or Portuguese! As far as I know neither testing nor essai are Portuguese.

This made me think perhaps the tweet was a bit short so I tried something a bit longer, with the number of words in each language being equal.

Désolé for le noise, je suis just seeing how détection de la language works.

— Ed Summers (@edsu) December 15, 2014

This one came across with lang fr. So having the text be a bit longer helped in this case. Admittedly this isn’t a very sound experiment, but it seems interesting and useful to see that Twitter is detecting language in tweets. It isn’t perfect, but that shouldn’t be surprising at all given the nature of human language. It might be useful to try a more exhaustive test using a more complete list of languages to see how it fairs. I’m adding another mental note…

Islandora: Islandora and Fedora 4

planet code4lib - Mon, 2014-12-15 16:09

Now that Fedora 4 has a production release, we at the Islandora Foundation would like to share our plans to integrate so that Islandora users, new and existing, can take advantage of the expanded performance and functionality of this major update to Fedora.

The details of our proposed plan and budget can be found in our Fedora 4 Prospectus and Project Plan, prepared by the Fedora 4 Interest Group. In short, we have established Drupal 7.x as the front end for a Fedora 4 prototype, and will commence development in January 2015 with an update and demo of the new system to be ready in time for the Open Repositories conference in June. To reach this goal, we have identified funding needs and begun soliciting support from the Islandora community. The primary use for these funds will be a dedicated developer to serve as Technical Lead on the project and oversee our effort to get an initial product out to the community. The developer for this position was recently selected, and we are very pleased to announce that Daniel Lamb of discoverygarden will be spearheading the technical development of Fedora 4/Islandora 7 integration. 

Taking the lead on overall management of the project will be Project Director Nick Ruest, two-time Islandora Release Manager and heavy contributor to the Islandora stack, with project support from the Islandora Foundation. We will be calling for volunteers from the community to join in the effort and we plan to hold one or two hackfests in the coming year. While developers, both individual and as in-kind donations from supporter institutions, are vital to this project, we are also very much in need of non-developer volunteers to test, document, and provide use cases to determine the features and scope of the update. A call for volunteers will go out in late January or early February. In the mean time, if you have questions or would like to commit funds or developer time to support project, please contact us, or ask on the mailing list.

For more information about how Islandora will work with Fedora 4, please check out this recent webinar hosted by discoverygarden. In the coming months, Nick Ruest and Daniel Lamb will be providing more detailed plans and reports here, and the Fedora 4 Interest Group will move its mandate from being a group to get the community moving towards Fedora 4 integration to a group that will work through the integration by soliciting feedback on proposed use cases and implementations. In addition, the group will work with the greater Fedora community towards a generic Fedora 3.x to Fedora 4 migration scenario.

FOSS4Lib Recent Releases: CollectionSpace - 4.1.1

planet code4lib - Mon, 2014-12-15 14:56

Last updated December 15, 2014. Created by Peter Murray on December 15, 2014.
Log in to edit this page.

Package: CollectionSpaceRelease Date: Monday, December 15, 2014

FOSS4Lib Recent Releases: Service-Proxy - 0.40

planet code4lib - Mon, 2014-12-15 14:18
Package: Service-ProxyRelease Date: Monday, December 1, 2014

Last updated December 15, 2014. Created by Peter Murray on December 15, 2014.
Log in to edit this page.

0.40 Mon Dec 1 13:04:00 CEST 2014

- mutable request, adds mechanism for modifying request on the fly -MKSP-138
- adds one more RIS field (KW) to RIS export, MKSP-140
- changes mappings for three RIS fields (SP,VL,IS), MKSP-140
- adds option to override default RIS-to-Pazpar2 field mappings, MKSP-140
- removes handling of obsolete Identity field 'proxyPattern', MKSP-133
- fixes null pointer exception in request parameter handling, MKSP-139

ACRL TechConnect: How I Work (Margaret Heller)

planet code4lib - Mon, 2014-12-15 14:00

Editor’s Note: This post is part of ACRL TechConnect’s series by our regular and guest authors about The Setup of our work.


Margaret Heller, @margaret_heller

Location: Chicago, IL

Current Gig: Digital Services Librarian, Loyola University Chicago

Current Mobile Device: iPhone 5s. It look me years and years of thinking to finally buy a smart phone, and I did it mainly because my iPod Touch and slightly smart phone were both dying so it could replace both.

Current Computer:

Work: Standard issue Dell running Windows 7, with two monitors.

Home: Home built running Windows 7, in need of an upgrade that I will get around to someday.

Current Tablet: iPad 3, which I use constantly. One useful tip is that I have the Adobe Connect, GoToMeeting, Google Hangout, and Lync apps which really help with participating in video calls and webinars from anywhere.

One word that best describes how you work: Tenaciously

What apps/software/tools can’t you live without?

Outlook and Lync are my main methods of communicating with other library staff. I love working at a place where IMing people is the norm. I use these both on desktop and on my phone and tablet. I love that a recent upgrade means that we can listen to voice mails in our email.

Firefox is my normal work web browser. I tend to use Chrome at home. The main reason for the difference is synced bookmarks. I have moved my bookmarks between browsers so many times that I have some of the original sites I bookmarked when I first used Netscape in the late 90s. Needless to say, very few of the sites still exist, but it reminds me of old hobbies and interests. I also don’t need the login to stream shows from my DVR at in my bookmark toolbar at work.

Evernote I use for taking meeting notes, conference notes, recipes, etc. I usually have it open all day at work.

Notepad++ is where I do most of my code writing.

OpenRefine is my favored tool for bulk editing metadata, closely aligned with Excel since I need Excel to get data into our institutional repository.

Filezilla is my favored FTP client.

WriteMonkey is the distraction free writing environment I use on my desktop computer (and how I am writing this post). I use Editorial on my iPad.

Spotify and iTunes for music and podcasts.

RescueTime for staying on track with work–I get an email every Sunday night so I can swear off social media for the next week. (It lasts about a day).

FocusBooster makes a great Pomodoro timer.

Zotero is my constant lifesaver when I can’t remember how to cite something, and the only way I stay on track with writing posts for ACRL TechConnect.

Feedly is my RSS reader, and most of the time I stay on top of it.

Instapaper is key to actually reading rather than skimming articles, though of course I am always behind on it.

Box (and Box Sync) is our institutional cloud file storage service, and I use it extensively for all my collaborative projects.

Asana is how we keep track of ongoing projects in the department, and I use it for prioritizing personal projects as well.

What’s your workspace like? :A large room in the basement with two people full time, and assorted student workers working on the scanner. We have pieces of computers sitting around, though moved out an old server rack that was taking up space. (Servers are no longer located in the library but in the campus data centers). My favorite feature is the white board wall behind my desk, which provides enough space to sketch out ideas in progress.

I have a few personal items in the office: a tea towel from the Bodleian Library in Oxford, a reproduction of an antique map of France, Belgium, & Holland, a photo of a fiddlehead fern opening, and small stone frogs to rearrange while I am talking on the phone. I also have a photo of my baby looking at a book, though he’s so much bigger now I need to add additional photos of him. My desk has in tray, out tray, and a book cart shaped business card holder I got at a long ago ALA conference. I am a big proponent of a clean desk, though the later in the semester it gets the more likely I am to have extra papers, but it’s important to my focus to have an empty desk.

There’s usually a lot going on in here and no natural light, so I go outside to work in the summer, or sometimes just to another floor in the building to enjoy the lake view and think through problems.

What’s your best time-saving trick?: Document and schedule routine tasks so I don’t forget steps or when to take care of them. I also have a lot of rules and shortcuts set up in my email so I can process email very quickly and not work out of my inbox. Learn the keyboard shortcuts! I can mainly get through Gmail without touching the mouse and it’s great.

What’s your favorite to-do list manager?: Remember the Milk is how I manage tasks. I’ve been using it for years for Getting Things Done. I pay for it, and so currently have access to the new version which is amazing, but I am sworn to secrecy about its appearance or features. I have a Google Doc presentation I use for Getting Things Done weekly reviews, but just started using an Asana project to track all my ongoing projects in one place without overwhelming Remember the Milk or the Google Doc. It tells me I currently have 74 projects. A few more have come in that I haven’t added yet either.

Besides your phone and computer, what gadget can’t you live without?: For a few more weeks, my breast pump, which I am not crazy about, but it makes the hard choices of parenting a little bit easier. I used to not be able to live without my Nook until I cut my commute from an hour on the train to a 20 minute walk, so now I need earbuds for the walk. I am partial to Pilot G2 pens, which I use all the time for writing ideas on scrap paper.

What everyday thing are you better at than everyone else?: Keeping my senses of humor and perspective available for problem solving.

What are you currently reading?: How to be a Victorian by Ruth Goodman (among other things). So far I have learned how Victorians washed themselves, and it makes me grateful for central heating.

What do you listen to while you work?: Podcasts (Roderick on the Line is required listening), mainly when I am doing work that doesn’t require a lot of focus. I listen mostly to full albums on Spotify (I have a paid account), though occasionally will try a playlist if I can’t decide what to listen to. But I much prefer complete albums, and try to stay on top of new releases as well as old favorites.

Are you more of an introvert or an extrovert?: A shy extrovert, though I think I should be an introvert based on the popular perception. I do genuinely like seeing other people, and get restless if I am alone for too long.

What’s your sleep routine like?: I try hard to get in bed at 9:30, but by 10 at the latest. Or ok, maybe 10:15. Awake at 6 or whenever the baby wakes up. (He mostly sleeps through the night, but sometimes I am up with him at 4 until he falls asleep again). I do love sleeping though, so chances to sleep in are always welcome.

Fill in the blank: I’d love to see _________ answer these same questions. Occasional guest author Andromeda Yelton.

What’s the best advice you’ve ever received?: You are only asked to be yourself. Figure out how you can best help the world, and work towards that rather than comparing yourself to others. People can adjust to nearly any circumstance, so don’t be afraid to try new things.

State Library of Denmark: Changing field type in Lucene/Solr

planet code4lib - Mon, 2014-12-15 12:06
The problem

We have 25 shards of 900GB / 250M documents. It took us 25 * 8 days = half a year to build them. Three fields did not have DocValues enabled when we build the shards:

  • crawl_date (TrieDateField): Unknown number of unique values, 256M values.
  • links_domains (multi value Strings): 3M unique values, 675M references.
  • links_hosts (multi value Strings): 6M unique values, 841M references.

We need DocValues on those fields for faceting. Not just because of speed and memory, but because Solr is technically unable to do faceting without it, at least on the links_domains & links_hosts fields: The internal structures for field cache faceting does not allow for the number of references we have in our index.

The attempted solution

Faced with the daunting task of re-indexing all shards, Hoss at Stump the Chump got the challenge of avoiding doing so. He suggested building a custom Lucene FilterReader with on-the-fly conversion, then using that to perform a full index conversion. Heureka, DVEnabler was born.

DVEnabler takes an index and a list of which fields to adjust, then writes a corrected index. It is still very much Here Be Dragons and requires the user to be explicit about how the conversion should be performed. Sadly the Lucene index format does not contain the required information for a more automatic conversion (see SOLR-6005 for a status on that). Nevertheless it seems to have reached first usable incarnation.

We tried converting one of our shards with DVEnabler. The good news is that it seemed to work: Our fields were converted to DocValues, we could perform efficient faceting and casual inspection indicated they had the right values. Proper test pending. The bad news is that the conversion took 2 days! For comparison, a non-converting plain optimize took just 8 hours.

Performance breakdown

Our initial shard building is extremely CPU-heavy: 8 days with 24 cores running 40 Tika-processes at 90%+ CPU utilization. The 8 real time days is 192 CPU core days. Solr merge/optimize is single-threaded, so the conversion to DocValues takes 2 CPU core days, or just 1/100 of the CPU resources needed for full indexing.

At the current time it is not realistic to make the conversion multi-threaded, to take advantage of the 24 cores. But it does mean that we can either perform multiple conversions in parallel or use the machine for building new shards, while conversing the old ones. Due to limited local storage, we can run 2 conversions in parallel, while moving unconverted & converted indexes to and from the machine. This gives us an effective conversion speed of 1 shard / 1 day.

FOSS4Lib Upcoming Events: Islandora Conference

planet code4lib - Sun, 2014-12-14 21:56
Date: Monday, August 3, 2015 - 08:00 to Friday, August 7, 2015 - 17:00Supports: IslandoraFedora RepositoryDrupal

Last updated December 14, 2014. Created by Peter Murray on December 14, 2014.
Log in to edit this page.

August 3 - 7, 2015, we invite Islandorians from the world over to join us in the birthplace of Islandora (Charlottetown, PEI) for a week of great food, (hopefully) beautiful weather, and all the Islandora you can handle.

Mark E. Phillips: What is a use?

planet code4lib - Sun, 2014-12-14 13:28

One of the metrics that we use for the various digital library systems that we run at work is the idea of an item “use”.

This post will hopefully explain a bit more about how a use is calculated and presented.

The different digital library systems that we operate (The Portal to Texas History, the UNT Digital Library, and the Gateway to Oklahoma History) make use of Google Analytics to log and report on access to these systems.  Below is a screenshot of the Google Analytics data for the last month related to The Portal to Texas History.

Google Analytics Screenshot for The Portal to Texas History

From Google Analytics we are able to get a rough idea of the number of users, sessions, and pageviews as well as a whole host of information that is important for running a large website like a digital library.

There are a number of features of Google Analytics that we can take advantage of that allow us to understand how users are interacting with our systems and interfaces.

One of the challenges we have with this kind of analytics is the fact that it collects information when triggered by Javascript on the page.  This can happen when the page is loaded or when something is clicked on the page.  The reason that this is sometimes not enough for our reporting is the fact that much of the content in our various digital libraries is linked to directly by outside resources,  either embedded in discussion forums or by directing users directly to the PDF representation of the item.

A few years ago we decided to start accounting for this kind of usage of our systems in addition to the data that Google Analytics provides.  In order to do this we developed a set of scripts that we run each night that work on the previous days worth of log files on the application servers that serve our digital library content.  These log files are aggregated to a single place,  parsed, and then filtered to leave us with the information we are interested in for the day.  This resulting data are the unique uses that an item has had from a given IP address during a 30 minute window.  This allows us to report on uses of theses and dissertations that may be linked to directly from a Google search result,  or possibly an image that was embedded in another sites blog post that pertains to one of our digital libraries.

Once we have the data for a given object we are able to aggregate that usage information to the collection and partner level for which the item belongs.  This allows us to show information about usage at the collection or partner level.  Finally the item use information is aggregated at the system level so that you can see the information for The Portal to Texas History, UNT Digital Library, or The Gateway to Oklahoma History in one place.

Item page in the UNT Digital Library

The above image shows how an end user can see the usage data for an item on the items about page.  This shows up in the “Usage” section which displays total usage, uses in the last 30 days, and then uses yesterday.

Usage Statistics for item in the UNT Digital Library

If a user clicks on the stats tab they are taken to the items stats page.  They can see the most recent 30 days or select from a month or year in the table below the graph.

Referral data for item in the UNT Digital Library

A user can view the referral traffic for a selected month or year by clicking on the referral tab.

Collection Statistics for the UNT Scholarly Works Repository in the UNT Digital Library

Each item use is also aggregated to the collection and partner level.

System statistics for the UNT Digital Library

And finally a user is able to view statistics for the entire system.  At this time we have usage data for the systems going back to 2009 when we switched over to our current architecture.

I will probably write another post detailing the specifics of what we do and don’t count when we are calculating a “use”.  So more later.

Ranti Junus: Digital Collections and Accessibility

planet code4lib - Sat, 2014-12-13 06:06

[This is a crosspost from the Digital Scholarship Collaborative Sandbox blog from the MSU Libraries.  The original blog post can be read there.  Do visit the blog and read the other posts written by my colleagues as well.]

Like many other academic libraries, our collection consists of not only print materials, but also electronic collections. Typical electronic resources can be those we subscribe to through a vendor (ProQuest, JSTOR, Elsevier, etc.), or ones that we produce in-house such as

We digitize a lot of stuff. The Library was busy working on digitization projects even before I joined in 2001, from the Making of the Modern Michigan, the Cookbooks project and Sliker Collection, Sunday School Books, nd more recently, historic images from Chicago Tribunes. Or consider other digital collections from other institutions such as the New York Public Library, the Library of Congress,Smithsonian National Museum of Natural History, the World Digital Library, or the Digital Public Library of America (DPLA). There are a lot of digital collections produced by various libraries, archives, museums, and other institutions.

The typical outcome from these digitization projects are images, metadata, and text, represented either as an image of printed or handwritten material or as a transcript. We then create a Web presence for these outcomes, including features like search, browse, and perhaps some additional application to display and interact with the images. User interaction with these digital collections should be straightforward: users should be able to visit the site, search or browse, and read the information presented on the page with ease. We also want to make the presentation of these collections pleasing to the eye, with background color or images, font type and color, and consistent placement of the images with the associated metadata (image on the top with metadata on the bottom, or image on the left with metadata on the right, or the whatever design decision we make to present the collection.) We also want to make sure that our institution’s branding is visible. So we add the banner, image or logo of our institution, some navigation so visitors can also go to our main website, and footers to provide visitors with contact information, acknowledgement of the funder, link to the privacy statement, etc.

Eventually, we produce a set of rich interfaces, chock full of images, text, and links. And probably some audio, too, for a sound project.

Given the ubiquitous nature of digital collections, the goal that these collections would be used as part of scholarly activities, and the library’s mission to disseminate the information as widely as possible, there is one aspect that many of us need to address when we plan for a digitization project: how do people with disabilities access these collections without getting lost? Can they also get the same access and benefit of our collections if they only rely on their screen readers (or refreshable Braille, or any other assistive technology)? Can people move around our website easily using just a keyboard (for those with hand-coordination difficulty who cannot use a mouse)?

Consider these questions when you begin working on any digital humanities project. Data visualization is now being used a lot. Sighted users can review the image representations easily; we can distinguish the information by shape and colors. Mundane data that used to be presented as text can now have pretty face. Information can be conveyed faster because we can see the charts and colors right away without having to go through lengthy text. But how can those who rely on sound be able to infer the information from those charts? Can color-blind people distinguish the color palette that you use? How are you going to explain the conclusion of your charts “verbally”? These are areas that have yet be addressed fully. We still have a lot of work to do.

Some resources:


Galen Charlton: Our move, by the numbers

planet code4lib - Sat, 2014-12-13 02:14

We’ve landed in Atlanta, having completed our move from Seattle driving cross-country.  Here are some numbers to ponder, with apologies to Harper’s magazine.

  • Humans: 2
  • Cats: 3
  • Miles as the car rolls: 3,600
  • Miles per gallon: 42.1
  • Average speed of the car: 174,720 furlongs per fortnight
  • Seconds spent pondering whether to use furlongs or smoots for the previous measure: 15
  • Cracked windshields: 1
  • Cats who forgot that if the tail is visible, the cat is visible: 1
  • Mornings that the cats were foiled by platform beds: 5
  • Mornings that the cats were foiled by an air mattress: 2
  • Mornings that the humans were foiled by a bed with an underneath: 2
  • Number of cats disappointed that said beds turned out to be moveable: 3
  • Hours spent experiencing the thrills of Los Angeles rush hour traffic: 3
  • Calls from a credit card fraud monitoring department: 1
  • Hotel hot tubs dipped into: 2
  • Restaurant restrooms with disconcerting signs: 1
  • Progress of feline excavation to China: no report
  • Fueling stops: 10
  • Net timezone difference: +3.0
  • Number of moving company staff involved: 9
  • Host cats consternated by the arrival of three interlopers: 4
  • Cats who decided to spend a few hours under the covers to bring down the number of whelms: 1
  • Tweets sent using the #SEAtoATL hashtag, including this post’s tweet: 23
  • Nights spent in California: 2
  • Nights spent in Texas: 3
  • Humans and cats happy to have arrived: 5

District Dispatch: How many pizzas can you get for $1.5 billion?

planet code4lib - Fri, 2014-12-12 23:09

Photo by Robert D Peyton via Flickr

Yesterday the Federal Communications Commission (FCC) brought the E-rate modernization proceeding to a conclusion with all the bravado it deserved. To a packed room, including library directors, teachers, a superintendent, a school principal and a handful of school students from D.C. public schools, the FCC staff presented the E-rate Order that the Chairman had circulated to his colleagues three weeks ago.

We at the American Library Association (ALA) had a pretty good understanding of what would be included in yesterday’s Order through our numerous briefings from the Chairman’s staff. Thankfully, it looks like things have not changed since the last update before the Commission’s sunshine period began (all hush-hush negotiations among the Commissioners when they no longer take public comment but finish digesting the public record to make sure the final item is commensurate with that record and the Commission’s own data and goals).

Despite interruptions during the open meeting from net neutrality protesters, the meeting and subsequent vote went forward smoothly. Richard Reyes-Gavilan spoke eloquently about the many services the D.C. Public Library provides all D.C. residents. These span the gamut from “basic human services such as applying for health benefits and communicating with loved ones” to those that change lives as shown in the story from The Washington Post, he related:

“[A]bout a 69-year old man who was set on rebuilding his life after 40 years of incarceration. With few other places to turn, he began taking computer classes at the Martin Luther King Jr. (MLK) Library. He learned to apply for jobs online and now he’s employed full-time at the University of the District of Columbia.”

Reyes-Gavilan, also showed the Commission where libraries with high-capacity broadband are heading in describing the DC Tech Meetup held monthly in the MLK Library where “hundreds of technologists gather to pitch ideas, demonstrate new products, and network with potential collaborators and funders.”

While Richard spoke on behalf of libraries, during the meeting, he was joined by colleague Nicholas Kerelchuck, manager, MLK Library Digital Commons, as well as Andrea Berstler, executive director, Wicomico Public Library; and Rose Dawson, director of Libraries, Alexandria Library. Each of whom could add similar examples of why high-capacity broadband is the currency of libraries today and why what the Commission accomplished today, can make a difference for so many libraries across the country.

What’s in the Order?

The shiny object that tops the list of all the press headlines and most of the organizations’ statements issued yesterday, including ALA’s, is the additional $1.5 billion that will be added to the program, immediately increasing available funding to $3.9 billion (plus annual inflation) from here on out.

Equally important to the additional funding are the policy changes the Commission adopted to address the to the library broadband capacity issues—geared to help more libraries get the speeds they desperately need.

Throughout the E-rate proceeding, ALA pushed the Commission to take up this issue. Yesterday we were rewarded for our efforts. While we do not have the Order in hand yet to read the exact details, FCC staff described the changes (pdf) during the meeting. They will “maximize the options schools and libraries have for purchasing affordable high-speed broadband” and include:

  • Additional flexibility for libraries and schools seeking to purchase high-speed broadband by suspending the amortization requirements for new construction and allowing applicants to pay for the non-discounted portion over multiple years;
  • In 2016, equalizing the Commission’s treatment of dark and lit fiber and allowing for self-construction when these options are the most cost-effective solutions;
  • Providing up to a 10 percent match to state funding for construction with special consideration for tribal libraries and schools; and
  • Requiring carriers that receive Connect America funding to offer high-speed broadband to libraries and schools located in their service areas.

Larra, Marijke, and Alan with Chairman Wheeler

These rule changes have very real potential for libraries that have been struggling to increase broadband-capacity. But do not fear. We at ALA, with the help of the E-rate task force, our telecommunications subcommittee, Bob Bocher, an OITP Fellow, and our consultants, are rolling up our sleeves to determine the most effective outreach and support activities. We will also be working with the Public Library Association (PLA), the Chief Officers of State Library Agencies (COSLA), as well as the Association for Rural and Small Libraries (ARSL), and other groups to make sure we address the most pressing concerns in the coming weeks.

What do pizza and E-rate have in common?

A number of my E-rate posts refer to my kids (they can tell you what universal service is, what the Senate Commerce Committee does, the difference between category 1 and 2, and what amortization means. We’re working on what a form 470 is—poor things). At least one of my posts refers to food. This one is both.

My son texts me, “what’s for dinner” while we at the office are reflecting on the events of the day. So, because my fridge has been empty (really empty) for the last several weeks, I text back a math problem. How many pizzas can you buy for $1.5 billion? Well, about 150 million which may be any teenager’s dream come true. But we won’t get any no matter how big the check is if we don’t place the order. The same holds true for libraries and E-rate. It would not matter if we had an additional $150 billion unless libraries place their order for E-rate eligible services—and in a big way.

It is true that the changes the Commission made to the program between the July Order and yesterday’s Order will take some real effort to navigate and it’s true that there are wrinkles that need ironing out by the Commission and USAC. This next year will undoubtedly be rocky while those who support E-rate applicants struggle to make sense of the changes and figure out how best to support individual libraries. It takes a village, and libraries should well understand that phrase if the stories I have been collecting are truly indicative of the tenacity of librarians today.

The Commission removed barriers in its rules to open the door for more libraries to get more funding for more broadband. It is now up to the library community to walk through those doors.

As always, more in the near future. Next post could very well have cocktails involved.

The post How many pizzas can you get for $1.5 billion? appeared first on District Dispatch.

Code4Lib: Code4Lib 2015 Diversity Scholarships

planet code4lib - Fri, 2014-12-12 21:35

The Code4Lib Scholarship Committee will award 5 diversity scholarships based on merit and need. Each scholarship will provide up to $1,000 to cover travel costs and conference fees for a qualified attendee to attend the 2015 Code4Lib Conference, which will be held in Portland, Oregon, from February 9 - 12, 2015.

Applications are due by December 31, 2014 at 5 PM EST (see below for more details).


To qualify for a scholarship, an applicant must be interested in actively contributing to the mission and goals of the Code4Lib Conference.

  • Two scholarships will be awarded to any woman or transgender person.
  • Two scholarships will be awarded to any person of Hispanic or Latino, Black or African-American, Asian, Native Hawaiian or Pacific Islander, or American Indian or Alaskan Native descent.
  • One scholarship will be awarded to the best remaining candidate who
    meets any of the previously mentioned eligibility requirements.

Eligible applicants may apply based on multiple criteria, but no applicant will receive more than one scholarship. Past winners of any Code4Lib scholarship are not eligible for a scholarship.

The scholarship recipients will be selected based upon their merit and financial needs.

Registration spots are being held for scholarship recipients. If you can attend only if you receive a scholarship, there is no need to register for the conference at this point. Scholarship recipients will receive a special link for free registration, or will be reimbursed if they have already registered.

Scholarship recipients are required to write and submit a brief trip report to the Code4Lib 2015 Scholarships Committee by April 1, 2015 to be posted to the Code4Lib wiki. The report should address: (a) what kind of experience they had at the conference, (b) what they have learned, (c) what suggestions they have for future attendees and conference organizers.

All reimbursement forms and receipts must be received by May 26, 2015.


To apply, please send an email to Francis Kayiwa ( with the subject heading Code4Lib 2015 Diversity Scholarship Application containing the following (combined into a single attached PDF, if possible):

  1. A brief letter of interest, which:
    • Identifies your eligibility for a diversity scholarship
    • Describes your interest in the conference and how you intend to
    • Discusses your merit and needs for the scholarship
  2. A resume or CV
  3. Contact information for two professional or academic references

The application deadline is Dec. 31, 2014, 5pm EST. The scholarship committee will notify successful candidates the week of Jan. 9, 2015.


We would like to thank our sponsors for supporting the Code4Lib 2015 Diversity Scholarships. All sponsors have left it up to the discretion of the Code4Lib 2015 Scholarship Committee for how to award these diversity scholarships.


For more information on the Code4Lib Conference, please see the
conference website at You can see write-ups of previous Code4Lib Conferences:

Nicole Engard: Bookmarks for December 12, 2014

planet code4lib - Fri, 2014-12-12 20:30

Today I found the following resources and bookmarked them on <a href=

  • TAGS TAGS is a free Google Sheet template which lets you setup and run automated collection of search results from Twitter.

Digest powered by RSS Digest

The post Bookmarks for December 12, 2014 appeared first on What I Learned Today....

Related posts:

  1. If This Then That
  2. LibraryThing adds another neat feature
  3. New Addition to Google


Subscribe to code4lib aggregator