You are here

Feed aggregator

Suzanne Chapman: Confab presentation – Short, True, Meaningful (pick 2)

planet code4lib - Tue, 2016-02-16 21:08

I gave an Ignite-style presentation at the 2014 Confab Higher Ed conference. Not sure if I’ll ever do a timed, auto advancing style presentation again but it was super fun!

Short, True, Meaningful (pick 2) – Confab Higher Ed Lightening Talk

Description: Librarians are great at being exhaustively thorough – a trait that’s wonderful for all kinds of traditional library activities and services – but not so great for coming up with concise web content or link labels. After many lengthy conversations debating the value of being concise and user-friendly over being exhaustively thorough, I introduced a new strategy. Using the classic “cheap, fast, good” decision triangle, I created my own version using short, true, and meaningful to help visually demonstrate the tradeoffs.

ACRL TechConnect: Low Expectations Distributed: Yet Another Institutional Repository Collection Development Workflow

planet code4lib - Tue, 2016-02-16 20:38

Anyone who has worked on an institutional repository for even a short time knows  that collecting faculty scholarship is not a straightforward process, no matter how nice your workflow looks on paper or how dedicated you are. Keeping expectations for the process manageable (not necessarily low, as in my clickbaity title) and constant simplification and automation can make your process more manageable, however, and therefore work better. I’ve written before about some ways in which I’ve automated my process for faculty collection development, as well as how I’ve used lightweight project management tools to streamline processes. My newest technique for faculty scholarship collection development brings together pieces of all those to greatly improve our productivity.

Allocating Your Human and Machine Resources

First, here is the personnel situation we have for the institutional repository I manage. Your own circumstances will certainly vary, but I think institutions of all sizes will have some version of this distribution. I manage our repository as approximately half my position, and I have one graduate student assistant who works about 10-15 hours a week. From week to week we only average about 30-40 hours total to devote to all aspects of the repository, of which faculty collection development is only a part. We have 12 librarians who are liaisons with departments and do the majority of the outreach to faculty and promotion of the repository, but a limited amount of the collection development except for specific parts of the process. While they are certainly welcome to do more, in reality, they have so much else to do that it doesn’t make sense for them to spend their time on data entry unless they want to (and some of them do). The breakdown of work is roughly that the liaisons promote the repository to the faculty and answer basic questions; I answer more complex questions, develop procedures, train staff, make interpretations of publishing agreements, and verify metadata; and my GA does the simple research and data entry. From time to time we have additional graduate or undergraduate student help in the form of faculty research assistants, and we have a group of students available for digitization if needed.

Those are our human resources. The tools that we use for the day-to-day work include Digital Measures (our faculty activity system), Excel, OpenRefine, Box, and Asana. I’ll say a bit about what each of these are and how we use them below. By far the most important innovation for our faculty collection development workflow has been integration with the Faculty Activity System, which is how we refer to Digital Measures on our campus. Many colleges and universities have some type of faculty activity system or are in the process of implementing one. These generally are adopted for purposes of annual reports, retention, promotion, and tenure reviews. I have been at two different universities working on adopting such systems, and as you might imagine, it’s a slow process with varying levels of participation across departments. Faculty do not always like these systems for a variety of reasons, and so there may be hesitation to complete profiles even when required. Nevertheless, we felt in the library that this was a great source of faculty publication information that we could use for collection development for the repository and the collection in general.

We now have a required question about including the item in the repository on every item the faculty member enters in the Faculty Activity System. If a faculty member is saying they published an article, they also have to say whether it should be included in the repository. We started this in late 2014, and it revolutionized our ability to reach faculty and departments who never had participated in the repository before, as well as simplify the lives of faculty who were eager participants but now only had to enter data in one place. Of course, there are still a number of people whom we are missing, but this is part of keeping your expectation low–if you can’t reach everyone, focus your efforts on the people you can. And anyway, we are now so swamped with submissions that we can’t keep up with them, which is a good if unusual problem to have in this realm. Note that the process I describe below is basically the same as when we analyze a faculty member’s CV (which I described in my OpenRefine post), but we spend relatively little time doing that these days since it’s easier for most people to just enter their material in Digital Measures and select that they want to include it in the repository.

The ease of integration between your own institution’s faculty activity system (assuming it exists) and your repository certainly will vary, but in most cases it should be possible for the library to get access to the data. It’s a great selling point for the faculty to participate in the system for your Office of Institutional Research or similar office who administers it, since it gives faculty a reason to keep it up to date when they may be in between review cycles. If your institution does not yet have such a system, you might still discuss a partnership with that office, since your repository may hold extremely useful information for them about research activity of which they are not aware.

The Workflow

We get reports from the Faculty Activity System on roughly a quarterly basis. Faculty member data entry tends to bunch around certain dates, so we focus on end of semesters as the times to get the reports. The reports come by email as Excel files with information about the person, their department, contact information, and the like, as well as information about each publication. We do some initial processing in Excel to clean them up, remove duplicates from prior reports, and remove irrelevant information.  It is amazing how many people see a field like “Journal Title” as a chance to ask a question rather than provide information. We focus our efforts on items that have actually been published, since the vast majority of people have no interest in posting pre-prints and those that do prefer to post them in arXiv or similar. The few people who do know about pre-prints and don’t have a subject archive generally submit their items directly. This is another way to lower expectations of what can be done through the process. I’ve already described how I use OpenRefine for creating reports from faculty CVs using the SHERPA/RoMEO API, and we follow a similar but much simplified process since we already have the data in the correct columns. Of course, following this process doesn’t tell us what we can do with every item. The journal title may be entered incorrectly so the API call didn’t pick it up, or the journal may not be in SHERPA/RoMEO. My graduate student assistant fills in what he is able to determine, and I work on the complex cases. As we are doing this, the Excel spreadsheet is saved in Box so we have the change history tracked and can easily add collaborators.

A view of how we use Asana for managing faculty collection development workflows.

At this point, we are ready to move to Asana, which is a lightweight project management tool ideal for several people working on a group of related projects. Asana is far more fun and easy to work with than Excel spreadsheets, and this helps us work together better to manage workload and see where we are with all our on-going projects. For each report (or faculty member CV), we create a new project in Asana with several sections. While it doesn’t always happen in practice, in theory each citation is a task that moves between sections as it is completed, and finally checked off when it is either posted or moved off into some other fate not as glamorous as being archived as open access full text. The sections generally cover posting the publisher’s PDF, contacting publishers, reminders for followup, posting author’s manuscripts, or posting to SelectedWorks, which is our faculty profile service that is related to our repository but mainly holds citations rather than full text. Again, as part of the low expectations, we focus on posting final PDFs of articles or book chapters. We add books to a faculty book list, and don’t even attempt to include full text for these unless someone wants to make special arrangements with their publisher–this is rare, but again the people who really care make it happen. If we already know that the author’s manuscript is permitted, we don’t add these to Asana, but keep them in the spreadsheet until we are ready for them.

We contact publishers in batches, trying to group citations by journal and publisher to increase efficiency so we can send one letter to cover many articles or chapters. We note to follow up with a reminder in one month, and then again in a month after that. Usually the second notice is enough to catch the attention of the publisher. As they respond, we move the citation to either posting publisher’s PDF section or to author’s manuscript section, or if it’s not permitted at all to the post to SelectedWorks section. While we’ve tried several different procedures, I’ve determined it’s best for the liaison librarians to ask just for author’s accepted manuscripts for items after we’ve verified that no other version may be posted. And if we don’t ever get them, we don’t worry about it too much.


I hope you’ve gotten some ideas from this post about your own procedures and new tools you might try. Even more, I hope you’ll think about which pieces of your procedures are really working for you, and discard those that aren’t working any more. Your own situation will dictate which those are, but let’s all stop beating ourselves up about not achieving perfection. Make sure to let your repository stakeholders know what works and what doesn’t, and if something that isn’t working is still important, work collaboratively to figure out a way around that obstacle. That type of collaboration is what led to our partnership with the Office of Institutional Research to use the Digital Measures platform for our collection development, and that in turn has  led to other collaborative opportunities.


Suzanne Chapman: 2016 UX+Library Conferences

planet code4lib - Tue, 2016-02-16 17:35

There are so many good conferences for UX+Library folks this year!

Note: this isn’t intended to be an exhaustive list, just conferences I’ve attended and enjoyed or conferences I’ve heard good things about. But if you have suggestions, please comment below. 

(each section in chronological order)

Library conferences focused on web technology, design, user research, or assessment Code4Lib

“An annual gathering of technologists from around the world, who largely work for and with libraries, archives, and museums and have a commitment to open technologies.”

  • Next occurrence: March 7-10, 2016
  • Location: Philadelphia, PA
IOLUG (Indiana Online Users Group)

Theme: “DIY UX: Innovate. Create. Design.”

From the call for proposals “What strategies and/or tools do you use to make library resources, webpages, spaces, marketing materials, etc. more user-friendly? What has proven successful for your organization? What problems surrounding user experience have you encountered, and what solutions have you devised? What best practices or recent research can you share about user experience? We encourage presentations that are practical, hands-on, and include take-awayable tools, techniques, and/or strategies that librarians can implement to improve their resources and services for students, patrons, faculty, etc.”

  • Next occurrence: May 20, 2016
  • Location: Indianapolis, IN
Computers in Libraries

“Libraries are changing,—building creative spaces with a focus on learning and creating; engaging audiences in different ways with community and digital managers; partnering with different community organizations in new and exciting ways. Computers in Libraries has always highlighted and showcased creative and innovative practices in all types of libraries, but this year with our theme, Library Labs: Research, Innovation & Imagination, we plan to feature truly transformative and cutting-edge research, services, and practices along with the strategies and technologies to support them.”

  • Next occurrence: March 8-10, 2016
  • Location: Washington, DC
Design for Digital

“Designing for Digital is a two-day conference packed with intensive, hands-on workshops and informative sessions meant to bring together colleagues working on user experience, discovery, design and usability projects inside and outside of libraries, drawing expertise from the tech and education communities, as well as from peers. This exposure will allow information professionals to bring lessons home to their institutions and to think differently about designing our digital future.”

    • Next occurrence: April 6 – 7, 2016
    • Location: Austin, TX

The one and only library conference focused entirely on UX. Last year offered an interactive format with wonderful keynotes, hands-on ethnographic technique exercises, and a team challenge. This year will feature more individual sessions highlighting projects from around the world with the theme: nailed, failed, and derailed.

  • Next occurrence: June 23-24, 2016
  • Location: Manchester, England

“Access is Canada’s annual library technology conference. It brings librarians, technicians, developers, programmers, and managers together to discuss cutting-edge library technologies. Access is a single stream conference featuring in-depth analyses, panel discussions, poster presentations, lightning talks, hackfest, and plenty of time for networking and social events.”

  • Next occurrence: October 4-7th
  • Location:Fredericton, New Brunswick
LITA Forum

“The LITA Forum is the annual gathering of about 300 technology-minded information professionals. It is the conference where technology meets the practicality of daily information operations in archives, libraries, and other information services. The Forum is an ideal place to interact with fellow library technologists. Attendees are working at the cutting edge of library technology and are interested in making connections with technically-inclined colleagues and learn about new directions and projects in libraries.”

  • Next occurrence: TBD (likely November)
  • Location: TBD
Library Assessment

Theme is “Building Effective, Sustainable, Practical Assessment”.

“The conference goal is to build and further a vibrant library assessment community by bringing together interested practitioners and researchers who have responsibility or interest in the broad field of library assessment. The event provides a mix of invited speakers, contributed papers, short papers, posters, and pre- and post-conference workshops that stimulate discussion and provide workable ideas for effective, sustainable, and practical library assessment.”

    • Next occurrence: October 31–November 2, 2016
    • Location: Arlington, VA
DLF (Digital Library Federation)

“Strategy meets practice at the Digital Library Federation (DLF). Through its programs, working groups, and initiatives, DLF connects the vision and research agenda of its parent organization, the Council on Library and Information Resources (CLIR), to an active and exciting network of practitioners working in digital libraries, archives, labs, and museums. DLF is a place where ideas can be road-tested, and from which new strategic directions can emerge.”

  • Next occurrence: November 7-9, 2016
  • Location: Milwaukee, WI
Higher Ed conferences focused on web technology, design, or user research Web Con

“An affordable two-day conference for web designers, developers, content managers, and other web professionals within higher ed and beyond.”

  • Next occurrence: April 27-28, 2016
  • Location: Urbana-Champaign, IL

“HighEdWeb is the annual conference of the Higher Education Web Professionals Association, created by and for all higher education Web professionals—from programmers to marketers to designers to all team members in-between—who want to explore the unique Web issues facing colleges and universities.”

  • Next occurrence: October 16-19, 2016
  • Location: Memphis, TN

“Focusing on the universal methods and tools of user interface and user experience design, as well as the unique challenges of producing websites and applications for large institutions, edUi is a perfect opportunity for web professionals at institutions of learning—including higher education, K-12 schools, libraries, museums, government, and local and regional businesses—to develop skills and share ideas.”

  • Next occurrence: TBD (likely November 2016)
  • Location: TBD
And a Bunch of Professional Industry Conferences & Events





Library of Congress: The Signal: Blurred Lines, Shapes, and Polygons, Part 2: An Interview with Frank Donnelly, Geospatial Data Librarian

planet code4lib - Tue, 2016-02-16 16:10

The following is a guest post by Genevieve Havemeyer-King, National Digital Stewardship Resident at the Wildlife Conservation Society Library & Archives. She participates in the NDSR-NYC cohort. This post is Part 2 on Genevieve’s exploration of stewardship issues for preserving geospatial data. Part 1 focuses on specific challenges of archiving geodata.

Frank Donnelly, GIS Librarian at Baruch College CUNY, was generous enough to let me pick his brain about some questions that came up while researching the selection and appraisal of geospatial data sets for my National Digital Stewardship Residency.

Baruch College’s NYC Geodatabase

Donnelly maintains the Newman Library’s geospatial data resources and repository, creates research guides for learning and exploring spatial data, and also teaches classes in open-source GIS software. In my meeting with him, we talked about approaches to GIS data curation in a library setting, limitations of traditional archival repositories, and how GIS data may be changing – all topics which have helped me think more flexibly about my work with these collections and my own implementation of standards and best practices for geospatial data stewardship.

Genevieve: How do you approach the selection of GIS materials?

Frank: As a librarian, much of my material selection is driven by the questions I receive from members of my college (students, faculty, and staff). In some cases these are direct questions (i.e. can we purchase or access a particular dataset), and in other cases it’s based on my general sense of what people’s interests are. I get a lot of questions from folks who are interested in local, neighborhood data in NYC for either business, social science, or public policy-based research, so I tend to focus on those areas. I also consider the sources of the questions – the particular departments or centers on campus that are most interested in data services – and try to anticipate what would interest them.

I try to obtain a mix of resources that would appeal to novice users for basic requests (canned products or click-able resources) as well as to advanced users (spatial databases that we construct so researchers using GIS can use it as a foundation for their work). Lastly, I look at what’s publicly accessible and readily usable, and what’s not. For example, it was challenging to find well-documented and public sources for geospatial datasets for NYC transit, so we decided to generate our own out of the raw data that’s provided.

Genevieve: On the limitations of the Shapefile, is the field growing out of this format? And do the limitations affect your ability to provide access?

Frank: People in the geospatial community have been grumbling about shapefiles for quite some time now, and have been predicting or calling for their demise. There are a number of limitations to the format in terms of maximum file size, limits on the number of attribute fields and on syntax used for field headers, lack of Unicode support, etc. It’s a rather clunky format as you have several individual pieces or files that have to travel together in order to function. Despite attempts to move on – ESRI has tried to de-emphasize them by moving towards various geodatabase formats, and various groups have promoted plain text formats like GML, WKT, and GeoJSON – the shapefile is still with us. It’s a long-established open format that can work in many systems, and has essentially become an interchange format that will work everywhere. If you want to download data from a spatial database or off of many web-based systems, those systems can usually transform and output the data to the shapefile format, so there isn’t a limitation in that sense. Compared to other types of digital data (music, spreadsheet files) GIS software seems to be better equipped at reading multiple types of file formats – just think about how many different raster formats there are. As other vector formats start growing in popularity and longevity – like GeoJSON or perhaps Spatialite – the shapefile may be eclipsed in the future, but it’s construction is simple enough that they should continue to be accessible.

Genevieve: Do you think that a digital repository designed for traditional archives can or should incorporate complex data sets like those within GIS collections? Do you have any initial ideas or approaches to this?

Frank: This is something of an age-old debate within libraries; whether the library catalog should contain just books or should it also contain other formats like music, maps, datasets, etc. My own belief is that people who are looking for geospatial datasets are going to want to search through a catalog specifically for datasets; it doesn’t make sense to wade through a hodgepodge of other materials, and the interface and search mechanisms for datasets are fundamentally different than the interface that you would want or need when searching for documents. Typical digital archive systems tend to focus on individual files as objects – a document, a picture, etc. Datasets are more complex as they require extensive metadata (for searchability and usability), help documentation and codebooks, etc. If the data is stored in large relational or object-oriented databases, that data can’t be stored in a typical digital repository unless you export the data tables out into individual delimited text files. That might be fine for small datasets or generally good for insuring preservation, but if you have enormous datasets – imagine if you had every data table from the US Census – it would be completely unfeasible.

For digital repositories I think it’s fine for small individual datasets, particularly if they are being attached to a journal article or research paper where analysis was done. But in most instances I think it’s better to have separate repositories for spatial and large tabular datasets. As a compromise you can always generate metadata records that can link you from the basic repository to the spatial one if you want to increase find-ability. Metadata is key for datasets – unlike documents (articles, reports, web pages) you have no text to search through, so keyword searching goes out the window. In order to find them you need to rely on searching metadata records or any help documents or codebooks associated with them.

Genevieve: How do you see selection and preservation changing in the future, if/when you begin collecting GIS data created at Baruch?

Frank: For us, the big change will occur when we can build a more robust infrastructure for serving data. Right now we have a web server where we can drop files and people can click on links to download layers or tables one by one. But increasingly it’s not enough to just have your own data website floating out there; in order to make sure your data is accessible and findable you want to appear in other large repositories. Ideally we want to get a spatial database up and running (like PostGIS) where we can serve the data out in a number of ways – we can continue to serve it the old fashioned way but would also be able to publish out to larger repositories like the OpenGeoportal. A spatial database would allow us to grant users access to our data directly through a GIS interface, without having to download and unzip files one by one.

DPLA: RootsTech Wrap Up

planet code4lib - Tue, 2016-02-16 16:00

The DPLA booth ready for visitors at RootsTech 2016.

Earlier this month, DPLA’s Director for Content Emily Gore and Manager of Special Projects Kenny Whitebloom headed west to Salt Lake City to represent DPLA at RootsTech, the largest family history event in the world.  DPLA had a booth in the Exhibit Hall and hosted two sessions, through which we were able to introduce our portal, collections, and resources to over a thousand genealogists and family researchers.

We love connecting with new audiences and were thrilled to have the opportunity to touch base with genealogists and family researchers to chat about their needs, interests, and questions about what DPLA has to offer.  DPLA stands out in the field as a free public resource that allows researchers to search the collections of almost 1,800 libraries, archives, and museums around the country all at once.  Nearly everyone who passed through our booth was excited about DPLA as a research resource — there was so much interest we ran out of brochures!

We also found that family researchers had great questions for us: Can you search family names?  Does DPLA have things like newspapers, letters, or yearbooks? What about essential documents like birth and death records?  Answer: Yes! We have collected content  in each of these categories from our network of hubs, but what was perhaps most exciting to family researchers about DPLA was the opportunity to dig deeper and add context to the lives of our ancestors.

Our presentation sessions allowed us to go even further.  We welcomed all levels of researchers, from beginners to pros, and Emily’s slides below demonstrate a few of the ways that DPLA collections hold vast potential for family historians.  For example, family bibles, like that of the Whitehead family, can be an invaluable source of birth and death information, particularly for the years before official state documentation.  Looking for Civil War-era ancestors?  Try searching for regimental records, veterans’ association photos, and scrapbooks:


The App Library also holds some valuable tools for family researchers.  Here, Emily shows how DPLA by County and State might be particularly helpful to zero in a a specific place or region that your family hails from.  Or, try cross-searching DPLA and Europeana to connect to family resources in Europe!


Thanks to everyone that stopped by our sessions and booth at RootsTech and welcome to the DPLA community!

Let’s stay connected about how DPLA can best serve genealogists and family researchers and we’ll hope to see you at RootsTech 2017!

Karen Coyle: More is more

planet code4lib - Tue, 2016-02-16 14:26
Here's something that drives me nuts:

These are two library catalog displays for Charles Darwin's "The original of species". One shows a publication date of 2015, the other a date of 2003. Believe me that neither of them anywhere lets the catalog user know that these are editions of a book first published in 1859. Nor do they anywhere explain that this book can be considered the founding text for the science of evolutionary biology. Imagine a user coming to the catalog with no prior knowledge of Darwin (*) - they might logically conclude that this is the work of a current scientist, or even a synthesis of arguments around the issue of evolution. From the second book above one could conclude that Darwin hangs with Richard Dawkins, maybe they have offices near each other in the same university.

This may seem absurd, but it is no more absurd than the paucity of information that we offer to users of our catalogs. The description of these books might be suitable to an inventory of the warehouse, but it's hardly what I would consider to be a knowledge organization service. The emphasis in cataloging on description of the physical item may serve librarians and a few highly knowledgeable users, but the fact that publications are not put into a knowledge context makes the catalog a dry list of uninformative items for many users. There are, however, cataloging practices that do not consider describing the physical item the primary purpose of the catalog. One only needs to look at archival finding aids to see how much more we could tell users about the collections we hold. Another area of more enlightened cataloging takes place in the non-book world.

The BIBFRAME AV Modeling Study was commissioned by the Library of Congress to look at BIBFRAME from the point of view of libraries and archives whose main holdings are not bound volumes. The difference between book cataloging and the collections covered by the study is much more than a difference in the physical form of the library's holdings. What the study revealed to me was that, at least in some cases, the curators of the audio-visual materials have a different concept of the catalog's value to the user. I'll give a few examples.

The Online Audiovisual Catalogers have a concept of primary expression, which is something like first edition for print materials. The primary expression becomes the representative of what FRBR would call the work. In the Darwin example, above, there would be a primary expression that is the first edition of Darwin's work. The AV paper says "...the approach...supports users' needs to understand important aspects of the original, such as whether the original release version was color or black and white." (p.13) In our Darwin case, including information about the primary expression would place the work historically where it belongs.

Another aspect of the AV cataloging practice that is included in the report is their recognition that there are many primary creator roles. AV catalogers recognize a wider variety of creation than standards like FRBR and RDA allow. With a film, for example, the number of creators is both large and varied: director, editor, writer, music composer, etc. The book-based standards have a division between creators and "collaborators" that not all agree with, in particular when it comes to translators and illustrators. Although some translations are relatively mundane, others could surely be elevated to a level of being creative works of their own, such as translations of poetry.

The determination of primary creative roles and roles of collaboration are not ones that can be made across the board; not all translators should necessarily be considered creators, not all sound incorporated into a film deserves to get top billing. The AV study recognizes that different collections have different needs for description of materials. This brings out the tension in the library and archives community between data sharing and local needs. We have to allow communities to create their own data variations and still embrace their data for linking and sharing. If, instead, we go forward with an inflexible data model, we will lose access to valuable collections within our own community.

(*) You, dear reader, may live in a country where the ideas of Charles Darwin are openly discussed in the classroom, but in some of the United States there are or have been in the recent past restrictions on imparting that information to school children.

Islandora: Islandora CLAW Lessons: Starting March 1st

planet code4lib - Tue, 2016-02-16 13:49

Looking ahead to Fedora 4? Interested in working with Islandora CLAW? Want to help out but don't know where to start? Want to adopt it and need some training? CLAW Committer Diego Pino will be giving a several-part series of lessons on how to develop in the CLAW project, starting March 1st at 11AM EST and continuing weekly until you're all CLAW experts. These will be held as interactive lessons via Google Hangouts (class size permitting). Registration is completely free but spaces may be limited. Sign up here to take part.

Journal of Web Librarianship: OAI-PMH Harvested Collections and User Engagement

planet code4lib - Tue, 2016-02-16 09:26
DeeAnn Allison

Journal of Web Librarianship: A Review of “Electronic Resource Management”

planet code4lib - Tue, 2016-02-16 09:26
Charlie Sicignano

Journal of Web Librarianship: A "Review of Learning JavaScript Design Patterns"

planet code4lib - Tue, 2016-02-16 09:25
John Rodzvilla

FOSS4Lib Recent Releases: CollectionSpace - 4.3

planet code4lib - Mon, 2016-02-15 22:36

Last updated February 15, 2016. Created by Peter Murray on February 15, 2016.
Log in to edit this page.

Package: CollectionSpaceRelease Date: Monday, February 15, 2016

LITA: Level Up – Gamification for Promotion in the Academic Library

planet code4lib - Mon, 2016-02-15 15:59
Kirby courtesy of Torzk

Let me tell you the truth- I didn’t begin to play games until my late twenties. In my youth, I resisted the siren call of Super Nintendo and Sega Genesis. As an adult, I studiously avoided Playstation and XBox. When the Wii came out, I caved. I am very glad I did, because finding games in my twenties proved to be a tremendous stress reducer, community builder, and creative outlet. I cannot imagine completing my MLIS while working full-time and planning my wedding without Super Smash Bros.

It was a time in my life when I really needed to punch something.

In case you are wondering, I specialize as Kirby and I am a crusher. Beyond video games, I like board games (mainly cooperative ones, like Pandemic) and trivia. Lately, I have also been toying with getting into Dungeons & Dragons because what I really need is more hobbies.

More to the point – This isn’t the first time I’ve talked about the value of gamification or my interest in it on the LITA Blog, but this is a first for me in that I am offering gamification as a tool towards a specific professional goal, namely promotion in the academic library.

A quick note- gamification doesn’t necessarily require technology, though I do recommend apps for this process. In writing this blog post, my key aim is to offer academic librarians looking for a natural starting place to apply gamification in their professional lives a recommended way to do so.

SuperBetter by Jane McGonigal

In the course of pursuing promotion in an academic library or seeking professional development opportunities in the workplace, it can be easy to feel overwhelmed, isolated, and even paralyzed. What if, instead of binging on Girl Scout cookies and listening to sad Radiohead (this may just be me), we chose to work gamefully? What if we framed promotion as a mission for an epic win, with quests, battles, and rewards along the way?

In her book, SuperBetter: A Revolutionary Approach to Getting Stronger, Happier, Braver, and more Resilient — Powered by the Science of Games (phew), Jane McGonigal boldly posits, “Work ethic is not a moral virtue that can be cultivated simply by wanting to be a better person. It’s actually a biochemical condition that can be fostered, purposefully, through activity that increases dopamine levels in the brain.”

She goes on to provide seven rules for implementing her SuperBetter method which are:

  • Challenge yourself.
  • Collect and activate power-ups.
  • Find and battle bad guys.
  • Seek out and complete quests.
  • Recruit allies.
  • Adopt a secret identity.
  • Go for the epic win.

Gamification is still something most of us are figuring out how to incorporate into library programming and services; however, I can think of no better way to begin to understand gamification as a learning theory than to apply it towards your work. Seeing how gamification can help you structure the steps it takes to be promoted in your library will offer inspiration. In the process, you will naturally think of ways to apply gamification to instruction, collection engagement, and other library outreach.

Think of the promotion process through the lens of SuperBetter’s rules. Quests might include identifying and contacting collaborators (allies) for your research project or a coach/mentor for your promotion process. You might make a spreadsheet of conferences you want to present at in the next two or three years. Is there a particularly impressive journal where you would like to publish? That’s a fine quest.

Make sure that as you complete these quests, all part of your effort for the eventual “epic win,” you track your efforts. The road to promotion is one that requires a well-rounded portfolio of activities, and gamifying each will keep you on track. Remember that each quest you complete provides you with a power-up, leaving you with more professional clout and experience, extending your network and leaving you SuperBetter. The quest is its own reward.

Habit RPG – I am a Level 10 Mage with a Panda Companion!

One tool I have found tremendously helpful for framing my own quests towards my promotion is Habit RPG, an app I have mentioned in previous posts. With Habit RPG, I can put all my quests and daily tasks in an already gamified context where I earn fancy armor and other gear for my avatar. SuperBetter has an app component which also looks great. Whether or not you are interested in investigating an app, I would encourage you to read SuperBetter, which is an excellent starting place for thinking about gamification and provides plenty of example and starter quests. Not a reader? No problem. Jane McGonigal has a Ted Talk which sums up the ideas very neatly.

Ultimately, the road to tenure can feel lonely. The solo nature of the pursuit means that no one’s experience is exactly the same. However, by approaching the process through gamification, you can put the joy back into the job. Get questing, and let me know your thoughts on gamifying promotion.

Suzanne Chapman: Web Content Debt is the New Technical Debt

planet code4lib - Mon, 2016-02-15 14:37

We worry a lot about “technical debt” in the IT world. The classic use of this metaphor describes the phenomenon where messy code is written in the interest of quick execution, causing a debt that will need to be repaid (time spent fixing the code later) or it will accumulate interest (additional work on the system will be complicated by the messy code). “Technical debt” is also used more broadly to describe the ongoing maintenance of legacy systems that we spend a great deal of time just keeping alive.

Technical & content debt holds us back from doing new and better things.

But in addition to technical debt, organizations (like libraries) with large websites have a growing problem with what I’ve started calling “content debt.” And like with “deferred maintenance” of buildings (the practice of postponing repairs to save costs), allowing too much technical debt and/or content debt will result in costing you much more in the long run. Beyond the costs, the big problem with technical & content debt is that they hold us back from doing new and better things.

Take for example the website I’m working on right now that currently has over 16,000 pages of content that were created by hundreds of different people over many many years. Redesigning this website isn’t just a matter of developing a new CMS with a more modern design and then hiring a room full of interns to copy and paste from the old CMS to the new CMS. We also need to look closely at all of the existing pages to evaluate what needs to be done differently this time around to ensure a more user-friendly and future-friendly site. It’s no easy task to detangle this mass of pages and the organic processes that generated them.

Some might say that you should just set the old stuff aside and start from scratch but if you don’t take the time to discover what’s causing your problems, you’ve little chance of not replicating them. The wikipedia page for technical debt offers some common causes for technical debt–many of which also fit with my concept of content debt. Here’s my revised version to help illustrate the similarities:

  • Business pressures: organization favors speed of releasing code [or content] as more important than a complete and quality release.
  • Lack of shared standards/best practices: no shared standards/best practices for developers [or content authors] means inconsistent quality of output.
  • Lack of alignment to standards: code [or content] standards/best practices aren’t followed.
  • Lack of knowledge: despite trying to follow standards, the developer [or content author] still doesn’t have the skills to do it properly.
  • Lack of collaboration: code [or content] activities aren’t done together, and new contributors aren’t properly mentored.
  • Lack of process or understanding: organization is blind to debt issues and makes code [or content] without understanding long-term implications beyond the immediate need.
  • Parallel development: parallel efforts, in isolation, to create similar code [or content] result in debt for time to merge things later (e.g., multiple units creating their own (redundant) pages about how to renew books, where to pay fines, how to use ILL, etc.).
  • Delayed refactoring: as a project evolves and issues with code [or content] become unwieldy, the longer remediation is delayed and more code [or content] is added, the debt becomes exponentially larger.
  • Lack of a test suite: results in release of messy code [or content] (e.g., I once worked on a large website with no pre-release environment for testing or training which resulted in a TON of published pages that said things like “looks like I can put some text here”).
  • Lack of ownership: outsourced software [or content] efforts result in extra effort to fix and recreate properly (e.g., content outsourced to interns).
  • Lack of leadership: technical [or UX/content strategy] leadership isn’t present or doesn’t train/encourage/enforce adherence to coding [or content] standards/best practices.

I also find this list useful because when talking about content issues, there’s a risk of seeming judgmental towards the individuals who made said content– but the reality is that there are tons of factors that lead to this “debt” situation. Approaching the problem from all the angles will lead to a more well-rounded solution.

Open Knowledge Foundation: Introducing Viderum

planet code4lib - Mon, 2016-02-15 10:00

Ten years ago, Rufus started CKAN as an “apt-get for data” in order to enable governments and corporations to provide their data as truly open data. Today, CKAN is used by countless open data publishers around the globe and has become the de facto standard.

With CKAN as the technical foundation, Open Knowledge has offered commercial services to governments and public institutions within its so-called Services division for many years. Some of the most prominent open data portals around the world have been launched by the team, including,,,, and—most recently—

Today, we’re spinning off this division into its own company: Viderum.

We’re doing this because we want to lend a stronger focus on further development and promotion of these services without distracting Open Knowledge’s core mission as an advocate for openness and transparency. We’ve also heard from our customers that they are asking for a commercial-grade service offering that is best realized in an organization dedicated to that end.

Viderum’s mission will be simple: to make the world’s public data discoverable and accessible to everyone. They will provide services and products to further expand the reach of open data around the world.

Says CEO of Viderum Sebastian Moleski:

I’m personally very excited about this opportunity to bring open data publishing to the next level. In all reality, the open data revolution has only just begun. As it moves further, it is imperative to build on core principles of openness and interoperability. When it comes to open data, there is no good reason to use closed, proprietary, and expensive solutions that tie governments and public institutions to particular vendors. Viderum will help prove that point again and again.”

As a first step in fulfilling their mission, Viderum is offering a cloud-based, multi-tenant solution to host CKAN that has been live since mid-November. This allows anyone to get their own CKAN instance and publish data without the hassle, cost, and learning curve involved in setting one up individually. By lowering technological barriers, we believe there are now even more reasons for governments, institutions, and local authorities to publish open data for everyone’s use.

Viderum have set up an office in Berlin and are currently hiring developers! If you know anyone who’s passionate about building software and the infrastructure for open data around the world, please pass the link along to them.

To find out more about Viderum, check out their website, read the FAQ or contact the team at


FOSS4Lib Recent Releases: Koha - 3.22.3

planet code4lib - Mon, 2016-02-15 09:53
Package: KohaRelease Date: Friday, February 12, 2016

Last updated February 15, 2016. Created by David Nind on February 15, 2016.
Log in to edit this page.

Koha 3.22.3 is a security release. It includes one security fix, four enhancements and 57 bug fixes.

As this is a security release, we strongly recommend anyone running Koha 3.22.* to upgrade.

See the release announcements for the details:

Terry Reese: MarcEdit: Thinking about Charactersets and MARC

planet code4lib - Sun, 2016-02-14 23:48

The topic of charactersets is likely something most North American catalogers rarely give a second thought to.  Our tools, systems – they all are built around a very anglo-centric world-view that assumes data is primarily structured in MARC21, and recorded in either MARC-8 or UTF8.  However, when you get outside of North America, the question of characterset, and even MARC flavor for that matter, becomes much more relevant.  While many programmers and catalogers that work with library data would like to believe that most data follows a fairly regular set of common rules and encodings – the reality is that it doesn’t.  While MARC21 is the primary MARC encoding for North American and many European libraries – it is just one of around 40+ different flavors of MARC, and while MARC-8 and UTF-8 are the predominate charactersets in libraries coding in MARC21, move outside of North American and OCLC, and you will run into Big5, Cyrillic (codepage 1251), Central European (codepage 1250), ISO-5426, Arabic (codepage 1256), and a range of many other localized codepages in use today.  So while UTF-8 and MARC-8 are the predominate encodings in countries using MARC21, a large portion of the international metadata community still relies on localized codepages when encoding their library metadata.  And this can be a problem for any North American library looking to utilize metadata encoded in one of these local codepages, or share data with a library utilizing one of these local codepages.

For years, MarcEdit has included a number of tools for handling this soup of character encodings – tools that work at different levels to allow the tool to handle data from across the spectrum of different metadata rules, encodings, and markups.  These get broken into two different types of processing algorithms.

Characterset Identification:

This algorithm is internal to MarcEdit and vital to how the tool handles data at a byte level.  When working with file streams for rendering, the tool needs to decide if the data is in UTF-8 or something else (for mnemonic processing) – otherwise, data won’t render correctly in the graphical interface without first determining characterset for use when rendering.  For a long time (and honestly, this is still true today), the byte in the LDR of a MARC21 record that indicates if a record is encoded in UTF-8 or something else, simply hasn’t been reliable.  It’s getting better, but a good number of systems and tools simply forget (or ignore) this value.  But more important for MarcEdit, this value is only useful for MARC21.  This encoding byte is set in a different field/position within each different flavor of MARC.  In order for MarcEdit to be able to handle this correctly, a small, fast algorithm needed to be created that could reliably identify UTF8 data at the binary level.  And that’s what’s used – a heuristical algorthm that reads bytes to determine if the characterset might be in UTF-8 or something else.

Might be?  Sadly, yes.  There is no way to auto detect characterset.  It just can’t happen.  Each codepage reuses the same codepoints, they just assign different characters to those codepoints based on which encoding is in use. So, a tool won’t know how to display textual data without first knowing the set of codepointer rules that data was encoded under.  It’s a real pain the backside.

To solve this problem, MarcEdit uses the following code in an identification function:

int x = 0; int lLen = 0; try { x = 0; while (x < p.Length) { //System.Windows.Forms.MessageBox.Show(p[x].ToString()); if (p[x] <= 0x7F) { x++; continue; } else if ((p[x] & 0xE0) == 0xC0) { lLen = 2; } else if ((p[x] & 0xF0) == 0xE0) { lLen = 3; } else if ((p[x] & 0xF8) == 0xF0) { lLen = 4; } else if ((p[x] & 0xFC) == 0xF8) { lLen = 5; } else if ((p[x] & 0xFE) == 0xFC) { lLen = 6; } else { return RET_VAL_ANSI; } while (lLen > 1) { x++; if (x > p.Length || (p[x] & 0xC0) != 0x80) { return RET_VAL_ERR; } lLen--; } iEType = RET_VAL_UTF_8; } x++; } } catch (System.Exception kk) { iEType= RET_VAL_ERROR } return iEType;

This function allows the tool to quickly evaluate any data at a byte level and identify if that data might be UTF-8 or not.  Which is really handy for my usage.

Character Conversion

MarcEdit has also included a tool that allows users to convert data from one character encoding to another.

This tool requires users to identify the original characterset encoding for the file to be converted.  Without that information, MarcEdit would have no idea which set of rules to apply when shifting the data around based on how characters have been assigned to their various codepoints.  Unfortunately, a common problem that I hear from librarians, especially librarians in the United States that don’t have to deal with regularly this problem, is that they don’t know the file’s original characterset encoding, or how to find it.  It’s a common problem – especially when retrieving data from some Eastern European publishers and Asian publishers.  In many of these cases, users send me files, and based on my experience looking at different encodings, I can make a couple educated guesses and generally figure out how the data might be encoded.

Automatic Character Detection

Obviously, it would be nice if MarcEdit could provide some kind of automatic characterset detection.  The problem is that this is a process that is always fraught with errors.  Since there is no way to definitively determine the characterset of a file or data simply by looking at the binary data – we are left having to guess.  And this is where heuristics comes in again.

Current generation web browsers automatically set character encodings when rendering pages.  This is something that they do based on the presence of metadata in the header, information from the server, and a heuristic analysis of the data prior to rendering.  This is why everyone has seen pages that the browser believes is one character set, but is actually in another, making the data unreadable when it renders.  However, the process that browsers are currently using, well, as sad as this may be, it’s the best we got currently.

And so, I’m going to be pulling this functionality into MarcEdit.  Mozilla has made the algorithm that they use public, and some folks have ported that code into C#.  The library can be found on git hub here:  I’ve tested it – it works pretty well, though is not even close to perfect.  Unfortunately, this type of process works best when you have lots of data to evaluate – but most MARC records are just a few thousand bytes, which just isn’t enough data for a proper analysis.  However, it does provide something — and maybe that something will provide a way for users working with data in an unknown character encodings to actually figure out how their data might be encoded.

The new character detection tools will be added to the next official update of MarcEdit (all versions).

And as I noted – this is a tool that will be added to give users one more tool to evaluating their records.  While detection may still only be a best guess – its likely a pretty good guess.

The MARC8 problem

Of course, not all is candy and unicorns.  MARC8, the lingua franca for a wide range of ILS systems and libraries – well, it complicates things.  Unlike many of the localized codepages that are actually well defined standards and in use by a wide range of users and communities around the world – MARC-8 is not.  MARC8 is essentially a made up encoding – it simply doesn’t exist outside of the small world of MARC21 libraries.  To a heuristical parser evaluating character encoding, MARC-8 looks like one of four different characterset: USASCII, Codepage 1252, ISO-8899, and UTF8.  The problem is that MARC-8, as an escape-base language, reuses parts of a couple different encodings.  This really complicates the identification of MARC-8, especially in a world where other encodings may (probably) will be present.  To that end, I’ve had to add a secondary set of heuristics that will evaluate data after detection so that if the data is identified as one of these four types, some additional evaluation is done looking specifically for MARC-8’s fingerprints.  This allows, most of the time, for MARC8 data to be correctly identified, but again, not always.  It just looks too much like other standard character encodings.  Again, it’s a good reminder that this tool is just a best guess at the characterset encoding of a set of records – not a definitive answer.

Honestly, I know a lot of people would like to see MARC as a data structure retired.  They write about it, talk about it, hope that BibFrame might actually do it.  I get their point – MARC as a structure isn’t well suited for the way we process metadata today.  Most programmers simply don’t work with formats like MARC, and fewer tools exist that make MARC easy to work with.  Likewise, most evolving metadata models recognize that metadata lives within a larger context, and are taking advantage of semantic linking to encourage the linking of knowledge across communities.  These are things libraries would like in their metadata models as well, and libraries will get there, though I think in baby steps.  When you consider the train-wreck RDA adoption and development was for what we got out of it (at a practical level) – making a radical move like BibFrame will require a radical change (and maybe event that causes that change).

But I think that there is a bigger problem that needs more immediate action.  The continued reliance on MARC8 actually posses a bigger threat to the long-term health of library metadata.  MARC, as a structure, is easy to parse.  MARC8, as a character encoding, is essentially a virus, one that we are continuing to let corrupt our data and lock it away from future generations.  The sooner we can toss this encoding to the trash heap, the better it will be for everyone – especially since we are likely the passing of one generation away from losing the knowledge of how this made up character encoding actually works.  And when that happens, it won’t matter how the record data is structured – because we won’t be able to read it anyway.


Suzanne Chapman: UX photo booth 2011 (My ideal library…)

planet code4lib - Sat, 2016-02-13 23:15

A few weeks ago I helped out again with the MLibrary Undergraduate library’s annual “Party for Your Mind” event to welcome the students back and introduce new students to the library.

Like last year, I did a photo booth where I asked the students to complete the sentence “My ideal library ______” and like last year, I got a lovely combination of silly and serious responses. Quiet/Loud and food/sleeping were again popular themes!

My ideal library… loud & fun!

See the full set here.

Suzanne Chapman: MLibrary – by the numbers

planet code4lib - Sat, 2016-02-13 23:14

Last fall I created some graphics for a slide show for our annual library reception event to demonstrate some of what we do via stats and graphics. This was such a fun side project and I couldn’t have done it without the data gathering help of Helen Look and others.

Full set here:


Subscribe to code4lib aggregator