You are here

Feed aggregator

DuraSpace News: Integrative Health Care Case Reports Now Widely Accessible in DSpaceDirect

planet code4lib - Tue, 2014-12-02 00:00

Winchester, MA  Martha Menard, Director of the Crocker Institute, is responsible for day-to-day operations, repository maintenance and overall design of the CaseRe3 Repository for Integrative Health Care Case Reports. She chose DSpace over Fedora for the original implementation in 2011.

District Dispatch: Publishers Weekly honors ALA leadership for library ebook advocacy

planet code4lib - Mon, 2014-12-01 21:35

Sari Feldman and Bob Wolven. Photo by Publishers Weekly.

Today, Publishers Weekly lauded American Library Association (ALA) Digital Content Working Group former co-chairs Sari Feldman and Bob Wolven in the publication’s annual “Publishing People of 2014” recognition for their role in advocating for fair library ebook lending practices.

From 2011–2014, Feldman, who is the incoming ALA president and the executive director of the Cuyahoga County Public Library in Ohio, and Wolven, who is the associate university librarian at Columbia University, led meetings with some of the world’s largest book publishers.

In the Publishers Weekly article, Andrew Albanese writes:

Publishers say discussions with ALA leaders and the DCWG have been instrumental in moving their e-book programs forward. And more importantly, direct lines of communication are now established between publishing executives and library leaders—which Feldman says is unprecedented—and those open lines will prove vital as the digital discussion moves beyond questions of basic access to e-books.

Congratulations Sari and Bob for your well-deserved recognition!

The post Publishers Weekly honors ALA leadership for library ebook advocacy appeared first on District Dispatch.

Shelley Gullikson: Weekly user tests: Finding games

planet code4lib - Mon, 2014-12-01 21:16

Our library has a pretty fantastic game collection of over 100 board games and almost 700 video games. But finding them? Well, it’s pretty easy if you know the exact title of what you want. But a lot of people just want to browse a list. And to get a list of all the video games you can borrow, you have two options:

  • Do a search for “computer games” in Summon and then go to Content Type facet on the left, click “more” and then limit by “Computer File”
  • Go to the library catalogue, select “Call & Other Numbers” and then under “Other Call Number” enter GVD if you want video games, but GVC if you want to see titles available through our Steam account. After that, you get a really useful results screen to browse:

And if you want board games, the content type in Summon is “Realia.”


Obviously, this is ripe for improvement, but how best to improve? User testing!

We set up in the lobby (mostly – see postscript) and asked passing students if they had 2 minutes to answer a question and get a chocolate. We told them that we wanted to improve access to our game collection (alternating “video game” and “board game”) and so wanted to know what they would do to find out what games the library had. We had a laptop with the library website up, ready for them to use.

No one clicked anywhere on the page. No one mentioned the catalogue. They all would search Summon or Google or else ask someone in the library.

We asked them to tell us what search terms they would use, so now we can make sure that those Google searches and Summon searches will bring them to a page that will give them what they want. For Summon, that likely means using Best Bets, but everyone was consistent with the search terms they’d use, so Best Bets should work out okay.

Once we have all that ready, we can test again to see if this will work smoothly for our users. Or if we really do have to tell them about “computer file” and “realia.” [shudder]


When we did testing last December, we set up in our Discovery Centre, a really cool and noisy space where students do a lot of collaborative work. We didn’t have to hustle too much to get participants; students would see our chocolate, come over to find out how to get some, do the test and that was that.

During our tests in the lobby this term, it’s been pretty much all hustle, and even after all these weeks I still don’t really like approaching people (I feel like the credit card lady at the airport that everyone tries to avoid). I kept thinking that we should head up to the Discovery Centre again for that gentler “display the chocolate and they will come” approach.

Well, we tried it today and got exactly one person in 20 minutes, despite lots of traffic. So we went back down to the lobby and got to the “we’re not learning anything new” mark in 15 minutes.

I’ll just have to learn to love the hustle.

District Dispatch: CopyTalk webinar update

planet code4lib - Mon, 2014-12-01 21:03

The next free copyright webinar (60 minutes) is on December 4 at 2pm Eastern Time. This installment of CopyTalk is entitled, “Introducing the Statement of Best Practices in Fair Use of Collections Containing Orphan Works for Libraries, Archives, and Other Memory Institutions” presented by Dave Hansen (UC Berkeley and UNC Chapel Hill) and Peter Jaszi (American University).

CopyTalks are scheduled for the first Thursday of even numbered months.

Two earlier webinars were recorded and archived:

From August 7, 2014 International copyright (with Janice Pilch from Rutgers University Library)

From October 2, 2014 Open licensing and the public domain: Tools and policies to support libraries, scholars, and the public (with Tim Vollmer from the Creative Commons).


The post CopyTalk webinar update appeared first on District Dispatch.

OCLC Dev Network: Opening Up Developer House

planet code4lib - Mon, 2014-12-01 20:30

We're excited to kick off our second Developer House here at the OCLC Developer Network by welcoming 12 library technologists for a week of brainstorming, learning and coding: Bilal Khalid, Bill Jones, Candace Lebel, Emily Flynn, Francis Kayiwa, Janina Sarol, Jason Thomale, Rachel Maderik, Sarah Johnston, Scott Hanrath, Shawn Denny, and Steelsen Smith. This is such a talented group—each person has terrific skills on their own—together they will be unstoppable.

SearchHub: Solr on YARN

planet code4lib - Mon, 2014-12-01 19:44
One of the most important evolutions in the big data landscape is the emergence of best-of-breed distributed computing frameworks. Gone are the days where every big data problem looks like a nail for the MapReduce hammer. Have an iterative machine learning job? Use Spark. Need to perform deep analytics on billions of rows? MapReduce. Need ad-hoc query capabilities? Solr is best. Need to process a stream of tweets in real-time? Spark streaming or Storm. To support this multi-purpose data processing platform, Hadoop 2 introduced YARN (Yet Another Resource Negotiator), which separates resource management and job scheduling from data processing in a Hadoop cluster. In contrast, in Hadoop v1, MapReduce computation was tightly coupled with the cluster resource manager. Put simply, YARN allows different jobs to run in a Hadoop cluster, including MapReduce, Spark, and Storm. In this post, I introduce an open source project developed at Lucidworks for running SolrCloud clusters on YARN ( Economies of Scale Before we get into the details of how it works, let’s understand why you might want to run SolrCloud on YARN. The main benefit to running distributed applications like Solr on YARN is improved operational efficiency. Designing, deploying, and managing a large-scale Hadoop cluster is no small feat. Deploying a new cluster requires investing in dedicated hardware, specialists to install and configure it, and performance / stability testing. In addition, you need to secure the cluster and actively monitor its health. There’s also training employees on how to use and develop solutions for Hadoop. In a nutshell, deploying a Hadoop cluster is a major investment that can take months or even years. The good news is adding additional computing capacity to an existing cluster is much easier than deploying a new cluster. Consequently, it makes good business sense to leverage economies of scale by running as many distributed applications on YARN as possible. If a new application requires more resources, it’s easy to add more HDFS and data nodes. Once a new application is deployed on YARN, administrators can monitor it from one centralized tool. As we’ll see below, running Solr on YARN is very simple in that a system administrator can deploy a SolrCloud cluster of any size using a few simple commands. Another benefit of running Solr on YARN is that businesses can deploy temporary SolrCloud clusters to perform background tasks like re-indexing a large collection. Once the re-index job is completed and index files are safely stored in HDFS, YARN administrators can shutdown the temporary SolrCloud cluster. Nuts and Bolts The following diagram illustrates how Solr on YARN works. Step 1: Run the SolrClient application Prior to running the SolrClient application, you need to upload the Solr distribution bundle (solr.tgz) to HDFS. In addition, the Solr YARN client JAR (solr-yarn.jar) also needs to be uploaded to HDFS, as this is needed to launch the SolrMaster application on one of the nodes in the cluster (step 2 below). hdfs dfs -put solr-yarn/jar solr/ hdfs dfs -put solr.tgz solr/ SolrClient is a Java application that uses the YARN Java API to launch the SolrMaster application in the cluster. Here is an example of how to run the SolrClient: hadoop jar solr-yarn.jar \ -nodes=2 \ -zkHost=localhost:2181 \ -solr=hdfs://localhost:9000/solr/solr.tgz \ -jar=hdfs://localhost:9000/solr/solr-yarn.jar \ -memory 512 \ -hdfs_home=hdfs://localhost:9000/solr/index_data This example requests Solr to be deployed into two YARN containers in the cluster, each having 512M of memory allocated to the container. Notice that you also need to give the ZooKeeper connection string (-zkHost) and location where Solr should create indexes in HDFS (-hdfs_home). Consequently, you need to setup a ZooKeeper ensemble before deploying Solr on YARN; running Solr with the embedded ZooKeeper is not supported for YARN clusters. The SolrClient application blocks until it sees SolrCloud running in the YARN cluster. Step 2: Allocate container to run SolrMaster The SolrClient application tells the ResourceManager it needs to launch the SolrMaster application in a container in the cluster. In turn, the ResourceManager selects a node and directs the NodeManager on the selected node to launch the SolrMaster application. A NodeManager runs on each node in the cluster. Step 3: SolrMaster requests containers to run SolrCloud nodes The SolrMaster performs three fundamental tasks: 1) requests N containers (-nodes) for running SolrCloud nodes from the ResourceManager, 2) configures each container to run the start Solr command, and 3) waits for a shutdown callback to gracefully shutdown each SolrCloud node. Step 4: Solr containers allocated across cluster When setting up container requests, the SolrMaster adds the path to the Solr distribution bundle (solr.tgz) as a local resource to each container. When the container is allocated, the NodeManager extracts the solr.tgz on the local filesystem and makes it available as ./solr. This allows us to simply execute the Solr start script using ./solr/bin/solr. Notice that other applications, such as Spark, may live alongside Solr in a different container on the same node. Step 5: SolrCloud node connects to ZooKeeper Finally, as each Solr starts up, it connects to ZooKeeper to join the SolrCloud cluster. In most cases, it makes sense to configure Solr to use the HdfsDirectoryFactory using the -hdfs_home parameter on the SolrClient (see step 1) as any files created locally in the container will be lost when the container is shutdown. Once the SolrCloud cluster is running, you interact with it using the Solr APIs. Shutting down a SolrCloud cluster One subtle aspect of running SolrCloud in YARN is that the application master (SolrMaster) needs a way to tell each node in the cluster to shutdown gracefully. This is accomplished using a custom Jetty shutdown hook. When each Solr node is launched, the IP address of the SolrMaster is stored in a Java system property: yarn.acceptShutdownFrom. The custom shutdown handler will accept a Jetty stop request from this remote address only. In addition, the SolrMaster computes a secret Jetty stop key that only it knows to ensure it is the only application that can trigger a shutdown request. What’s Next? Lucidworks is working to get the project migrated over to the Apache Solr project, see: In addition, we’re adding YARN awareness to the Solr Scale Toolkit ( and plan to add YARN support for Lucidworks Fusion ( in the near future.

The post Solr on YARN appeared first on Lucidworks.

pinboard: Untitled (

planet code4lib - Mon, 2014-12-01 19:35
RT @no_reply: #code4lib's community scholarship "angel fund" is a few hundred dollars short of funding a second scholarship.

LibUX: 016: Putting the User First with Courtney Greene McDonald

planet code4lib - Mon, 2014-12-01 18:37

Courtney Greene McDonald is the author of Putting the User First: 30 Strategies for Transforming Library Services, The Anywhere Library: a Primer for the Mobile Web, and she the chair of the editorial board for Weave: Journal of Library User Experience.

We gushed about the 2014 SEFLIN Virtual Conference (recordings), how awesome it is that there is no a peer-reviewed journal in our specific field, and a lot more.

When you think about something like Facebook …, they change everything, people get mad, but it’s very sticky. Amazon is very sticky. Google, very sticky. Libraries were in an environment for a very long time where they were sticky.

This also finishes-up our first season! There will be a couple of bonus episodes to round-out the year, and in January we will be coming back atcha with improvements to the audio quality, format, and a series of ten pocasts about the nitty-gritty and red tape of in-house UX.

The post 016: Putting the User First with Courtney Greene McDonald appeared first on LibUX.

LITA: Tell your LITA story

planet code4lib - Mon, 2014-12-01 18:12

Building on ALA Midwinter 2014’s #becauseLITA initiative, members of LITA’s membership development committee want to pull together a short video that captures your response to one of the following prompts:

  • What was your best LITA moment?
  • How has LITA made your life awesome?
  • What interests you most about LITA?

That means we want YOU to participate! Yes, I know – sounds like a lot of pressure to talk on camera, but it’s really not that bad. Plus you’ll get everlasting appreciation from the LITA crew for helping out!

In particular, we are looking to hear the perspectives of LITA members who are students, new professionals and/or new to LITA, and longstanding LITA members.


  • Length can be as brief as a Vine (6 seconds) up to two minutes, though be warned we may need to only use a portion of what you submit. Please keep it short and sweet!
  • Include your name, institution, how long you’ve been a LITA member, and anything else you’d like us to know.
  • Please get it to us by Monday, December 15 so we can work on editing over winter break. Imagine how satisfied you’ll feel to check this off your pre-holiday to-do list!
  • Email videos (or questions) to Brianna at briannahmarshall [at] gmail [dot] com.

Thanks for participating and we can’t wait to see what you come up with!

Jonathan Rochkind: “More library mashups”, with Umlaut chapter

planet code4lib - Mon, 2014-12-01 15:29

I received my author’s copy of More Library Mashups, edited by Nicole Engard.  I notice the publisher’s site is still listing it as “pre-order”, but I think it’s probably available for purchase (in print or e).

Publisher’s site (with maybe cheaper “pre-order” price?)


It’s got a chapter in it by me about Umlaut.

I’m hoping it attracts some more attention and exposure for Umlaut, and maybe gets some more people trying it out.

Consider asking your employing library to purchase a copy of the book for the collection! It looks like it’s got a lot of interesting stuff in it, including a chapter by my colleague Sean Hannan on building a library website by aggregating content services.

Filed under: General

ACRL TechConnect: This Is How I (Attempt To) Work

planet code4lib - Mon, 2014-12-01 14:50

Editor’s Note: ACRL TechConnect blog will run a series of posts by our regular and guest authors about The Setup of our work. The first post is by TechConnect alum Becky Yoose.

Ever wondered how several of your beloved TechConnect authors and alumni manage to Get Stuff Done? In conjunction with The Setup, this is the first post in a series of TechConnect authors, past and present, to show off what tools, tips, and tricks they use for work.

I have been tagged by @nnschiller in his “This is how I work” post. Normally, I just hide when these type of chain letter type events come along, but this time I’ll indulge everyone and dust off my blogging skills. I’m Becky Yoose, Discovery and Integrated Systems Librarian, and this is how I work.

Location: Grinnell, Iowa, United States

Current Gig: Assistant Professor, Discovery and Integrated Systems Librarian; Grinnell College

Current Mobile Device: Samsung Galaxy Note 3, outfitted with an OtterBox Defender cover. I still mourn the discontinuation of the Droid sliding keyboard models, but the oversized screen and stylus make up for the lack of tactile typing.

Current Computer:

Work: HP EliteBook 8460p (due to be replaced in 2015); boots Windows 7

Home: Betty, my first build; dual boots Windows 7 and Ubuntu 14.04 LTS

eeepc 901, currently b0rked due to misjudgement on my part about appropriate xubuntu distros.

Current Tablet: iPad 2, supplied by work.

One word that best describes how you work:

Don’t panic. Nothing to see here. Move along.

What apps/software/tools can’t you live without?

Essential work computer software and tools, in no particular order:

  • Outlook – email and meetings make up the majority of my daily interactions with people at work and since campus is a Microsoft shop…
  • Notepad++ – my Swiss army knife for text-based duties: scripts, notes, and everything in between.
  • PuTTY - Great SSH/Telnet client for Windows.
  • Marcedit – I work with library metadata, so Marcedit is essential on any of my work machines.
  • MacroExpress and AutoIt – Two different Windows automation apps: MacroExpress handles more simple automation (opening programs, templating/constant data, simple workflows involving multiple programs) while AutoIt gives you more flexibility and control in the automation process, including programming local functions and more complex decision-making processes.
  • Rainmeter and Rainlander – These two provide customized desktop skins that give you direct or quicker access to specific system information, functions, or in Rainlander’s case, application data.
  • Pidgin – MPOW uses both LibraryH3lp and AIM for instant messaging services, and I use IRC to keep in touch with #libtechwomen and #code4lib channels. Being able to do all three in one app saves time and effort.
  • Jing – while the Snipping Tool in Windows 7 is great for taking screenshots for emails, Jing has proven to be useful for both basic screenshots and screencasts for troubleshooting systems issues with staff and library users. The ability to save screencasts on is also valuable when working with vendors in troubleshooting problems.
  • CCleaner – Not only does it empty your recycling bin and temporary files/caches, the various features available in one spot (program lists, registry fixes, startup program lists, etc.) make CCleaner an efficient way to do housekeeping on my machines.
  • Janetter (modified code for custom display of Twitter lists) – Twitter is my main information source for the library and technology fields. One feature I use extensively is the List feature, and Janetter’s plugin-friendly set up allows me to highly customize not only the display but what is displayed in the list feeds.
  • Firefox, including these plugins (not an exhaustive list):

For server apps, the main app (beyond putty or vSphere) that I need is Nagios to monitor the library virtual Linux server farm. I also am partial to nano, vim, and apt.

As one of the very few tech people on staff, I need a reliable system to track and communicate technical issues with both library users and staff. Currently the Libraries is piggybacking on ITS’ ticketing system KBOX. Despite being fit into a somewhat inflexible existing structure, it has worked well for us, and since we don’t have to maintain the system, all the better!

Web services: The Old Reader, Gmail, Google Drive, Skype, Twitter. I still mourn the loss of Google Reader.

For physical items, my tea mug. And my hat.

What’s your workspace like?

Take a concrete box, place it in the dead center of the library, cut out a door in one side, place the door opening three feet from the elevator door, cool it to a consistent 63-65 degrees F., and you have my office. Spending 10+ hours a day during the week in this office means a bit of modding is in order:

  • Computer workstation set up: two HP LA2205wg 22 inch monitors (set to appropriate ergonomic distances on desk), laptop docking station, ergonomic keyboard/mouse stand, ergonomic chair. Key word is “ergonomic”. I can’t stress this enough with folks; I’ve seen friends develop RSIs on the job years ago and they still struggle with them today. Don’t go down that path if you can help it; it’s not pretty.
  • Light source: four lamps of varying size, all with GE Daylight 6500K 15 watt light bulbs. I can’t do the overhead lights due to headaches and migraines, so these lamps and bulbs help make an otherwise dark concrete box a little brighter.
  • Three cephalopods, a starfish, a duck, a moomin, and cats of various materials and sizes
  • Well stocked snack/emergency meal/tea corner to fuel said 10+ hour work days
  • Blankets, cardigans, shawls, and heating pads to deal with the cold

When I work at home during weekends, I end up in the kitchen with the laptop on the island, giving me the option to sit on the high chair or stand. Either way, I have a window to look at when I need a few seconds to think. (If my boss is reading this – I want my office window back.)

What’s your best time-saving trick?

Do it right the first time. If you can’t do it right the first time, then make the path to make it right as efficient  and painless as you possibly can. Alternatively, build a time machine to prevent those disastrous metadata and systems decisions made in the past that you’re dealing with now.

What’s your favorite to-do list manager?

The Big Picture from 2012

I have tried to do online to-do list managers, such as Trello; however, I have found that physical managers work best for me. In my office I have a to-do management system that comprises of three types of lists:

  • The Big Picture List (2012 list pictured above)- four big post it sheets on my wall, labeled by season, divided by months in each sheet. Smaller post it notes are used to indicate which projects are going on in which months. This is a great way to get a quick visual as to what needs to be completed, what can be delayed, etc.
  • The Medium Picture List – a mounted whiteboard on the wall in front of my desk. Here specific projects are listed with one to three action items that need to be completed within a certain time, usually within one to two months.
  • The Small Picture List – written on discarded Choice review cards, the perfect size to quickly jot down things that need to be done either today or in the next few days.

Besides your phone and computer, what gadget can’t you live without?

My wrist watch, set five minutes fast. I feel conscientious if I go out of the house without it.

What everyday thing are you better at than everyone else?

I’d like to think that I’m pretty good with adhering to Inbox Zero.

What are you currently reading?

The practice of system and network administration, 2nd edition. Part curiosity, part wanting to improve my sysadmin responsibilities, part wanting to be able to communicate better with my IT colleagues.

What do you listen to while you work?

It depends on what I am working on. I have various stations on Pandora One and a selection of iTunes playlists to choose from depending on the task on hand. The choices range from medieval chant (for long form writing) to thrash metal (XML troubleshooting).

Realistically, though, the sounds I hear most are email notifications, the operation of the elevator that is three feet from my door, and the occasional TMI conversation between students who think the hallway where my office and the elevator are located is deserted.

Are you more of an introvert or an extrovert?

An introvert blessed/cursed with her parents’ social skills.

What’s your sleep routine like?

I turn into a pumpkin at around 8:30 pm, sometimes earlier. I wake up around 4:30 am most days, though I do cheat and not get out of bed until around 5:15 am, checking email, news feeds, and looking at my calendar to prepare for the coming day.

Fill in the blank: I’d love to see _________ answer these same questions.

You. Also, my cats.

What’s the best advice you’ve ever received?

Not advice per se, but life experience. There are many things one learns when living on a farm, including responsibility, work ethic, and realistic optimism. You learn to integrate work and life since, on the farm, work is life. You work long hours, but you also have to rest whenever you can catch a moment.  If nothing else, living on a farm teaches you that no matter how long you put off doing something, it has to be done. The earlier, the better, especially when it comes with shoveling manure.

DuraSpace News: VIVO Project Strategy Meeting at Northwestern

planet code4lib - Mon, 2014-12-01 00:00

This week a 15 member VIVO Strategy Team is meeting at the Northwestern University Library on the Evanston campus to review issues and set goals for the VIVO project.

Representatives include members of the VIVO Leadership Group, the VIVO Steering Group, the VIVO Management Team and others who serve on the Strategy Team. Meeting goals include:

Manage Metadata (Diane Hillmann and Jon Phipps): Why You Should Come to the Jane-athon

planet code4lib - Sun, 2014-11-30 19:59

I know many of you are puzzled by this event, so do take a look at a rundown of the plans on the RDA Toolkit Blog.

Not so surprisingly, we were inspired by the notion of a hackathon, but it had to be focused on something other than computer code and application building. All of us have heard conflicting opinions about whether RDA can be fully functional, whether FRBR works and will benefit users, or whether it’s just all too complicated. The big gap in addressing these questions has been the challenge in doing something hands-on instead of the usual sage-on-the-stage doling out large piles of handouts. There are still realities that need to be recognized, as we take a hands-on look at RDA and build some real RDA data.

First of these realities is that RDA has been in development for a hell of a long time, and the rules (the part that gets the most attention, and some think really IS the whole of RDA) started out as AACR3. As one who’s been watching this space (from the outside and the inside) since the beginning, I can confirm that the the notion of AACR3 is a historical artifact, nothing to do with what RDA has become.

I’ve been ranting and railing for years (too many to count) that RDA must be more than rules. And it is–see the RDA Registry for evidence of that. This leads me to the second reality: all of us are learning as we go. The first iteration of the RDA Vocabularies, developed by the DCMI/RDA Task Group after a famous meeting in London in the Spring of 2007, were never published. The published version, much improved, was released early in 2013 along with the new RDA Registry. The learning-by-doing was happening in a lot of other standards-focused groups: IFLA and W3C for example. FRBR, an essential part of the RDA model, was evolving along with RDA, and that fact led to a couple of interesting compromises, still working themselves out.

I can promise you that the Jane-athon will reflect all of those realities, and in addition build out the community familiar with the lessons yet to be learned. There won’t be any papering over of gaps, downplaying of issues, or anything like that. At the Jane-athon we will demonstrate that building real RDA records in the context of FRBR is not a future dream, it’s happening now. What you will see as a participant is the reality–the ability to work within a FRBR flow, to import MARC records and see the system map them into FRBR constructs, to create links with NAF information, and view the results as a tree that highlights the relationships.

Perhaps most important, we want to have fun with this. There will be no quizzes, no grades, no transcripts. That’s why we chose to focus on two sets of materials with great potential to benefit from a FRBR-based approach. Early in the day you’ll walk through the business of creating cataloging for Blade Runner resources (original book by Philip K. Dick), translations, film, etc. After that we’ll turn the group loose on Jane Austen (with some made-ahead basic data). After the flurry of data creation, we’ll be looking at the results, highlighting issues that come up, and not incidentally, getting some feedback from the participants about the tools, processes, and the beta-Jane-athon in its entirety.

We know (and welcome the fact) that not everyone attending will be a cataloger, much less all that familiar with RDA. There will be a place and a role for everyone who wants to learn more, and to dig in and get their hands [virtually] dirty. There is no need to cram for this event, or to study the RDA rules or cataloging before you come. If you want to get a bit familiar with RIMMF before you come, by all means take a look at the site, download the software, and play. The only requirement is an open mind and some excitement about the possibilities (some trepidation is okay too).

Once you have registered for ALA Midwinter in Chicago, you can sign up for the Jane-athon.The Jane-athon is already available as a paid addition of the full registration for ALA Midwinter in Chicago.

Please feel free to use the comments portion of this post to ask questions, or use the RDA-L list to bring up questions and concerns.

We hope to see you there!

Terry Reese: MarcEdit Automatic Field Translation Plug-in

planet code4lib - Sat, 2014-11-29 17:27

While experimenting with doing automatic language translation using the Microsoft Translation API, I got a couple of questions from users asking if this same process could be applied to doing automatic field translation to create localized searching indexes of subject terms.  The specific use case proposed was the generation of a single 653 that included automated translations of the 650$a.  Since this is likely a pretty specific use case with a limited audience, I’ve created this process as a plug-in.  If you are interested in seeing how this works, please see the following video:

If you have questions, let me know.



LibUX: Customer Journey Maps Have the Biggest Impact

planet code4lib - Fri, 2014-11-28 23:46

We probably have other things to worry about than usability testing, like where to find time to test, but even small-scale budget usability will make a positive impact. So when time is short, what usability testing method has the most bang for its buck?

The following chart shows the popularity of various testing methods by companies trying to optimize conversion rates.

“Which of the following methods do you currently use to improve conversion rates?”

In this 2014 Conversion Optimization Report by Econsultancy, of the companies that used customer journey analysis, copy optimization and segmentation, 95% saw an improvement in their website conversion, compared to an average of 72% among other respondents (my emphasis).

This post is part of a new series that we hope you can use as a reference to make smart usability decisions. If you’re interested in more of the same, follow @libuxdata on Twitter, or continue the conversation in our Facebook group.

The post Customer Journey Maps Have the Biggest Impact appeared first on LibUX.

Open Knowledge Foundation: Competition now open – enter your app and win 5,000 euro

planet code4lib - Fri, 2014-11-28 13:55

This is a cross-post by Ivonne Jansen-Dings, originally published on the Apps4Europe blog, see the original here.

With 10 Business Lounges happening throughout Europe this year, Apps for Europe is trying to find the best open data applications and startups that Europe has to offer. We invite all developers, startups and companies that use open data as a recourse to join our competition and win a spot at the International Business Lounge @ Future Everything in February 2015.

Last year’s winner has shown the potential of using open data to enhance their company and expand their services. Since the international Business Lounge at Future Everything last year they were able to reach new cities and raise almost 140.000,- in crowdfunding. A true success story!   Over the past years many local, regional and national app competitions in Europe have been organized to stimulated developers and companies to build new applications with open data. Apps for Europe has taken it to the next level. By adding Business Lounges to local events we introduce the world of open data development to that of investors, accelerators, incubators and more.   Thijs Gitmans, Peak Capital: “The Business Lounge in Amsterdam had a professional and personal approach. I am invited to this kind of meetings often, and the trigger to actually go or cancel last minute 99% of the time has to do with proper, timely and personal communication.”   The Apps for Europe competitions will run from 1 September to 31 December 2014, with the final at Future Everything in Manchester, UK, on 26-27 February 2015.

Read more about Apps4Europe here.

DPLA: Giving Thanks: Top 10 Colonial Facial Hair Inspirations

planet code4lib - Fri, 2014-11-28 07:30

This week was a time for people to give thanks—this includes showing some gratitude for some awe-inspiring beards and mustaches. In a continuation of our “Movember” series, we’re throwing it back to the colonial era for some facial hair inspiration.

 This week: Styles of the early settlers. 

The first landing of the Pilgrims, 1620. Governor Berkley and Nathaniel Bacon. Sir Walter Raleigh. Landing of the Pilgrims. Embarkation of the Pilgrims. Early settlers on their way to church. Embarkation of the “Pilgrim Fathers.” Colonists reaching Connecticut. The landing of the colonists. A Pilgrim parting from his family.






































And a bonus: Just because your mustache is strong, doesn’t mean your hairstyle should suffer. Try one of these colonial wig styles to compliment your new look.

“The Colonists At Home,” wig styles.

Terry Reese: MarcEdit 6 Update

planet code4lib - Fri, 2014-11-28 06:27

Happy Thanksgiving to those celebrating.  Rather than over indulging in food, my family and I spent our day relaxing and enjoying some down time together.  After everyone went to bed, I had a little free time and decided to wrap up the update I’ve been working on.  This update includes the following changes:

  • Language File changes.
  • Export/Delete Selected Records: UI changes
  • Biblinker — updated the tool to provide support for linking to FAST headings when available in the record
    • Updated the fields processed (targeted to ignore uncontrolled or local items)
  • Z39.50 Client — Single Search, multiple databases selected bug when number of results exceed data limit, blank data would be returned.
  • RDA Helper Bug Fix — Updated an error where under certain conditions, bracketed data would be incorrectly parsed.
  • Miscellaneous UI changes to support language changes


The Language file changes represent a change in how internationalization of the interface works.  Master language files are now hosted on GItHub, with new files added on update.  The language files are automatically generated, so they are not as good as if they were done by an individual – though some individuals are looking at the files and providing updates.  My hope is that through this process of automated language generation, coupled with human intervention, the new system will significantly help non-English speakers.  But I guess time will tell.

The download can be found by using the automated update tool in MarcEdit, or downloading the update from:

pinboard: Code4LibBC Day 1: Lightning Talks Part 1 | Learning LibTech

planet code4lib - Thu, 2014-11-27 21:40
RT @TheRealArty: Code4LibBC Day 1: Lightning Talks Part 1 #c4lbc #code4lib

Lukas Koster: Analysing library data flows for efficient innovation

planet code4lib - Thu, 2014-11-27 12:24

In my work at the Library of the University of Amsterdam I am currently taking a step forward by actually taking a step back from a number of forefront activities in discovery, linked open data and integrated research information towards a more hidden, but also more fundamental enterprise in the area of data infrastructure and information architecture. All for a good cause, for in the end a good data infrastructure is essential for delivering high quality services in discovery, linked open data and integrated research information.
In my role as library systems coordinator I have become more and more frustrated with the huge amounts of time and effort spent on moving data from one system to another and shoehorning one record format into the next, only to fulfill the necessary everyday services of the university library. Not only is it not possible to invest this time and effort productively in innovative developments, but this fragmented system and data infrastructure is also completely unsuitable for fundamental innovation. Moreover, information provided by current end user services is fragmented as well. Systems are holding data hostage. I have mentioned this problem before in a SWIB presentation. The issue was also recently touched upon in an OCLC Hanging Together blog post: “Synchronizing metadata among different databases” .

Fragmented data (SWIB12)

In order to avoid confusion in advance: when using the term “data” here, I am explicitly not referring to research data or any other specific type of data. I am using the term in a general sense, including what is known in the library world as “metadata”. In fact this is in line with the usage of the term “data” in information analysis and system design practice, where data modelling is one of the main activities. Research datasets as such are to be treated as content types like books, articles, audio and people.

It is my firm opinion that libraries have to focus on making their data infrastructure more efficient if they want to keep up with the ever changing needs of their audience and invest in sustainable service development. For a more detailed analysis of this opinion see my post “(Discover AND deliver) OR else – The future of the academic library as a data services hub”. There are a number of different options to tackle this challenge, such as starting completely from scratch, which would require huge investments in resources for a long time, or implementing some kind of additional intermediary data warehouse layer while leaving the current data source systems and workflows in place. But for all options to be feasible and realistic, a thorough analysis of a library’s current information infrastructure is required. This is exactly what the new Dataflow Inventory project is about.

The project is being carried out within the context of the short term Action Plans of the Digital Services Division of the Library of the University of Amsterdam, and specifically the “Development and improvement of information architecture and dataflows” program. The goal of the project is to describe the nature and content of all internal and external datastores and dataflows between internal and external systems in terms of object types (such as books, articles, datasets, etc.) and data formats, thereby identifying overlap, redundancy and bottlenecks that stand in the way of efficient data and service management. We will be looking at dataflows in both front and back end services for all main areas of the University Library: bibliographic, heritage and research information. Results will be a logical map of the library data landscape and recommendations for possible follow up improvements. Ideally it will be the first step in the Cleaning-Reconciling-Enriching-Publishing data chain as described by Seth van Hooland and Ruben Verborgh in their book “Linked Data for Libraries, Archives and Museums”.

The first phase of this project is to decide how to describe and record the information infrastructure in such a form that the data map can be presented to various audiences in a number of ways, and at the same time can be reused in other contexts on the long run, for instance designing new services. For this we need a methodology and a tool.

At the university library we do not have any thorough experience with describing an information infrastructure on an enterprise level, so in this case we had to start with a clean slate. I am not at all sure that we came up with the right approach in the end. I hope this post will trigger some useful feedback from institutions with relevant experience.

Since the initial and primary goal of this project is to describe the existing infrastructure instead of a desired new situation, the first methodological area to investigate appears to be Enterprise Architecture (interesting to see that Wikipedia states “This article appears to contain a large number of buzzwords“). Because it is always better to learn from other people’s experiences than to reinvent all four wheels, we went looking for similar projects in the library, archive and museum universe. This proved to be rather problematic. There was only one project we could find that addresses a similar objective, and I happened to know one of the project team members. The Belgian “Digital library system’s architecture study” (English language report here)” was carried out for the Flemish Public Library network Bibnet, by Rosemie Callewaert among others. Rosemie was so kind to talk to me and explain the project objectives, approaches, methods and tools used. For me, two outcomes of this talk stand out: the main methodology used in the project is Archimate, which is an Enterprise Architecture methodology, and the approach is completely counter to our own approach: starting from the functional perspective as opposed to our overview of the actual implemented infrastructure. This last point meant we were still looking at a predominantly clean slate.
Archimate also turned out to be the method of choice used by the University of Amsterdam central enterprise architecture group, whom we also contacted. It became clear that in order to use Archimate efficiently, it is necessary to spend a considerable amount of time on mastering the methodology. We looked for some accessible introductory information to get started. However the official Open Group Archimate website is not as accessible as desired in more than one way. We managed to find some documentation anyway, for instance the direct linkt to the Archimate specification and the free document “Archimate made practical”. After studying this material we found that Archimate is a comprehensive methodology for describing business, application and technical infrastructure components, but we also came to the conclusion that for our current short term project presentation goals we needed something that could be implemented fairly soon. We will keep Archimate in mind for the intermediate future. If anybody is interested, there is a good free open source modelling tool available, Archi. Other Enterprise Architecture methodologies like Business Process Modelling focus more on workflows than on existing data infrastructures. Turning to system design methods like UML (Unified Modelling Language) we see similar drawbacks.

An obvious alternative technique to consider is Dataflow Diagramming (DFD) (what’s in a name?), part of the Structured Design and Structured Analysis methodology, which I had used in previous jobs as systems designer and developer. Although DFD’s are normally used for describing functional requirements on a conceptual level, with some tweaking they can also be used for describing actual system and data infrastructures, similar to the Archimate Application and Infrastructure layers. The advantage of the DFD technique is that it is quite simple. Four elements are used to describe the flow of information (dataflows) between external entities, processes and datastores. The content of dataflows and datastores can be specified in more detail using a data dictionary. The resulting diagrams are relatively easy to comprehend. We decided to start with using DFD’s in the project. All we had left to do was find a good and not too expensive tool for it.

Basic DFD structure

There are basically two types of tools for describing business processes and infrastructures: drawing tools, focusing on creating diagrams, and repository based modelling tools, focused on reusing the described elements. The best known drawing tool must be MicroSoft Visio, because it is part of their widely used Office Suite. There are a number of other commercial and free tools, among which the free Google Drive extension Although most drawing tools cover a wide range of methods and techniques, they don’t usually support reuse of elements with consistent characteristics in other diagrams. Also, diagrams are just drawings, they can’t be used to generate data definition scripts or basic software modules or reverse engineering or flexible reporting. Repository based tools can do all these things. Reuse, reporting, generating, reverse engineering and import and export features are exactly the features we need. We also wanted a tool that supports a number of other methods and techniques for employing in other areas of modelling, design and development. There are some interesting free or open source tools, like OpenModelSphere (which supports UML, ERD Data modelling and DFD), and a range of commercial tools. To cut a long story short we selected the commercial design and management tool Visual-Paradigm because it supports a large number of methodologies with an extensive feature set in a number of editions for reasonable fees. An additional advantage is the online shared teamwork repository.

After acquiring the tool we had to configure it the way we wanted to use it. We decided to try and align the available DFD model elements to the Archimate elements so it would in time be possible to move to Archimate if that would prove to be a better method for future goals. Archimate has Business Service and Business Process elements on the conceptual business level, and Application Component (a “system”), Application Function (a “module”) and Application Service (a “function”) elements on the implementation level.

Basic Archimate Structure

In our project we will mainly focus on the application layer, but with relations to the business layer. Fortunately, the DFD method supports a hierarchical process structure by means of the decomposition mechanism, so the two hierarchical structures Business Service – Business Process and Application Component – Application Function – Application Service can be modeled using DFD. There is an additional direct logical link between a Business Process and the Application Service that implements it. By adding the “stereotypes” feature from the UML toolset to the DFD method in Visual Paradigm, we can effectively distinguish between the five process types (for instance by colour and attributes) in the DFD.

Archimate DFD alignment

So in our case, a DFD process with a “system” stereotype represents a top level Business Service (“Catalogue”, “Discover”, etc.) and a “process” process within “Cataloguing” represents an activity like “Describe item”, “Remove item”, etc. On the application level a “system” DFD process (Application Component) represents an actual system, like Aleph or Primo, a “module” (Application Function) a subsystem like Aleph CAT or Primo Harvesting, and a “function” (Application Service) an actual software function like “Create item record”.
A DFD datastore is used to describe the physical permanent and temporary files or databases used for storing data. In Archimate terms this would probably correspond with a type of “Artifact” in the Technical Infrastructure layer, but that might be subject for interpretation.
Finally an actual dataflow describes the data elements that are transferred between external entities and processes, between processes, and between processes and datastores, in both directions. In DFD, the data elements are defined in the data dictionary in the form of terms in a specific syntax that also supports optionality, selection and iteration, for instance:

  • book = title + (subtitle) + {author} + publisher + date
  • author = name + birthdate + (death date)

In Archimate there is a difference in flows in the Business and Application layers. In the Business layer a flow can be specified by a Business Object, which indicates the object types that we want to describe, like “book”, “person”, “dataset”, “holding”, etc. The Business Object is realised as one or more Data Objects in the Application Layer, thereby describing actual data records representing the objects transferred between Application Services and Artifacts. In DFD there is no difference between a business and a dataflow. In our project we particularly want to describe business objects in dataflows and datastores to be able to identify overlap and redundancies. But besides that we are also interested in differences in data structure used for similar business objects. So we do have to distinguish between business and data objects in the DFD model. In Visual-Paradigm this can be done in a number of ways. It is possible to add elements from other methodologies to a DFD with links between dataflows or datastores and the added external elements. Data structures like this can also be described in Entity Relationship Diagrams, UML Class Diagrams or even RDF Ontologies.
We haven’t decided on this issue yet. For the time being we will employ the Visual Paradigm Glossary tool to implement business and data object specifications using Data Dictionary terms. A specific business object (“book”) will be linked to a number of different dataflows and datastores, but the actual data objects for that one business object can be different, both in content and in format, depending on the individual dataflows and datastores. For instance a “book” Business Object can be represented in one datastore as an extensive MARC record, and in another as a simple Dublin Core record.

Example bibliographic dataflows

After having determined method, tool and configuration, the next step is to start gathering information about all relevant systems, datastores and dataflows and describing this in Visual Paradigm. This will be done by invoking our own internal Digital Services Division expertise, reviewing applicable documentation, and most importantly interviewing internal and external domain experts and stakeholders.
Hopefully the resulting data map will provide so much insight that it will lead to real efficiency improvements and really innovative services.


Subscribe to code4lib aggregator