You are here

Feed aggregator

OCLC Dev Network: Today at Developer House: Data strategy, lightning talks, oh my!

planet code4lib - Thu, 2014-12-04 02:30

Today’s developer house activities were a real adventure. The morning was filled with an overview of OCLC’s data strategy and plans for exposing entities. The later half of the morning saw a bevy of lightning talks ranging from user experience to Hadoop and inspired lots of great conversations over lunch.

SearchHub: Infographic: Gender Gap – Women in Technology

planet code4lib - Thu, 2014-12-04 01:00
Women that choose careers in technology and other STEM fields are pivotal to technological innovation but they are increasing relegated to the sidelines inside their own organizations. Here’s a snapshot of the gender gap in technology and how it compares to the rest of the workforce – and why we should reprogram the gender balance:

The post Infographic: Gender Gap – Women in Technology appeared first on Lucidworks.

DuraSpace News: CALL for OR2015 Scholarship Programme Applicants

planet code4lib - Wed, 2014-12-03 00:00

Indianapolis, IN  The Tenth International Conference on Open Repositories ( , OR2015, will take place on June 8-11, 2015 in Indianapolis (Indiana, USA). The organizers are pleased to invite you to apply to the 2015 Scholarship Programme.

District Dispatch: Put your library on Digital Inclusion map before Dec 12!

planet code4lib - Tue, 2014-12-02 21:19

Last call! Add your voice now to a nationally representative study of public libraries and the roles they play in community digital inclusion. Participate in the 2014 Digital Inclusion Survey by December 12 to add your library to interactive community maps and support efforts to educate policymakers and the media about modern library services, resources and infrastructure.

Participation in the survey can also help your library identify the impacts of public computer and Internet access on your community and demonstrate library contributions to community digital inclusion efforts. The study is funded by the Institute of Museum and Library Services, and conducted by the American Library Association (ALA), the Information Policy & Access Center (iPAC) at University of Maryland, the International City/County Management Association (ICMA), and Community Attributes International (CAI).

Find your community on our new interactive map here and check out the rest of our tools and resources here. With your help we can further build on these tools and products with the 2014 survey results. To participate, go to and follow the Take Survey Now button. The survey is open until December 12. (By participating you can also register to win one of three Kindles!)

Questions? E-mail Thank you!

The post Put your library on Digital Inclusion map before Dec 12! appeared first on District Dispatch.

OCLC Dev Network: Four Projects Started at Developer House

planet code4lib - Tue, 2014-12-02 20:30

Developer House is underway!  We have four teams working on four different projects.  Each of us have the same goal: We will have fun developing these projects and we will have working code to demonstrate on Friday morning.

FOSS4Lib Upcoming Events: CollectionSpace: Getting it up and running at your museum

planet code4lib - Tue, 2014-12-02 19:24
Date: Monday, February 9, 2015 - 12:00 to 17:00Supports: CollectionSpace

Last updated December 2, 2014. Created by Peter Murray on December 2, 2014.
Log in to edit this page.

This workshop is designed for anyone interested in or tasked with the technical setup and configuration of CollectionSpace for use in any collections environment (museum, library, special collection, gallery, etc. For more information about CollectionSpace, visit

HangingTogether: Gifts for archivists (and librarians)?

planet code4lib - Tue, 2014-12-02 17:44

Last year we asked on the ArchiveGrid blog for suggestions for gifts for archivists — and we were blown away by the number (and quality!) of suggestions (posted in 24 fun and practical gifts for archivists). This year, we’re moving the conversation over to HangingTogether and extending the fun to librarians. So, librarians and archivists, what would you like as a gift? We’ll assemble the best of the best and post them in a week or two. Then it’s up to you to leave the link for that special someone to find. Or use it to treat your colleagues. We look forward to your suggestions in the comments below!

[Untitled, Anacostia family c. 1950. Smithsonian Institution]

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (275)

David Rosenthal: Henry Newman's Farewell Column

planet code4lib - Tue, 2014-12-02 16:00
Henry Newman has been writing a monthly column on storage technology for Enterprise Storage Forum for 12 years, and he's decided to call it a day. His farewell column is entitled Follow the Money: Picking Technology Winners and Losers and it starts:
I want to leave you with a single thought about our industry and how to consistently pick technology winners and losers. This is one of the biggest lessons I’ve learned in my 34 years in the IT industry: follow the money.Its an interesting read. Although Henry has been a consistent advocate for tape for "almost three decades", he uses tape as an example of the money drying up. He has a table showing that the LTO media market is less than half the size it was in 2008. He estimates that the total tape technology market is currently about $1.85 billion, whereas the disk technology market it around $35 billion.
Following the money also requires looking at the flip side and following the de-investment in a technology. If customers are reducing their purchases of a technology, how can companies justify increasing their spending on R&D? Companies do not throw good money after bad forever, and at some point they just stop investing.Go read the whole thing and understand why Henry's regular column will be missed, and how perceptive the late Jim Gray was when in 2006 he stated that Tape is Dead, Disk is Tape, Flash is Disk.

Open Knowledge Foundation: Introducing Open Education Data

planet code4lib - Tue, 2014-12-02 15:29

Open education data is a relatively new area of interest with only dispersed pockets of exploration having taken place worldwide. The phrase ‘open education data’ remains loosely defined but might be used to refer to:

  • all openly available data that could be used for educational purpose
  • open data that is released by education institutions

Understood in the former sense, open education data can be considered a subset of open education resources (OERs) where data sets are made available for use in teaching and learning. These data sets might not be designed for use in education, but can be repurposed and used freely.

In the latter sense, the interest is primarily around the release of data from academic institutions about their performance and that of their students. This could include:

  • Reference data such as the location of academic institutions
  • Internal data such as staff names, resources available, personnel data, identity data, budgets
  • Course data, curriculum data, learning objectives,
  • User-generated data such as learning analytics, assessments, performance data, job placements
  • Benchmarked open data in education that is released across institutions and can lead to change in public policy through transparency and raising awareness.

Last week I gave a talk at the at the LTI NetworkED Seminar series run by the London School of Economics Learning Technology and Innovation Department introducing open education data. The talk ended up being a very broad overview of how we can use open data sets to meet educational needs and the challenges and opportunities this presents, so for example issues around monitoring and privacy. Prior to giving the talk I was interviewed for the LSE blog.

A video of the talk is available on the CLTSupport YouTube Channel and embedded below.

DPLA: Open Technical Advisory Committee Call: Wednesday, December 3, 2:00 PM Eastern

planet code4lib - Tue, 2014-12-02 14:00

The DPLA Technical Advisory Committee will lead an open committee call on Wednesday, December 3 at 2:00 PM Eastern. To register, complete the short registration form available via the link below.

  1. AWS migration
  2. Ingestion development
  3. Frontend usability assessment work
  4. Recent open source contributions (non-DPLA-specific projects) by tech team members
  5. Upcoming events with DPLA tech team participation
  6. DPLA Hubs application
  7. Questions, comments, and open discussion

Islandora: Research Data in Islandora

planet code4lib - Tue, 2014-12-02 13:01

The idea of storing research data in Islandora has come up fair bit lately at camps and on the listserv, so here is a little overview of the current state of tools and projects that touch on the topic:

  • Combining the Compound Solution Pack with the Binary Solution Pack can get your data into Islandora and make it browsable. The Binary SP, which is still in development, can accommodate any kind of data with a barebones ingestion that adds only the objects necessary for Fedora. The Compound SP can be used to 'attach' these files to a parent object more suitable to display and browsing, such as a PDF or image.
  • Islandora Scholar contains tools for disseminating information about citations. When used in conjunction with the Entities Solution Pack (recently offered to the Islandora Foundation and likely to be in the 7.x-1.5 release next year), it can manage authority records for scholars and projects.
  • The Data Solution Pack, being developed by Alex Garnett at Simon Fraser University, uses Ethercalc to display and manipulate data from XLSX, XLS, ODS, and CSV sources in a spreadsheet viewer.
  • Simon Fraser also has a Research Data Repository environment with SFUdora, which supports DDI and desktop synchronization. It is demonstrated here by Alex Garnett.
  • Research data can also be handled by using existing solution packs in novel ways. One of the first Islandora projects at UPEI involved storing electron microscope images with the Large Image Solution Pack, which was perfectly suited to storing and presenting such massive files. UPEI has also employed the Image Annotation Solution Pack to steward and annotate goat anatomy photos for veterinary students.
  • A quantum chemist at UPEI is updating the Chemistry Solution Pack to work with Islandora 7.x.
  • UPEI is also developing a Biosciences Solution Pack to serve their biodiversity and bioscience wetlab.
  • The UPEI team is developing integration of the DDC Data Management Planning Tool into the Islandora stack, with work nearly complete.
  • CNR IPSP and CNR IRCrES in Italy are using Islandora to store, preserve, and make accessible scientific data produced by the Institute of Plant Virology and the Institute of Plant Protection of the Italian National Research Council with the V2P2 project. This repository handles data relating to plant, microorganism, and virus interactions.
  • The University of Toronto Scarborough has begun a project for Learning in Neural Circuits. More projects are in development, such as Eastern Himalaya Research Network and Mediating Israel. A broader Research Commons service is also in the works.
  • The Smithsonian Institute uses a heavily customized Islandora instance called SIdora for field research data.

Are you working with research data in islandora? Are you planning to? Contact us and share your story.

Ed Summers: Inter-face

planet code4lib - Tue, 2014-12-02 01:49

Image from page 315 of “The elements of astronomy; a textbook” (1919)

Every document, every moment in every document, conceals (or reveals) an indeterminate set of interfaces that open into alternate spaces and temporal relations.

Traditional criticism will engage this kind of radiant textuality more as a problem of context than a problem of text, and we have no reason to fault that way of seeing the matter. But as the word itself suggests, “context” is a cognate of text, and not in any abstract Barthesian sense. We construct the poem’s context, for example, by searching out the meanings marked in the physical witnesses that bring the poem to us. We read those witnesses with scrupulous attention, that is to say, we make our detailed way through the looking glass of the book and thence to the endless reaches of the Library of Babel, where every text is catalogued and multiple cross-referenced. In making the journey we are driven far out into the deep space, as we say these days, occupied by our orbiting texts. There objects pivot about many different points and poles, the objects themselves shapeshift continually and the pivots move, drift, shiver, and even dissolve away. Those transformations occur because “the text” is always a negotiated text, half perceived and half created by those who engage with it.

Radiant Textuality by Jerome McGann

DuraSpace News: Integrative Health Care Case Reports Now Widely Accessible in DSpaceDirect

planet code4lib - Tue, 2014-12-02 00:00

Winchester, MA  Martha Menard, Director of the Crocker Institute, is responsible for day-to-day operations, repository maintenance and overall design of the CaseRe3 Repository for Integrative Health Care Case Reports. She chose DSpace over Fedora for the original implementation in 2011.

District Dispatch: Publishers Weekly honors ALA leadership for library ebook advocacy

planet code4lib - Mon, 2014-12-01 21:35

Sari Feldman and Bob Wolven. Photo by Publishers Weekly.

Today, Publishers Weekly lauded American Library Association (ALA) Digital Content Working Group former co-chairs Sari Feldman and Bob Wolven in the publication’s annual “Publishing People of 2014” recognition for their role in advocating for fair library ebook lending practices.

From 2011–2014, Feldman, who is the incoming ALA president and the executive director of the Cuyahoga County Public Library in Ohio, and Wolven, who is the associate university librarian at Columbia University, led meetings with some of the world’s largest book publishers.

In the Publishers Weekly article, Andrew Albanese writes:

Publishers say discussions with ALA leaders and the DCWG have been instrumental in moving their e-book programs forward. And more importantly, direct lines of communication are now established between publishing executives and library leaders—which Feldman says is unprecedented—and those open lines will prove vital as the digital discussion moves beyond questions of basic access to e-books.

Congratulations Sari and Bob for your well-deserved recognition!

The post Publishers Weekly honors ALA leadership for library ebook advocacy appeared first on District Dispatch.

Shelley Gullikson: Weekly user tests: Finding games

planet code4lib - Mon, 2014-12-01 21:16

Our library has a pretty fantastic game collection of over 100 board games and almost 700 video games. But finding them? Well, it’s pretty easy if you know the exact title of what you want. But a lot of people just want to browse a list. And to get a list of all the video games you can borrow, you have two options:

  • Do a search for “computer games” in Summon and then go to Content Type facet on the left, click “more” and then limit by “Computer File”
  • Go to the library catalogue, select “Call & Other Numbers” and then under “Other Call Number” enter GVD if you want video games, but GVC if you want to see titles available through our Steam account. After that, you get a really useful results screen to browse:

And if you want board games, the content type in Summon is “Realia.”


Obviously, this is ripe for improvement, but how best to improve? User testing!

We set up in the lobby (mostly – see postscript) and asked passing students if they had 2 minutes to answer a question and get a chocolate. We told them that we wanted to improve access to our game collection (alternating “video game” and “board game”) and so wanted to know what they would do to find out what games the library had. We had a laptop with the library website up, ready for them to use.

No one clicked anywhere on the page. No one mentioned the catalogue. They all would search Summon or Google or else ask someone in the library.

We asked them to tell us what search terms they would use, so now we can make sure that those Google searches and Summon searches will bring them to a page that will give them what they want. For Summon, that likely means using Best Bets, but everyone was consistent with the search terms they’d use, so Best Bets should work out okay.

Once we have all that ready, we can test again to see if this will work smoothly for our users. Or if we really do have to tell them about “computer file” and “realia.” [shudder]


When we did testing last December, we set up in our Discovery Centre, a really cool and noisy space where students do a lot of collaborative work. We didn’t have to hustle too much to get participants; students would see our chocolate, come over to find out how to get some, do the test and that was that.

During our tests in the lobby this term, it’s been pretty much all hustle, and even after all these weeks I still don’t really like approaching people (I feel like the credit card lady at the airport that everyone tries to avoid). I kept thinking that we should head up to the Discovery Centre again for that gentler “display the chocolate and they will come” approach.

Well, we tried it today and got exactly one person in 20 minutes, despite lots of traffic. So we went back down to the lobby and got to the “we’re not learning anything new” mark in 15 minutes.

I’ll just have to learn to love the hustle.

District Dispatch: CopyTalk webinar update

planet code4lib - Mon, 2014-12-01 21:03

The next free copyright webinar (60 minutes) is on December 4 at 2pm Eastern Time. This installment of CopyTalk is entitled, “Introducing the Statement of Best Practices in Fair Use of Collections Containing Orphan Works for Libraries, Archives, and Other Memory Institutions” presented by Dave Hansen (UC Berkeley and UNC Chapel Hill) and Peter Jaszi (American University).

CopyTalks are scheduled for the first Thursday of even numbered months.

Two earlier webinars were recorded and archived:

From August 7, 2014 International copyright (with Janice Pilch from Rutgers University Library)

From October 2, 2014 Open licensing and the public domain: Tools and policies to support libraries, scholars, and the public (with Tim Vollmer from the Creative Commons).


The post CopyTalk webinar update appeared first on District Dispatch.

OCLC Dev Network: Opening Up Developer House

planet code4lib - Mon, 2014-12-01 20:30

We're excited to kick off our second Developer House here at the OCLC Developer Network by welcoming 12 library technologists for a week of brainstorming, learning and coding: Bilal Khalid, Bill Jones, Candace Lebel, Emily Flynn, Francis Kayiwa, Janina Sarol, Jason Thomale, Rachel Maderik, Sarah Johnston, Scott Hanrath, Shawn Denny, and Steelsen Smith. This is such a talented group—each person has terrific skills on their own—together they will be unstoppable.

SearchHub: Solr on YARN

planet code4lib - Mon, 2014-12-01 19:44
One of the most important evolutions in the big data landscape is the emergence of best-of-breed distributed computing frameworks. Gone are the days where every big data problem looks like a nail for the MapReduce hammer. Have an iterative machine learning job? Use Spark. Need to perform deep analytics on billions of rows? MapReduce. Need ad-hoc query capabilities? Solr is best. Need to process a stream of tweets in real-time? Spark streaming or Storm. To support this multi-purpose data processing platform, Hadoop 2 introduced YARN (Yet Another Resource Negotiator), which separates resource management and job scheduling from data processing in a Hadoop cluster. In contrast, in Hadoop v1, MapReduce computation was tightly coupled with the cluster resource manager. Put simply, YARN allows different jobs to run in a Hadoop cluster, including MapReduce, Spark, and Storm. In this post, I introduce an open source project developed at Lucidworks for running SolrCloud clusters on YARN ( Economies of Scale Before we get into the details of how it works, let’s understand why you might want to run SolrCloud on YARN. The main benefit to running distributed applications like Solr on YARN is improved operational efficiency. Designing, deploying, and managing a large-scale Hadoop cluster is no small feat. Deploying a new cluster requires investing in dedicated hardware, specialists to install and configure it, and performance / stability testing. In addition, you need to secure the cluster and actively monitor its health. There’s also training employees on how to use and develop solutions for Hadoop. In a nutshell, deploying a Hadoop cluster is a major investment that can take months or even years. The good news is adding additional computing capacity to an existing cluster is much easier than deploying a new cluster. Consequently, it makes good business sense to leverage economies of scale by running as many distributed applications on YARN as possible. If a new application requires more resources, it’s easy to add more HDFS and data nodes. Once a new application is deployed on YARN, administrators can monitor it from one centralized tool. As we’ll see below, running Solr on YARN is very simple in that a system administrator can deploy a SolrCloud cluster of any size using a few simple commands. Another benefit of running Solr on YARN is that businesses can deploy temporary SolrCloud clusters to perform background tasks like re-indexing a large collection. Once the re-index job is completed and index files are safely stored in HDFS, YARN administrators can shutdown the temporary SolrCloud cluster. Nuts and Bolts The following diagram illustrates how Solr on YARN works. Step 1: Run the SolrClient application Prior to running the SolrClient application, you need to upload the Solr distribution bundle (solr.tgz) to HDFS. In addition, the Solr YARN client JAR (solr-yarn.jar) also needs to be uploaded to HDFS, as this is needed to launch the SolrMaster application on one of the nodes in the cluster (step 2 below). hdfs dfs -put solr-yarn/jar solr/ hdfs dfs -put solr.tgz solr/ SolrClient is a Java application that uses the YARN Java API to launch the SolrMaster application in the cluster. Here is an example of how to run the SolrClient: hadoop jar solr-yarn.jar \ -nodes=2 \ -zkHost=localhost:2181 \ -solr=hdfs://localhost:9000/solr/solr.tgz \ -jar=hdfs://localhost:9000/solr/solr-yarn.jar \ -memory 512 \ -hdfs_home=hdfs://localhost:9000/solr/index_data This example requests Solr to be deployed into two YARN containers in the cluster, each having 512M of memory allocated to the container. Notice that you also need to give the ZooKeeper connection string (-zkHost) and location where Solr should create indexes in HDFS (-hdfs_home). Consequently, you need to setup a ZooKeeper ensemble before deploying Solr on YARN; running Solr with the embedded ZooKeeper is not supported for YARN clusters. The SolrClient application blocks until it sees SolrCloud running in the YARN cluster. Step 2: Allocate container to run SolrMaster The SolrClient application tells the ResourceManager it needs to launch the SolrMaster application in a container in the cluster. In turn, the ResourceManager selects a node and directs the NodeManager on the selected node to launch the SolrMaster application. A NodeManager runs on each node in the cluster. Step 3: SolrMaster requests containers to run SolrCloud nodes The SolrMaster performs three fundamental tasks: 1) requests N containers (-nodes) for running SolrCloud nodes from the ResourceManager, 2) configures each container to run the start Solr command, and 3) waits for a shutdown callback to gracefully shutdown each SolrCloud node. Step 4: Solr containers allocated across cluster When setting up container requests, the SolrMaster adds the path to the Solr distribution bundle (solr.tgz) as a local resource to each container. When the container is allocated, the NodeManager extracts the solr.tgz on the local filesystem and makes it available as ./solr. This allows us to simply execute the Solr start script using ./solr/bin/solr. Notice that other applications, such as Spark, may live alongside Solr in a different container on the same node. Step 5: SolrCloud node connects to ZooKeeper Finally, as each Solr starts up, it connects to ZooKeeper to join the SolrCloud cluster. In most cases, it makes sense to configure Solr to use the HdfsDirectoryFactory using the -hdfs_home parameter on the SolrClient (see step 1) as any files created locally in the container will be lost when the container is shutdown. Once the SolrCloud cluster is running, you interact with it using the Solr APIs. Shutting down a SolrCloud cluster One subtle aspect of running SolrCloud in YARN is that the application master (SolrMaster) needs a way to tell each node in the cluster to shutdown gracefully. This is accomplished using a custom Jetty shutdown hook. When each Solr node is launched, the IP address of the SolrMaster is stored in a Java system property: yarn.acceptShutdownFrom. The custom shutdown handler will accept a Jetty stop request from this remote address only. In addition, the SolrMaster computes a secret Jetty stop key that only it knows to ensure it is the only application that can trigger a shutdown request. What’s Next? Lucidworks is working to get the project migrated over to the Apache Solr project, see: In addition, we’re adding YARN awareness to the Solr Scale Toolkit ( and plan to add YARN support for Lucidworks Fusion ( in the near future.

The post Solr on YARN appeared first on Lucidworks.

pinboard: Untitled (

planet code4lib - Mon, 2014-12-01 19:35
RT @no_reply: #code4lib's community scholarship "angel fund" is a few hundred dollars short of funding a second scholarship.

LibUX: 016: Putting the User First with Courtney Greene McDonald

planet code4lib - Mon, 2014-12-01 18:37

Courtney Greene McDonald is the author of Putting the User First: 30 Strategies for Transforming Library Services, The Anywhere Library: a Primer for the Mobile Web, and she the chair of the editorial board for Weave: Journal of Library User Experience.

We gushed about the 2014 SEFLIN Virtual Conference (recordings), how awesome it is that there is no a peer-reviewed journal in our specific field, and a lot more.

When you think about something like Facebook …, they change everything, people get mad, but it’s very sticky. Amazon is very sticky. Google, very sticky. Libraries were in an environment for a very long time where they were sticky.

This also finishes-up our first season! There will be a couple of bonus episodes to round-out the year, and in January we will be coming back atcha with improvements to the audio quality, format, and a series of ten pocasts about the nitty-gritty and red tape of in-house UX.

The post 016: Putting the User First with Courtney Greene McDonald appeared first on LibUX.


Subscribe to code4lib aggregator