You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 2 hours 56 min ago

SearchHub: Lasso Some Prizes by Stumping The Chump in Austin Texas

Wed, 2015-09-30 18:53

Professional Rodeo riders typically only have a few seconds to prove themselves and win big prizes. But you’ve still got two whole weeks to prove you can Stump The Chump with your tough Lucene/Solr questions, and earn both bragging rights and one of these prizes…

  • 1st Prize: $100 Amazon gift certificate
  • 2nd Prize: $50 Amazon gift certificate
  • 3rd Prize: $25 Amazon gift certificate

You don’t have to know how to rope a steer to win, just check out the session information page for details on how to submit your questions. Even if you can’t make it to Austin to attend the conference, you can still participate — and do your part to humiliate me — by submitting your questions.

To keep up with all the “Chump” related info, you can subscribe to this blog (or just the “Chump” tag).

The post Lasso Some Prizes by Stumping The Chump in Austin Texas appeared first on Lucidworks.

HangingTogether: Services built on usage metrics

Wed, 2015-09-30 17:39

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Corey Harper of New York University and Stephen Hearn of University of Minnesota. They had posited that in an environment more oriented toward search than toward browse indexing, new kinds of services will rely on non-bibliographic data, usage metrics and data analysis techniques. Metrics can be about usage data—such as how frequently items have been borrowed, cited, downloaded or requested—or about bibliographic data—such as where, how and how often search terms appear in the bibliographic record. Some kinds of use data are best collected on a larger scale than most catalogs provide.

These usage metrics could be used to build a wide range of library services and activities. Among the possible services noted: collection management, identifying materials for offsite storage, deciding which subscriptions to maintain, comparing citations for researchers’ publications with what the library is not purchasing; improving relevancy ranking, personalizing search results, offering recommendation services, measuring impact of library usage on research or student success. What if libraries emulated Amazon with “People who accessed <this title> also accessed <these titles>” or “People in the same course as you are accessing <these titles>”?

Harvard Innovation Lab’s StackLife aggregates such usage data of library titles as number of check-outs (broken down by faculty, undergraduates and graduate students, with faculty checkouts weighted differently), number of ILL requests, and frequency the title is placed in course reserves, and then assigns a “Stack Score” for each title. A search on a subject then displays a heat map graphic with the higher scores shown in darker hues, as shown in the accompanying graphic, and can serve as a type of recommender service. The StackLife example inspired other suggestions for possible services, such as aggregating the holdings and circulation data across multiple institutions—or even across countries—with Amazon sales data, and weighting scores if the author was affiliated with the institution. A recent Pew study found that personal recommendations dominated book recommendations. Could libraries capture and aggregate faculty and student recommendations mentioned in blogs and tweets?

The University of Minnesota conducted a study[i] to investigate the relationships between first-year undergraduate students’ use of the academic library, academic achievement, and retention. The results suggested a strong correlation between using academic library services and resources—particularly database logins, book loans, electronic journal logins, and library workstation logins— and higher grade point averages.

Some of the challenges raised in the focus group discussions included:

Difficulties in analyzing usage data: The different systems and databases libraries have present challenges in both gathering and analyzing the data. A number of focus group members are interested in visualizing usage data, and at least a couple are using Tableau to do so. Libraries have with difficulty harvested citations and measure which titles are available in their repositories, but it is even more difficult to demonstrate which resources would not have been available without the library.  The variety of resources also mean that the people who analyze the data are scattered across the campus in different functional areas. Interpreting Google analytics to determine patterns of usage over time and the effect of curricula changes is particularly difficult.

Aggregating usage data across campus: Tools that allow selectors to choose titles to send to remote storage by circulation data and classification range (to assess the impact on a particular area of stacks) can be hampered when storage facilities use a different integrated library system.

Anonymizing data to protect privacy: Aggregating metrics across institutions may help anonymize data but hinders analysis of performance at an individual institution. Anonymizing data may also prevent usage metrics by demographics (e.g., professors vs. grad students vs. undergraduates). Even when demographic data is captured as part of campus log-ins, libraries cannot know the demographics of people accessing their resources who are not affiliated with their institution.

Difficulties in correlating library use with academic performance or impact: Some focus group members questioned whether it was even possible to correlate library use with academic performance. (“Are we busting our heads to collect something that doesn’t tell us anything?”) On the other hand, we can at least start making some decisions based on the data we do have, and perhaps libraries’ concern with being “scientific” is not warranted.

Data outside the library control: Much usage data lies outside the library control (for example, Google Scholar and Elsevier). Only vendors have access to electronic database logs. Relevance ranking for electronic resources licensed from vendors is a “black box”.

Inconsistent metadata: Inconsistent metadata can dilute the reliability of usage statistics. Examples cited included: the same author represented in multiple ways; varying metadata due to changes in cataloging rules over time; different romanization schemes used for non-Latin script materials.  The biggest issue is that most libraries’ metadata comes from external sources and thus the library has no control over its quality. The low quality of metadata for e-resources from some vendors remains a common issue; misplaced identifiers for ebooks was cited as a serious problem. Focus group members have pointed vendors to the OCLC cross-industry white paper, Success Strategies for Electronic Content Discovery and Access without much success. Threats to cancel a subscription unless the metadata improves prove empty when their selectors object. Libraries do some bulk editing of the metadata, for example: reconciling name forms with the LC name authority file (or outsource this work); adding language and format codes in the fixed fields. The best sign of a “reliable vendor” is that they get their metadata from OCLC. It’s important for vendors to view metadata as a “community property.”

[i] Krista M. Soria, Jan Fransen, Shane Nackerud.  Stacks, Serials, Search Engines, and Students’ Success: First-Year Undergraduate Students’ Library Use, Academic Achievement, and Retention. Journal of Academic Librarianship, 40 (2014), 84-91. doi:10.1016/j.acalib.2013.12.002


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (61)

Jonathan Rochkind: Just curious: Do you think there is a market for additional Rails contractors for libraries?

Wed, 2015-09-30 14:54

Fellow library tech people and other library people who read this blog, what do you think?

Are there libraries who would be interested in hiring a Rails contractor/consultant to do work for them, of any kind?

I know Data Curation Experts does a great job with what they do — do you think there is work for more than just them, whether on Blacklight/Hydra or other Rails?

Any sense of it, from where you work or what you’ve heard?

I’m just curious, thinking about some things.

Filed under: General

District Dispatch: ALA Urges Passage of Digital Learning Equity Act

Wed, 2015-09-30 14:02

Digital section of the Martin Luther King, Jr.,
Memorial Library, Washington, D.C.

ALA is urging passage of The Digital Learning Equity Act of 2015, H.R. 3582, which was introduced in the U.S. House of Representatives last week by Congressman Peter Welch (D-VT) and co-sponsored by David McKinley (R-WV) with the support of ALA and the education community.

ALA President Sari Feldman issued a statement applauding Reps. Welch and McKinley for co-sponsoring H.R. 3582 and warning that: “Students in every classroom and every corner of the nation need Congress to close the homework gap. ALA urges Congress to quickly pass and send the Digital Learning Equity Act to the President.”

The legislation addresses the growing digital divide and learning gaps between students with and without access to the Internet at home. Increasingly, students find it necessary to complete homework utilizing the Internet. Many students gather at public libraries after school, gain access before school or at lunch, or simply go without access often resulting in these students falling behind. Allowing students access to laptops only addresses part of the digital divide—if the student cannot access the Internet, they cannot do their homework research.

H.R. 3582 recognizes that at-home access is critical to homework completion, authorizes an innovative grant program for schools to promote for student access, prioritizes rural and high-density, low-income schools, and requires the FCC to study the growing homework gap. ALA signed a joint letter with several education representatives to support H.R. 3582 and urge its quick passage.

The legislation recognizes that libraries can provide access tools, on-line tools, as well as provide research and guidance for students complementing the work of teachers and school librarians. As noted in the September/October issue of American Libraries, libraries are quickly moving to provide tools for greater Internet access. Libraries across the country, including New York, Kansas City, San Mateo County (CA), Chicago and Washington County, Maine, are already allowing patrons to check-out mobile Wi-Fi hotspots.

Additional benefits of increased access include providing parents opportunities to complete their education, obtain certifications, and apply for jobs.

The Senate companion, S. 1606, was introduced in the Senate this past June by Senator Angus King (I-ME) and is being considered by the Senate Health, Education, Labor and Pensions Committee. H.R. 3582 was referred to the House Education and Workforce Committee.

ALA will continue to support efforts to broaden access to the Internet and calls on Congress to quickly pass the Digital Learning Equity Act.

The post ALA Urges Passage of Digital Learning Equity Act appeared first on District Dispatch.

LITA: Creative Commons Crash Course, a LITA webinar

Wed, 2015-09-30 14:00

Attend this interesting and useful LITA webinar:

Creative Commons Crash Course

Wednesday, October 7, 2015
1:30 pm – 3:00 pm Central Time
Register Online, page arranged by session date (login required)

Since the first versions were released in 2002, Creative Commons licenses have become an important part of the copyright landscape, particularly for organizations that are interested in freely sharing information and materials. Participants in this 90 minute webinar will learn about the current Creative Commons licenses and how they relate to copyright law.

This webinar will follow up on Carli Spina’s highly popular Ignite Session at the 2015 ALA Mid Winter conference. Carli will explain how to find materials that are Creative Commons-licensed, how to appropriately use such items and how to apply Creative Commons licenses to newly created materials. It will also include demonstrations of some important tools that make use of Creative Commons-licensed media. This program will be useful for librarians interested in instituting a Creative Commons licensing policy at their institutions, as well as those who are interested in finding free media for use in library materials.

Carli Spina

Is the Emerging Technologies and Research Librarian at the Harvard Law School Library. There she is responsible for teaching research and technology classes, as well as working on technology projects and creating online learning objects. She has presented both online and in-person on copyright and technology topics. Carli also offers copyright training and assistance to patrons and staff and maintains a guide to finding and understanding Creative Commons and public domain materials. Prior to becoming a librarian, she worked as an attorney at an international law firm. You can find more information about her work, publications, and presentations at

Register for the Webinar

Full details
Can’t make the date but still ant to join in? Registered participants will have access to the recorded webinar.


  • LITA Member: $45
  • Non-Member: $105
  • Group: $196

Registration Information:

Register Online, page arranged by session date (login required)
Mail or fax form to ALA Registration
call 1-800-545-2433 and press 5

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty,

LITA: To tweet or not to tweet: scholarly engagement and Twitter

Wed, 2015-09-30 14:00
by Colleen Simon, via Flickr

I’ve been thinking a lot about scholarly engagement on Twitter lately, especially after reading Bonnie Stewart‘s latest blog post, “The morning after we all became social media gurus.” Based on her research and writing for her thesis, she weighs exactly what we as academic librarians and LIS professionals are getting out of digital scholarly engagement and how we measure that influence in terms of metrics. I’d like to unpack this topic a bit and open it up to a wider reader discussion in the comments section, after the jump!

Debating the merits of networked scholarship via Twitter is a topic that has been bouncing around in journal articles and blog posts for the past 8 years or so. But as notions of scholarly publication and information dissemination change, it’s worth returning to this topic in order to assess how our presence as academic librarians and LIS professionals is changing as well. Addressing social media training in her blog post, Stewart poses the question, “Are the workshops helping…or just making people feel pressured to Do Another Thing in a profession currently swamped by exhortations to do, show, and justify?” I am both a lizard person Millennial and early-career librarian, so navigating through Twitter is easy for me, but not in the sense of establishing myself professionally. I feel that I’ve only just gotten the hang of professional information dissemination, and am learning more every day about how what we as information specialists tweet out reaches others in our community and what we get back from that.

But how do we understand and frame the practical benefits of digital and networked scholarship through Twitter specifically? The amount of times a single tweet is cited? How many followers, retweets, or favorites a professional has?

The pros of using Twitter as a form of scholarly networking are very clear to me – being able to contribute to the conversation in one’s field, creating new professional connections, and having an open venue in which to speak on scholarly matters – to name a few.

But the more tangential aspects are where it gets a lot grayer for me. How do we view ownership of tweets and replies to tweets? Does the citation of a viral tweet hold as much weight as a citation to an article published in a scholarly journal? How do we weigh the importance of scholarly tweets when we sometimes have to parse them out between the pictures of our pets being adorable? (I mean personally I see them as being equally important.)

This is all to say that if and/or when Twitter and other social media venues become a default environment for digital scholarship, should there be more of an effort to incorporate social media and networked scholarship as the norm that all “successful” LIS professionals should be doing, or is this just another signifier of the DIY-style of skill-building that librarianship is experiencing today as Erin Leach has written in her blog post? Should academic institutions be providing more workshops to train and guide professionals to use Twitter as professional development? What is the mark of a truly connected scholar and information specialist? I have a lot of questions.

I’ll round out my post with a quote from Jesse Stommel from his presentation New-form scholarship and the public digital humanities: “It isn’t that a single tweet constitutes scholarship, although in rare cases one might, but rather that Twitter and participatory media more broadly disperses the locus of scholarship, making the work less about scholarly products and more about community presence and engagement.” Community presence and engagement are such important factors in how I see academic librarians, LIS professionals, and information specialists using Twitter and connecting in the field.

So to open this up to you the readers, how do you measure your digital identity as a scholar or professional? How much weight do you give to digital networked scholarship? 

Ed Summers: Seminar Week 4

Wed, 2015-09-30 04:00

In this week’s seminar we left the discussion of information and began looking at the theory of design writ large, with a few focused readings, and a lecture from Professor Findlater. One of the key things I took from the lecture was the distinction between User Experience and Usability. The usability of an object speaks to its functionality, and how easy it is for people to use it. User Experience on the other hand is more of an affective measure of how users perceive or think about the device, which may not always be congruent with it’s usability. It’s an interesting distinction, which I wasn’t really conscious of before.

Speaking of distinctions, we spent a fair bit of time talking about the first chapter in Don Norman’s The Design of Everyday Things. Norman is well known for popularizing the term affordance in the HCI community. He borrowed the term from the psychologist James Gibson. We read from Norman’s revised edition from 2013–the original was published in 1988. In chapter 1 it seems that Norman has a bit of an axe to grind because of how affordance had been used in the literature to denote a relation between an object and an organism, as well as a perceived affordance or what he now calls a signifier. This might seem like splitting hairs a bit, and perhaps a bit quixotic after the term affordance has been out in the wild for 25 years–but for me it made some sense. I know everyone didn’t but I appreciated his sometimes acerbic tone, especially when he was berating General Electric for its continuously flawed refrigerator controls.

A good example came up in class of a student who recently moved and needed to buy a couch. She is tall, and often likes sleeping on a couch. So she was looking for a couch that would be easy to sleep on. Specifically she wanted a couch with arm rests that could also serve as pillows, because she is tall. For example compare these two couches:

The second couch affords being used for sleeping by a tall person. The affordance is a specific relation between the tall person and the couch, not a specific feature of the couch. The distinction that Norman is making here is that the affordance is different from the perception of the affordance, or signifier, which in this case is the type of arm rest.

Being able to distinguish between the relation between the object and the person, and the sign of the relationship seems important–particularly in cases where the affordance exists but is not known, or when it appears that the affordance exists, but it actually does not. I’m thinking of controls in a user interface that appear to do one thing, but do something else, or do that expected thing as well as something unexpected. I can see why HCI people would have reacted negatively to Norman telling them they were using the term wrong. But since I’m new to the field I don’t have that baggage I guess :-)

A few other things that came up in discussion that I will note down here to follow up on at some point are:

Writing Workshop

In the second part of the class we had a writing workshop where we discussed writing we liked, writing strategies (since we’ll be doing lots of that in the PhD program), as well as ways to get started writing when you are stuck.

We all brought in examples of writing we liked, and talked briefly about them. I brought in Jackson & Barbrow (2015) as an example of what I thought was a well written paper. I like it because it is strongly grounded in enthographic research (admittedly something I want to learn more about), and discusses a topic area that I am interested in: ecology, measurement and standards. I thought the paper did a good job of studying behavior around standards at different scales: the individual team of researchers out in the field, and the large scale national collection of data. Jackson included numerous quotes from individuals he observed during the study, which added additional authentic voices to the paper. He also quoted Borges in an useful/artful way. I thought it was also very interesting how standards were presented as things that we should consider as a design factor in Human Computer Interaction. The very idea that seemingly invisible and dull things like standards (how they live, and what they omit) could be useful in design is a fascinating idea to me. I liked the illustration that while measuring the environment seems like a precise science, it has a human/social component to it that is extremely important. Finally, I’ll admit it: I brought the paper because it won an honorable mention best paper award at CHI – and I’m a fan of Jackson’s work.

Writing Strategies

After we all talked about our favorite papers we discussed writing strategies or techniques to be aware of:

  • illustrations are important, captions are important
  • formulas as visualization
  • subheadings to make things easier for the reader
  • start a section with the main point, so people can find their way back
  • template, protocol for experimentation: group study
  • goal is sometimes to help people replicate
  • need a clear method: this is a must
  • need to be able to justify the decision
  • supplementary information (datasets, interview transcripts, coding, etc)
  • talk about failure
  • made clear real world implications
  • in first paragraph that the research is important
  • reflection on findings
Ways to get started

We also talked about ways to get started writing when it is difficult to get going:

  • look for similar papers: content & venue
  • keep track of good examples of papers
  • may find some that work as templates
  • talk about the paper, get feedback all along the way ; talk to co-authors
  • talk to the people that you are citing
  • start with what’s easiest: sometimes lit review, or methods, depending
  • what is the story: what do you want people to remember, how do they get highlighted ; the turns
  • where do you like writing?
  • find your voice, style, etc.
  • give yourself time
  • the time of day is important
  • stretches of time are important (take breaks)
  • plan/outline
  • be clear and concise
  • use evidence

Here are my random notes I took while doing the readings for this week. They haven’t been cleaned up much. But I thought I’d keep them here just in case I need to remember something later.

When doing the readings we were asked to pay particular attention to these questions:

  • What problem or research question does it address? Is that an important problem/question?
  • What is the research contribution?
  • Is the overall research approach used (e.g., lit survey, interview study, experiment) appropriate?
  • Are there any threats to the validity of the research? For example, the sampling method was for the researcher to ask all their friends to participate (let’s hope we don’t see this!).
  • Is the research properly grounded in past work?
  • Are there any presentation issues?

Norman (2013), Chapter 1

  • Psychopathology: the scientific study of mental disorders, and their causes (potentially in the environment)
  • Is it possible to learn how a product is supposed to be used?
  • “All artificial things are designed.”
  • Types: Industrial, Interaction, Experience (last two seem to blend a bit)
  • Rules followed by a machine are only know by its designers – and sometimes not even then (Nick’s work on Algorithmic Accountability)
  • The importance of blaming machines for the problems, not the person. Human centered.
  • “If only people would read the instructions everything would be all right.” Haha: RTFM.
  • Wow, he studied the usability of a nuclear reactor and Three Mile Island. No pressure!
  • Need to design for the way people are not the way you (the designer) want them to be.
  • In Human Centered Design (HCD) focus on breakdown, not on when things are going right. Winograd & Flores (Computers and Cognition).
  • HCD cuts across the three types of design.
  • Affordances: relationship between a physical object and a person (or agent). An affordance is not a property of the object, it is a relationship.
    • J J Gibson came up with the word.
    • Some are perceivable, and some are invisible.
  • Norman introduces: affordances, signinfiers, constraints, mappings, feedback, and conceptual model.
  • Signifiers are signals that draw attention to affordances, or make them visible.
    • They can be accidental or unintentional. Markers of use (a path through the woods)
    • Some signifiers are the perceived affordances, useful in cases where the affordance isn’t easily perceived.
    • They must be perceivable, or else they are failing to function.
    • Some perceived affordances (signifiers) can appear real, but are not. (used in games, and other places to illustrate constraints?)
    • If signifiers are signals for affordances, I guess affordances are the signified?
  • Mappings: a relationship between the control and the result.
    • some are natural, some feel natural (but in fact are not)
  • Feedback communicates the results of an action: there can be too little feedback, and too much feedback (backseat driver).
  • Conceptual models are explanations of how something works. Often they are simplified, and there are more than one. They are often inferred from using a device.
  • System Image: the bundle of all these things – akin to an actual brand?
  • Hardest part of design is getting people to abandon their disciplinary viewpoint and to think about the person who is using the device (antidisciplinarity)

Crystal & Ellington (2004)

  • This paper is a review of the literature on the topic of task analysis, with a view to the future, so not so much a formal research study really.
  • It seems like a pretty thorough review, but wondering if it is a regurgitation of Kuutti and Bannon (1991) that is mentioned in the conclusion. Although I guess Crystal reviews content past 1991, so maybe it’s an update of sorts?
  • It’s kind of amusing how they use their own task analysis breakdown to present the illustration of the field. I guess this would be a conceptual model (CTA)?
  • My impression was that more could’ve been done to define notion of usability which is thrown in at the end.
  • I liked how they wanted to compose and integrate the research on task-analysis, and not validate a particular viewpoint, but show the options that are available, and suggest cross-pollination between the models. Not sure if this is inter-disciplinary or not.

Oulasvirta, Kurvinen, & Kankainen (2003)

  • conclusions are inconclusive
  • hypotheses are introduced too late
  • good grounding in the literature
  • they seemed to have tow different variables at play: documentation provided and environment
  • admittedly limited analysis of the design documents

This study is built upon a useful foundation of existing research on bodystorming, and seems to provide a useful introduction to the concept. It also usefully highlights through concrete examples how bodystorming is different from brainstorming. The goal of the study seems to be to be explore two hypotheses, that are introduced at the end of the paper (I think they should’ve been included earlier):

  1. to speed up the design process
  2. to increase awareness of contextual factors in design

The authors mentioned a third hypothesis, which was to evaluate whether bodystorming on site provided immediate feedback about design ideas. But to me this seemed like a very minor variation on the speed of the design process.

They attempted to study these questions by analyzing the designs generated by 4 different case studies where bodystorming was used and a more traditional brainstorming case study. The setting of the bodystorming was varied: on site, similar site, office space, office space with acting. It also seemed like different types of documentation (user stories and design questions) were used in each scenario. This seemed to be changing more than one variable, and complicating the ability to draw conclusions. The authors mention that the results were somewhat complicated because designs from one of the bodystorming sessions seemed to inform other sessions. This was strange because they mention elsewhere that the case studies included different participants; but they couldn’t be all different if this sort of learning took place?

The findings themselves were inconclusive, and admittedly somewhat shallow. Although some of the anecdotes regarding site accessibility, level of exhaustion, inspiration and memorability seem like they would be fruitful to explore in a more controlled manner. It felt like this study was trying to do an experiment, but really did a much better job of presenting the ideas of bodystorming in the context of the literature, and providing a useful set of case studies to delineate how it could be used.


Crystal, A., & Ellington, B. (2004). Task analysis and human-computer interaction: Approaches, techniques, and levels of analysis. AMCIS 2004 Proceedings, 391.

Jackson, S. J., & Barbrow, S. (2015). Standards and/as innovation: Protocols, creativity, and interactive systems development in ecology. In. CHI; Association of Computing Machinery. Retrieved from

Norman, D. A. (2013). The design of everyday things: Revised and expanded edition. Basic books.

Oulasvirta, A., Kurvinen, E., & Kankainen, T. (2003). Understanding contexts by being there: Case studies in bodystorming. Personal and Ubiquitous Computing, 7(2), 125–134.

DuraSpace News: Tell a DSpace Story to Introduce People, Ideas and Innovation

Wed, 2015-09-30 00:00

The Telling DSpace Stories work group got underway this fall. The goal is to introduce DSpace community members and the work they are doing by sharing each others stories. The first five stories are now available to answer questions about how others have implemented DSpace at several types of institutions in different parts of the world:

LITA: It’s a Brave New Workplace

Tue, 2015-09-29 20:08

LITA Blog Readers, I’ve got a new job. For the past month I’ve been getting my sea legs at the University of Houston’s M.D. Anderson Library. As CORC (Coordinator of Online Resources and Collections), my job is supporting data-driven collection decisions and processes. I know, it’s way cool.

M.D. Anderson – Ain’t she a beaut?

I have come to realize that the most challenging aspect of adapting to a new workplace may well be learning new technologies and  adjusting to familiar technologies used in slightly different ways. I’m text mining my own notes for clues and asking a ton of questions, but switching from Trello to Basecamp has been rough.

No, let’s be honest, the most challenging thing has been  navigating the throngs of undergrads on a crowded campus. Before working remotely for years, I worked at small nonprofits, graduated from a teeny, tiny liberal arts college, and grew up in a not-big Midwestern town. You may notice a theme.

No worries, I’m doing fine. The tech is with me.

In upcoming installments of Brave New Workplace I’ll share methods for organization, prioritization, acculturation, and technology adaptation in a new workplace. While I’ll focus on library technologies and applications, I’ll also be turning a tech-focused approach to workplace culture questions. Spoiler alert: I’m going to encourage you to build your own CRM for your coworkers and their technology habits. Be prepared.

And stay tuned! Brave New Workplace will return on October 16th.




SearchHub: How StubHub De-Dupes with Apache Solr

Tue, 2015-09-29 19:07
As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting StubHub engineer Neeraj Jain’s session on de-duping in Solr. Stubhub handles large number of events and related documents. Use of Solr within Stubhub has grown from search for events/tickets to content ingestion. One of the major challenges that are faced in content ingestion systems is to detect and remove duplicates without compromising on quality and performance. We present a solution that involves spatial searching, custom update handler, custom geodist function etc, to solve the de-duplication problem. In this talk, we’ll present design and implementation details of the custom modules and APIs and discuss some of the challenges that we faced and how we overcame them. We’ll also present the comparison analysis between old and the new system used for de-duplication. Neeraj Jain is an engineer working with Stubhub Inc in San Francisco. He has a special interest in search domain and has been working with SOLR for over 4 years. He also has interest in mobile app development; he works as a freelancer and has applications on Google play store and iTunes store that are built using SOLR. Neeraj has a Masters in Technology degree from the Indian Institute of Technology, Kharagpur. Deduplication Using Solr: Presented by Neeraj Jain, Stubhub from Lucidworks Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post How StubHub De-Dupes with Apache Solr appeared first on Lucidworks.

FOSS4Lib Recent Releases: ArchivesSpace - 1.4.0

Tue, 2015-09-29 18:07

Last updated September 29, 2015. Created by Peter Murray on September 29, 2015.
Log in to edit this page.

Package: ArchivesSpaceRelease Date: Monday, September 28, 2015

DPLA: Humanities for All of Us: The NEH at 50

Tue, 2015-09-29 17:00

Sunday night there wasn’t a cloud in the sky in Boston, and so we were fortunate to get a clear view of the rare supermoon eclipse. I took a telescope out to the backyard with my kids and we worked to line up the equipment, and then we chatted about astronomy, optics, and physics, umbras and penumbras. A moment of science? Yes, but ultimately much more.

The eclipse lasted for several hours, and the science part was quickly dispatched. That left plenty of time for greater thoughts to play out, as we were awed by the spectacle. My mind drifted to Mary Shelley, Lord Byron, and their poetry and prose from the creepily dark summer of 1816; the apocalyptic paintings of John Martin; the impact of eclipses on ancient Jerusalem; and entreaties against the fearful night in the Book of Common Prayer (so evocatively described by Alan Jacobs in his recent “biography” of that Anglican prayerbook). In short, I experienced the lunar eclipse simultaneously through the lenses of the telescope and the humanities.

Undoubtedly others had different literary, artistic, philosophical, religious, or historical thoughts come to mind. (As well as less highbrow allusions: for some reason I also thought of Space: 1999.) But it’s impossible to imagine our experience of a lunar eclipse without the framing of our shared culture. We are humans, not machines, and we do not experience daily life—or awe-inspiring events—mechanically. We are constantly applying our understanding of the past, of writing and interpretations, of the spirit and art, to what we see and do.

The National Endowment for the Humanities has been supporting and broadly communicating that profound understanding for 50 years. Their anniversary website shows the incredible breadth and depth of their programs, projects, and topics. The NEH has not stood still, either; the establishment of the Office of Digital Humanities a decade ago, for instance, catalyzed an incipient field and led to productive commerce between the humanities and many other fields, including the sciences.

And the NEH has been a leading supporter of the Digital Public Library of America, which we hope will serve as a storehouse of shared—and open—culture for the next 50 years and beyond. We salute the National Endowment for the Humanities on their fiftieth, and thank them once again for underwriting the full range of human experience.

FOSS4Lib Recent Releases: VuFind - 2.5

Tue, 2015-09-29 16:38

Last updated September 29, 2015. Created by Demian Katz on September 29, 2015.
Log in to edit this page.

Package: VuFindRelease Date: Tuesday, September 29, 2015

Eric Lease Morgan: My water collection predicts the future

Tue, 2015-09-29 16:37

As many of you may or may not know, I collect water, and it seems as if my water collection predicts the future, sort of.

Since 1979 or so, I’ve been collecting water. [1] The purpose of the collection is/was enable me to see and experience different parts of the world whenever I desired. As the collection grew and my computer skills developed, I frequently used the water collection as a kind of Guinea pig for digital library projects. For example, my water collection was once manifested as a HyperCard Stack complete with the sound of running water in the background. For a while my water collection was maintained in a FileMaker database that generated sets of HTML. Quite a number of years ago I migrated everything to MySQL and embedded images of the water bottles in fields of the database. This particular implementation also exploited XML and XSLT to dynamically make the content available on the Web. (There was even some RDF output.) After that I included geographic coordinates into the database. This made it easy for me to create maps illustrating whence the water came. To date, there are about two hundred and fifty waters in my collection, but active collecting has subsided in the past few years.

But alas, this past year I migrated my co-located host to a virtual machine. In the process I moved all of my Web-based applications — dating back more than two decades — to a newer version of the LAMP stack, and in the process I lost only a single application — my water collection. I still have all the data, but the library used to integrate XSLT into my web server (AxKit) simply would not work with Apache 2.0, and I have not had the time to re-implement a suitable replacement.

Concurrently, I have been negotiating a two-semester long leave-of-absence from my employer. The “leave” has been granted and commenced a few of weeks ago. The purpose of the leave is two-fold: 1) to develop my skills as a librarian, and 2) to broaden my experience as a person. The first part of my leave is to take a month-long vacation, and that vacation begins today. For the first week I will paint in Tuscany. For the second week I will drink coffee in Venice. During the third week I will give a keynote talk at ADLUG in Rome. [2] Finally, during the fourth week I will learn how to make croissants in Provence. After the vacation is over I will continue to teach “XML 101” to library school graduate students at San Jose State University. [3] I will also continue to work for the University of Notre Dame on a set of three text mining projects (EEBO, JSTOR, and HathiTrust). [4, 5, 6]

As I was getting ready for my “leave” I was rooting through my water collection, and I found four different waters, specifically from: 1) Florence, 2) Venice, 3) Rome, and 4) Nice. As I looked at the dates of when the water was collected, I realized I will be in those exact same four places, on those exact same four days, exactly thirty-three years after I originally collected them. My water collection predicted my future. My water collection is a sort of model of me and my professional career. My water collection has sent me a number of signs.

This “leave-of-absence” (which in not really a leave nor a sabbatical, but instead a temporary change to adjunct faculty status) is a whole lot like going to college for the first time. “Where in the world am I going? What in the world am I going to do? Who in the world will I meet?” It is both exciting and scary at once and at the same time. It is an opportunity I would be foolish to pass up, but it is not as easy as you might imagine. That said, I guess I am presently an artist- and librarian-at-large. I think I need new, albeit temporary, business cards to proclaim my new title(s).

Wish me luck, and “On my mark. Get set. Go!”

  1. blog postings describing my water collection –
  2. ADLUG –
  3. “XML 101” at SJSU –
  4. EEBO browser –
  5. JSTOR browser –
  6. HathiTrust browser –

Open Knowledge Foundation: Open: A Short Film about Open Government, Open Data and Open Source

Tue, 2015-09-29 14:20

This is a guest post from Richard Pietro the writer and director of Open.

If you’re reading this, you’re likely familiar with the terms Open Government, Open Data, and Open Source. You probably understand how civic engagement is being radically transformed through these movements.

Therein lays the challenge: How can we reach everyone else? The ones who haven’t heard these terms and have little interest in civic engagement.

Here’s what I think: Civic engagement is a bad brand. If we’re to capture the attention of more people, we need to change its brand for the better.

When most people think of civic engagement, they probably imagine people in a community meeting somewhere yelling at each other. Or, maybe they picture a snooze-fest municipal planning and development consultation. Who has time to fit that in with everything else going on in their lives? I think most people would prefer to invest their spare time on something they’re passionate about; not sitting in a stuffy meeting! (If stuffy meetings ARE your passion, that’s cool too!)

Civic engagement is seen as dry and boring, or meant solely for the hyper-informed, hyper-engaged, policy-wonk. Between these two scenarios, you feel your voice will never be heard – so why bother? Civic engagement has bad PR. It isn’t viewed as fun for most people. Plus, I think there’s also an air of elitism, especially when it’s spoken as a right, duty, privilege, or punishment (judges issue community service as a punishment).

That’s why I’ve adopted a different perspective: Civic Engagement as Art. This was motivated via Seth Godin’s book “Linchpin” where he suggests that art shouldn’t only be thought of as fine art. Rather, he argues that art is a product of passion; art is creating something, and that’s what civic engagement is all about – creating something in your community that comes from passion.

I’m hoping that Open will introduce Open Government, Open Data, and Open Source to new people in simply because it is being done in a new way. My intention is to begin changing the civic engagement brand by having fun with it.

For example, I call myself an Open Government Fanboy, so Open uses as many pop-culture and “fanboy-type” references as we could squeeze in. As a matter of fact, I call the film a “spoofy adaptation” of The Matrix. What we did was take the scene where Morpheus is explaining to Neo the difference between the “Real World” and the “Matrix” and adapts it to the “Open World” versus the “Closed World.” We also included nods to Office Space, The Simpsons, Monty Python, and Star Trek.

As a bonus, I’m hoping that these familiar themes and references will make it easier for “newbies” to understand Open Government, Open Data, and Open Source space.

So, without further Apu (Simpsons fans will get it), I give you Open – The World’s first short film on Open Government, Open Data, and Open Source.

Watch Open


Writer and Director: Richard Pietro
Screenplay: Richard Pietro & Rick Weiss
Executive Producers: Keith Loo and Bruce Chau
Cinematographers: Gord Poon & Mike Donis
Technical Lead: Brian Wong
Composer and Sound Engineer: GARU
Actors: Mish Tam & Julian Friday

HangingTogether: Data Management and Curation in 21st Century Archives – Part 2

Tue, 2015-09-29 13:08

I attended the 79th Annual Meeting of the Society of American Archivists (SAA) last month in Cleveland, Ohio and was invited to participate on the Research Libraries Roundtable panel on Data Management and Curation in 21st Century Archives. Dan Noonan, e-Records/Digital Resources Archivist, moderated the discussion. Wendy Hagenmaier, Digital Collections Archivist, Georgia Tech Library and Sammie Morris, Director, Archives and Special Collections & University Archivist, Purdue University Libraries joined me on the panel. Between the three of us there was a nice variety of perspectives given our different experiences and interests.

I discussed my presentation in an earlier blog post – Part 1: Managing and Curating Data with Reuse in Mind. In this post I highlight key points from Wendy and Sammie’s presentations.  What made an impression on me was whether and how they and their colleagues came to value each other’s  complementary skill, experience, and expertise needed to manage and curate data.

Do you value complementarities?

Wendy discussed her collaboration with Lizzy Rolando, Research Data Librarian, Georgia Tech Library. She likened their experience to Susan and Sharon from Hayley Mills’ 1961 film The Parent Trap. Wendy described herself and Lizzy as “twins separated by silly professional silos”. Working together they found several areas of convergence and divergence around workflows, copyright, data integrity, security and reusability, and funding curation. Wrestling with their differences has changed Wendy’s thoughts about archival theory and practice. She has been inspired to place more emphasis on being a proactive partner during data creation; considering what a network-based, non-exclusive ownership model of archives might look like; identifying best practices for capturing dynamic cloud-based files and systems; ensuring born-digital collections are actually reusable; and creating pathways for products of reuse to be preserved and related back to the original record. She also is wondering how to leverage federal data sharing mandates to advocate for the resources required to build repositories and systems needed to provide access to born-digital archives.

Sammie discussed strategies to convince stakeholders that archivists should actively participate in data management and curation activities given their expertise in collecting, preserving, and providing access to unique collections. Like data, archival materials are under-described, often lack context, and are frequently complex, unpublished raw primary sources that present a plethora of management issues from privacy and intellectual property rights affecting access to preservation and security needs of one-of-a-kind materials. Archivists’ experiences with creating collecting policies, selecting and appraising unique collections for long-term value, negotiating privacy and copyright issues, and creating secure and trusted repositories can prove invaluable for data curation planning and decision making. A key strategy she used was articulating how archival theory and practice could be used to help institutions meet the ISO 16363 requirements for establishing trustworthy digital repositories.

I was not surprised about the amount of convincing Sammie had to do with campus stakeholders, because my research suggests the same thing when it comes to librarians. However, as an outsider looking in, I must admit I was surprised that librarians were included in the group of campus stakeholders that needed convincing. Although archivists and librarians have different areas of expertise, I thought they would have proactively joined forces to seize on the value of their complementarities. The work archivists and librarians could accomplish together, given their areas of expertise would seem to strengthen the argument that they have major roles in planning and implementing e-Research support on campus. Wendy’s presentation reinforced this thought, but her collaboration with Lizzy was expected as part of her job responsibilities.

It made me wonder how many archivist-librarian pairings exist on campuses engaged in e-Research support. If you are actively working together in an archivist-librarian pairing please comment or respond to this blog post. Tell us what sparked your collaboration. How has it changed your thinking about your professional practice? What have been your strategies for a successful collaboration? What value has it added? Are you finding that you’re stronger together?

About Ixchel Faniel

Ixchel M. Faniel is a Research Scientist at OCLC. She is currently working on projects examining data reuse within academic communities to identify how contextual information about the data that supports reuse can best be created and preserved. She also examines librarians' early experiences designing and delivering research data services with the objective of informing practical, effective approaches for the larger academic community.

Mail | Web | More Posts (3)

DuraSpace News: Introducing the VIVO Community Pages

Tue, 2015-09-29 00:00

Hidden treasures are even better when they are discovered. The VIVO community wiki pages are one of those treasures. This section of the DuraSpace wiki offers the VIVO community a wealth of information, best practices and valuable resources that can assist institutions in implementing, managing and sharing VIVO data and resources. Here are highlights of what you will find in the VIVO Community pages.

Considering VIVO

DuraSpace News: ¿Hablas español?

Tue, 2015-09-29 00:00

Winchester, MA  If you would like to keep up with DuraSpace news, events, opportunities and initiatives in Spanish please subscribe to DuraSpace Informe. This new bi-monthly newsletter will be published for the first time in the beginning of October and will feature current strategic information of interest to Spanish speaking users of DSpace, Fedora and VIVO open source projects.

DuraSpace News: Coming soon in Converis, from Thomson Reuters

Tue, 2015-09-29 00:00

From Danielle Pokusa, Thomson Reuters