You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 10 hours 42 min ago

Code4Lib Journal: Introduction to Text Mining with R for Information Professionals

Tue, 2016-07-19 15:08
The 'tm: Text Mining Package' in the open source statistical software R has made text analysis techniques easily accessible to both novice and expert practitioners, providing useful ways of analyzing and understanding large, unstructured datasets. Such an approach can yield many benefits to information professionals, particularly those involved in text-heavy research projects. This article will discuss the functionality and possibilities of text mining, as well as the basic setup necessary for novice R users to employ the RStudio integrated development environment (IDE). Common use cases, such as analyzing a corpus of text documents or spreadsheet text data, will be covered, as well as the text mining tools for calculating term frequency, term correlations, clustering, creating wordclouds, and plotting.

Code4Lib Journal: Data for Decision Making: Tracking Your Library’s Needs With TrackRef

Tue, 2016-07-19 15:08
Library services must adapt to changing patron needs. These adaptations should be data-driven. This paper reports on the use of TrackRef, an open source and free web program for managing reference statistics.

Code4Lib Journal: Are games a viable solution to crowdsourcing improvements to faulty OCR? – The Purposeful Gaming and BHL experience

Tue, 2016-07-19 15:08
The Missouri Botanical Garden and partners from Dartmouth, Harvard, the New York Botanical Garden, and Cornell recently wrapped up a project funded by IMLS called Purposeful Gaming and BHL: engaging the public in improving and enhancing access to digital texts (http://biodivlib.wikispaces.com/Purposeful+Gaming). The goals of the project were to significantly improve access to digital texts through the applicability of purposeful gaming for the completion of data enhancement tasks needed for content found within the Biodiversity Heritage Library (BHL). This article will share our approach in terms of game design choices and the use of algorithms for verifying the quality of inputs from players as well as challenges related to transcriptions and marketing. We will conclude by giving an answer to the question of whether games are a successful tool for analyzing and improving digital outputs from OCR and whether we recommend their uptake by libraries and other cultural heritage institutions.

Code4Lib Journal: From Digital Commons to OCLC: A Tailored Approach for Harvesting and Transforming ETD Metadata into High-Quality Records

Tue, 2016-07-19 15:08
The library literature contains many examples of automated and semi-automated approaches to harvest electronic theses and dissertations (ETD) metadata from institutional repositories (IR) to the Online Computer Library Center (OCLC). However, most of these approaches could not be implemented with the institutional repository software Digital Commons because of various reasons including proprietary schema incompatibilities and high level programming expertise requirements our institution did not want to pursue. Only one semi-automated approach was found in the library literature which met our requirements for implementation, and even though it catered to the particular needs of the DSpace IR, it could be implemented to other IR software if further customizations were applied. The following paper presents an extension of this semi-automated approach originally created by Deng and Reese, but customized and adapted to address the particular needs of the Digital Commons community and updated to integrate the latest Resource Description & Access (RDA) content standards for ETDs. Advantages and disadvantages of this workflow are discussed and presented as well.

Code4Lib Journal: Metadata Analytics, Visualization, and Optimization: Experiments in statistical analysis of the Digital Public Library of America (DPLA)

Tue, 2016-07-19 15:08
This paper presents the concepts of metadata assessment and “quantification” and describes preliminary research results applying these concepts to metadata from the Digital Public Library of America (DPLA). The introductory sections provide a technical outline of data pre-processing, and propose visualization techniques that can help us understand metadata characteristics in a given context. Example visualizations are shown and discussed, leading up to the use of "metadata fingerprints" -- D3 Star Plots -- to summarize metadata characteristics across multiple fields for arbitrary groupings of resources. Fingerprints are shown comparing metadata characterisics for different DPLA "Hubs" and also for used versus not used resources based on Google Analytics "pageview" counts. The closing sections introduce the concept of metadata optimization and explore the use of machine learning techniques to optimize metadata in the context of large-scale metadata aggregators like DPLA. Various statistical models are used to predict whether a particular DPLA item is used based only on its metadata. The article concludes with a discussion of the broad potential for machine learning and data science in libraries, academic institutions, and cultural heritage.

David Rosenthal: More on Terms of Service

Tue, 2016-07-19 15:00
When Jefferson Bailey & I finished writing My Web Browser's Terms of Service I thought I was done with the topic, but two recent articles bought it back into focus. Below the fold are links, extracts and comments.

In Ticking all the boxes, The Economist writes about an interesting legal theory underpinning a rash of cases in New Jersey:
The suits seek to exploit the Truth-in-Consumer Contract, Warranty and Notice Act, enacted in New Jersey 35 years ago. This was intended to prevent companies that do business in the state from using contracts, notices or signs to limit consumer rights protected by law.These suits:
generally include allegations that online terms violate consumers’ rights to seek damages as protected by New Jersey law and fail to explain which provisions cover New Jersey. ... plaintiffs need not show injury or loss in order to sue but merely prove violation of the TCCWNA. The risks to companies are significant:
the TCCWNA entitles each successful plaintiff to at least $100 in damages, plus fees to lawyers and so on. If a website has millions of visitors, the costs to a company could be staggering. But they are balanced by longer-term risks to consumers:
A growing number of firms, emboldened by favourable Supreme Court rulings, have adopted clauses that limit class-action suits. Consumers are instead restricted to resolving disputes individually, in arbitration. The TCCWNA cases may inspire more firms to add such caveats. That might limit frivolous suits. But consumers with grave complaints would be unable to sue, either. In the end lawsuits over restrictive contracts may make them more restrictive still. An example of this trend is Pokemon Go:
to play Pokemon Go, you have to accede to a binding arbitration clause, surrendering your right to sue and promising only to seek redress for any harms that the company visits upon you in a system of secretive, one-sided shadow courts paid for by corporations where class actions are not permitted and the house always wins. ... Pokemon joins a small but growing movement of online services that strip their customers of their legal rights as a condition of sale, including Google Fiber and AirbnbIt could be worse, in this case you can send an email:
within 30 days of creating your account, and include in the body "a clear declaration that you are opting out of the arbitration clause in the Pokémon Go terms of service." In The Biggest Lie on the Internet: Ignoring the Privacy Policies and Terms of Service Policies of Social Networking Services, Jonathan Obar and Anne Oeldorf-Hirsch report on:
an empirical investigation of privacy policy (PP) and terms of service (TOS) policy reading behavior. An experimental survey (N=543) assessed the extent to which individuals ignore PP and TOS when joining a fictitious social networking site, NameDrop. Results reveal 74% skipped PP, selecting ‘quick join.’ For readers, average PP reading time was 73 seconds, and average TOS reading time was 51 seconds. Based on average adult reading speed (250-280 words per minute), PP should have taken 30 minutes to read, TOS 16 minutes. Among the clauses that almost all experimental subjects missed were ones requiring:
data sharing with the NSA and employers, and .. providing a first-born child as payment The thing is, consumers are probably being rational in ignoring the mandatory arbitration and the terms of service. Even with a class action, the terms of service are so stacked against the consumer that a win is highly unlikely, and if it happens the most the consumer can expect is a mountain of paperwork asking for proofs that they almost certainly don't possess, in order to stake a claim to the crumbs left over after the class action lawyers get paid.

A social network that started suing its members for the kinds of things everyone does would be out of business very quickly, so the details of the terms are pretty irrelevant compared to the social norms of the network. The privacy terms are perhaps more important, but if you care about privacy the last thing you should be doing is using a social network.

Eric Lease Morgan: How not to work during a sabbatical

Tue, 2016-07-19 14:43

This presentation — given at Code4Lib Midwest (Chicago, July 14, 2016) — outlines the various software systems I wrote during my recent tenure as an adjunct faculty member at the University of Notre Dame. (This presentation is also available as a one-page PDF handout designed to be duplex printed and folded in half as if it were a booklet.)

  • How rare is rare? – In an effort to determine the “rarity” of items in the Catholic Portal, I programmatically searched WorldCat for specific items, counted the number of times it was held by libraries in the United States, and recorded the list of the holding libraries. Through the process I learned that most of the items in the Catholic Portal are “rare”, but I also learned that “rarity” can be defined as the triangulation of scarcity, demand, and value. Thus the “rare” things may not be rare at all.
  • Image processing – By exploiting the features and functions of an open source library called OpenCV, I started exploring ways to evaluate images in the same way I have been evaluating texts. By counting & tabulating the pixels in an image it is possible to create ratios of colors, do facial recognition, or analyze geometric composition. Through these processes is may be possible to supplement art history and criticism. For example, one might be able to ask things like, “Show me all of the paintings from Picasso’s Rose Period.”
  • Library Of Congress Name Authorities – Given about 125,000 MARC authority records, I wrote an application that searched the Library Of Congress (LOC) Name Authority File, and updated the local authority records with LOC identifiers, thus making the local authority database more consistent. For items that needed disambiguation, I created a large set of simple button-based forms allowing librarians to choose the most correct name.
  • MARC record enrichment – Given about 500,000 MARC records describing ebooks, I wrote a program that found the richest OCLC record in WorldCat and then merged the found record with the local record. Ultimately the local records included more access points and thus proved to be more useful in a library catalog setting.
  • OAI-PMH processing – I finally got my brain around the process of harvesting & indexing OAI-PMH content into VUFind. Whoever wrote the original OAI-PMH applications for VUFind did a very good job, but there is a definite workflow to the process. Now that I understand the workflow it is relatively easy ingest metadata from things like ContentDM, but issues with the way Dublin Core is implement still make the process challenging.
  • EEBO/TCP – Given the most beautiful TEI mark-up I’ve ever seen, I have systematically harvested the Early English Books Online (EEBO) content from the Text Encoding Initiative (TCP) and done some broad & deep but also generic text analysis subsets of the collection. Readers are able to search the collection for items of interest, save the full text to their own space for analysis, and have a number of rudimentary reports done against the result. This process allows the reader to see the corpus from a “distance”. Very similar work has been done against subsets of content from JSTOR as well as the HathiTrust.
  • VIAF Lookup – Given about 100,000 MARC authority records, I wrote a program to search VIAF for the most appropriate identifier and associate it with the given record. Through the process I learned two things: 1) how to exploit the VIAF API, and 2) how to exploit the Levenshtein algorithm. Using the later I was able to make automated and “intelligent” choices when it came to name disambiguation. In the end, I was able to accurately associate more than 80% of the authority names with VIAF identifiers.

My tenure as an adjunct faculty member was very much akin to a one year education except for a fifty-five year old. I did many of the things college students do: go to class, attend sporting events, go on road trips, make friends, go to parties, go home for the holidays, write papers, give oral presentations, eat too much, drink too much, etc. Besides the software systems outlined above, I gave four or five professional presentations, attend & helped coordinate five or six professional meetings, taught an online, semester-long, graduate-level class of on the topic of XML, took many different classes (painting, sketching, dance, & language) many times, lived many months in Chicago, Philadelphia, and Rome, visited more than two dozen European cities, painted about fifty paintings, bound & filled about two dozen hand-made books, and took about three thousand photographs. The only thing I didn’t do is take tests.

Islandora: Islandora's Technical Lead: Daniel Lamb

Mon, 2016-07-18 22:07

Following on our earlier announcement, the Islandora Foundation would like to announce that Daniel Lamb will be our Technical Lead, starting August 1st. The Islandora Technical Lead is responsible for providing leadership, technical guidance, coordination, and support to the open source community in the interests of releasing ongoing enhancements and maintenance to Islandora. 

Danny comes to us from discoverygarden, Inc, where he has spent the better part of four years working with Islandora. He is uniquely suited to taking on the role of Islandora’s Technical Lead as both a long-time committer to the Islandora 7.x-1.x project and the primary architect of Islandora CLAW. He is also an experienced presenter and Islandora Camp instructor, an official Committer on both Islandora teams, and has long been a leader in the Islandora community. We did a Meet Your Developer interview with Danny back in 2014 if you want to learn more about his background and his approach to development.

 

LITA: Transmission #7 – A Special Transmission

Mon, 2016-07-18 18:21

Hi, everyone! Due to technical challenges and delays, I am reopening the Begin Transmission survey and feedback form (below). Join the conversation! Thanks for your help.

Loading…

LITA: Did you attend the recent ALA Annual 2016 conference in Orlando FL?

Mon, 2016-07-18 16:12

If so please complete our LITA conference programs survey evaluation at:

bit.ly/litaan16evals

We hope you had the best ALA Annual conference, and that attending useful, informative and fun LITA programs were an important part of your conference experience. If so please take a moment to complete our evaluation survey. Your responses are very important to your colleagues who are planning programming for next years ALA Annual, as well as LITA year round continuing education sessions.

To complete your survey it might also help to check back at the

Full schedule of LITA programs and meetings.

And recall other details at the LITA @ ALA Annual page.

Thank you and we hope to see you at the

LITA Forum in Fort Worth, TX, November 17-20, 2016

Islandora: Islandora is Getting a Technical Lead

Mon, 2016-07-18 14:28

The Islandora Foundation could not be more pleased to announce that it will be hiring a Technical Lead to start in August, 2016. The Islandora Technical Lead is responsible for providing leadership, technical guidance, coordination, and support to the open source community in the interests of releasing ongoing enhancements and maintenance to Islandora. Together with the Project & Community Manager and Islandora's governing committees, the Technical Lead ensures that Islandora software fulfills the mission of the project and the needs of community members.

The Technical Lead creates an inclusive, welcoming, open team environment based on a meritocracy of committers, contributors, documentation specialists, technical trainers, and other volunteer resources. They strive to recruit new members to the team from the larger community of volunteers. 

Hiring a Technical Lead has been a long-term goal of the Islandora Foundation since its launch in 2013, and we could not have gotten here without the support of our wonderful members, both institutions and individuals. While you may note that our membership funding has not quite reached the $160,000 minimum that we set as a goal, we have the opportunity to top up that funding with support from a grant for 2016/2017. We trust in our community to bring our membership funding up to close that gap over the next year so that the Islandora Foundation will remain sustainable going forward.

This is a huge step forward for the Islandora Foundation and the community that it serves. As we enter our fourth year as an independent non-profit, we look to that community for direction on where the project will go next. Adding a Technical Lead to our team will provide an invaluable resource to help achieve our goals, but the role of the Islandora community and the many wonderful volunteers within it will remain paramount. Islandora is for the community, by the community - just now with a Technical Lead for that community to work with.

 

District Dispatch: Dear RNC and DNC: Libraries are essential to achieving national priorities

Mon, 2016-07-18 14:14

Today, the Republican National Convention (RNC) kicks off in Cleveland, and the Democratic National Convention (DNC) begins next Monday in Philadelphia. In the latest installment of the Policy Revolution! initiative, ALA submitted comments to the Republican and Democratic Party Platform Committees. A party platform is a formal set of value statements and goals to guide the direction of a political party. Final discussion and ratification of the platforms will take place during the respective conventions.

ALA’s submission is based on a large body of prior work. At the most fundamental level, such comments are informed by internal ALA policies, approved by ALA’s Council. In terms of our work more specifically targeted to the national public policy arena, we completed the National Policy Agenda for Libraries in June 2015 to provide the strategic policy direction for libraries, under the auspices of a Library Advisory Committee that included a number of library community organizations in addition to ALA.

At this point in the process, the primary goal is to showcase how libraries contribute to the broad range of national goals of importance to the major political parties. Given the economic unease around the country, ALA comments highlighted the roles of libraries in advancing economic opportunity. The comments also address several issues that are prominent in the campaigns, such as national infrastructure, veterans, education and learning and others.

There will be some library presence at the conventions. In Cleveland, the Cuyahoga County (Ohio) Public Library will be streaming briefings organized by The Atlantic. In Philadelphia, Siobhan Reardon, President and Director, Free Library of Philadelphia, will serve on a panel on getting online and digital inclusion that will be keynoted by FCC Commissioner Mignon Clyburn.

Susan Hildreth, Tony Sarmiento and Alan Inouye (L-R) discuss the challenges and opportunities presented by the upcoming national elections.

OITP held a number of sessions at the 2016 ALA Annual Conference to provide briefings and obtain guidance from the ALA community for our future policy direction.  In particular, we held one public panel session moderated by Mark Smith, Director, Texas State Library and Archives Commission with panelists Susan Hildreth, Executive Director, Peninsula (Calif.) Library System; Tony Sarmiento, Executive Director, Senior Service America; and me. Thanks to those who attended our session or one of our meetings and provided advice.

Of course, the political conventions only mark the beginning of the actual presidential campaigns, so there is much more work to be done in the months leading into the election, the transition to the next President and the first 100 days of the new Administration. We will be developing and disseminating much more information and honing in on specific recommendations. So here’s a question for you: If you could say one (or two) things to the presidential candidates about the value of libraries to our respective communities, what would you highlight? We’d like to hear from you—via the comments option on District Dispatch or send email directly to me at ainouye@alawash.org.

The post Dear RNC and DNC: Libraries are essential to achieving national priorities appeared first on District Dispatch.

LibUX: Create Once, Publish Everywhere

Mon, 2016-07-18 05:35

In 2014 I persuaded my library to build another website. No, it wasn’t a redesign, no new entity from the ground up to replace what we have. This was another website — a second one.

Ours is a unique joint-use academic and public facility. Divide this library’s users into its broadest audiences and there are still plenty to account for: faculty, undergraduate and graduate students (local, distant), alumni, the public – whatever that means.

Chunking the latter into one big patronbase isn’t particularly useful, but the allocation of the homepage’s real-estate constricted our ability to finely tune it. Our incentive to accommodate the academic community crowded out our ability to accommodate the audience who cared about events and entertainment – and this is precisely where our usability studies drew the line. Public cardholders appreciated but asked for more prominent access to new popular materials and programs, students and faculty were pretty clear about what they didn’t want.

So, I talked colleagues into spinning-off a new website — different look and feel, tone, even domain — just for public library services, and they weren’t shy about voicing concerns about increased workload involved with doubling-up and maintaining two sets of content, and whether this decision would, for example, obfuscate research services from the public or programming from the faculty. Content locked away in a silo is, after all, locked away in a silo. There’s risk that a graduate student using the academic library website might not see that a favorite author is visiting when that event is only posted for the public.

Right. Big problem, but not one exactly unique to this project. Libraries have been suffering these pain points for years. Assuaging this grief is exactly the selling point for discovery layers. That “library website” that we refer to in the singular is more like an organism of microsites and applications: there is the catalog, a static homepage hosted by the institution or county, maybe a one-off Drupal site on a server the library controls, subject guides, an event management system, a room reservation system, and iPhone app. Silos are a library’s present and future.

The increasing device complexity and capability of the web is and will continue to reinforce silos. As libraries approach their mobile moment, library websites that try to do too much will fail, whereas sites and apps that focus on doing just one thing well will succeed. It’s this sentiment recommending developers consider breaking out functionality among multiple apps, that there is a point when an app can be too feature-rich.

The Kano model can illustrate that some features have a negative impact on customer satisfaction.

Everything is designed. Few things are designed well. Brian Reed

Libraries are actually in a good position to benefit from this strategy. So much of the library web presence is already single-purposed that it wouldn’t take much to retrofit. Rather than roll the catalog into the same template as the homepage, it can be embraced as a standalone web app with its own unique purpose-driven user interface. This isn’t about going off-brand, but without the pressure of inheriting a mega-menu from the homepage, the library can be more judicious with the catalog’s design. This makes sense for library services when patrons are task-driven. Time is better spent optimizing for engagement rather than making the sites identical.

Not to mention silos aren’t inherently bad for discovery. Organizing web content in the way that news sites have sports sections is sound. Robots and crawlers have an easier time indexing content when there is a delineated path in which similar content is clustered and interlinked. The machines are cool with it. What makes discovery tricky for humans is that content on one site isn’t usually available on another. If patrons visit the library with a task in mind — “I want to find something to read”, “I need to renew my items”, “I want to see events”, or “I need to write a paper” — then there isn’t much incentive to browse outside of that content silo.

Libraries can’t depend on patrons just happening onto the event calendar after picking through the databases, nor can they depend on cramming everything on, or funneling everyone through, the front page. Getting found is going to get harder. If an institution has the ability and incentive to build an app, stakeholders want that option to be on the table without dramatically impacting workflow.  Libraries will need to be able to grow, adapt, and iterate without having to fuss over content.

A copeing mechanism

I knew a standalone, public-themed, public-toned, public-specific library website would better resonate with, well, the public. If we were better able to fine-tune the content for the audience, patrons would be more likely to engage with library services for a longer time. This allows more opportunity to introduce new services, promote databases, maybe increase circulation.

At the same time, by relieving the pressure from just one homepage, the library can also better serve academic patrons. The opportunity to increase engagement all around won this gamble the stakeholder support it needed, but not if it dramatically strained workflow or blocked any potential content from any user. We needed to change how we approached content so that it was possible to share one item across all platforms, but at the same time prevent the need to micromanage which piece of content appeared where.

In 2009, Daniel Jacobson, Director of Application and Development for NPR wrote a series of posts on Programmable Web about the NPR API beginning with “C.O.P.E.: Create Once, Publish Everywhere.” To meet the content demand for native apps on various platforms, microsites, including the NPR affiliate sites, the team wrote an API, which made it easier to fetch and repurpose media. This today is an important principle for addressing the challenges of a future-friendly web.

For most libraries it’s not going to be realistic to control all the content from one system, yet consolidating what’s possible will make it easier to manage over time. With some static pages on the institutional web server with limited control, we began migrating this old content in to a WordPress multisite, with which staff were already familiar.

There were specific types of content we intended to broadcast to the four corners: notifications and schedules, databases, events, reader’s advisory in the form of lists and staff picks, guides, and instructional videos. If the library’s success was determined by usage, turnout, and circulation, on the web that success very much depends on the ability to spotlight this content at the patron’s point of engagement.

A content system as-is doesn’t cut it. Popular content management systems like WordPress and Drupal are wonderful, but to meet libraries’ unique and portable needs these need a little rigging. If an institution hopes to staff-source content and expect everyone to use the system, then tailoring the CMS to the needs and culture of the library is an important step.

Subject specialists were creating guides and videos. Librarians involved with programming (both academic and public) were creating events. Others maintained departmental info, policies, schedules.

To ensure consistent and good content from folks better suited to create it, it is unfair and counterproductive to present a system with too steep a learning curve. I admit to being naive and surprised to see how strange and unfamiliar WordPress could be for those who don’t spend all day in it. De-jargoning the content system is no less important than de-jargoning the content.

Plus, these systems require tweaking to make content sufficiently modular. WordPress’s default editor–title, tags and categories, a featured image, and a blank slate text box–doesn’t fly for a content-type like an event, which requires start and end times, dates, all-day or multi-day options.

Moreover, the blank slate is intimidating.

Rigid content requirements and a blank-slate WYSIWYG don’t scale. When demanding content is detail oriented enough to have instructions, the stock editor can be replaced with smaller custom fields, which like any form element can be required before the post can be published.

Here’s an example: a self-hosted video actually requires multiple video formats to be cross-browser compatible and captions to be accessible. Publishing without captions violates accessibility guidelines, but without being able to ensure that the captions exists it is inevitable that at some point an inaccessible video will go live. Breaking the content editor into smaller, manageable chunks allows for fine control, checks and balances, and has the added opportunity to insert instructions at each step to streamline the process.

A cross-system, cross-department, controlled vocabulary is key. When we first started to think about sharing content between public and academic library websites, we knew that on some level all content would need to be filterable by either terms “public” or “academic.” We’re not going to publish something twice, so the public library website will have to know that it needs “public” content.

This was an addicting train of thought. We could go hog wild if new pages knew what kind of content to curate. What would it take then to create a page called “gardening” and make it a hub for all library content about the topic? It needs to be dynamic so it can stay current without micromanagement. It needs to populate itself with gardening book lists, staff-picks, upcoming “gardening” events, agricultural resources and databases – assuming the content exists. Isn’t this just a subject search for “gardening”?

If a library can assign one or two taxonomies that could be applied to all sorts of disparate content, then the query a site makes for the API could match categories regardless of their content type. The taxonomy has to be controlled and enforced so that it is consistent, and when possible can be built right into the post editor. Using WordPress, custom taxonomies can be tied to multiple types without fuss.

register_taxonomy( 'your-taxonomy', // Add any content type her 'your-taxonomy' can be used array( 'databases', 'events', 'reviews', 'items', 'video' ); // other taxonomy options omitted for brevity array( /* options */ ); );

I created two taxonomies: “Library Audience,” which lets us filter content for the type of patron–academic, public, teen, children and family, distance student, etc.–and “Subject” lets us filter by subject. The no-red-tape way to create a global “Subject” taxonomy was to just use the subjects that the library’s electronic resources use, a standardized vocabulary overseen by a committee. In our specific case, database subjects actually boil down to a three-letter designation. So while users see “Business,” the slug passed around behind the scenes is “zbu.”

Here is what a query against our eventual API for “business” looked like:

https://example-library.org/api/get_content/?taxonomy=subjects&slug=zbu

Content is then liberated by an API. Content management systems like WordPress or Drupal already have an API of sorts: the RSS feed. Technically, any site can ingest the XML and republish whatever content is made available, but it won’t include things added custom to the CMS. This isn’t an uncommon need, so both WordPress and Drupal have REST APIs – which is a little beyond the scope of this writeup.

These enable the programmatic fetch and push of content from one platform into another.

In LibGuides — an increasingly capable library-specific content management system — our content creators can use HTML5 data attributes as hooks in the template that help determine the audience and type of content to grab. It creates a little empty placeholder, like an ad spot, to be populated by a relevant event (if any), book lists, relevant resources, past or upcoming workshops, and more.

At the time this article was originally written in summer 2014 it looked a little like

<span data-content="event" data-audience="public" data-subject="comics"></span>

in which librarians decided what type of content (e.g., an event) went where on their page. For each placeholder, based off its parameters, a script builds and submits the API query string using jQuery:

$.getJSON( ‘//www.example-library.org/api/get_event/?taxonomy=audienc&slug=public&term=comics’ ) .success( function( response ) { // do something });

We have since largely traded jQuery for Angular. When there is a placeholder it’s a tad more agnostic

<ng-repeat="ad in ads"> {{ title }} {{ etc }} </ng-repeat>

but more often than not we just weasel it in there using attributes such as audience and type, which unless other specified will determine the values from the page.

Not random, but library events that make sense on the pages they appear.

Remember that just a few years ago many libraries rushed to create mobile sites but then struggled to maintain two sets of content, and the follow-up responsive web design is a long process involving a lot of stakeholders – many haven’t gotten this far because of the red-tape. The landscape of the web will only get weirder. There are and will continue to be new markets, new corners of the internet where libraries will want to be.

Libraries that can C.O.P.E. will be able to grow, iterate, and evolve. Libraries that can’t, won’t.

The post Create Once, Publish Everywhere appeared first on LibUX.

Galen Charlton: Cats who reside in story

Sun, 2016-07-17 22:54

The tragedy of keeping house with cats is that their lives are so short in comparison to our own.

On Friday, Marlene and I put Sophie to rest; today, LaZorra. Four years ago, we lost Erasmus; before that, Scheherazade and Jennyfur. At the moment, we have just one, Amelia. It was a relief that she got a clean bill of health on Saturday… but she is nonetheless sixteen years old. The inexorability of time weighs heavily on me today.

I have no belief that there is any continuation of thought or spirt or soul after the cessation of life; the only persistence I know of for our cats is in the realm of story. And it is not enough: I am not good enough with words to capture and pin down the moment of a cat sleeping and purring on my chest or how the limbs of our little feline family would knot and jumble together.

Words are not nothing, however, so I shall share some stories about the latest to depart.

LaZorra was named after the white “Z” on her back, as if some bravo had decided to mark her before she entered this world. LaZorra was a cat of great brain, while her brother Erasmus was not. We would joke that LaZorra had claimed not only her brain cells, but those of her daughters Sophia and Amelia. (Who were also Erasmsus’ children; suffice it to say that I thought I had more time to spay LaZorra than was actually the case).

Although she was a young mother, LaZorra was a good one. Scheherazade was alive at the time and also proved to be a good auntie-cat.

Very early on, a pattern was set: Sophie would cuddle with her father Rasi; Mellie with her mother Zorrie. LaZorra would cuddle with me; as would Erasmus; per the transitive property, I ended up squished.

But really, it took only one cat to train me. For a while LaZorra had a toy that she would drag to me when she wanted me to play with her. I always did; morning, afternoon, evening, at 2 in the morning…

“NO!”

Well, that was Marlene reminding me that once I taught a cat that I could be trained to play with her at two a.m. that there would be no end of it—nor any rest for us—so I did not end up being perfectly accommodating.

But I came close. LaZorra knew that she was due love and affection; that her remit included unlimited interference with keyboards and screens. And in the end, assistance when she could no longer make even the slight jump to the kitchen chair.

When we lost Erasmus to cancer, Marlene and I were afraid that Sophie would inevitably follow. For her, Rasi was her sun, moon, and stars. We had Erasmus euthanized at home so that the others would know that unlike the many trips for chemo, that this time he was not coming back. Nonetheless, Sophie would often sit at the door, waiting for her daddy to come back home.

She never stopped doing that until we moved.

It was by brush and comb, little by little as she camped out on the back of the couch, that I showed her that humans might just possibly be good for something (though not as a substitute for her daddy-cat). It is such a little thing, but I hold it as one of my personal accomplishments that I helped her look outward again.

Eventually those little scritches on the back of the couch became her expected due: we learned that we were to pay the Sophie-toll every time we passed by her.

Both LaZorra and Sophie were full of personality—and thus, they were often the subjects of my “Dear Cat” tweets. I’ll close with a few of them.

Butter to LaZorra was as mushrooms to hobbits:

Dear cat:

There is no butter left.
There was no butter.
Butter never exists.
Butter? What's that?
(May I eat breakfast in peace?)

Love,
G

— Galen Charlton (@gmcharlt) July 9, 2016

At times, she was a little too clever for her own good:

Dear cat:

Precision in eating the pill pocket around the pill is NOT a virtue.

Love,
Galen

— Galen Charlton (@gmcharlt) April 26, 2016

Sophie was the only cat I’ve known to like popping bubblewrap:

Dear cat:

Carry on, carry on. No need to stop popping that bubble-wrap on my account.

Er, why are you looking so guilty?

Love,

Galen

— Galen Charlton (@gmcharlt) March 20, 2016

Sophie apparently enjoyed the taste of cables:

Dear cat:

If you persist in chewing on that cable, I will be unable to post your picture on the internet, and that is just not allo

— Galen Charlton (@gmcharlt) November 1, 2015

LaZorra was the suitcase-inspector-in-chief:

Dear cat,

Out of the suitcase, please. It suffices for me to bring your legend to #alamw16; your fur is optional.

Love,
Galen

— Galen Charlton (@gmcharlt) January 7, 2016

And, of course, they could be counted on to help with computation:

Dear cat:

I'm not sure even Google can turn up useful results for the query "ssser xzssfcvvvd|{PO? (99999"

Love,

Galen#ActualTranscript

— Galen Charlton (@gmcharlt) December 9, 2014

They both departed this world with pieces of our hearts in their claws.

FOSS4Lib Recent Releases: VuFind Harvest - 2.1.0

Sun, 2016-07-17 17:08

Last updated July 17, 2016. Created by Peter Murray on July 17, 2016.
Log in to edit this page.

Package: VuFind HarvestRelease Date: Thursday, July 14, 2016

FOSS4Lib Updated Packages: VuFind Harvest

Sun, 2016-07-17 17:07

Last updated July 17, 2016. Created by Peter Murray on July 17, 2016.
Log in to edit this page.

VuFindHarvest contains OAI-PMH harvesting logic. This is part of the VuFind project (https://vufind.org) but may be used as a stand-alone tool or incorporated into other software dealing with metadata harvesting.

Package Type: Metadata ManipulationLicense: GPLv2 Package Links Releases for VuFind Harvest Programming Language: PHPOpen Hub Link: https://www.openhub.net/p/vufindharvestOpen Hub Stats Widget: 

LibUX: 63% of web traffic originates from a smartphone

Sun, 2016-07-17 00:49

More library users than not have a touch screen that connects to the Internet in their pocket. In the United States, 63% of web traffic originates from a smartphone. This number aligns with the Pew Research Center report most recently updated in October 2014, adding additional context by noting that as of last year more than “90% of American adults own a cell phone,” 42% own a tablet. Of them, 34% “go online mostly using their phones, and not using some other device such as a desktop or laptop computer.”

These trends hearken back to an industry prediction coined by Googler Luke Wroblewski called “the mobile moment.” That is, the point at which mobile traffic to an organization’s website eclipses traditional non-mobile traffic, such as from a desktop.

The pressure for libraries to make their websites mobile friendly increases from quarter to quarter as the upward trajectory makes itself more distinct.

2015 Q2 Mobile Overview Report

The post 63% of web traffic originates from a smartphone appeared first on LibUX.

LibUX: 21% of people start scrolling before the page finishes loading

Sun, 2016-07-17 00:38

21% of people start scrolling before the page finishes loading. They might just scroll past the upcoming events or new resources at the top of the page that libraries are trying to promote. Whoops. Chalk that up to “above the fold” irony.


2015 Q2 Mobile Overview Report

The post 21% of people start scrolling before the page finishes loading appeared first on LibUX.

LibUX: Average load time on mobile

Sun, 2016-07-17 00:30

The average load time on mobile is about 4 seconds. Type of connection—3G, 4G, LTE, WiFi, or something else—isn’t part of the report, but we know the majority of the devices are smartphones using Android or iOS. Okay, you could have guessed that. But the potential speed difference between 3G and WiFi is enormous, and we should be interested in how that breaks down. While overall web traffic from tablets makes up less than a third, the Adobe Digital Index 2014 U.S. Mobile Benchmark reported that 93% of this traffic is over WiFi, enough—I think—to skew MOVR’s load time average. My gut feeling is that a four second load for mobile devices is optimistic if those devices aren’t on WiFi but—I digress.

This is an important benchmark when considered with other data showing the importance of web page speed in terms of user behavior. For instance, the 2014 Radware Mobile State of the Union suggests that almost half of mobile users expect pages to load in just two seconds, and of them a whopping 40% will abandon a page taking longer than three. So, if the average mobile user is already in a perpetual state of uggggh come onnnnn then trying to connect to a slow library website or database isn’t doing much for his or her opinion of us.

2015 Q2 Mobile Overview Report

The post Average load time on mobile appeared first on LibUX.

David Rosenthal: What is wrong with science?

Sat, 2016-07-16 22:31
This is a quick post to flag two articles well worth reading.

The 7 biggest problems facing science, according to 270 scientists by Julia Belluz, Brad Plumer, and Brian Resnick at Vox is an excellent overview of some of the most serious problems, with pointers to efforts to fix them. Their 7 are:
  • Academia has a huge money problem:
    In the United States, academic researchers in the sciences generally cannot rely on university funding alone to pay for their salaries, assistants, and lab costs. Instead, they have to seek outside grants. "In many cases the expectations were and often still are that faculty should cover at least 75 percent of the salary on grants," writes John Chatham, ... Grants also usually expire after three or so years, which pushes scientists away from long-term projects. Yet as John Pooley ... points out, the biggest discoveries usually take decades to uncover and are unlikely to occur under short-term funding schemes.
  • Too many studies are poorly designed:
    An estimated $200 billion — or the equivalent of 85 percent of global spending on research — is routinely wasted on poorly designed and redundant studies, according to meta-researchers who have analyzed inefficiencies in research. We know that as much as 30 percent of the most influential original medical research papers later turn out to be wrong or exaggerated.
  • Replicating results is crucial — and rare:
    A 2015 study looked at 83 highly cited studies that claimed to feature effective psychiatric treatments. Only 16 had ever been successfully replicated. Another 16 were contradicted by follow-up attempts, and 11 were found to have substantially smaller effects the second time around. Meanwhile, nearly half of the studies (40) had never been subject to replication at all.
  • Peer review is broken:
    numerous studies and systematic reviews have shown that peer review doesn’t reliably prevent poor-quality science from being published.
  • Too much science is locked behind paywalls:
    "Large, publicly owned publishing companies make huge profits off of scientists by publishing our science and then selling it back to the university libraries at a massive profit (which primarily benefits stockholders)," Corina Logan, an animal behavior researcher at the University of Cambridge, noted. "It is not in the best interest of the society, the scientists, the public, or the research." (In 2014, Elsevier reported a profit margin of nearly 40 percent and revenues close to $3 billion.)
  • Science is poorly communicated:
    Science journalism is often full of exaggerated, conflicting, or outright misleading claims. If you ever want to see a perfect example of this, check out "Kill or Cure," a site where Paul Battley meticulously documents all the times the Daily Mail reported that various items — from antacids to yogurt — either cause cancer, prevent cancer, or sometimes do both.
    ...
    Indeed, one review in BMJ found that one-third of university press releases contained either exaggerated claims of causation (when the study itself only suggested correlation), unwarranted implications about animal studies for people, or unfounded health advice.
  • Life as a young academic is incredibly stressful:
    A 2015 study at the University of California Berkeley found that 47 percent of PhD students surveyed could be considered depressed
Amen to all of those. Gina Kolata at the New York Times limits So Many Research Scientists, So Few Openings as Professors to the over-production of Ph. D.s:
Dr. Larson and his colleagues calculated R0s for various science fields in academia. There, R0 is the average number of Ph.D.s that a tenure-track professor will graduate over the course of his or her career, with an R0 of one meaning each professor is replaced by one new Ph.D. The highest R0 is in environmental engineering, at 19.0. It is lower — 6.3 — in biological and medical sciences combined, but that still means that for every new Ph.D. who gets a tenure-track academic job, 5.3 will be shut out. In other words, Dr. Larson said, 84 percent of new Ph.D.s in biomedicine “should be pursuing other opportunities” — jobs in industry or elsewhere, for example, that are not meant to lead to a professorship.Again, amen. A friend of mine spotted this problem years ago and has been making a business advising grad students and post-docs how to transition to "real work".

Pages