You are here

Feed aggregator

Library of Congress: The Signal: The NDSR Boston Residents Reflect on their “20% Projects”

planet code4lib - Wed, 2015-05-20 12:39

The following is a guest post by the entire group of NDSR-Boston residents as listed below. For their final posting, the residents present an overview of their individual professional development projects.

Rebecca Fraimow (WGBH)

Rebecca

One of the best things about this year’s NDSR in Boston  is the mandate to dedicate 20% of our time to projects outside of the specific bounds of our institution. Taking coursework, attending conferences, creating workshops — it’s all the kind of stuff that’s invaluable in the archival profession but is often hard to make time for on top of a full-time job, and I really appreciated that NDSR explicitly supported these efforts.

While I definitely took advantage of the time for my own personal professional development — investing time in Python and Ruby on Rails workshops and Harvard’s CopyrightX course, as well as presentations at AMIA, Code4Lib, Personal Digital Archives, NEA and NDSR-NE — the portion of my 20% that I’ve most appreciated is the opportunity to expand the impact of the program beyond the bounds of the immediate NDSR community. With the support of the rest of the Boston cohort, I partnered with my WGBH mentor, Casey Davis, to lead a series of workshops on handling audiovisual analog and digital material for students at the Simmons School of Library and Information Science. It was fantastic to get a chance to share the stuff I’ve learned with the next generation of archivists (and, who knows, maybe some of the next round of NDSR residents!).

As a cohort, we’ve also teamed up to design workflows and best practice documents for the History Project — a Boston-based, volunteer-run LGBT archive with a growing collection of digitized and born-digital items. This project is also, I think, a really great example of the ways that the program can make an impact outside of the relatively small number of institutions that host residents, and illustrates how valuable it is to keep expanding the circle of digital preservation knowledge.

Samantha Dewitt (Tufts University)

Samantha

The NDSR residency has been a terrific experience for me, with the Tufts project proving to be a very good fit. Having been completely preoccupied with the subject of open science and Research Data Management in these past nine months, I am finding it hard to let go of the topic and I endeavor to continue working on this particular corner of the digital preservation puzzle. These days, data sharing and research data management frequently arise as topics of conversation in relation to research universities. Consequently, I had little trouble finding ways to add digital data preservation to my “20%” time. I looked forward to sharing the subject with my NDSR cohort whenever possible!

In November, our group attended a seminar on data curation at the Harvard-Smithsonian Center for Astrophysics. Several weeks later, I was able to meet with Dr. Micah Altman (MIT) to explore the subject of identifying and managing confidential information in research. Also in November, the Boston Library Consortium & Digital Science held a workshop at Tufts on Better Understanding the Research Information Management Landscape. Mark Hahnel, founder of Figshare, and Jonathan Breeze, CEO of Symplectic, spoke. This spring, Eleni Castro, research coordinator and data scientist at Harvard, met with our group to discuss the university’s new Dataverse 4.0 beta. Finally, in April, I was excited to be able to attend the Research Data Access and Preservation Summit in Minneapolis, MN. It has been a busy nine months!

Joey Heinen (Harvard Libraries)

Joey

The “20%” component of the National Digital Stewardship Residency is a great way for us to expand our interests, learn more about emerging trends and practices in the field and also to stay connected to any interests that might not align with our projects. My 20% involved a mixture of continuing education opportunities, organizing talks and tours and contributing to group projects which serve specific institutions or the field at large. For continuing education I learned some of the basics of Python programming through the Data Scientist Training for Librarians at Harvard.

For talks and tours, I organized a visit to the Northeast Document Conservation Center (largely to learn about the IRENE Audio Preservation System ) and with the Harvard Art Museum’s Registration and Digital Infrastructure and Emerging Technologies departments. I also co-organized an event entitled “Catching Waves: A Panel Discussion on Sustainable Digital Audio Delivery” (webex recording available soon on Harvard Library’s YouTube Channel). For developing resources I participated in the AMIA/DLF 2014 Hack Day in a group that developed a tool for comparing the output of three A/V characterization tools (see the related blog post) and also designed digital imaging and audio digitization workflows for the History Project.

Finally, I participated in NDSR-specific panels at the National Digital Stewardship Alliance – Northeast meeting (NDSA-NE) and the Spring New England Archivists conference as well as individually at the recent American Institute for Conservation of Historic and Artistic Works conference. All in all I am pleased with the diversity of the projects and my level of engagement with both the local and national preservation communities. (As a project update, here is the most recent iteration of the Format Migration Framework (pdf)).

Tricia Patterson (MIT Libraries)

Tricia

Two weeks left to go! And I ended up doing so much more than I initially anticipated during my residency. My project was largely focused on diagrammatically and textually documenting the low-level workflows of our digitization and managing digital preservation processes, some of the results of which can be seen on the Digital Preservation Management workshop site. But beyond the core of the project, so much else was accomplished. I helped organize both an MIT host event and a field trip to the JFK Library and Museum for my NDSR compadres. Joey Heinen and I co-organized a panel on sustainable digital audio delivery, replete with stellar panelists from both MIT and Harvard. I collaborated with my NDSR peers on a side assignment for the History Project. I also shared my work with colleagues at so many different venues, like presenting at the New England Music Library Association, giving a brown bag talk at MIT, writing on our group blog, being accepted to present with my MIT colleagues at the International Association of Sound and Audiovisual archives conference, and in the final days of my residency, presenting at the Association of Recorded Sound Collections conference.

All in all, a lot has been crammed into nine brief months: engaging in hands-on experience, enhancing my technological and organizational knowledge, forging connections in the digital preservation community and beyond. It really ended up being a vigorous and dynamic catapult into the professional arena of digital preservation. Pretty indispensable, I’d say!

Jen LaBarbera (Northeastern University)

Jen

Though my project focused specifically on creating workflows and roadmaps for various kinds of digital materials, I found myself becoming more and more intrigued by the conceptual challenges of digital preservation for the digital humanities. Working on this project as part of a residency meant that I had some flexibility and was given the time and encouragement to pursue topics of interest, even if they were only indirectly related to my project at Northeastern University.

As a requirement of the residency, each resident had to plan and execute an event at their host institution, and we were given significant latitude to define that event. Instead of doing the standard tour and in-person demonstration of my work at Northeastern, Giordana Mecagni and I chose to reach out to some folks in our library-based Digital Scholarship Group to host a conversation exploring the intersections between digital preservation and digital humanities. The response from the Boston digital humanities and library community was fantastic; people were eager to dive into this conversation and talk about the challenges and opportunities presented in preserving the scholarly products of the still fairly new world of digital humanities. We had a stellar turnout from digital humanities scholars and librarians from all over the Boston area, from institutions within the NDSR Boston cohort and beyond. We didn’t settle on any concrete answers in our conversation, but we were able to highlight the importance of digital preservation within the digital humanities world.

My experience with NDSR Boston will continue to be informative and influential as I move on to the next step in my career, as the lead archivist at Lambda Archives of San Diego in sunny southern California. From the actual work on my project at Northeastern to the people we met through our “20%” activities – e.g. touring NEDCC, attending Rebecca’s AV archiving workshops at Simmons, working with the History Project to develop digital preservation plans and practices – I feel much more prepared to responsibly preserve and make available the variety of formats of digital material that will inevitably come my way in my new position at this LGBTQ community archive.

DPLA: Developing and implementing a technical framework for interoperable rights statements

planet code4lib - Wed, 2015-05-20 12:10

Farmer near Leakey, holding a goat he has raised. Near San Antonio, 1973. National Archives and Records Administration. http://dp.la/item/234c76f4c6cc16488ddedbe69a7da297

Within the Technical Working Group of the International Rights Statements Working Group, we have been focusing our efforts on identifying a set of requirements and a technically sound and sustainable plan to implement the rights statements under development. Now that two of the Working Group’s white papers have been released, we realized it was a good time to build on the introductory blog post by our Co-Chairs, Emily Gore and Paul Keller. Accordingly, we hope this post provides a good introduction to our technical white paper, Recommendations for the Technical Infrastructure for Standardized International Rights Statements, and more generally, how our thinking has changed throughout the activities of the working group.

The core requirements

The Working Group realized early on that there was the need for a common namespace for rights statements in the context of national and international projects that aggregate cultural heritage objects. We saw the success of the work undertaken to develop and implement the Europeana Licensing Framework, but realized that a more general framework was necessary to be leveraged beyond the Europeana community.  In addition, we established that there was a clear need to develop persistent, dereferenceable URIs to provide human- and machine-readable representations.

In non-technical terms, this identifies a number of specific requirements. First, the persistence requirement means that the URIs need to remain the same over time, so we can ensure that they can be accessed consistently over the long term. The “dereferenceability” requirement states that when we request a rights statement by its URI, we need to get a representation back for it, either human-readable or machine-readable depending on how it’s requested. For example, if a person enters a rights statement’s URI in their web browser’s address bar, they should get an HTML page in response that presents the rights statement’s text and more information. By comparison, if a piece of software or a metadata ingestion process requests the rights statement by its URI, it should get a machine-readable representation (say, using the linked data-compatible JSON-LD format) that it can interpret and reuse in some predictable way.

Beyond these requirements, we also identified the need for both the machine-readable representation to provide specific kinds of additional information where appropriate, such as the name of the statement, the version of the statement, and where applicable, the jurisdiction where the rights statement applies. Finally, and most importantly, we needed a consistent way to provide translations of these statements that met the above requirements for dereferenceability, since they are intended to be reused by a broad international community of implementers.

Data modeling and implementation

After some discussion, we decided the best implementation for these rights statements was to develop a vocabulary implemented using the Resource Description Framework (RDF) and the Simple Knowledge Organization System (SKOS) standards. These standards are broadly used throughout the Web, and are both used within the Europeana Data Model  and the DPLA Metadata Application Profile. We are also looking at the Creative Commons Rights Expression Language (ccREL) and Open Digital Rights Language (ODRL) models to guide our development. At this stage, we have a number of modeling issues still open, such as which properties to use for representing various kinds of human-readable documentation or providing guidance on how to apply community-specific constraints and permissions. Deciding whether (and how) rights statements can be extended in the future is also an intriguing point. We are looking for feedback on all these topics!

As part of the process, we have been managing our draft implementation of the RightsStatements.org data model in a GitHub repository to allow for ease of collaboration across the technical subgroup. As the proposed rights statements become finalized following the discussion period on the two white papers, we will be working to provide a web server to host the machine-readable and human-readable versions of the rights statements in accordance with our requirements. To guide our implementation, we are building on the Best Practice Recipes for Publishing RDF Vocabularies with a slight modification to allow for better support for the multilingual requirements of the Working Group. Advice from the technical experts in our community is also highly welcome on this approach.

The end of the public feedback period has been set to Friday 26th June 2015, but the Technical Working Group will try to answer comments on the white paper on a regular basis, in the hope of setting up a continuous, healthy stream of discussion.

Acknowledgements

The technical work on implementing the rights statements has been deeply collaborative, and would not have been possible by the dedicated efforts of the members of the Technical Working Group:

  • Bibliothèque Nationale de Luxembourg: Patrick Peiffer
  • Digital Public Library of America: Tom Johnson and Mark Matienzo
  • Europeana Foundation: Valentine Charles and Antoine Isaac
  • Kennisland: Maarten Zeinstra
  • University of California San Diego: Esmé Cowles
  • University of Oregon: Karen Estlund
  • Florida State University: Richard J. Urban

Library Tech Talk (U of Michigan): Quality in HathiTrust (Re-Posting)

planet code4lib - Wed, 2015-05-20 00:00

This is a re-posting of a HathiTrust blog post. HathiTrust receives well over a hundred inquiries every month about quality problems with page images or OCR text of volumes in HathiTrust. That’s the bad news. The good news is that in most of these cases, there is something they can do about it. A new blog post is intended to shed some light on the thinking and practices about quality in HathiTrust.

LibUX: 019: Links Should Open in the Same Window

planet code4lib - Tue, 2015-05-19 22:39

Where should links open – and does it matter? In this episode of the podcast, we explore the implications on the net user experience of such a seemingly trivial preference.

Links

You can listen to LibUX on Stitcher, find us on iTunes, or subscribe to the straight feed. Consider signing-up for our weekly newsletter, the Web for Libraries.

The post 019: Links Should Open in the Same Window appeared first on LibUX.

SearchHub: Infographic: The Woes of the CIOs

planet code4lib - Tue, 2015-05-19 17:28
It’s tough out there for CIOs. They’re getting it from all sides and from all directions. Let’s take a look at the unique challenges CIOs face in trying to keep their organizations competitive and effective:

The post Infographic: The Woes of the CIOs appeared first on Lucidworks.

Islandora: Fedora 4 Project Update IV

planet code4lib - Tue, 2015-05-19 15:31

As the project entered the fourth month, work continued on migration planning and mapping, migration-utils, and Drupal integration.

Migration work was split between working on migration-utils, migration mappings, data modeling (furthering Portland Common Data Model compliance), and working with the Islandora (Fedora 4 Interest Group), Fedora (Fedora Tech meetings), and Hydra (Hydra Metadata Working Group) communities on the preceding items. In addition, Audit Service-- a key requirement of an Islandora community fcrepo3 -> fcrepo4 migration -- finalized the second phase of the project. Community stakeholders are currently reviewing and providing feedback.

Work on migration-utils focused mainly applying a number of mappings (outlined here) to the utility, adding support for object-to-object linking, and providing documentation on how to use the utility. This work can be demonstrated by building the Islandora 7.x-2.x Vagrant Box, cloning the migration-utils repository, and pointing migration-utils at a fcrepo3 native filesystem or directory of exported FOXML.

As for object modeling and inter-community work, an example of this work is the below image of a sample Islandora Large Image object modeled in the Portland Common Data Model. This model will continue to evolve as the communities work together in the various Hydra Metadata Working Group sub-working groups.

On the Drupal side of things, work was started on Middleware Services, a middleware service that will utilize the Fedora 4 REST API and the Drupal Services modules, and create an API for the majority of interactions between the two systems. In addition, a few Drupal modules have been created to leverage this; islandora_basic_image, islandora_collection, islandora_dcterms.

In addition, the team has been exploring options with RDF integration and support in Drupal, as well as how to handle editing (Islandora XML Forms) the various descriptive metadata schemas the community uses. This is captured in a few issues in the issue queue; #27 & #28. Due to the importance of the issue, a special Fedora 4 Interest Group meeting was held to discuss how to proceed with this functionality in Islandora 7.x-2.x. The group's consensus was to solicit use cases from the community to better understand how to proceed with 7.x-2.x

Work will continue on the migration and Drupal sides of the project into May.

David Rosenthal: How Google Crawls Javascript

planet code4lib - Tue, 2015-05-19 15:00
I started blogging about the transition the Web is undergoing from a document to a programming model, from static to dynamic content, some time ago. This transition has very fundamental implications for Web archiving; what exactly does it mean to preserve something that is different every time you look at it? Not to mention the vastly increased cost of ingest, because executing a program takes a lot more, a potentially unlimited amount of, computation than simply parsing a document.

The transition has big implications for search engines too; they also have to execute rather than parse. Web developers have a strong incentive to make their pages search engine friendly, so although they have enthusiastically embraced Javascript they have often retained a parse-able path for search engine crawlers to follow. We have watched academic journals adopt Javascript, but so far very few have forced us to execute it to find their content.

Adam Audette and his collaborators at Merkle | RKG have an interesting post entitled We Tested How Googlebot Crawls Javascript And Here’s What We Learned. It is aimed at the SEO (Search Engine Optimzation) world but it contains a lot of useful information for Web archiving. The TL;DR is that Google (but not yet other search engines) is now executing the Javascript in ways that make providing an alternate, parse-able path largely irrelevant to a site's ranking. Over time, this will mean that the alternate paths will disappear, and force Web archives to execute the content.

District Dispatch: Ending “bulk collection” of library records on the line in looming Senate vote

planet code4lib - Tue, 2015-05-19 13:14

Image Source: PolicyMic

Last week, the House of Representatives voted overwhelmingly, 338 to 88, for passage of the latest version of the USA FREEDOM Act, H.R. 2048. The bill — and the battle to achieve the first meaningful reform of the USA PATRIOT Act since it was enacted 14 years ago — now shifts to the Senate. There, the outcome may well turn on the willingness of individual voters to overwhelm Congress with demands that USA FREEDOM either be passed without being weakened, or that the now infamous “library provision” of the PATRIOT Act (Section 215) and others slated for expiration on June 1 simply be permitted to “sunset” as the Act provides if Congress takes no action. Now is the time for all librarians and library supporters — for you — to send that message to both of your US Senators. Head to the action center to find out how.

For the many reasons detailed in yesterday’s post, ALA and its many private and public sector coalition partners have strongly urged Congress to pass the USA FREEDOM Act of 2015 without weakening its key, civil liberties-restoring provisions. Already a finely-tuned compromise that delivers fewer privacy protections than last year’s Senate version of the USA FREEDOM Act, this year’s bill simply cannot sustain further material dilution and retain ALA’s (and many other groups’) support. The Obama Administration also officially endorsed and called for passage of the bill.

Unfortunately, the danger of the USA FREEDOM Act being blocked entirely or materially weakened is high. The powerful leader of the Senate, Mitch McConnell of Kentucky, is vowing to bar consideration of H.R. 2048 and, instead, to provide the Senate with an opportunity to vote only on his own legislation (co-authored with the Chair of the Senate Intelligence Committee) to reauthorize the expiring provisions of the PATRIOT Act with no privacy-protecting or other changes whatsoever. Failing the ability to pass that bill, Sen. McConnell and his allies have said that they will seek one or more short-term extensions of the PATRIOT Act’s expiring provisions.

Particularly in light of last week’s ruling by a federal appellate court that the government’s interpretation of its “bulk collection” authority under Section 215 was illegally broad in all key respects, ALA and its partners from across the political spectrum vehemently oppose any extension without meaningful reform of the USA PATRIOT Act of any duration.

The looming June 1 “sunset” date provides the best leverage since 2001 to finally recalibrate key parts of the nation’s surveillance laws to again respect and protect library records and all of our civil liberties. Please, contact your Senators now!

Additional Resources

House Judiciary Committee Summary of H.R. 2048

Statement of Sen. Patrick Leahy, lead sponsor of S. 1123 (May 11, 2015)

Open Technology Institute Comparative Analysis of select USA FREEDOM Acts of 2014 and 2015

Patriot Act in Uncharted Legal Territory as Deadline Approaches,” National Journal (May 10, 2015)

N.S.A. Collection of Bulk Call Data Is Ruled Illegal,” New York Times (May 7, 2015)

The post Ending “bulk collection” of library records on the line in looming Senate vote appeared first on District Dispatch.

LITA: Call for Writers

planet code4lib - Tue, 2015-05-19 13:00
meme courtesy of Michael Rodriguez

The LITA blog is seeking regular contributors interested in writing easily digestible, thought-provoking blog posts that are fun to read (and hopefully to write!). The blog showcases innovative ideas and projects happening in the library technology world, so there is a lot of room for contributor creativity. Possible post formats could include interviews, how-tos, hacks, and beyond.

Any LITA member is welcome to apply. Library students and members of underrepresented groups are particularly encouraged to apply.

Contributors will be expected to write one post per month. Writers will also participate in peer editing and conversation with other writers – nothing too serious, just be ready to share your ideas and give feedback on others’ ideas. Writers should expect a time commitment of 1-3 hours per month.

Not ready to become a regular writer but you’d like to contribute at some point? Just indicate in your message to me that you’d like to be considered as a guest contributor instead.

To apply, send an email to briannahmarshall at gmail dot com by Friday, May 29. Please include the following information:

  • A brief bio
  • Your professional interests, including 2-3 example topics you would be interested in writing about
  • If possible, links to writing samples, professional or personal, to get a feel for your writing style

Send any and all questions my way!

Brianna Marshall, LITA blog editor

Hydra Project: ActiveFedora 8.1.0 released

planet code4lib - Tue, 2015-05-19 08:21

We are pleased to announce the release of ActiveFedora 8.1.0.  This release:

– Patches casting behavior – see https://github.com/projecthydra/active_fedora/wiki/Patching-ActiveFedora-7.x-&-8.x-Casting-Behavior for detailed information on the problem.
– Fixes rsolr patch-level dependency introduced by 35189fc.

Details can be found at:  https://github.com/projecthydra/active_fedora/releases/tag/v8.1.0

Thanks, as always, to the team!

District Dispatch: EFF chief to keynote Washington Update session at Annual Conference

planet code4lib - Tue, 2015-05-19 06:56

Cindy Cohn, Legal Director and General Counsel for the EFF. Photographed by Erich Valo.

For decades, the Electronic Frontier Foundation (EFF) and the American Library Association (ALA) have stood shoulder to shoulder on the front lines of the fight for privacy online, at the library and in many other spheres of our daily lives. EFF Executive Director Cindy Cohn will discuss that proud shared history and the uncertain future of personal privacy during this year’s 2015 ALA Annual Conference in San Francisco. The session, titled “Frenetic, Fraught and Front Page: An Up-to-the-Second Update from the Front Lines of Libraries’ Fight in Washington,” takes place from 8:30 to 10:00 a.m. on Saturday, June 27, 2015, at the Moscone Convention Center in room 2001 of the West building.

Before becoming EFF’s Executive Director in April of 2015, Cohn previously served as the award-winning group’s legal director and general counsel from 2000–2015. In 2013, National Law Journal named Cohn one of the 100 most influential lawyers in America, noting: “If Big Brother is watching, he better look out for Cindy Cohn.” In 2012, the Northern California Chapter of the Society of Professional Journalists awarded her the James Madison Freedom of Information Award.

During the conference session, Adam Eisgrau, managing director of the ALA Office of Government Relations, will provide an up-to-the-minute insight from the congressional trenches of key federal privacy legislation “in play,” including the current status of efforts to reform the USA PATRIOT Act, the Freedom of Information Act (FOIA), as well as copyright reform, net neutrality and federal library funding. Participants will have the opportunity to pose questions to the speakers.

Speakers
  • Cindy Cohn, executive director, Electronic Frontier Foundation
  • Adam Eisgrau, managing director, Office of Government Relations, American Library Association

View all ALA Washington Office conference sessions

The post EFF chief to keynote Washington Update session at Annual Conference appeared first on District Dispatch.

Cynthia Ng: Accessible Format Production: Overview on Creating Accessible Formats

planet code4lib - Tue, 2015-05-19 01:49
I have been meaning to post a series of post on how to create accessible formats, so here’s the overview.  The Overall Process Scan the print material. Run through OCR, creating a Text-readable PDF. Edit the PDF to make an Accessible PDF. Convert PDF (or EPUB) to document format. Edit the document to make it … Continue reading Accessible Format Production: Overview on Creating Accessible Formats

DuraSpace News: Finish Off Your Digital Preservation To-do List with ArchivesDirect

planet code4lib - Tue, 2015-05-19 00:00

Winchester, MA  Everyone has a different set of priorities when it comes to planning for digital preservation. Here are some examples of items that might appear on a typical digital preservation to-do list:

1. leverage hosted online service to manage preservation process

2. apply different levels of preservation to different types of content

3. do more than back up content on spare hard drives

4. keep copies in multiple locations

5. make sure content remains viable

SearchHub: Lucidworks Fusion 1.4 Now Available

planet code4lib - Mon, 2015-05-18 19:34
We’ve just released Lucidworks Fusion 1.4! This version is a Short-Term Support, or “preview” release of Fusion. There are a lot of new features in this version. Some of the highlights are: Security Fusion has always provided fine-grained security control on top of Solr. In version 1.4, we’ve significantly enhanced our integration with enterprise security systems: Kerberos We now support setting up Fusion as a Kerberos-protected service. You will be able to authenticate to Kerberos in your browser or API client, and instead of providing a password to Fusion, Fusion will validate you and allow (or disallow) access via Kerberos mechanisms. LDAP Group Mapping We’ve enriched our LDAP directory integration. In the past, we’ve been able to use LDAP to authenticate users and perform document-level security trimming. We can now additionally determine the user’s LDAP group memberships, and use those memberships to assign them to Fusion roles. Alerting We’ve introduced pipeline stages to send alerts, one each in the indexing and query pipelines. With these stages, you can send emails or Slack messages in response to documents passing through those pipelines. Emails are fully templated, so you can customize the content and include data from matching documents. And you’ll soon also be able to add other alerting methods besides email and Slack. A simple use for these is to set up notifications whenever a document matching a set of queries is crawled. Look for a post from our CTO (who wrote the code!) published here for more info on using alerting. Logstash Connector We add new connector integrations to Fusion all the time, but the Logstash Connector deserves special note. For those of you collecting machine data, it’s been possible to configure Logstash to ship logs to a Fusion pipeline or Solr index. The new Fusion Logstash Connector does this too, but makes it easier to install, configure, and manage. We include an embedded Logstash installation, so that you can start, stop, and edit your Logstash configuration right from the Fusion Datasource Admin GUI. You can use any standard Logstash plugin (including the network listeners, file tailing inputs, grok filter, or other format filters), and Fusion will automatically send the results into Fusion. There, you can do further Fusion pipeline processing, simple field mappings, or just index straight into Solr. Apache Spark Fusion is now including Apache Spark and the ability to use Spark to run complex analytic jobs. For now, the Fusion event aggregations and signals extractions can run in Spark for faster processing. In future releases, we expect to allow you to write and run more types of jobs in Spark, taking advantage of any of Spark’s powerful features and rich libraries. Solr 5.x As of Fusion 1.4, we officially support running Fusion against Solr 5.x clusters. We will still ship with an embedded Solr 4.x installation until we have validated repackaging and upgrades for existing Fusion/Solr 4.x customers, but new customers are free to install Solr 5.x, start it up in the SolrCloud cluster mode (bin/solr start -c), and use Fusion and all Fusion features with the new version. As you can see, we’re quickly adding new capabilities to Fusion and these latest features are just a preview of what’s on the way. Stay tuned for much more! Download Lucidworks Fusion, read the release notes, or learn more about Lucidworks Fusion.

The post Lucidworks Fusion 1.4 Now Available appeared first on Lucidworks.

Jonathan Rochkind: Yahoo YBoss spell suggest API significantly increases pricing

planet code4lib - Mon, 2015-05-18 15:53

For a year or two, we’ve been using the Yahoo/YBoss/YDN Spelling Service API to provide spell suggestions for queries in our homegrown discovery layer. (Which provides UI to search the catalog via Blacklight/Solr, as well as an article search powered by EBSCOHost api).

It worked… good enough, despite doing a lot of odd and wrong things. But mainly it was cheap. $0.10 per 1000 spell suggest queries, according to this cached price sheet from April 24 2105. 

However, I got an email today that they are ‘simplifying’ their pricing by charging for all “BOSS Search API” services at $1.80 per 1000 queries, starting June 1.

That’s 18x increase. Previously we paid about $170 a year for spell suggestions from Yahoo, peanuts, worth it even if it didn’t work perfectly. That’s 1.7 million querries for $170, pretty good.  (Honestly, I’m not sure if that’s still making queries it shouldn’t be, in response to something other than user input. For instance, we try to suppress spell check queries on paging through an existing result set, but perhaps don’t do it fully).

But 18x $170 is $3060.  That’s a pretty different value proposition.

Anyone know of any decent cheap spell suggest API’s? It looks like maybe Microsoft Bing has a poorly documented one.  Not sure.

Yeah, we could role our own in-house spell suggestion based on a local dictionary or corpus of some kind. aspell, or Solr’s built-in spell suggest service based on our catalog corpus.  But we don’t only use this for searching the catalog, and even for the catalog I previously found these API’s based on web searches provided better results than a local-corpus-based solution.  The local solutions seemed to false positive (provide a suggestion when the original query was ‘right’) and false negative (refrain from providing a suggestion when it was needed) more often than the web-based API’s. As well, of course, as being more work on us to set up and maintain.


Filed under: General

Manage Metadata (Diane Hillmann and Jon Phipps): What’s up with this Jane-athon stuff?

planet code4lib - Mon, 2015-05-18 15:13

The RDA Development Team started talking about developing training for the ‘new’ RDA, with a focus on the vocabularies, in the fall of 2014. We had some notion of what we didn’t want to do: we didn’t want yet another ‘sage on the stage’ event, we wanted to re-purpose the ‘hackathon’ model from a software focus to data creation (including a major hands-on aspect), and we wanted to demonstrate what RDA looked like (and could do) in a native RDA environment, without reference to MARC.

This was a tall order. Using RIMMF for the data creation was a no-brainer: the developers had been using the RDA Registry to feed new vocabulary elements into their their software (effectively becoming the RDA Registry’s first client), and were fully committed to FRBR. Deborah Fritz had been training librarians and other on RIMMF for years, gathering feedback and building enthusiasm. It was Deborah who came up with the Jane-athon idea, and the RDA Development group took it and ran with it. Using the Jane Austen theme was a brilliant part of Deborah’s idea. Everybody knows about JA, and the number of spin offs, rip-offs and re-tellings of the novels (in many media formats) made her work a natural for examining why RDA and FRBR make sense.

One goal stated everywhere in the marketing materials for our first Jane outing was that we wanted people to have fun. All of us have been part of the audience and on the dais for many information sessions, for RDA and other issues, and neither position has ever been much fun, useful as the sessions might have been. The same goes for webinars, which, as they’ve developed in library-land tend to be dry, boring, and completely bereft of human interaction. And there was a lot of fun at that first Jane-athon–I venture to say that 90% of the folks in the room left with smiles and thanks. We got an amazing response to our evaluation survey, and the preponderance of responses were expansive, positive, and clearly designed to help the organizers to do better the next time. The various folks from ALA Publishing who stood at the back and watched the fun were absolutely amazed at the noise, the laughter, and the collaboration in evidence.

No small part of the success of Jane-athon 1 rested with the team leaders at each table, and the coaches going from table to table helping out with puzzling issues, ensuring that participants were able to create data using RIMMF that could be aggregated for examination later in the day.

From the beginning we thought of Jane 1 as the first of many. In the first flush of success as participants signed up and enthusiasm built, we talked publicly about making it possible to do local Jane-athons, but we realized that our small group would have difficulty doing smaller events with less expertise on site to the same standard we set at Jane-athon 1. We had to do a better job in thinking through the local expansion and how to ensure that local participants get the same (or similar) value from the experience before responding to requests.

As a step in that direction CILIP in the UK is planning an Ag-athon on May 22, 2015 which will add much to the collective experience as well as to the data store that began with the first Jane-athon and will be an increasingly important factor as we work through the issues of sharing data.

The collection and storage of the Jane-athon data was envisioned prior to the first event, and the R-Balls site was designed as a place to store and share RIMMF-based information. Though a valuable step towards shareable RDA data, rballs have their limits. The data itself can be curated by human experts or available with warts, depending on the needs of the user of the data. For the longer term, RIMMF can output RDF statements based on the rball info, and a triple store is in development for experimentation and exploration. There are plans to improve the visualization of this data and demonstrate its use at Jane-athon 2 in San Francisco, which will include more about RDA and linked data, as well as what the created data can be used for, in particular, for new and improved services.

So, what are the implications of the first Jane-athon’s success for libraries interested in linked data? One of the biggest misunderstandings floating around libraryland in linked data conversations is that it’s necessary to make one and only one choice of format, and eschew all others (kind of like saying that everyone has to speak English to participate in LOD). This is not just incorrect, it’s also dangerous. In the MARC era, there was truly no choice for libraries–to participate in record sharing they had to use MARC. But the technology has changed, and rapidly evolving semantic mapping strategies [see: dcpapers.dublincore.org/pubs/article/view/3622] will enable libraries to use the most appropriate schemas and tools for creating data to be used in their local context, and others for distributing that data to partners, collaborators, or the larger world.

Another widely circulated meme is that RDA/FRBR is ‘too complicated’ for what libraries need; we’re encouraged to ‘simplify, simplify’ and assured that we’ll still be able to do what we need. Hmm, well, simplification is an attractive idea, until one remembers that the environment we work in, with evolving carriers, versions, and creative ideas for marketing materials to libraries is getting more complex than ever. Without the specificity to describe what we have (or have access to), we push the problem out to our users to figure out on their own. Libraries have always tried to be smarter than that, and that requires “smart” , not “dumb”, metadata.

Of course the corollary to the ‘too complicated’ argument lies the notion that a) we’re not smart enough to figure out how to do RDA and FRBR right, and b) complex means more expensive. I refuse to give space to a), but b) is an important consideration. I urge you to take a look at the Jane-athon data and consider the fact that Jane Austen wrote very few novels, but they’ve been re-published with various editions, versions and commentaries for almost two centuries. Once you add the ‘based on’, ‘inspired by’ and the enormous trail created by those trying to use Jane’s popularity to sell stuff (“Sense and Sensibility and Sea Monsters” is a favorite of mine), you can see the problem. Think of a pyramid with a very expansive base, and a very sharp point, and consider that the works that everything at the bottom wants to link to don’t require repeating the description of each novel every time in RDA. And we’re not adding notes to descriptions that are based on the outdated notion that the only use for information about the relationship between “Sense and Sensibility and Sea Monsters” and Jane’s “Sense and Sensibility” is a human being who looks far enough into the description to read the note.

One of the big revelations for most Jane-athon participants was to see how well RIMMF translated legacy MARC records into RDA, with links between the WEM levels and others to the named agents in the record. It’s very slick, and most importantly, not lossy. Consider that RIMMF also outputs in both MARC and RDF–and you see something of a missing link (if not the Golden Gate Bridge .

Not to say there aren’t issues to be considered with RDA as with other options. There are certainly those, and they’ll be discussed at the Jane-In in San Francisco as well as at the RDA Forum on the following day, which will focus on current RDA upgrades and the future of RDA and cataloging. (More detailed information on the Forum will be available shortly).

Don’t miss the fun, take a look at the details and then go ahead and register. And catalogers, try your best to entice your developers to come too. We’ll set up a table for them, and you’ll improve the conversation level at home considerably!

LITA: Tech Yourself Before You Wreck Yourself – Volume 6

planet code4lib - Mon, 2015-05-18 14:00

What’s new with you TYBYWYers? I’m sure you’ve been setting the world on fire with your freshly acquired tech skills. You’ve been pushing back the boundaries of the semantic web. Maybe the rumors are true and you’re developing a new app to better serve your users. I have no doubt you’re staying busy.

If you’re new to Tech Yourself, let me give you a quick overview. Each installment, produced monthly-ish offers a curated list of tools and resources for library technologists at all levels of experience. I focus on webinars, MOOCs, and other free/low-cost options for learning, growing, and increasing tech proficiency. Welcome!

Worthwhile Webinars:

Texas State Library and ArchivesTech Tools With Tine – One Hour of Arduino – May 29, 2015 – I’ve talked about this awesome ongoing tech orientation series before, and this installment on Arduino promises to be an exciting time!

TechSoup for LibrariesExcel at Everything! (Or At Least Make Better Spreadsheets) – May 21, 2015 – I will confess I am obsessed with Excel, and so I take every free class I find on the program. Hope to see you at this one!

Massachusetts Library SystemPower Searching: Databases and the Hidden Web – May 28, 2015 – Another classic topic, and worth revisiting!

I Made This:

LYRASISLYRASIS eGathering – May 20th, 2015

Shameless self-promotion, but I’m going to take three paragraphs to draw your attention to an online conference which I’ve organized. I know! I am proud of me too.

eGathering 2015

But not as proud as I am of the impressive and diverse line-up of speakers and presentations that comprise the 2015 eGathering. The event is free, online, and open to you through the generosity of LYRASIS members. Register online today and see a Keynote address by libtech champion Jason Griffey, followed by 6 workshop/breakout sessions, one of which is being hosted by our very own LITA treasure, Brianna Marshall. Do you want to learn ’bout UX from experts Amanda L. Goodman and Michael Schofield? Maybe you’re more interested in political advocacy and the library from EveryLibrary‘s John Chrastka? We have a breakout session for you.

Register online today! All registrants will receive an archival copy of the complete eGathering program following the event. Consider it my special gift to you, TYBYWYers.

Tech On!

TYBYWY will return June 19th!

DPLA: A DPLA of Your Very Own

planet code4lib - Mon, 2015-05-18 13:48

This guest post was written by Benjamin Armintor, Programmer/Analyst at Columbia University Libraries and a 2015 DPLA + DLF Cross-Pollinator Travel Grant awardee.

I work closely with the Hydra and Blacklight platforms in digital library work, and have followed the DPLA project with great interest as a potential source of data to drive Blacklight sites. I think of frameworks like Blacklight as powerful tools for exploring what can be done with GLAM data and resources, but it’s difficult to get started in without data and resources to point it at. I had experimented with mashups of OpenLibrary data and public domain MARC cataloging, but the DPLA content was uniquely rich and varied, has a well-designed API, and carried with it a decent chance that an experimenter would be affiliated with some of the entries in the index.

Blacklight was designed to draw its data from Solr, but the DPLA API itself is so close to a NoSQL store that it seemed like a natural fit to the software. Unfortunately, it’s hard to make time for projects like that, and as such the DPLA+DLF Cross-Pollinator travel grant was a true boon.

Attending DPLAfest afforded me a unique opportunity to work with the DPLA staff on a project to quickly build a Blacklight site against DPLA data, and thanks to their help and advice I was able to push along a Blacklight engine that incorporated keyword and facet searches and thumbnail images of the entire DPLA corpus—an impressive 10 million items!—by the end of the meeting. The progress we made was enthusiastically received by the Blacklight and Hydra communities: I began receiving contributions and installation reports before the meeting was over. I’ve since made progress moving the code along from a conference demonstration to a fledgling project; the community contributions helped find bugs and identify gaps in basic Blacklight functionality, which I’ve slowly been working through. I’m also optimistic that I’ve recruited some of the other DPLAFest attendees to contribute, as an opportunity to learn more about the DPLA api, Blacklight, and Ruby on Rails. Check the progress on the project that started at DPLAfest on GitHub

LITA: Storify of LITA’s First UX Twitter Chat

planet code4lib - Mon, 2015-05-18 11:18

LITA’s UX Interest Group did a fantastic job moderating the first ever UX Twitter Chat on May 15th. Moderators Amanda (@godaisies) and Haley (@hayleym1218) presented some deep questions and great conversations organically grew from there. There were over 220 tweets over the 1-hour chat.

The next UX Twitter Chat will take place on Friday, May 29th, from 2-3 p.m. EDT, with moderator Bohyun (@bohyunkim). Use #litaux to participate. See this post for more info. Hope you can join us!

Here’s the Storify of the conversation from May 15th.

Patrick Hochstenbach: Brush inking exercise

planet code4lib - Sun, 2015-05-17 10:22
Filed under: Comics Tagged: art, cartoon, cat, ink, inking, mouse

Pages

Subscribe to code4lib aggregator