You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 9 hours 50 min ago

Library of Congress: The Signal: APIs: How Machines Share and Expose Digital Collections

Thu, 2016-01-07 19:01

By DLR German Aerospace Center (Zwei Roboterfreunde / Two robot friends) [CC BY 2.0], via Wikimedia Commons.

Kim Milai, a retired school teacher, was searching on ancestry.com for information about her great grandfather, Amohamed Milai, when her browser turned up something she had not expected: a page from the Library of Congress’s Chronicling America site displaying a scan of the Harrisburg Telegraph newspaper from March 13, 1919. On that page was a story with the headline, “Prof. Amohamed Milai to Speak at Second Baptist.” The article was indeed about her great grandfather, who was an enigmatic figure within her family, but…”Professor!?,” Milai said. “He was not a professor. He exaggerated.” Whether it was the truth or an exaggeration, it was, after all, a rare bit of documentation about him, so Milai printed it out and got to add another colorful piece to the mosaic of her family history. But she might never have found that piece if it wasn’t for ancestry.com’s access to Chronicling America’s collections via an API.

Application Programming Interfaces (APIs) are not new. API-based interactions are part of the backdrop of modern life. For example, your browser, an application program, interfaces  with web servers. Another example is when an ATM screen enables you to interact with a financial system. When you search online for a flight, the experience involves multiple API relationships: your travel site or app communicates with individual airlines sites which, in turn, query their systems and pass their schedules and prices back to your travel site or app. When you book the flight, your credit card system gets involved. But all you see during the process are a few screens, while in the background, at each point of machine-to-machine interaction, servers rapidly communicate with each other, across their boundaries, via APIs. But what exactly are they?

Chris Adams, an information technology specialist at the Library of Congress, explained to me that APIs can be considered a protocol – or a set of rules governing the format of messages exchanged between applications. This allows either side of the exchange to change without affecting other parties as long as they continue to follow the same rules.

World Digital Library, Library of Congress.

Adams created the APIs for the World Digital Library, an international project between approximately 190 libraries, archives and museums. The World Digital Library’s APIs describe what to expect from the API and explain how to build tools to access the WDL’s collections. Adams said, “The APIs declare that we publish all of our data in a certain format, at a certain location and ‘here’s how you can interact with it.’ ” Adams also said that an institution’s digital-collections systems can and should evolve over time but their APIs should remain stable in order to provide  reliable access to the underlying data.  This allows outside users the stability needed to build tools which use those APIs and frequently saves time even within the same organization as, for example, the front-end or user-visible portion of a website can be improved rapidly without the need to touch the complex back-end application running on the servers.

HathiTrust Digital Library. Hathitrust.org.

So, for us consumers, the experience of booking a flight or buying a book online just seems like the way things ought to be. And libraries, museums, government agencies and other institutions are coming around to “the way things ought to be” and beginning to implement APIs to share their digital collections in ways that consumers have come to expect.

Another example of implementation, similar to the WDL’s, is how HathiTrust uses APIs among shared collections. For example, a search of HathiTrust for the term “Civil War” queries the collections of all of their 110 or so consortium partners (the Library of Congress is among them) and the search results include a few million items, which you can filter by Media, Language, Country and a variety of other filters. Ultimately it may not matter to you which institutions you got your items from; what matters is that you got an abundance of good results for your search. To many online researchers, it’s the stuff that matters, not so much which institution hosts the collection.

That doesn’t mean that the online collaboration of cultural institutions might diminish the eminence of any individual institution. Each object in the search results — of HathiTrust, WDL and similar resources — is clearly tagged with metadata and information about where the original material object resides, and so the importance of each institution’s collections becomes more widely publicized. APIs help cultural institutions increase their value — and their web traffic — by exposing more of their collections and sharing more of their content with the world.

The increasing use of APIs does not mean that institutions who want them are required to write code for them. David Brunton, a supervisory IT specialist at the Library of Congress, said that most people are using time-tested APIs instead of writing their own, and, as a result, standardized APIs are emerging. Brunton said, “Other people have already written the code, so it’s less work to reuse it. And most people don’t have infinite programming resources to throw at something.”

Example 1. Adding the Library of Congress search engine to Firefox.

Brunton cites OpenSearch as an example of a widely used, standardized API. OpenSearch helps search engines and clients communicate, by means of a common set of formats, to perform search requests and publish results for syndication and aggregation. He gave an example of how to view it in action by adding a Library of Congress search engine to the Firefox browser.

“In Firefox, go to www.loc.gov and look in the little search box at the top of the browser,” Brunton said. “A green plus sign (+) pops up next to ‘Search.’ If you click on the little green Plus sign, one of the things you see in the menu is ‘Add the Library of Congress search.’ [Example 1.] When you click on that, the Library’s search engine gets added into your browser and you can search the Library’s site from a non-Library page.”

As institutions open up more and more of their online digital collections, Chris Adams sees great potential in using another API, the International Image Interoperability Framework , as a research tool. IIIF enables users to, among other things, compare and annotate side-by-side digital objects from participating institutions without the need for each institution to run the same applications or specifically enable each tool used to view the items.  Adams points to an example of how it works by means of the Mirador image viewer. Here is a demonstration:

  1. Go to http://iiif.github.io/mirador/ and, at the top right of the page, click “Demo.” The subsequent page, once it loads, should display two graphics side by side – “Self-Portrait Dedicated to Paul Gauguin” in the left window and “Buddhist Triad: Amitabha Buddha Seated” in the right window. [Example 2.]

    Example 2. Mirador image viewer demo.

  2. Click on the thumbnails at the bottom of each window to change the graphic in the main windows.
  3. In the left window, select the grid symbol in the upper left corner and, in the drop down menu, select “New Object.” [Example 3.]

    Example 3. Select New Object.

  4. The subsequent page should display thumbnails of sample objects from different collections at Harvard, Yale, Stanford, BnF, the National Library of Wales and e-codices. [Example 4.]

    Example 4. Thumbnails from collections.

  5. Double-click a new object and it will appear in left image viewer window.
  6. Repeat the process for the right viewer window.

To see how it could work with the WDL collections:

  1. Go to http://iiif.github.io/mirador/ and click “Demo” at the top right of the page. The subsequent page will display the page with the two graphics.
  2. Open a separate browser window or tab.
  3. Open “The Sanmai-bashi Bridges in Ueno.”
  4. Scroll to the bottom of the page and copy the link displayed under “IIIF Manifest,” The link URL is http://www.wdl.org/en/item/11849/manifest
  5. Go back to the Mirador graphics page, to the left window, select the grid symbol and in the drop down menus select “New Object.”
  6. In the subsequent page, in the field that says “Add new object from URL…” paste the IIIF Manifest URL. [Example 5.]

    Example 5. Add new object from URL…”

  7. Click “enter/return” on your computer keyboard. “The Sanmai-bashi Bridges in Ueno” should appear at the top of the list of collections. Double-click one of the three thumbnails to add it to the left graphics viewer window.
  8. For the right window in the graphics viewer page use another sample from WDL, “The Old People Mill,” and copy its IIIF Manifest URL from the bottom of the page (http://www.wdl.org/en/item/11628/manifest).
  9. Return to the graphics viewer page, to the right window, select the grid symbol and in the drop down menus select “New Object.”
  10. In the subsequent page, in the field that says “Add new object from URL…,” paste the IIIF Manifest URL and click the “enter/return” key. “The Old People Mill” should appear at the top of the list of collections. Double-click to add it to the right graphics viewer window.

This process can be repeated using any tool which supports IIIF, such as the Universal Viewer, and new tools can be built by anyone without needing to learn a separate convention for each of the many digital libraries in the world which support IIIF.

Adams said that implementing an API encourages good software design and data management practices. “The process of developing an API can encourage you to better design your own site,” Adams said. “It forces you to think about how you would split responsibilities.” As programmers rush to meet deadlines, they often face the temptation of solving a problem in the simplest way possible at the expense of future flexibility; an API provides a natural point to reconsider those decisions. This encourages code which is easier to develop and test, and makes it cheaper to expand server capacity as the collections grow and user traffic increases.

Meanwhile, the APIs themselves should remain unchanged, clarifying expectations on both sides, essentially declaring, “I will do this. You must do that. And then it will work.”

APIs enable a website like the HathiTrust, Digital Public Library of America or Europeana to display a vast collection of digital objects without having to host them all. APIs enable a website like Chronicling America or the World Digital Library to open up its collections to automated access by anyone. In short, APIs enable digital collections to become part of a collective, networked system where they can be enjoyed — and used — by a vast international audience of patrons.

“Offering an API allows other people to reuse your content in ways that you didn’t anticipate or couldn’t afford to do yourself,” said Adams. “That’s what I would like for the library world, those things that let other people re-use your data in ways you didn’t even think about.”

Islandora: Islandora CLAW Community Sprint 003: January 18 - 29

Thu, 2016-01-07 18:44

The Islandora community is kicking off the new year with our third volunteer sprint on the Islandora CLAW project. Continuing with our plan for monthly sprints, this third go around will continue some of the tickets from the second sprint, put a new focus on developing Collection service in PHP, and put more work into PCDM. To quote CLAW Committer Jared Whiklo, we shall PCDMize the paradigm.

This sprint will be developer focussed, but the team is always happy to help new contributors get up to speed if you want to take part in the project. If you have any questions about participating in the sprint, please do not hesitate to contact CLAW Project Director, Nick Ruest. A sign up sheet for the sprint is available here, and the sprint will be coordinated via a few Skype meetings and a lot of hanging around on IRC in the #islandora channel on freenode.

Villanova Library Technology Blog: Foto Friday: Reflection

Thu, 2016-01-07 16:58

“Character is like a tree and reputation like a shadow.
The shadow is what we think of it; the tree is the real thing.”
— Abraham Lincoln

Photo and quote contributed by Susan Ottignon, research support librarian: languages and literatures team.


Like11 People Like This Post

Villanova Library Technology Blog: A New Face in Access Services

Thu, 2016-01-07 14:33

Cordesia (Dee-Dee) Pope recently joined Falvey’s staff as a temporary Access Services specialist reporting to Luisa Cywinski, Access Services team leader. Pope described her duties as “providing superb assistance to Falvey Memorial Library’s patrons.”

Pope, a native of Philadelphia, attended the PJA School where she earned an associate’s degree in paralegal studies and business administration. She has approximately 10 years of experience as a paralegal.

When asked about hobbies and interests she says, “I enjoy spending time with my two children, reading books of every genre, watching movies and learning new things.”


Like11 People Like This Post

LITA: A Linked Data Journey: Interview with Julie Hardesty

Thu, 2016-01-07 14:00

Image Courtesy of Marcin Wichary under a CC BY 2.0 license.

Introduction

This is part four of my Linked Data Series. You can find the previous posts in my author feed. I hope everyone had a great holiday season. Are you ready for some more Linked Data goodness? Last semester I had the pleasure of interviewing Julie Hardesty, metadata extraordinaire (and analyst) at Indiana University, about Hydra, the Hydra Metadata Interest Group, and Linked Data. Below is a bio and a transcript of the interview.

Bio:

Julie Hardesty is the Metadata Analyst at Indiana University Libraries. She manages metadata creation and use for digital library services and projects. She is reachable at jlhardes@iu.edu.

The Interview

Can you tell us a little about the Hydra platform?

Sure and thanks for inviting me to answer questions for the LITA Blog about Hydra and Linked Data! Hydra is a technology stack that involves several pieces of software – a Blacklight search interface with a Ruby on Rails framework and Apache Solr index working on top of the Fedora Commons digital repository system. Hydra is also referred to when talking about the open source community that works to develop this software into different packages (called “Hydra Heads”) that can be used for management, search, and discovery of different types of digital objects. Examples of Hydra Heads that have come out of the Hydra Project so far include Avalon Media System for time-based media and Sufia for institutional repository-style collections.

What is the Hydra Metadata Interest Group and your current role in the group?

The Hydra Metadata Interest Group is a group within the Hydra Project that is aiming to provide metadata recommendations and best practices for Hydra Heads and Hydra implementations so that every place implementing Hydra can do things the same way using the same ontologies and working with similar base properties for defining and describing digital objects. I am the new facilitator for the group and try to keep the different working groups focused on deliverables and responding to the needs of the Hydra developer community. Previous to me, Karen Estlund from Penn State University served as facilitator. She was instrumental in organizing this group and the working groups that produced the recommendations we have so far for technical metadata and rights metadata. In the near-ish future, I am hoping we’ll see a recommendation for baseline descriptive metadata and a recommendation for referring to segments within a digitized file, regardless of format.

What is the group’s charge and/or purpose? What does the group hope to achieve?

The Hydra Metadata Interest Group is interested in working together on base metadata recommendations, as a possible next step of the successful community data modeling, Portland Common Data Model. The larger goals of the Metadata Interest Group are to identify models that may help Hydra newcomers and further interoperability among Hydra projects. The scope of this group will concentrate primarily on using Fedora 4. The group is ambitiously interested in best practices and helping with technical, structural, descriptive, and rights metadata, as well as Linked Data Platform (LDP) implementation issues.

The hope is to make recommendations for technical, rights, descriptive, and structural metadata such that the Hydra software developed by the community uses these best practices as a guide for different Hydra Heads and their implementations.

Can you speak about how Hydra currently leverages linked data technologies?

This is where keeping pace with the work happening in the open source community is critical and sometimes difficult to do if you are not an active developer. What I understand is that Fedora 4 implements the W3C’s Linked Data Platform specification and uses the Portland Common Data Model (PCDM) for structuring digital objects and relationships between them (examples include items in a collection, pages in a book, tracks on a CD). This means there are RDF statements that are completely made of URIs (subject, predicate, and object) that describe how digital objects relate to each other (things like objects that contain other objects; objects that are members of other objects; objects ordered in a particular way within other objects). This is Linked Data, although at this point I think I see it as more internal Linked Data. The latest development work from the Hydra community is using those relationships through the external triple store to send commands to Fedora for managing digital objects through a Hydra interface. There is an FAQ on Hydra and the Portland Common Data Model that is being kept current with these efforts. One outcome would be digital objects that can be shared at least between Hydra applications.

For descriptive metadata, my understanding is that Hydra is not quite leveraging Linked Data… yet. If URIs are used in RDF statements that are stored in Fedora, Hydra software is currently still working through the issue of translating that URI to show the appropriate label in the end user interface, unless that label is also stored within the triple store. That is actually a focus of one of the metadata working groups, the Applied Linked Data Working Group.

What are some future, anticipated capabilities regarding Hydra and linked data?

That capability I was just referring to is one thing I think everyone hopes happens soon. Once URIs can be stored for all parts of a statement, such as “this photograph has creator Charles W. Cushman,” and Charles W. Cushman only needs to be represented in the Fedora triple store as a URI but can show in the Hydra end-user interface as “Charles W. Cushman” – that might spawn some unicorns and rainbows.

Another major effort in the works is implementing PCDM in Hydra. Implementation work is happening right now on the Sufia Hydra Head with a base implementation called Curation Concerns being incorporated into the main Hydra software stack as its own Ruby gem. This involves Fedora 4’s understanding of PCDM classes and properties on objects (and implementing Linked Data Platform and ordering ontologies in addition to the new PCDM ontology). Hydra then has to offer interfaces so that digital objects can be organized and managed in relation to each other using this new data model. It’s pretty incredible to see an open source community working through all of these complicated issues and creating new possibilities for digital object management.

What challenges has the Hydra Metadata Interest Group faced concerning linked data?

We have an interest in making use of Linked Data principles as much as possible since that makes our digital collections that much more available and useful to the Internet world. Our recommendations are based around various RDF ontologies due to Fedora 4’s capabilities to handle RDF. The work happening in the Hydra Descriptive Metadata Working Group to define a baseline descriptive metadata set and the ontologies used there will be the most likely to want Linked Data URIs used as much as possible for those statements. It’s not an easy task to agree on a baseline set of descriptive metadata for various digital object types but there is precedence in both the Europeana Data Model and the DPLA Application Profile. I would expect we’ll follow along similar lines but it is a process to both reach consensus and have something that developers can use.

Do you have any advice for those interested in linked data?

I am more involved in the world of RDF than in the world of Linked Data at this point. Using RDF like we do in Hydra does not mean we are creating Linked Data. I think Linked Data comes as a next step after working in RDF. I am coming from a metadata world heavily involved in XML and XML schemas so to me this isn’t about getting started with Linked Data, it’s about understanding how to transition from XML to Linked Data (by way of RDF). I watch for reports on creating Linked Data and, more importantly, transitioning to Linked Data from current metadata standards and formats. Conferences such as Code4Lib (coming up in March 2016 in Philadelphia), Open Repositories (in Dublin, Ireland in June 2016) and the Digital Library Federation Forum (in Milwaukee in November 2016) are having a lot of discussion about this sort of work.

Is there anything we can do locally to prepare for linked data?

Recommended steps I have gleaned so far include cleaning the metadata you have now – syncing up names of people, places, and subjects so they are spelled and named the same across records; adding authority URIs whenever possible, this makes transformation to RDF with URIs easier later; and considering the data model you will move to when describing things using RDF. If you are using XML schemas right now, there isn’t necessarily a 1:1 relationship between XML schemas and RDF ontologies so it might require introducing multiple RDF ontologies and creating a local namespace for descriptions that involve information that is unique to your institution (you become the authority). Lastly, keep in mind the difference between Linked Data and Linked Open Data and be sure if you are getting into publishing Linked Data sets that you are making them available for reuse and aggregation – it’s the entire point of the Web of Data that was imagined by Tim Berners-Lee when he first discussed Linked Data and RDF (http://www.w3.org/DesignIssues/LinkedData.html).

Conclusion

A big thank you to Julie for sharing her experiences and knowledge. She provided a plethora of resources during the interview, so go forth and explore! As always, please feel free to leave a comment or contact Julie/me privately. Until next time!

Ed Summers: Craft and Computation

Thu, 2016-01-07 05:00

(???) provides an interesting view into how the funiture artist Wendell Castle uses 3D scanning and digital fabrication tools in his work. Usefully (for me) the description is situated in the larger field of human-computer interaction, and computer supported work, which I’m trying to learn more about. It’s worth checking out if you are interested in a close look at how a small furniture studio (that has built an international reputation for craftsmanship) uses 3d scanning and robotics to do its work.

One fascinating piece of the story is the work of the studio director, Marvin Pallischeck (Marv), who adapted a CNC machine designed for [pick-and-place] work in the US Postal Service, to serve as a milling machine. This robot is fed 3D scans of prototypes created by Castle along with material (wood) and then goes to work. The end result isn’t a completed piece, but one that a woodcarver can then work with further to get it into shape. The 3d scanning is done by an offiste firm that does work in scanning wood. They deliver a CAD file that needs to be converted to a CAM file. The CAM file then needs to be adjusted to control the types of cutters and feed speeds that are used, to fit the particular wood being worked on.

The work is also iterative where the robot successively works on the parts of the whole piece, getting closer and closer with Marv’s help. The process resists full autmation:

“At the end of the day, it’s the physical properties of the material that drives our process”, says Marv as he describes the way the wood grain of a Castle piece can be read to determine the orientation of the tree’s growth within the forest. “I always say, this tree is now dead, but its wood is not - and it’s important to know that going into this.” Bryon understands this in a similar way, “There’s a lot of tension in wood. When you start cutting it up, that tension is released, free to do as it will. And form changes. Things crack, they bend, and warp”

There is also an impact on the clients perception of the work: its authenticity and authorship. On the theoretical side, Cheadle and Jackson are drawing attention to how the people, their creative processes, the computers and the materials they are working with, are all part of a network. As with Object Oriented Ontology (Bogost is cited), the lines between the human and the non-human objects begin to get fuzzy, and complicated. More generally the interviews and ethnographic work point the work of Wanda Orlikowski.

These arguments build in turn on a broader body of work around materiality and social life growing in the organizational and social sciences. Orlikowski finds that materiality is integral to organizational life and that developing new ways of dealing with material is critical if one is to understand the multiple, emergent, shifting and interdependent technologies at the heart of contemporary practice (Orlikowski, 2007). Orlikowski sees humans and technology as bound through acts of ‘recursive intertwining’ or ‘constitutive entanglement’ that eschew pre-ordered hierarchies or dualisms. Rather, human actors and technological practices are enmeshed and co-constituted, emerging together from entangled networks that are always shifting and coemerging in time.

I think this is an angle I’m particularly interested in exploring with respect to Web archiving work: the ways in which traditional archival materials (paper, film, audio, photographs, etc) and processes are challenged by the material of the Web. With respect to this work by Cheatle and Jackson: the ways in which our automated tools (crawlers, viewers, inventory/appraisal tools) have been designed (or not) to fit the needs of archivists. How are archivists, the medium of the Web, and the archival tools/processes are entangled, and how an understanding of this entanglement can inform the design of new archival tools.

Orlikowski, W. J. (2007). Sociomaterial practices: Exploring technology at work. Organization Studies, 28(9), 1435–1448. http://doi.org/10.1177/0170840607081138

Terry Reese: MarcEdit updates

Thu, 2016-01-07 04:30

I noted earlier today that I’d be making a couple MarcEdit updates.  You can see the change logs here:

Please note – if you use the Linked data tools, it is highly recommended that you update.  This update was done in part to make the interactions with LC more efficent on all sides.

You can get the download from the automated update mechanism in MarcEdit or from the downloads page: http://marcedit.reeset.net/downloads

Questions, let me know.

–tr

LITA: Jobs in Information Technology: January 6, 2016

Thu, 2016-01-07 00:38

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week:

Cadence Group, Metadata and Systems Librarian, Greenbelt, MD

California State University, Fullerton, Systems Librarian, Fullerton, CA

The Samuel Roberts Noble Foundation, Inc., Manager, Library & Information Management Services, Ardmore, OK

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Villanova Library Technology Blog: The Curious ‘Cat: Favorite Falvey Memory from 2015?

Wed, 2016-01-06 21:18

This week, the Curious ‘Cat asks library staff, “As 2015 comes to an end, do you have a favorite Falvey memory from this past year?

Melanie Wood—“The service dogs that were in here for the stress busters: that was definitely a highlight. … The ancient Bassett hound [“Oliver”] was my favorite.”

 

Stephen Spatz—“My favorite moment might have been when that marketer hid all those Kind bars in the stacks.”

Raamaan McBride—“Was Millicent hired this year? That would be my favorite 2015 memory.”

 

Nora Ramos—[A pipe burst in the Falvey West stacks the morning of Aug. 12, which caused water to rain down on books shelved on the first and ground floors.] “This wasn’t a very good memory. I worked very hard; it was awful. It was the worst thing that happened this year. … I think everybody—custodial—did a beautiful job, no matter what. … They were able to solve the problem in time. I hope it never happens again.”

Barb Haas—[A pipe burst in the Falvey West stacks the morning of Aug. 12, which caused water to rain down on books shelved on the first and ground floors.] “I wouldn’t say it was a favorite event, but it certainly was a memorable event. … Also, we moved [thousands of library items] from Garey Hall to our new remote-storage facility. That was huge; that was a huge project.”

 

 

 

 

 

 

 

 

 

 

 

 

 

Dave Uspal—“I work in the tech department at Falvey and, as such, mostly come up from the basement for meals and meetings (after which it’s back down to the basement). This year, however, I got out of the basement more since I was privileged to work with several classes as part of our Aurelius Digital Scholarship initiative. Normally, working in the tech department you don’t see the bigger picture of what you do, but getting a chance to work directly with students on a series of Digital Humanities projects was a great opportunity for me to get out and see the greater campus. The students this year were great—intelligent and creative—and I thought the projects turned out fantastic (to be unveiled Spring 2016!). I would love to do this again next year.”


Like11 People Like This Post

LibUX: The “Phablet” Will be the Dominant Form Factor

Wed, 2016-01-06 19:11

This is the seventh year Flurry has reported on mobile app usage, showing continuing trends of mobile growth, the importance and conscious popularity of personalization, how media consumption is shifting from television and PCs to smartphones, and the rapid growth of mobile commerce.

Of interest to designers and web teams might be the changing size of the handheld screen, which way back in early iPhone days constituted an easy media query at something-something 320px, but now is of course a little bit more nebulous and a whole lot bigger.

At least in terms of engagement metrics, the Phablet is doing really well

The picture got much clearer when we looked at year-over-year growth in time spent and cut that by form factor. Time spent on phablets grew 334% year-over-year (2.9 times more than the average), compared to 117% for all form factors. With time spent on mobile surpassing that on television, and phablets posting astonishing growth in media consumption, it appears that the cable industry will find in the phablet and its apps its long-awaited digital nemesis.

This post is part of a nascent library UX data collection we hope you can use as a reference to make smart decisions. If you’re interested in more of the same, follow @libuxdata on Twitter, or continue the conversation on our Facebook group. You might also think about signing-up for the Web for Libraries weekly.

Email Address

The post The “Phablet” Will be the Dominant Form Factor appeared first on LibUX.

Patrick Hochstenbach: Sktchy Portraits

Wed, 2016-01-06 18:44
Filed under: Comics, Sketchbook Tagged: copic, illustration, portrait, sktchyapp

Eric Lease Morgan: XML 101

Wed, 2016-01-06 18:05

This past Fall I taught “XML 101” online and to library school graduate students. This posting echoes the scripts of my video introductions, and I suppose this posting could also be used as very gentle introduction to XML for librarians.

Introduction

I work at the University of Notre Dame, and my title is Digital Initiatives Librarian. I have been a librarian since 1987. I have been writing software since 1976, and I will be your instructor. Using materials and assignments created by the previous instructors, my goal is to facilitate your learning of XML.

XML is a way of transforming data into information. It is a method for marking up numbers and text, giving them context, and therefore a bit of meaning. XML includes syntactical characteristics as well as semantic characteristics. The syntactical characteristics are really rather simple. There are only five or six rules for creating well-formed XML, such as: 1) there must be one and only one root element, 2) element names are case-sensitive, 3) elements must be close properly, 4) elements must be nested properly, 4) attributes must be quoted, and 5) there are a few special characters (&, <, and >) which must be escaped if they are to be used in their literal contexts. The semantics of XML is much more complicated and they denote the intended meaning of the XML elements and attributes. The semantics of XML are embodied in things called DTDs and schemas.

Again, XML is used to transform data into information. It is used to give data context, but XML is also used to transmit this information in an computer-independent way from one place to another. XML is also a data structure in the same way MARC, JSON, SQL, and tab-delimited files are data structures. Once information is encapsulated as XML, it can unambiguously transmitted from one computer to another where it can be put to use.

This course will elaborate upon these ideas. You will learn about the syntax and semantics of XML in general. You will then learn how to manipulate XML using XML-related technologies called XPath and XSLT. Finally, you will learn library-specific XML “languages” to learn how XML can be used in Library Land.

Well-formedness

In this, the second week of “XML 101 for librarians”, you will learn about well-formed XML and valid XML. Well-formed XML is XML that conforms to the five or six syntactical rules. (XML must have one and only one root element. Element names are case sensitive. Elements must be closed. Elements must be nested correctly. Attributes must be quoted. And there are a few special characters that must be escaped (namely &, <, and >). Valid XML is XML that is not only well-formed but also conforms to a named DTD or schema. Think of valid XML as semantically correct.

Jennifer Weintraub and Lisa McAulay, the previous instructors of this class, provide more than a few demonstrations of how to create well-formed as well as valid XML. Oxygen, the selected XML editor for this course is both powerful and full-featured, but using it efficiently requires practice. That’s what the assignments are all about. The readings supplement the demonstrations.

DTD’s and namespaces

DTD’s, schemas, and namespaces put the “X” in XML. They make XML extensible. They allow you to define your own elements and attributes to create your own “language”.

DTD’s — document type declarations — and schemas are the semantics of XML. They define what elements exists, what order they appear in, what attributes they can contain, and just as importantly what the elements are intended to contain. DTD’s are older than schemas and not as robust. Schemas are XML documents themselves and go beyond DTD’s in that they provide the ability to define the types of data elements and attributes contain.

Namespaces allow you, the author, to incorporate multiple DTD and schema definitions into a single XML document. Namespaces provide a way for multiple elements of the same name to exist concurrently in a document. For example, two different DTD’s may contain an element called “title”, but one DTD refers to a title as in the title of a book, and the other refers to “title” as if it were an honorific.

Schemas

Schemas are an alternative and more intelligent alternative to DTDs. While DTDs define the structure of XML documents, schemas do it with more exactness. While DTDs only allow you to define elements, the number of elements, the order of elements, attributes, and entities, schemas allow you to do these things and much more. For example, they allow you to define the types of content that go into elements or attributes. Strings (characters). Numbers. Lists of characters or numbers. Boolean (true/false) values. Dates. Times. Etc. Schemas are XML documents in an of themselves, and therefore they can be validated just like any other XML document with a pre-defined structure.

The reading and writing of XML schemas is very librarian-ish because it is about turning data into information. It is about structuring data so it makes sense, and it does this in an unambiguous and computer-independent fashion. It is too bad our MARC (bibliographic) standards are not as rigorous.

RelaxNG, Schematron, and digital libraries

The first is yet another technology for modeling your XML, and it is called RelaxNG. This third modeling technology is intended to be more human readable than schemas and more robust that DTDs. Frankly, I have not seen RelaxNG implements very many times, but it behooves you to know it exists and how it compares to other modeling tools.

The second is Schematron. This tool too is used to validate XML, but instead of returning “ugly” computer-looking error messages, its errors are intended to be more human-readable and describe why things are the way they are instead of just saying “Wrong!”

Lastly, there is an introduction to digital libraries and trends in their current development. More and more, digital libraries are really and truly implementing the principles of traditional librarianship complete with collection, organization, preservation, and dissemination. At the same time, they are pushing the boundaries of the technology and stretching our definitions. Remember, it is not so much about the technology (the how of librarianship) that is important, but rather the why of libraries and librarianship. The how changes quickly. The why changes slowly, albiet sometimes too slowly.

XPath

This week is all about XPath, and it is used to select content from your XML files. It is akin to navigating a computer’s filesystem from the command line in order to learn what is located in different directories.

XPath is made up of expressions which return values of true, false, strings (characters), numbers, or nodes (subsets of XML files). XPath is used in conjunction with other XML technologies, most notably XSTL and XQuery. XSLT is used to transform XML files into other plain text files. XQuery is akin to the structured query language of relational databases.

You will not be able to do very much with XML other than read or write it, unless you understand XPath. An understanding XPath is essencial if you want to do truly interesting things with XML.

XSLT

This week you will be introduced to XSLT, a programming language used to transform XML into other plain text files.

XML is all about information, and it is not about use nor display. In order for XML to be actually useful — to be applied towards some sort of end — specific pieces of data need to be extracted from XML or the whole of the XML file needs to be converted into something else. The most common conversion (or “transformation”) is from some sort of XML into HTML for display in a Web browser. For example, bibliographic XML (MARCXML or MODS) may be transformed into a sort of “catalog card” for display, or a TEI file may be transformed into a set of Web pages, or an EAD file may be transformed into a guide intended for printing. Alternatively, you may want to tranform the bibliographic data into a tab-delimited text file for a spreadsheet or an SQL file for a relational database. Along with other sets of information, an XML file may contain geographic coordinates, and you may want to extract just those coordinates to create a KML file — a sort of map file.

XSLT is a programming language but not like most programming languages you may know. Most programming languages are “procedural” (like Perl, PHP, or Python), meaning they execute their commands in a step-wise manner. “First do this, then do that, then do the other thing.” This can be contrasted with “declarative” programming languages where events occur or are encountered in a data file, and then some sort of execution happens. There are relatively few declarative programming languages, but LISP is/was one of them. Because of the declarative nature of XSLT, the apply-templates command is so important. The apply-templates command sort of tells the XSLT processor to go off and find more events.

Now that you are beginning to learn XSLT and combining it with XPath, you are beginning to do useful things with the XML you have been creating. This is where the real power is. This is where it gets really interesting.

TEI — Text Encoding Initiative

TEI is a granddaddy, when it comes to XML “languages”. It started out as a different from of mark-up, a mark-up called SGML, and SGML was originally a mark-up language designed at IBM for the purposes of creating, maintaining, and distributing internal documentation. Now-a-days, TEI is all but a hallmark of XML.

TEI is a mark-up language for any type of literature: poetry or prose. Like HTML, it is made up of head and body sections. The head is the place for administrative, bibliographic, and provenance metadata. The body is where the poetry or prose is placed, and there are elements for just about anything you can imagine: paragraphs, lines, headings, lists, figures, marginalia, comments, page breaks, etc. And if there is something you want to mark-up, but an element does not explicitly exist for it, then you can almost make up your own element/attribute combination to suit your needs.

TEI is quite easily the most well-documented XML vocabulary I’ve ever seen. The community is strong, sustainable, albiet small (if not tiny). The majority of the community is academic and very scholarly. Next to a few types of bibliographic XML (MARCXML, MODS, OAIDC, etc.), TEI is probably the most commonly used XML vocabulary in Library Land, with EAD being a close second. In libraries, TEI is mostly used for the purpose of marking-up transcriptions of various kinds: letters, runs of out-of-print newsletters, or parts of a library special collection. I know of no academic journals marked-up in TEI, no library manuals, nor any catalogs designed for printing and distribution.

TEI, more than any other type of XML designed for literature, is designed to support the computed critical analysis of text. But marking something up in TEI in a way that supports such analysis is extraordinarily expensive in terms of both time and expertise. Consequently, based on my experience, there are relatively very few such projects, but they do exist.

XSL-FO

As alluded to throughout this particular module, XSL-FO is not easy, but despite this fact, I sincerely believe it is under-utilized tool.

FO stands for “Formatting Objects”, and it in an of itself is an XML vocabulary used to define page layout. It has elements defining the size of a printed page, margins, running headers & footers, fonts, font sizes, font styles, indenting, pagination, tables of contents, back-of-the-book indexes, etc. Almost all of these elements and their attributes use a syntax similar to the syntax of HTML’s cascading stylesheets.

Once an XML file is converted into an FO document, you are expected to feed the FO document to a FO processor, and the FO processor will convert the document into something intended for printing — usually a PDF document.

FO is important because not everything is designed nor intended to be digital. Digital everything is mis-nomer. The graphic design of a printed medium is different from the graphic design of computer screens or smart phones. In my opinion, important XML files ought to be transformed into different formats for different mediums. Sometimes those mediums are screen oriented. Sometimes it is better to print something, and printed somethings last a whole lot longer. Sometimes it is important to do both.

FO is another good example of what XML is all about. XML is about data and information, not necessarily presentation. XSL transforms data/information into other things — things usually intended for reading by people.

EAD — Encoded Archival Description

Encoded Archival Description (or EAD) is the type of XML file used to enumerate, evaluate, and make accessible the contents of archival collections. Archival collections are often the raw and primary materials of new humanities scholarship. They are usually “the papers” of individuals or communities. They may consist of all sorts of things from letters, photographs, manuscripts, meeting notes, financial reports, audio cassette tapes, and now-a-days computers, hard drives, or CDs/DVDs. One thing, which is very important to understand, is that these things are “collections” and not intended to be used as individual items. MARC records are usually used as a data structure for bibliographically describing individual items — books. EAD files describe an entire set of items, and these descriptions are more colloquially called “finding aids”. They are intended to be read as intellectual works, and the finding aids transform collections into coherent wholes.

Like TEI files, EAD files are comprised of two sections: 1) a header and 2) a body. The header contains a whole lot or very little metadata of various types: bibliographic, administrative, provenance, etc. Some of this metadata is in the form of lists, and some of it is in the form of narratives. More than TEI files, EAD files are intended to be displayed on a computer screen or printed on paper. This is why you will find many XSL files transforming EAD into either HTML or FO (and then to PDF).

RDF

RDF is an acronym for Resource Description Framework. It is a data model intended to describe just about anything. The data model is based on an idea called triples, and as the name implies, the triples have three parts: 1) subjects, 2) predicates, and 3) objects.

Subjects are always URIs (think URLs), and they are the things described. Objects can be URIs or literals (words, phrases, or numbers), and objects are the descriptions. Predicates are also always URIs, and they denote the relationship between the subjects and the objects.

The idea behind RDF was this. Describe anything and everthing in RDF. Resuse as many of the URIs used by other people as possible. Put the RDF on the Web. Allow Internet robots/spiders to harvest and cache the RDF. Allow other computer programs to ingest the RDF, analyse it for the similar uses of subjects, predicates, and objects, and in turn automatically uncover new knowledge and new relationships between things.

RDF is/was originally expressed as XML, but the wider community had two problems with RDF. First, there were no “killer” applications using RDF as input, and second, RDF expressed as XML was seen as too verbose and too confusing. Thus, the idea of RDF languished. More recently, RDF is being expressed in other forms such as JSON and Turtle and N3, but there are still no killer applications.

You will hear the term “linked data” in association with RDF, and linked data is the process of making RDF available on the Web.

RDF is important for libraries and “memory” or “cultural heritage” institutions, because the goal of RDF is very similar to the goals of libraries, archives, and museums.

MARC

The MARC standard has been the bibliographic bread & butter of Library Land since the late 1960’s. When it was first implemented it was an innovative and effect data structure used primarily for the production of catalog cards. With the increasing availability of computers, somebody got the “cool” idea of creating an online catalog. While logical, the idea did not mature with a balance of library and computing principles. To make a long story short, library principles prevailed and the result has been and continues to be painful for both the profession as well as the profession’s clientele.

MARCXML was intended to provide a pathway out of this morass, but since it was designed from the beginning to be “round tripable” with the original MARC standard, all of the short-comings of the original standard have come along for the ride. The Library Of Congress was aware of these short-comings, and consequently MODS was designed. Unlike MARC and MARCXML, MODS has no character limit and its field names are human-readable, not based on numeric codes. Given that MODS is flavor of XML, all of this is a giant step forward.

Unfortunately, the library profession’s primary access tools — the online catalog and “discovery system” — still heavily rely on traditional MARC for input. Consequently, without a wholesale shift in library practice, the intellectual capital the profession so dearly wants to share is figuratively locked in the 1960’s.

Not a panacea

XML really is an excellent technology, and it is most certainly apropos for the work of cultural heritage institutions such as libraries, archives, and museums. This is true for many reasons:

  1. it is computing platform independent
  2. it requires a minimum of computer technology to read and write
  3. to some degree, it is self-documenting, and
  4. especially considering our profession, it is all about data, information, and knowlege

On the other hand, it does have a number of disadvantages, for example:

  1. it is verbose — not necessarily succinct
  2. while easy to read and write, it can be difficult to process
  3. like all things computer program-esque, it imposes a set of syntactical rules, which people can sometimes find frustrating
  4. its adoption as standard has not been as ubiquitous as desired

To date you have learned how to read, write, and process XML and a number of its specific “flavors”, but you have by no means learned everything. Instead you have received a more than adequate introduction. Other XML topics of importance include:

  • evolutions in XSLT and XPath
  • XML-based databases
  • XQuery, a standardized method for querying sets of XML similar to the standard query language of relational databases
  • additional XML vocabularies, most notably RSS
  • a very functional way of making modern Web browsers display XML files
  • XML processing instructions as well as reserved attributes like lang

In short, XML is not a panacea, but it is an excellent technology for library work.

Summary

You have all but concluded a course on XML in libraries, and now is a good time for a summary.

First of all, XML is one of culture’s more recent attempts at formalizing knowledge. At its root (all puns intended) is data, such as the number like 1776. Through mark-up we might say this number is a year, thus turning the data into information. By putting the information into context, we might say that 1776 is when the Declaration of Independence was written and a new type of government was formed. Such generalizations fall into the realm of knowledge. To some degree, XML facilitates the transformation of data into knowledge. (Again, all puns intended.)

Second, understand that XML is also a data structure defined by the characteristics of well-formedness. By that I mean XML has one and only one root element. Elements must be opened and closed in a hierarchal manner. Attributes of elements must be quoted, and a few special characters must always be escaped. The X in XML stands for “extensible”, and through the use of DTDs and schemas, specific XML “flavors” can be specified.

With this under your belts you then experimented with at least a couple of XML flavors: TEI and EAD. The former is used to mark-up literature. The later is used to describe archival collections. You then learned about the XML transformation process through the application of XSL and XPath, two rather difficult technologies to master. Lastly, you made strong efforts to apply the principles of XML to the principles of librarianship by marking up sets of documents or creating your own knowledge entity. It is hoped you have made a leap from mere technology to system. It is not about Oxygen nor graphic design. It is about the chemistry of disseminating data as unambiguously as possible for the purposes of increasing the sphere of knowledge. With these things understood, you are better equipped to practice librarianship in the current technological environment.

Finally, remember, there is no such thing as a Dublin Core record.

Epilogue — Use and understanding

This course in XML was really only an introduction. You were expected to read, write, and transform XML. This process turns data into information. All of this is fine, but what about knowledge?

One of the original reasons texts were marked up was to facilitate analysis. Researchers wanted to extract meaning from texts. One way to do that is to do computational analysis against text. To facilitate computational analysis people thought is was necessary for essential characteristics of a text to be delimited. (It is/was thought computers could not really do natural language processing.) How many paragraphs exists? What are the names in a text? What about places? What sorts of quantitative data can be statistically examined? What main themes does the text include? All of these things can be marked-up in a text and then counted (analyzed).

Now that you have marked up sets of letters with persname elements, you can use XPath to not only find persname elements but count them as well. Which document contains the most persnames? What are the persnames in each document. Tabulate their frequency. Do this over a set of documents to look for trends across the corpus. This is only a beginning, but entirely possible given the work you have already done.

Libraries do not facilitate enough quantitative analysis against our content. Marking things up in XML is a good start, but lets go to the next step. Let’s figure out how the profession can move its readership from discovery to analysis — towards use & understand.

Eric Lease Morgan: Mr. Serials continues

Wed, 2016-01-06 16:42

The (ancient) Mr. Serials Process continues to support four mailing list archives, specifically, the archives of ACQNET, Colldv-l, Code4Lib, and NGC4Lib, and this posting simply makes the activity explicit.

Mr. Serials is/was a process I developed quite a number of years ago as a method for collecting, organizing, archiving electronic journals (serials). The process worked well for a number of years, until electronic journals were no longer distributed via email. Now-a-days, Mr. Serials only collects the content of a few mailing lists. That’s okay. Things change. No big deal.

On the other hand, from a librarian’s and archivist’s point-of-view, it is important to collect mailing list content in its original form — email. Email uses the SMTP protocol. The communication sent back and forth, between email server and client, is well-structured albiet becoming verbose. Probably “the” standard for saving email on a file system is called mbox. Given a mbox file, it is possible to use any number of well-known applications to read/write mbox data. Heck, all you need is a text editor. Increasingly, email archives are not available from mailing list applications, and if they are, then they are available only to mailing list administrators and/or in a proprietary format. For example, if you host a mailing list on Google, can you download an archive of the mailing list in a form that is easily and universally readable? I think not.

Mr. Serials circumvents this problem. He subscribes to mailing lists, saves the incoming email to mbox files, and processes the mbox files to create searchable/browsable interfaces. The interfaces are not hugely aesthetically appealing, but they are more than functional, and the source files are readily available. Just ask.

Most recently both the ACQNET and Colldv-l mailing lists moved away from their hosting institutions to servers hosted by the American Library Association. This has not been the first time these lists have moved. It probably won’t be the last, but since Mr. Serials continues subscribe to these lists, comprehensive archives persevere. Score a point for librarianship and the work of archives. Long live Mr. Serials.

Terry Reese: Heads Up: MarcEdit Linked Data Components Update (all versions) scheduled for this evening

Wed, 2016-01-06 16:18

A heads up to those folks using MarcEdit and using the following components:

  • Validate Headings
  • Build Links
  • Command-Line tool using the build links option

These components rely on MarcEdit’s linked data framework to retrieve semantic data from a wide range of vocabulary services.  I’ll be updating one of these components in order to improve the performance and how they interact with the Library of Congress’s id.loc.gov service.  This will provide a noticeable improvement on the MarcEdit side (with response time cut by a little over 2/3rds) and will make MarcEdit much more friendly to the LC id.loc.gov service.  Given the wide range of talks at Midwinter this year discussing experimentations related to embedding semantic data into MARC records and the role MarcEdit is playing in that work – I wanted to make sure this was available prior to ALA.

Why the change

When MarcEdit interacts with id.loc.gov, it’s communications are nearly always just HEAD requests.  This is because over the past year or so, the folks at LC have been incredibly responsive developing into their headers statements nearly all the information someone might need if they are just interested in looking up a controlled term and finding out if:

  1. It exists
  2. The preferred label
  3. Its URI

Prior to the HEADER lookup, this had to be done using a different API which resulted in two requests – one to the API, and then one to the XML representation of the document for parsing.  By moving the most important information into the document headers (X- elements), I can minimize the amount of data I’m having to request from LC.  And that’s a good thing – because LC tends to have strict guidelines around how often and how much data you are allowed to request from them at any given time.  In fact, were it not LC’s willingness to allow me to by-pass those caps when working this this service —  a good deal of the new functionality being developed into the tool simply wouldn’t exist.  So, if you find the linked data work in MarcEdit useful, you shouldn’t be thanking me – this work has been made possible by LC and their willingness to experiment with id.loc.gov. 

Anyway – the linked data tools have been available in MarcEdti for a while, and they are starting to generate significant traffic on the LC side of things.  Adding the Validate Headings tool only exasperated this – enough so that LC has been asking if I could do some things to help throttle the requests coming from MarcEdit.  So, we are working on some options – but in the mean time, LC noticed something odd in their logs.  While MarcEdit only makes HEAD requests, and only processes the information from that request – they were seeing 3 requests showing up in their logs. 

Some background on the LC service — it preforms a lot of redirection.  One request to the label service, results in ~3 redirects.  All the information MarcEdit need is found in the first request, but when looking at the logs, they can see MarcEdit is following the redirects, resulting in 2 more Head requests for data that the tool is simply throwing away.  This means that in most cases, a single request for information is generating 3 HEAD requests – an if you take a file of 2000 records, with ~5 headings to be validated (on average) – that means MarcEdit would generate ~30,000 requests (10,000 x 3).  That’s not good – and when LC approached me to ask why MarcEdit was asking for the other data files – I didn’t have an answer.  It wasn’t till I went to the .NET documentation that the answer became apparent.

As folks should know, MarcEdit is developed using C#, which means, it utilizes .NET.  The primary component for handling network interactions happens in the System.Net component – specifically, the System.Net.HttpWebRequest component.  Here’s the function:

public System.Collections.Hashtable ReadUriHeaders(string uri, string[] headers) { System.Net.ServicePointManager.DefaultConnectionLimit = 10; System.Collections.Hashtable headerTable = new System.Collections.Hashtable(); uri = System.Uri.EscapeUriString(uri); //after escape -- we need to catch ? and & uri = uri.Replace("?", "%3F").Replace("&", "%26"); System.Net.WebRequest.DefaultWebProxy = null; System.Net.HttpWebRequest objRequest = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(MyUri(uri)); objRequest.UserAgent = "MarcEdit 6.2 Headings Retrieval"; objRequest.Proxy = null; //Changing the default timeout from 100 seconds to 30 seconds. objRequest.Timeout = 30000; //System.Net.HttpWebResponse objResponse = null; //.Create(new System.Uri(uri)); objRequest.Method = "HEAD"; try { using (var objResponse = (System.Net.HttpWebResponse)objRequest.GetResponse()) { //objResponse = (System.Net.HttpWebResponse)objRequest.GetResponse(); if (objResponse.StatusCode == System.Net.HttpStatusCode.NotFound) { foreach (string name in headers) { headerTable.Add(name, ""); } } else { foreach (string name in headers) { if (objResponse.Headers.AllKeys.Contains(name)) { string orig_header = objResponse.Headers[name]; byte[] b = System.Text.Encoding.GetEncoding(28591).GetBytes(orig_header); headerTable.Add(name, System.Text.Encoding.UTF8.GetString(b)); } else { headerTable.Add(name, ""); } } } } return headerTable; } catch (System.Exception p) { foreach (string name in headers) { headerTable.Add(name, ""); } headerTable.Add("error", p.ToString()); return headerTable; } }

It’s a pretty straightforward piece of code – the tool looks up a URI, reads the header, and outputs a hash of the values.  There doesn’t appear to be anything in the code that would explain why MarcEdit was generating so many requests (because this function was only being called once per item).  But looking at the documentation – well, there is.  The HttpWebRequest object has a property – AllowAutoRedirect, and it’s set to true by default.  This tells the component that a web request can be automatically redirected up to the value set in MaxRedirections (by default, I think it’s 5).  Since every request to the LC service generates redirects – MarcEdit was following them and just tossing the data.  So that was my problem.  Allowing redirects is a fine assumption to make for a lot of things – but for my purposes – not so much.  It’s an easy fix – I added a value to the function header – something that is set to false by default, and then use that value to set the AllowAutoRedirect bit.  This way I can allow redirects when I need them, but turn it off when by default when I don’t (which is almost always).  Once finished, I tested against LC’s service and they confirmed that this reduced the number of HEAD requests.  On my side – I noticed that things were much, much faster.  On the LC side, they are pleased because MarcEdit is generating a lot of traffic, and this should help to reduce and focus that traffic.  So win, win, all around.

What does this mean

So what this means – I’ll be posting an update this evening.  It will include a couple tweaks based on feedback from the update this past Sunday – but most importantly, it will include this change.  If you use the linked data tools or the Validate Headings tools – you will want to update.  I’ve updated MarcEdit’s user agent string, so LC will now be able to tell if a user is using a version of MarcEdit that is fixed.  If you aren’t and you are generating a lot of traffic – don’t be surprised if they ask you to update. 

The other thing that I think that it shows (and this I’m excited about), is that LC really has been incredibly accommodating when it has come to using this service, and rather than telling me that MarcEdit needed to start following LC’s data request guidelines for the id.loc.gov service (which would make this service essentially useless), they worked with me to figure out what was going on so we could find a solution that everyone is happy with.  And like I said, we both are concerned that as more users hit the service, there will be a need to do spot throttle those requests globally, so we are talking about how that might be done. 

For me, this type of back and forth has been incredibly refreshing and somewhat new.  It certainly has never happened when I’ve spoken to any ILS vendor or data provider (save for members of the Koha and OLE communities) – and gives me some hope that just maybe we can all come together and make this semantic web thing actually work.  The problem with linked data is that unless there is trust: trust in the data and trust in the service providing the data – it just doesn’t work.  And honestly, I’ve had concerns that in Library land, there are very few services that I feel you could actually trust (and that includes OCLC at this point).  Service providers are slowly wading in – but these types of infrastructure components take resources – lots of resources, and they are invisible to the user…or, when they are working, they are invisible.  Couple that with the fact that these services are infrastructure components, not profit engines – its not a surprise that so few services exist, and the ones that do, are not designed to support real-time, automated look up.  When you realize that this is the space we live in, right now, It makes me appreciate the folks at LC, and especially Nate Trail, all the more.  Again, if you happen to be at ALA and find these services useful, you really should let them know.

Anyway – I started the process to run tests and then build this morning before heading off to work.  So, sometime this evening, I’ll be making this update available.  However, given that these components are becoming more mainstream and making their way into authority workflows – I wanted to give a heads up.

Questions – let me know.

–tr

Islandora: Islandora Camp BC 2016!

Wed, 2016-01-06 15:11

Islandora is going back to British Columbia this year! July 18 - 20, Vancouver will become the second city to host a repeat Islandora Camp, following our wonderful first BC camp in February 2015. Our host this time is the British Columbia Electronic Library Network, a partnership between the Province of British Columbia and its post-secondary libraries. Registration and a Call for Proposals will be coming in a couple of months, on the camp webpage. British Columbia has a very active community of Islandora users and we hope they will join us as we return to their neighbourhood. This will also serve as both the West Coast and Canadian Islandora camp for 2016. Seating is limited, so watch for our announcement when registration opens!

We hope we'll see you here in July!

Photo by: JamesZ_Flickr

LITA: 3D Printer Handyman’s Toolbox

Wed, 2016-01-06 13:00

On this site, we have discussed how 3D Printers can enhance various aspects of your library’s programming and how to create important partnerships for implementation. Indeed, 3D Printers can improve the library experience for all involved. However, what happens when that printer comes to a screeching/beeping halt? After two years of maintaining our printers, Makerbot Replicator 2 and Tinkerine Ditto Pro, and thanks to the kind donations of library patrons, I have assembled a toolbox that has eased daily maintenance and disassembly.

The post is broken up into sections covering tools for the following aspects:

Each section also looks at pricing for these tools and alternatives.

Tools for: Plate THE TAPE

Not all of us can afford to wait for flexible platforms and we must make do with laying down some painter’s tape to ease the object removal. At first we would use standard-sized tape which would require about 8 strips to fully cover either platform. A few months ago someone graciously donated a roll of 3M ScotchBlue Painter’s Tape Superwide Roll that has made it just a two strip process:

As you can see, the Superwide covers the majority of the Makerbot plate and a single strip of regular sized painter’s tape finishes the job.

Layout is also important. We place the thinner strip at the forefront where both printers clear their extruders before starting a print job. This allows us to replace the highly-used section of the platform at near daily intervals while the larger portion is only replaced when there is significant wear. This is also a cost saving measure as the Superwide variety can be nearly $50 per roll, and minimizing its replacement is vital. Utilizing this method allows me to stretch two rolls for four months, despite the printers actively printing out 3-4 jobs per day. If you are tight for funds, there is also 3DXTech’s XL Blue Painters Tape worth considering, which is half the price but with mixed reviews.

THE SCRAPER

Both of our printers use PLA 1.75MM filament and while that does give some flexibility in removing items from the platform, it can still be a pain to remove flat thin objects. Paint scraper to the rescue! At first we tried a cheap plastic scraper set (the red ones) but their edges were too soft. Upgrading to a metal 1.5 inch scraper provided much better results. While it does have a tendency to damage the tape, thus reducing the re-usability of said tape, it provided enough strength to wedge a gap in mere seconds. As for pricing, metal 1.5 inch scrapers can be found for less than $5.

Tools for: Fine Tuning Prints

For print jobs that required a bit of polish, we turned to three trusty friends: scissors, X-ACTO knife, and sandpaper.

The small scissor allows you to cut through thin pieces, such as supports or excess filament. The X-ACTO knife is called in for situations like stubborn raft pieces. Finally, the sandpaper can smooth out imperfections. Combined, all three will give your print a much finer look and prevent broken nails. Additionally, all of these pieces are easy to come by and should be no more than a few dollars each.

Tools for: Gripping

These tools are recommended for any situation that requires a delicate and firm grip. For the most part this occurs when small bits of filament are left in and around the extruder. In these situations we use a wide range of tweezers and pincers. These tools are easily found in any craft store or online. We found the flat head tweezers to provide the best grip on filament. Once again, these tools are quite affordable at around $3-5 each. Pick up one for now and add to your collection as required.

 

Tools for: Disassembly

The web will likely have an abundance of video and text resources on how to fix the exact issue for your printer. Avoid the temptation of these “quick fixes” first and check your manufacturer’s support site. I found quite a few answers to our Makerbot2 problems on their excellent support site. However, you should also be aware that the day will come when you will need to disassemble the magical device.

For the most part, 3D Printers come with everything you need. What they can lack are clear instructions on how to disassemble, let alone what tools you need, and even such great resources as ifixit.com fail to cover this vital maintenance aspect. It should be noted that the following tools are only needed if your warranty is already void. Some provider’s warranties will not even let you disassemble simple housing areas. Read those warranty guidelines that might be conveniently located in the bottom of the now discarded box, or on their support sites. All clear?

SCREWDRIVERS – Phillips & Flathead

Alright, so the first thing you want to purchase is a screwdriver set that contains a variety of bit sizes. This allows you to tackle devices whose manufacturer decided that a different size was required for the shell, extruder, panels, board, and warning label. Again, you should really read any and all support documentation before attempting to use these tools. Our aptly named Precision Screwdriver Set, which covers 1.0mm through 3.0mm, has opened up a few areas on the machines. Just do a quick search on Amazon and find a set that fits your budget and needs, such as the $7 Herco HE826 Precision Screwdriver Set or $5 for the Stanley 66-039 6-Piece Set.

SCREWDRIVERS – Hex

We also ran into areas, namely the extruder, which required a screwdriver with hex bits. For these advanced areas (did I mention you should really read those support documents?) we turned to the 54 Bit Driver Kit from iFixit.com’s store. This set also includes a flexible driver that made it easy to work with in the printer’s cramped areas.

The set is perfect for 3D Printers as it also contains flathead and Phillips bits of all sizes. It can easily cover the majority of your disassembly needs. At $25 it is definitely the pricier of the three sets but I highly recommend it due to the quality of the tools and the diverse sizes.

Final Cost

After all that, you are probably asking yourself: “how much is this going to cost me?” Let me break it down for you in a handy table.

RECOMMENDED TOOLBOX Item

Cost

3M ScotchBlue Painter’s Tape Superwide Roll

$50

Metal 1.5 inch Scraper

$5

Small Scissor

$5

X-Acto #2 Knife

$6

Flat Tweezer

$5

54 Bit Driver Kit

$25

 

$96

ALTERNATIVE TOOLBOX

Item

Cost

3DXTech’s XL Blue Painters Tape

$25

Metal 1.5 inch Scraper

$5

Small Scissor

$5

X-Acto #2 Knife

$6

Flat Tweezer

$5

Stanley 66-039 6-Piece Screwdriver Set

$6

 

$52

The contents and size of your 3D Printer Toolbox will come down to your needs and the model you use. I am allowed some freedom in dissembling our printers to fix small issues, like filament jams, and the high use of our machines means I am changing out the plate tape every few hours. Both of these requirements are reflected in the higher quality (and thus higher priced) emphasis of the plate and disassembly sections. You might find that your printer needs finer tweezers to reach certain areas and for other functions. A 3D Printer is a massive financial and time investment, so remember to save some funds to ease your interaction with them.

Meredith Farkas: Reputation is everything: On problematic editing and sponsored content

Wed, 2016-01-06 02:04

First, full disclosure: I am a columnist for American Libraries. They pay me to write columns every other month in which I state my opinion on various things relating somewhat to technology. What I’m writing here is my own opinion and represents me alone.

As a professional librarian and a writer, I take my credibility very seriously. I have made every effort to be ethical and true to myself, even when doing so has bitten me in the ass. I have never wanted anyone to doubt that what I am saying or writing is motivated by anything other than my desire to serve the profession. I’ve said no to sponsorship offers on my blog, to paid blogging gigs, and to an offer to buy the Library Success Wiki. I turned down a lucrative job offer at a certain vendor many years ago because I knew that the second I took the job, I would lose some of my professional credibility and I honestly valued it more than money. I really don’t believe that makes me a better person than anyone else. Maybe a more vain one?

Because of that though, I completely understand why Patricia Hswe and Stewart Varner were not happy about the fact that their American Libraries article had been edited to include quotes from a specific vendor: Gale Cengage. We are only as good as our reputations; our credibility. Everything in an article reflects upon its author(s); the work of the editor is invisible to the reader. So without the context provided by Varner in his blog post (and good for them for sticking up for themselves), we might assume that the authors were perfectly comfortable with promoting one specific vendor in their efforts to serve library digital humanities efforts. And I’m sure neither of them wanted to look like Gale Cengage shills, especially not for what one gets paid to write an article in our profession. If any one one of us feels they can be bought by a vendor, he or she should hold out for more than t-shirts, swag, a nice dinner, or a few bucks (and I’ve written about our problematic relationships librarians have with vendors in the past, so I won’t rehash that here).

When I wrote an article for The New Republic, the title was changed to “The Next Librarian of Congress Should Be an Actual Librarian.” I didn’t even make that argument in the article! The minute I saw the title, I knew people would have issues with it because “actual librarian” is a rather loaded term (that a magazine editor would not know about). While I got some crap about it, it wasn’t anything that might impact the way I am seen by others. So I really do feel for Patricia and Stewart, because something like this (without their explanation) could impact their reputations.

My experiences with the ALA Publishing Staff have always been positive. Only once or twice did I ever receive pushback on a column, and that was a very long time ago. Only once did anything get changed without first reviewing it with me, and that was just the title. After talking with my editor (who was new to working with me at the time), it never happened again. They have let me publish what I want, even when it’s not pro-vendor (case in point). Yes, a column is different than a journalistic article, because I am speaking my opinions as me, but when the person writing the journalistic article is a professional in our field, their reputation is also on the line with that article.

If you actually read the article, I think you’ll find that the two quotes from a Gale Cengage rep are not particularly incendiary nor that they promote Gale Cengage particularly strongly, but I also understand that’s not the point. This whole thing brings up two important issues for me:

1. They added content to the article that misrepresented the authors without checking with them.
2. In adding that content, they are promoting a particular vendor in what is supposed to be a journalistic article. It reeks of sponsored content.

I think if the magazine wants to publish articles that are favorable toward a particular vendor, it should not use librarians to do it as it compromises their integrity (frankly, I think it compromises journalistic integrity too, but I’m just going to worry about my fellow librarians right now). But really, it just shouldn’t do that at all. If this is supposed to be a real publication, sponsored content (or anything biased favorably toward a particular vendor) should be clearly marked as such and should not look exactly like a regular article. If American Libraries or ALA gained materially in some way in exchange for the inclusion of this article (and I have no idea if they did), it should be clearly stated. This is basic journalistic ethics. David Weinberger wrote an interesting piece for HBR.com (a site which I hate to link to on moral grounds) on the ethics of sponsored content and the Poynter Center has some good information on the ethics of the phenonenon as well. As Weinberger says, lots of media outlets (even our professional ones) are desperate for revenue and need vendor support to stay afloat, but it’s still important for sponsored content to adhere to certain ethical standards. And one of them certainly is that it should be made crystal clear that it is sponsored content.

I’m not going to demand that anyone at American Libraries be nailed to a cross, nor do I think anyone else should. They are good people who I believe really are motivated to do a good job and made a regrettably poor decision in this case. I see the mob mentality on social media and the enjoyment people seem to have when they’ve found someone who has misbehaved and it makes me queasy. I do think we need to be clear and rational in our criticisms so that they actually take what we’re saying seriously. I get the sense that the folks at American Libraries don’t understand why what they did was wrong and I think saying terrible things about them or their publication (which is insulting to everyone else who writes for it) will not make that clear. I’m glad this has received attention from those bodies within ALA who oversee ALA publications, and I really hope action will be taken. I hope that American Libraries will review its editorial policies and practices.

I’ve enjoyed writing for American Libraries and have enjoyed the conversations that have been sparked from my columns. I know lots of people in our profession don’t read blogs and I’m honored to have a platform to share things I passionately believe in to a whole other audience. My friend Lisa Hinchliffe (who is running for ALA President — cue the shameless but uncompensated plug) made a comment on Twitter that made me realize how something like this could potentially reflect poorly on me and other American Libraries authors:

@joejanes @librarianmer I trust what you say but …your readers may not trust that any more … really big prob for all AL authors now.

— Lisa Hinchliffe (@lisalibrarian) January 5, 2016

That does worry me a bit. But I think highly of the people who work at American Libraries I hope they will take steps to make sure that the entire publication reflects strong ethical standards and safeguards the credibility of its authors.

NYPL Labs: Free for All: NYPL Enhances Public Domain Collections For Sharing and Reuse

Tue, 2016-01-05 23:07

Today we are proud to announce that out-of-copyright materials in NYPL Digital Collections are now available as high-resolution downloads. No permission required, no hoops to jump through: just go forth and reuse! 

The release of more than 180,000 digitized items represents both a simplification and an enhancement of digital access to a trove of unique and rare materials: a removal of administration fees and processes from public domain content, and also improvements to interfaces — popular and technical — to the digital assets themselves. Online users of the NYPL Digital Collections website will find more prominent download links and filters highlighting restriction-free content; while more technically inclined users will also benefit from updates to the Digital Collections API enabling bulk use and analysis, as well as data exports and utilities posted to NYPL's GitHub account. These changes are intended to facilitate sharing, research and reuse by scholars, artists, educators, technologists, publishers, and Internet users of all kinds. All subsequently digitized public domain collections will be made available in the same way, joining a growing repository of open materials.

To encourage novel uses of our digital resources, we are also now accepting applications for a new Remix Residency program. Administered by the Library's digitization and innovation team, NYPL Labs, the residency is intended for artists, information designers, software developers, data scientists, journalists, digital researchers, and others to make transformative and creative uses of digital collections and data,and the public domain assets in particular. Two projects will be selected, receiving financial and consultative support from Library curators and technologists.

To provide further inspiration for reuse, the NYPL Labs team has also released several demonstration projects delving into specific collections, as well as a visual browsing tool allowing users to explore the public domain collections at scale. These projects, which suggest just a few of the myriad investigations made possible by fully opening these collections, include:

  • a "mansion builder" game, exploring floor plans of grand turn-of-the-century New York apartments; 
  • a then-and-now comparison of New York's Fifth Avenue, juxtaposing 1911 wide angle photographs with Google Street View; and
  • a "trip planner" using locations extracted from mid-20th century motor guides that listed hotels, restaurants, bars, and other destinations where Black travelers would be welcome.

The public domain release spans the breadth and depth of NYPL's holdings, from the Library's rich New York City collection, historic maps, botanical illustrations, unique manuscripts, photographs, ancient religious texts, and more. Materials include:

Visit nypl.org/publicdomain for information about the materials related to the public domain update and links to all of the projects demonstrating creative reuse of public domain materials. Go forth, and reuse!

 

Jonathan Rochkind: American Libraries adds Gale quotes in without author’s knowledge

Tue, 2016-01-05 22:38

From a blog post by Patricia Hswe and Stewart Varner.

TL;DR: Patricia Hswe and I wrote an article for American Libraries and the editors added some quotes from a vendor talking about their products without telling us. We asked them to fix it and they said no.

I guess Gale Cengage paid the ALA for placement or something? I can’t think of any other reason the ALA would commission an article which has the hard requirement of including quotes from Gale PR staff in it?

Sounds me like one can’t trust the ALA to be objective representative of our profession, if they’re accepting payment in return for quoting vendor PR staff in articles in their publication that are ostensibly editorial.

What they’ll do different next time is make sure the authors are on the same page, or just use their own in-house authors instead of librarians. They’d rather use librarians because it makes the article look better, hey, that’s what Gale is (presumably) paying them for. Heck, they can probably get some librarians to go along with it too, alas.

I actually started to Google wondering “Hey, is American Libraries actually published by Gale Cengage? Because that would explain things…” before remembering “Wait a second, American Libraries is published by the ALA… aren’t they supposed to represent their members, not vendors?”


Filed under: General

District Dispatch: ALA submits comments on Family Engagement and Early Learning

Tue, 2016-01-05 21:48

Parents and children reading together at the library. (Westport Library)

Yesterday, were we getting ready for ALA’s Midwinter conference that starts this week? Uh, no.

In fact we were putting the finishing touches on comments to the U.S. Departments of Health and Human Services and Education in response to their call for comments on their joint draft policy statement on family engagement and early learning. Their draft policy statement includes specific Principles of Effective Family Engagement Practices and recommendations for state and local action. The draft also reviews relevant research and includes an appendix with resources for planning engagement programs, professional development, and families.

We’re still reeling from the quick turnaround time but thanks to input and guidance from ALA’s youth divisions (AASL, ALSC, and YALSA) as well as PLA, we were able to develop strong examples of national, state, and local library programs and initiatives that support early learning, foster lifelong learning habits, and seek to engage parents and other caregivers during these important early years.

Given the historic commitment of libraries to foster learning at every stage in life for people from all backgrounds, we urged the Departments to recognize libraries as critical partners in advancing early learning opportunities and robust family engagement. Specifically, ALA recommended the Departments:

  • Include relevant library research and resources related to early learning and family engagement in the Principles to provide a stronger foundation for action at local, state and national levels;
  • Systematically review existing federal program guidelines and regulations to identify, coordinate, and increase opportunities to foster family engagement and rich learning experiences. Where appropriate, libraries and other government or non-profit entities should be eligible entities for relevant funding to support these efforts;
  • Convene national stakeholders to explore how they can support, coordinate, and invest in early learning programs to maximize impact at the federal level. The convening should develop and publicly release recommendations for action;
  • Advocate for mapping existing state and local assets and leveraging existing organizations before creating new entities so as to maximize efficiencies and effectiveness. Partnerships should be encouraged where possible to build capacity across entities; and
  • Explicitly include libraries as a resource for professional development and in-service training opportunities related to family engagement and early learning to leverage expertise and build stronger state and local connections.

Our interest in the policy statement stems from two efforts underway at OITP. The first is the Policy Revolution! initiative, which includes early learning and early literacy as one of the core areas of the National Policy Agenda for Libraries (pdf). The second is through the lens of the Program on Youth &Technology which among other issues, looks at the role of the library in providing informal learning opportunities rooted in STEAM and computational thinking activities.

The post ALA submits comments on Family Engagement and Early Learning appeared first on District Dispatch.

Pages