You are here

Feed aggregator

Jonathan Rochkind: Non-digested asset names in Rails 4: Your Options

planet code4lib - Fri, 2014-10-03 16:48

Rails 4 removes the ability to produce non-digest-named assets in addition to digest-named-assets. (ie ‘application.js’ in addition to ‘application-810e09b66b226e9982f63c48d8b7b366.css’).

There are a variety of ways to work around this by extending asset compilation. After researching and considering them all, I chose to use a custom Rake task that uses the sprockets manifest.json file. In this post, I’ll explain the situation and the options.

The Background

The Rails asset pipeline, powered by sprockets, compiles (sass, coffeescript, others), aggregates (combines multiple source files into one file for performance purposes), and post-processes (minimization, gzip’ing) your assets.

It produces assets to be delivered to the client that are fingerprinted with a digest hash based on the contents of the file — such as ‘application-810e09b66b226e9982f63c48d8b7b366.css’.  People (and configuration) often refer to this filename-fingerprinting as “digested assets”.

The benefit of this is that because the asset filenames are guaranteed to change if their content changes, the individual files can be cached indefinitely, which is great. (You still probably need to adjust your web server configuration to take advantage of this, which you may not be doing).

In Rails3, a ‘straight’ named copy of the assets (eg `application.js`) were also produced, alongside the fingerprinted digest-named assets.

Rails4 stopped doing this by default, and also took away any ability to do this even as a configurable option. While I can’t find the thread now, I recall seeing discussion that in Rails3, the production of non-digest-named assets was accomplished through actually asking sprockets to compile everything twice, which made asset compilation take roughly twice as long as it should.   Which is indeed a problem.

Rather than looking to fix Sprockets api to make it possible to compile the file once but simply write it twice, Rails devs decided there was no need for the straight-named files at all, and simply removed the feature.

Why would you need straight-named assets?

Extensive and combative discussion on this feature change occurred in sprockets-rails issue #49.

The title of this issue reveals one reason people wanted the non-digest-named assets: “breaks compatibility with bad gems”.   This mainly applies to gems that supply javascript, which may need to generate links to assets, and not be produced to look up the current digest-named URLs.  It’s really about javascript, not ‘gems’, it can apply to javascript you’ve included without gemifying it too.

The Rails devs expression opinion on this issue believed (at least initially) that these ‘bad gems’ should simply be fixed, accomodating them was the wrong thing to do, as it eliminates the ability to cache-forever assets they refer to.

I think they under-estimate the amount of work it can take to fix these ‘bad’ JS dependencies, which often are included through multi-level dependency trees (requiring getting patches accepted by multiple upstreams) — and also basically requires wrapping all JS assets in rubygems that apply sprockets/rails-specific patches on top, instead of, say, just using bower.

I think there’s a good argument for accommodating JS assets which the community has not yet had the time/resources to make respect the sprockets fingerprinting. Still, it is definitely preferable, and always at least theoretically possible, to make all your JS respect sprockets asset fingerprinting — and in most of my apps, I’ve done that.

But there’s other use cases: like mine!

I have an application that needs to offer a Javascript file at a particular stable URL, as part of it’s API — think JS “widgets”.

I want it to go through the asset pipeline, for source control, release management, aggregation, SASS, minimization, etc. The suggestion to just “put it in /public as a static asset” is no good at all. But I need the current version available at a persistent  URL.

Rails 3, this Just Worked, since the asset pipeline created a non-digested name. In Rails 4, we need a workaround.  I don’t need every asset to have a non-digest-named version, but I do need a whitelist of a few that are part of my public API.

I think this is a pretty legitimate use case, and not one that can be solved by ‘fixing bad gems’. I have no idea if Rails devs recognize it or not.

(It’s been suggested that HTML emails linking to CSS stylesheets (or JS?) is another use case. I haven’t done that and don’t understand it well enough to comment. Oh, and other people want em for their static 500 error pages.)

Possible Workaround Options

So that giant Github Issue thread? At first it looks like just one of those annoying ones with continual argument by uninformed people that will never die, and eventually @rafaelfranca locked it. But it’s also got a bunch of comments with people offering their solutions, and is the best aggregation of possible workarounds to consider — I’m glad it wasn’t locked sooner. Another example of how GitHub qualitatively improves open source development — finding this stuff on a listserv would have been a lot harder.

The Basic Rake Task

Early in the thread, Rails core team member @guilleiguaran suggested a Rake task, which simply looks in the file system for fingerprinted assets and copies them over to the un-digest-named version. Rails core team member @rafaelfranca later endorsed this approach too. 

The problem is it won’t work. I’ve got nothing against a rake task solution. It’s easy to wire things up so your new rake task automatically gets called every time after `rake assets:precompile’, no problem!

The problem is that a deployed Rails app may have multiple fingerprinted versions of a particular asset file around, representing multiple releases. And really you should set things up this way —  because right after you do a release, there may be cached copies of HTML (in browser caches, or proxying caches including a CDN) still around, still referencing the old version with the old digest fingerprint. You’ve got to keep it around for a while.

(How long? Depends on the cache headers on the HTML that might reference it. The fact that sprockets only supports keeping around a certain number of releases, and not releases made within a certain time window, is a different discussion. But, yeah, you need to keep around some old versions).

So it’s unpredictable which of the several versions you’ve got hanging around the rake task is going to copy to the non-digest-named version, there’s no guarantee it’ll be the latest one. (Maybe it depends on their lexographic sort?). That’s no good.

Enhance the core-team-suggested rake task?

Before I realized this problem, I had already spent some time trying to implement the basic rake task, add a whitelist parameter, etc. So I tried to keep going with it after realizing this problem.

I figured, okay, there are multiple versions of the asset around, but sprockets and rails have to know which one is the current one (to serve it to the current application), so I must be able to use sprockets ruby API in the rake task to figure it out and copy that one.

  • It was kind of challenging to figure out how to get sprockets to do this, but eventually it was sort of working.
  • Except i started to get worried that I might be triggering the double-compilation that Rails3 did, which I didn’t want to do, and got confused about even figuring out if I was doing it.
  • And I wasn’t really sure if I was using sprockets API meant to be public or internal. It didn’t seem to be clearly documented, and sprockets and sprockets-rails have been pretty churny, I thought I was taking a significant risk of it breaking in future sprockets/rails version(s) and needing continual maintenance.

Verdict: Nope, not so simple, even though it seems to be the rails-core-endorsed solution. 

Monkey-patch sprockets: non-stupid-digest-assets

Okay, so maybe we need to monkey-patch sprockets I figured.

@alexspeller provides a gem to monkey-patch Sprockets to support non-digested-asset creation, the unfortunately combatively named non-stupid-digest-assets.

If someone else has already figured it out and packaged it in a gem, great! Maybe they’ll even take on the maintenance burden of keeping it working with churny sprockets updates!

But non-stupid-digest-assets just takes the same kind logic from that basic rake task, another pass through all the assets post-compilation, but implements it with a sprockets monkeypatch instead of a rake task. It does add a white list.  I can’t quite figure out if it’s still subject to the same might-end-up-with-older-version-of-asset problem.

There’s really no benefit just to using a monkey patch instead of a rake task doing the same thing, and it has increased risk of breaking with new Rails releases. Some have already reported it not working with the Rails 4.2.betas — I haven’t investigated myself to see what’s up with that, and @alexspeller doesn’t seem to be in any hurry to either.

Verdict: Nope. non-stupid-digest-assets ain’t as smart as it thinks it is. 

Monkey-patch sprockets: The right way?

If you’re going to monkey-patch sprockets and take on forwards-compat risk, why not actually do it right, and make sprockets simply write the compiled file to two different file locations (and/or use symlinks) at the point of compilation?

@ryana  suggested such code. I’m not sure how tested it is, and I’d want to add the whitelist feature.

At this point, I was too scared of the forwards-compatibility-maintenance risks of monkey patching sprockets, and realized there was another solution I liked better…

Verdict: It’s the right way to do it, but carries some forwards-compat maintenance risk as an unsupported monkey patch

Use the Manifest, Luke, erm, Rake!

I had tried and given up on using the sprockets ruby api to determine ‘current digest-named asset’.  But as I was going back and reading through the Monster Issue looking for ideas again, I noticed @drojas suggested using the manifest.json file that sprockets creates, in a rake task.

Yep, this is where sprockets actually stores info on the current digest-named-assets. Forget the sprockets ruby api, we can just get it from there, and make sure we’re making a copy (or symlinking) the current digested version to the non-digested name.

But are we still using private api that may carry maintenance risk with future sprockets versions?  Hey, look, in a source code comment Sprockets tells us “The JSON is part of the public API and should be considered stable.” Sweet!

Now, even if sprockets devs  remember one of them once said this was public API (I hope this blog post helps), and even if sprockets is committed to semantic versioning, that still doesn’t mean it can never change. In fact, the way some of rubydom treats semver, it doesn’t even mean it can’t change soon and frequently; it just means they’ve got to update the sprockets major version number when it changes. Hey, at least that’d be a clue.

But note that changes can happen in between Rails major releases. Rails 4.1 uses sprockets-rails 2.x which uses sprockets 2.x. Rails 4.2 — no Rails major version number change — will use sprockets-rails 3.x which, oh, still uses sprockets 2.x, but clearly there’s no commitment on Rails not to change sprockets-rails/sprockets major versions without a Rails major version change.

Anyway, what can you do, you pays your money and you takes your chances. This solution seems pretty good to me.

Here’s my rake task, just a couple dozen lines of code, no problem.

 Verdict: Pretty decent option, best of our current choices

The Redirect

One more option is using a redirect to take requests for the non-digest-named asset, and redirect it to the current digest-named asset.

@Intrepidd suggests using rack middleware to do that.   I think it would also work to just use a Rails route redirect, with lambda. (I’m kind of allergic to middleware.) Same difference either way as far as what your app is doing.

I didn’t really notice this one until I had settled on The Manifest.  It requires two HTTP requests every time a client wants the asset at the persistent URL though. The first one will touch your app and needs short cache time, that will then redirect to the digest-named asset that will be served directly by the web server and can be cached forever. I’m not really sure if the performance implications are significant, probably depends on your use cases and request volume. @will-r suggests it won’t work well with CDN’s though. 

Verdict: Meh, maybe, I dunno, but it doesn’t feel right to introduce the extra latency

The Future

@rafaelfranca says Rails core has changed their mind and are going to deal with “this issue” “in some way”. Although I don’t think it made it into Rails 4.2 after all.

But what’s “this issue” exactly? I dunno, they are not sharing what they see as the legitimate use cases to handle, and requirements on legitimate ways to handle em.

I kinda suspect they might just be dealing with the “non-Rails JS that needs to know asset URLs” issue, and considering some complicated way to automatically make it use digest-named assets without having to repackage it for Rails.  Which might be a useful feature, although also a complicated enough one to have some bug risks (ah, the story of the asset pipeline).

And it’s not what I need, anyway, there are other uses cases than the “non-Rails JS” one that need non-digest-named assets.

I just need sprockets to produce parallel non-digested asset filenames for certain whitelisted assets. That really is the right way to handle it for my use case. Yes, it means you need to know the implications and how to use cache headers responsibly. If you don’t give me enough rope to hang myself, I don’t have enough rope to climb the rock face either. I thought Rails target audience was people who know what they’re doing?

It doesn’t seem like this would be a difficult feature for sprockets to implement (without double compilation!).  @ryana’s monkeypatch seems like pretty simple code that is most of the way there.  It’s the feature what I need.

I considered making a pull request to sprockets (the first step, then probably sprockets-rails, needs to support passing on the config settings).  But you know what, I don’t have the time or psychic energy to get in an argument about it in a PR; the Rails/sprockets devs seem opposed to this feature for some reason.  Heck, I just spent hours figuring out how to make my app work now, and writing it all up for you instead!

But, yeah, just add that feature to sprockets, pretty please.

So, if you’re reading this post in the future, maybe things will have changed, I dunno.

Filed under: General

Library of Congress: The Signal: The Library of Congress Wants You (and Your File Format Ideas)

planet code4lib - Fri, 2014-10-03 16:24

“Uncle Sam Needs You” painted by James Montgomery Flagg

In June of this year, the Library of Congress announced a list of formats it would prefer for digital collections. This list of recommended formats is an ongoing work; the Library will be reviewing the list and making revisions for an updated version in June 2015. Though the team behind this work continues to put a great deal of thought and research into listing the formats, there is still one more important component needed for the project: the Library of Congress needs suggestions from you.

This request is not half-hearted. As the Library increasingly relies on the list to identify preferred formats for acquisition of digital collections, no doubt other institutions will adopt the same list. It is important, therefore, that as the Library undertakes this revision of the recommended formats, it conducts a public dialog about them in order to reach an informed consensus.

This public dialog includes librarians, library students, teachers, vendors, publishers, information technologists — anyone and everyone with an opinion on the matter and a stake in preserving digital files. Collaboration is essential for digital preservation. No single institution can know everything and do everything alone. This is a shared challenge.

Librarians, what formats would you prefer to receive your digital collections in? What file formats are easiest for you to process and access? Publishers and vendors, what format do you think you should create your digital publications in if you want your stuff to last and be accessible into the future? The time may come when you want to re-monetize a digital publication, so you want to ensure that it is accessible.

Those are general questions, of course. Let’s look at the specific file formats the Library has selected so far. The preferred formats are categorized by:

  • Textual Works and Musical Compositions
  • Still Image Works
  • Audio Works
  • Moving Image Works
  • Software and Electronic Gaming and Learning
  • Datasets/Databases

Take, for example, digital photographs. Here is the list of formats the Library would most prefer to receive for digital preservation:

  • TIFF (uncompressed)
  • JPEG2000 (lossless (*.jp2)
  • PNG (*.png)
  • JPEG/JFIF (*.jpg)
  • Digital Negative DNG (*.dng)
  • JPEG2000 (lossy) (*.jp2)
  • TIFF (compressed)
  • BMP (*.bmp)
  • GIF (*.gif)

Is there anything you think should be changed in that list? If so, why? Or anything added to this list? There’s a section on metadata on that page. Does it say enough? Or too little? Is it clear enough? Should the Library add some description about adding photo metadata into the photo files themselves?

Please look over the file categories that interest you and tell us what you think. Help us shape a policy that will affect future digital collections, large and small. Be as specific as you can.

Email your questions and comments to the digital preservation experts below. Your emails will be confidential; they will not be published on this blog post. So don’t be shy. We welcome all questions and comments, great and small.

Send general email about preferred formats to Theron Westervelt (thwe at Send email about specific categories to:

  • Ardie Bausenbach (abau at for Textual Works and Musical Compositions
  • Phil Michel (pmic at for Still Image Works
  • Gene DeAnna (edea at for Audio Works
  • Mike Mashon (mima at for Moving Image Works
  • Trevor Owens (trow at for Software and Electronic Gaming and Learning
  • Donna Scanlon (dscanlon at for Datasets/Databases

They are all very nice people who are up to their eyeballs in digital-preservation work and would appreciate hearing your fresh perspective on the subject.

One last thing. The recommended formats are just that: recommended. It is not a fixed set of standards. And the Library of Congress will not reject any digital collection of value simply because the file formats in the collection might not conform to the recommended formats.

Jason Ronallo: The Lenovo X240 Keyboard and the End/Insert Key With FnLk On as a Software Developer on Linux

planet code4lib - Fri, 2014-10-03 16:12

As a software developer I’m using keys like F5 a lot. When I’m doing any writing, I use F6 a lot to turn off and on spell correction underlining. On the Lenovo X240 the function keys are overlaid on the same keys as volume and brightness control. This causes some problems for me. Luckily there’s a solution that works for me under Linux.

To access the function keys you have to also press the Fn key. If most of what you’re doing is reloading a browser and not using the volume control, then this is a problem, so they’ve created a function lock which is enabled by pressing the Fn and Esc/FnLk key. The Fn key lights up and you can press F5 without using the Fn modifier key.

That’s all well and good until you get to another quirk of this keyboard where the Home, End, and Delete keys are in the same function key row in a way that the End key also functions as the Insert key. When function lock is on the End key becomes an Insert key. I don’t ever use the Insert key on a keyboard, so I understand why they combined the End/Insert key. But in this combination it doesn’t work for me as a software developer. I’m continually going between something that needs to be reloaded with F5 and in an editor where I need to quickly go to the end of a line in a program.

Luckily there’s a pretty simple answer to this if you don’t ever need to use the Insert key. I found the answer on askubuntu.

All I needed to do was run the following:

xmodmap -e "keycode 118 = End"

And now even when the function keys are locked the End/Insert key always behaves as End. To make this is permanent and the mapping gets loaded with X11 starts, add xmodmap -e "keycode 118 = End" to your ~/.xinitrc.

Jason Ronallo: Questions Asked During the Presentation Websockets For Real-time And Interactive Interfaces At Code4lib 2014

planet code4lib - Fri, 2014-10-03 16:12

During my presentation on WebSockets, there were a couple points where folks in the audience could enter text in an input field that would then show up on a slide. The data was sent to the slides via WebSockets. It is not often that you get a chance to incorporate the technology that you’re talking about directly into how the presentation is given, so it was a lot of fun. At the end of the presentation, I allowed folks to anonymously submit questions directly to the HTML slides via WebSockets.

I ran out of time before I could answer all of the questions that I saw. I’ll try to answer them now.

Questions From Slides

You can see in the YouTube video at the end of my presentation (at 1h38m26s) the following questions came in. ([Full presentation starts here[( Some lines that came in were not questions at all. For those that are really questions, I’ll answer them now, even if I already answered them.

Are you a trained dancer?

No. Before my presentation I was joking with folks about how little of a presentation I’d have, at least for the interactive bits, if the wireless didn’t work well enough. Tim Shearer suggested I just do an interpretive dance in that eventuality. Luckily it didn’t come to that.

When is the dance?

There was no dance. Initially I thought the dance might happen later, but it didn’t. OK, I’ll admit it, I was never going to dance.

Did you have any efficiency problems with the big images and chrome?

On the big video walls in Hunt Library we often use Web technologies to create the content and Chrome for displaying it on the wall. For the most part we don’t have issues with big images or lots of images on the wall. But there’s a bit of trick happening here. For instance when we display images for My #HuntLibrary on the wall, they’re just images from Instagram so only 600x600px. We initially didn’t know how these would look blown up on the video wall, but they end up looking fantastic. So you don’t necessarily need super high resolution images to make a very nice looking display.

Upstairs on the Visualization Wall, I display some digitized special collections images. While the possible resolution on the display is higher, the current effective resolution is only about 202px wide for each MicroTile. The largest image is then only 404px side. In this case we are also using a Djatoka image server to deliver the images. Djatoka has an issue with the quality of its scaling between quality levels where the algorithm chosen can make the images look very poor. How I usually work around this is to pick the quality level that is just above the width required to fit whatever design. Then the browser scales the image down and does a better job making it look OK than the image server would. I don’t know which of these factors effect the look on the Visualization Wall the most, but some images have a stair stepping look on some lines. This especially effects line drawings with diagonal lines, while photographs can look totally acceptable. We’ll keep looking for how to improve the look of images on these walls especially in the browser.

Have you got next act after Wikipedia?

This question is referring to the adaptation of Listen to Wikipedia for the Immersion Theater. You can see video of what this looks like on the big Hunt Library Immersion Theater wall.

I don’t currently have solid plans for developing other content for any of the walls. Some of the work that I and others in the Libraries have done early on has been to help see what’s possible in these spaces and begin to form the cow paths for others to produce content more easily. We answered some big questions. Can we deliver content through the browser? What templates can we create to make this work easier? I think the next act is really for the NCSU Libraries to help more students and researchers to publish and promote their work through these spaces.

Is it lunchtime yet?

In some time zone somewhere, yes. Hopefully during the conference lunch came soon enough for you and was delicious and filling.

Could you describe how testing worked more?

I wish I could think of some good way to test applications that are destined for these kinds of large displays. There’s really no automated testing that is going to help here. BrowserStack doesn’t have a big video wall that they can take screenshots on. I’ve also thought that it’d be nice to have a webcam trained on the walls so that I could make tweaks from a distance.

But Chrome does have its screen emulation developer tools which were super helpful for this kind of work. These kinds of tools are useful not just for mobile development, which is how they’re usually promoted, but for designing for very large displays as well. Even on my small workstation monitor I could get a close enough approximation of what something would look like on the wall. Chrome will shrink the content to fit to the available viewport size. I could develop for the exact dimensions of the wall while seeing all of the content shrunk down to fit my desktop. This meant that I could develop and get close enough before trying it out on the wall itself. Being able to design in the browser has huge advantages for this kind of work.

I work at DH Hill Library while these displays are in Hunt Library. I don’t get over there all that often, so I would schedule some time to see the content on the walls when I happened to be over there for a meeting. This meant that there’d often be a lag of a week or two before I could get over there. This was acceptable as this wasn’t the primary project I was working on.

By the time I saw it on the wall, though, we were really just making tweaks for design purposes. We wanted the panels to the left and right of the Listen to Wikipedia visualization to fall along the bezel. We would adjust font sizes for how they felt once you’re in the space. The initial, rough cut work of modifying the design to work in the space was easy, but getting the details just right required several rounds of tweaks and testing. Sometimes I’d ask someone over at Hunt to take a picture with their phone to ensure I’d fixed an issue.

While it would have been possible for me to bring my laptop and sit in front of the wall to work, I personally didn’t find that to work well for me. I can see how it could work to make development much faster, though, and it is possible to work this way.

Race condition issues between devices?

Some spaces could allow you to control a wall from a kiosk and completely avoid any possibility of a race condition. When you allow users to bring their own device as a remote control to your spaces you have some options. You could allow the first remote to connect and lock everyone else out for a period of time. Because of how subscriptions and presence notifications work this would certainly be possible to do.

For Listen to Wikipedia we allow more than one user to control the wall at the same time. Then we use WebSockets to try to keep multiple clients in sync. Even though we attempt to quickly update all the clients, it is certainly possible that there could be race conditions, though it seems unlikely. Because we’re not dealing with persisting data, I don’t really worry about it too much. If one remote submits just after another but before it is synced, then the wall will reflect the last to submit. That’s perfectly acceptable in this case. If a client were to get out of sync with what is on the wall, then any change by that client would just be sent to the wall as is. There’s no attempt to make sure a client had the most recent, freshest version of the data prior to submitting.

While this could be an issue for other use cases, it does not adversely effect the experience here. We do an alright job keeping the clients in sync, but don’t shoot for perfection.

How did you find the time to work on this?

At the time I worked on these I had at least a couple other projects going. When waiting for someone else to finish something before being able to make more progress or on a Friday afternoon, I’d take a look at one of these projects for a little. It meant the progress was slow, but these also weren’t projects that anyone was asking to be delivered on a deadline. I like to have a couple projects of this nature around. If I’ve got a little time, say before a meeting, but not enough for something else, I can pull one of these projects out.

I wonder, though, if this question isn’t more about the why I did these projects. There were multiple motivations. A big motivation was to learn more about WebSockets and how the technology could be applied in the library context. I always like to have a reason to learn new technologies, especially Web technologies, and see how to apply them to other types of applications. And now that I know more about WebSockets I can see other ways to improve the performance and experience of other applications in ways that might not be as overt in their use of the technology as these project were.

For the real-time digital collections view this is integrated into an application I’ve developed and it did not take much to begin adding in some new functionality. We do a great deal of business analytic tracking for this application. The site has excellent SEO for the kind of content we have. I wanted to explore other types of metrics of our success.

The video wall projects allowed us to explore several different questions. What does it take to develop Web content for them? What kinds of tools can we make available for others to develop content? What should the interaction model be? What messaging is most effective? How should we kick off an interaction? Is it possible to develop bring your own device interactions? All of these kinds of questions will help us to make better use of these kinds of spaces.

Speed of an unladen swallow?

I think you’d be better off asking a scientist or a British comedy troupe.

Questions From Twitter

Mia (@mia_out) tweeted at 11:47 AM on Tue, Mar 25, 2014
@ostephens @ronallo out of curiosity, how many interactions compared to visitor numbers? And in-app or relying on phone reader?

sebchan (@sebchan) tweeted at 0:06 PM on Tue, Mar 25, 2014
@ostephens @ronallo (but) what are the other options for ‘interacting’?

This question was in response to how 80% of the interactions with the Listen to Wikipedia application are via QR code. We placed a URL and QR code on the wall for Listen to Wikipedia not knowing which would get the most use.

Unfortunately there’s no simple way I know of to kick off an interaction in these spaces when the user brings their own device. Once when there was a stable exhibit for a week we used a kiosk iPad to control a wall so that the visitor did not need to bring a device. We are considering how a kiosk tablet could be used more generally for this purpose. In cases where the visitor brings their own device it is more complicated. The visitor either must enter a URL or scan a QR code. We try to make the URLs short, but because we wanted to use some simple token authentication they’re at least 4 characters longer than they might otherwise be. I’ve considered using geolocation services as the authentication method, but they are not as exact as we might want them to be for this purpose, especially if the device uses campus wireless rather than GPS. We also did not want to have a further hurdle of asking for permission of the user and potentially being rejected. For the QR code the visitor must have a QR code reader already on their device. The QR code includes the changing token. Using either the URL or QR code sends the visitor to a page in their browser.

Because the walls I’ve placed content on are in public spaces there is no good way to know how many visitors there are compared to the number of interactions. One interesting thing about the Immersion Theater is that I’ll often see folks standing outside of the opening to the space looking in, so even if there where some way to track folks going in and out of the space, that would not include everyone who has viewed the content.

Other Questions

If you have other questions about anything in my presentation, please feel free to ask. (If you submit them through the slides I won’t ever see them, so better to email or tweet at me.)

Jason Ronallo: HTML Slide Decks With Synchronized and Interactive Audience Notes Using WebSockets

planet code4lib - Fri, 2014-10-03 16:12

One question I got asked after giving my Code4Lib presentation on WebSockets was how I created my slides. I’ve written about how I create HTML slides before, but this time I added some new features like an audience interface that synchronizes automatically with the slides and allows for audience participation.

TL;DR I’ve open sourced starterdeck-node for creating synchronized and interactive HTML slide decks.

Not every time that I give a presentation am I able to use the technologies that I am talking about within the presentation itself, so I like to do it when I can. I write my slide decks as Markdown and convert them with Pandoc to HTML slides which use DZslides for slide sizing and animations. I use a browser to present the slides. Working this way with HTML has allowed me to do things like embed HTML5 video into a presentation on HTML5 video and show examples of the JavaScript API and how videos can be styled with CSS.

For a presentation on WebSockets I gave at Code4Lib 2014, I wanted to provide another example from within the presentation itself of what you can do with WebSockets. If you have the slides and the audience notes handout page open at the same time, you will see how they are synchronized. (Beware slowness as it is a large self-contained HTML download using data URIs.) When you change to certain slides in the presenter view, new content is revealed in the audience view. Because the slides are just an HTML page, it is possible to make the slides more interactive. WebSockets are used to allow the slides to send messages to each audience members' browser and reveal notes. I am never able to say everything that I would want to in one short 20 minute talk, so this provided me a way to give the audience some supplementary material.

Within the slides I even included a simplistic chat application that allowed the audience to send messages directly to the presenter slides. (Every talk on WebSockets needs a gratuitous chat application.) At the end of the talk I also accepted questions from the audience via an input field. The questions were then delivered to the slides via WebSockets and displayed right within a slide using a little JavaScript. What I like most about this is that even someone who did not feel confident enough to step up to a microphone would have the opportunity to ask an anonymous question. And I even got a few legitimate questions amongst the requests for me to dance.

Another nice side benefit of getting the audience to notes before the presentation starts is that you can include your contact information and Twitter handle on the page.

I have wrapped up all this functionality for creating interactive slide decks into a project called starterdeck-node. It includes the WebSocket server and a simple starting point for creating your own slides. It strings together a bunch of different tools to make creating and deploying slide decks like this simpler so you’ll need to look at the requirements. This is still definitely just a tool for hackers, but having this scaffolding in place ought to make the next slide deck easier to create.

Here’s a video where I show starterdeck-node at work. Slides on the left; audience notes on the right.

Other Features

While the new exciting feature added in this version of the project is synchronization between presenter slides and audience notes, there are also lots of other great features if you want to create HTML slide decks. Even if you aren’t going to use the synchronization feature, there are still lots of reasons why you might want to create your HTML slides with starterdeck-node.

Self-contained HTML. Pandoc uses data-URIs so that the HTML version of your slides have no external dependencies. Everything including images, video, JavaScript, CSS, and fonts are all embedded within a single HTML document. That means that even if there’s no internet connection from the podium you’ll still be able to deliver your presentation.

Onstage view. Part of what gets built is a DZSlides onstage view where the presenter can see the current slide, next slide, speaker notes, and current time.

Single page view. This view is a self-contained, single-page layout version of the slides and speaker notes. This is a much nicer way to read a presentation than just flipping through the slides on various slide sharing sites. If you put a lot of work into your talk and are writing speaker notes, this is a great way to reuse them.

PDF backup. A script is included to create a PDF backup of your presentation. Sometimes you have to use the computer at the podium and it has an old version of IE on it. PDF backup to the rescue. While you won’t get all the features of the HTML presentation you’re still in business. The included Node.js app provides a server so that a headless browser can take screenshots of each slide. These screenshots are then compiled into the PDF.


I’d love to hear from anyone who tries to use it. I’ll list any examples I hear about below.

Here are some examples of slide decks that have used starterdeck-node or starterdeck.

Jason Ronallo: HTML and PDF Slideshows Written in Markdown with DZSlides, Pandoc, Guard, Capybara Webkit, and a little Ruby

planet code4lib - Fri, 2014-10-03 16:12

I’ve used different HTML slideshow tools in the past, but was never satisfied with them. I didn’t like to have to run a server just for a slideshow. I don’t like when a slideshow requires external dependencies that make it difficult to share the slides. I don’t want to actually have to write a lot of HTML.

I want to write my slides in a single Markdown file. As a backup I always like to have my slides available as a PDF.

For my latest presentations I came up with workflow that I’m satisfied with. Once all the little pieces were stitched together it worked really well for me. I’ll show you how I did it.

I had looked at DZSlides before but had always passed it by after seeing what a default slide deck looked like. It wasn’t as flashy as others and doesn’t immediately have all the same features readily available. I looked at it again because I liked the idea that it is a single file template. I also saw that Pandoc will convert Markdown into a DZSlides slideshow.

To convert my Markdown to DZSlides it was as easy as:

pandoc -w dzslides > presentation.html

What is even better is that Pandoc has settings to embed images and any external files as data URIs within the HTML. So this allows me to maintain a single Markdown file and then share my presentation as a single HTML file including images and all–no external dependencies.

pandoc -w dzslides --standalone --self-contained > presentation.html

The DZSlides default template is rather plain, so you’ll likely want to make some stylistic changes to the CSS. You may also want to add some more JavaScript as part of your presentation or to add features to the slides. For instance I wanted to add a simple way to toggle my speaker notes from showing. In previous HTML slides I’ve wanted to control HTML5 video playback by binding JavaScript to a key. The way I do this is to add in any external styles or scripts directly before the closing body tag after Pandoc does its processing. Here’s the simple script I wrote to do this:

#! /usr/bin/env ruby # markdown_to_slides.rb # Converts a markdown file into a DZslides presentation. Pandoc must be installed. # Read in the given CSS file and insert it between style tags just before the close of the body tag. css ='styles.css') script ='scripts.js') `pandoc -w dzslides --standalone --self-contained > presentation.html` presentation ='presentation.html') style = "<style>#{css}</style>" scripts = "<script>#{script}</script>" presentation.sub!('</body>', "#{style}#{scripts}</body>")'presentation.html', 'w') do |fh| fh.puts presentation end

Just follow these naming conventions:

  • Presentation Markdown should be named
  • Output presentation HTML will be named presentation.html
  • Create a stylesheet in styles.css
  • Create any JavaScript in a file named scripts.js
  • You can put images wherever you want, but I usually place them in an images directory.
Automate the build

Now what I wanted was for this script to run any time the Markdown file changed. I used Guard to watch the files and set off the script to convert the Markdown to slides. While I was at it I could also reload the slides in my browser. One trick with guard-livereload is to allow your browser to watch local files so that you do not have to have the page behind a server. Here’s my Guardfile:

guard 'livereload' do watch("presentation.html") end guard :shell do # If any of these change run the script to build presentation.html watch('') {`./markdown_to_slides.rb`} watch('styles.css') {`./markdown_to_slides.rb`} watch('scripts.js') {`./markdown_to_slides.rb`} watch('markdown_to_slides.rb') {`./markdown_to_slides.rb`} end

Add the following to a Gemfile and bundle install:

source '' gem 'guard-livereload' gem 'guard-shell'

Now I have a nice automated way to build my slides, continue to work in Markdown, and have a single file as a result. Just run this:

bundle exec guard

Now when any of the files change your HTML presentation will be rebuilt. Whenever the resulting presentation.html is changed, it will trigger livereload and a browser refresh.

Slides to PDF

The last piece I needed was a way to convert the slideshow into a PDF as a backup. I never know what kind of equipment will be set up or whether the browser will be recent enough to work well with the HTML slides. I like being prepared. It makes me feel more comfortable knowing I can fall back to the PDF if needs be. Also some slide deck services will accept a PDF but won’t take an HTML file.

In order to create the PDF I wrote a simple ruby script using capybara-webkit to drive a headless browser. If you aren’t able to install the dependencies for capybara-webkit you might try some of the other capybara drivers. I did not have luck with the resulting images from selenium. I then used the DZSlides JavaScript API to advance the slides. I do a simple count of how many times to advance based on the number of sections. If you have incremental slides this script would need to be adjusted to work for you.

The Webkit driver is used to take a snapshot of each slide, save it to a screenshots directory, and then ImageMagick’s convert is used to turn the PNGs into a PDF. You could just as well use other tools to stitch the PNGs together into a PDF. The quality of the resulting PDF isn’t great, but it is good enough. Also the capybara-webkit browser does not evaluate @font-face so the fonts will be plain. I’d be very interested if anyone gets better quality using a different browser driver for screenshots.

#! /usr/bin/env ruby # dzslides2pdf.rb # dzslides2pdf.rb http://localhost/presentation_root presentation.html require 'capybara/dsl' require 'capybara-webkit' # require 'capybara/poltergeist' require 'fileutils' include Capybara::DSL base_url = ARGV[0] || exit presentation_name = ARGV[1] || 'presentation.html' # temporary file for screenshot FileUtils.mkdir('./screenshots') unless File.exist?('./screenshots') Capybara.configure do |config| config.run_server = false config.default_driver config.current_driver = :webkit # :poltergeist = "fake app name" config.app_host = base_url end visit '/presentation.html' # visit the first page # change the size of the window if Capybara.current_driver == :webkit page.driver.resize_window(1024,768) end sleep 3 # Allow the page to render correctly page.save_screenshot("./screenshots/screenshot_000.png", width: 1024, height: 768) # take screenshot of first page # calculate the number of slides in the deck slide_count = page.body.scan(%r{slide level1}).size puts slide_count (slide_count - 1).times do |time| slide_number = time + 1 keypress_script = "Dz.forward();" # dzslides script for going to next slide page.execute_script(keypress_script) # run the script to transition to next slide sleep 3 # wait for the slide to fully transition # screenshot_and_save_page # take a screenshot page.save_screenshot("./screenshots/screenshot_#{slide_number.to_s.rjust(3,'0')}.png", width: 1024, height: 768) print "#{slide_number}. " end puts `convert screenshots/*png presentation.pdf` FileUtils.rm_r('screenshots')

At this point I did have to set this up to be behind a web server. On my local machine I just made a symlink from the root of my Apache htdocs to my working directory for my slideshow. The script can be called with the following.

./dzslides2pdf.rb http://localhost/presentation/root/directory presentation.html Speaker notes

One addition that I’ve made is to add some JavaScript for speaker notes. I don’t want to have to embed my slides into another HTML document to get the nice speaker view that DZslides provides. I prefer to just have a section at the bottom of the slides that pops up with my notes. I’m alright with the audience seeing my notes if I should ever need them. So far I haven’t had to use the notes.

I start with adding the following markup to the presentation Markdown file.

<div role="note" class="note"> Hi. I'm Jason Ronallo the Associate Head of Digital Library Initiatives at NCSU Libraries. </div>

Add some CSS to hide the notes by default but allow for them to display at the bottom of the slide.

div[role=note] { display: none; position: absolute; bottom: 0; color: white; background-color: gray; opacity: 0.85; padding: 20px; font-size: 12px; width: 100%; }

Then a bit of JavaScript to show/hide the notes when pressing the “n” key.

window.onkeypress = presentation_keypress_check; function presentation_keypress_check(aEvent){ if ( aEvent.keyCode == 110) { aEvent.preventDefault(); var notes = document.getElementsByClassName('note'); for (var i=0; i < notes.length; i++){ notes[i].style.display = (notes[i].style.display == 'none' || !notes[i].style.display) ? 'block' : 'none'; } } } Outline

Finally, I like to have an outline I can see of my presentation as I’m writing it. Since the Markdown just uses h1 elements to separate slides, I just use the following simple script to output the outline for my slides.

#!/usr/bin/env ruby # outline_markdown.rb file ='') index = 0 file.each_line do |line| if /^#\s/.match line index += 1 title = line.sub('#', index.to_s) puts title end end Full Example

You can see the repo for my latest HTML slide deck created this way for the 2013 DLF Forum where I talked about Embedded Semantic Markup,, the Common Crawl, and Web Data Commons: What Big Web Data Means for Libraries and Archives.


I like doing slides where I can write very quickly in Markdown and then have the ability to handcraft the deck or particular slides. I’d be interested to hear if you do something similar.

Jason Ronallo: DLF Forum 2013 presentation: Embedded Semantic Markup,, the Common Crawl, and Web Data Commons

planet code4lib - Fri, 2014-10-03 16:12

I spoke at the 2013 DLF Forum about Embedded Semantic Markup,, the Common Crawl, and Web Data Commons: What Big Web Data Means for Libraries and Archives. My slides, code, and data are all open.

Here’s the abstract:

Search engines are reaching the limits of natural language processing while wanting to provide more exact answers, not just results, especially for the mobile context. This shift is part of what has spurred progress in how data can be published and consumed on the Web. Broad and simple vocabularies and simplified embedded semantic markup is leading to wider adoption of publishing data in HTML. Libraries and archives can take advantage of new opportunities to make their services and collections more discoverable on the open Web. This presentation will show some examples of what libraries and archives are currently doing and point to future possibilities.

At the same time as this new data is being made available, only a few organizations have the resources to crawl the Web and extract the data. The Common Crawl is helping to make a large repository of Web crawl data available for public use, and Web Data Commons is extracting the data embedded in the Common Crawl and making the resulting linked data available for download. This presentation will share data from original research on how libraries currently fare in this new environment of big Web data. Are libraries and archives represented in the corpus? With this democratization of Web crawl data and lowered expense for consumption of it, what are the opportunities for new library services and collections?

Jason Ronallo: A Plugin For Mediaelement.js For Preview Thumbnails on Hover Over the Time Rail Using WebVTT

planet code4lib - Fri, 2014-10-03 16:12

The time rail or progress bar on video players gives the viewer some indication of how much of the video they’ve watched, what portion of the video remains to be viewed, and how much of the video is buffered. The time rail can also be clicked on to jump to a particular time within the video. But figuring out where in the video you want to go can feel kind of random. You can usually hover over the time rail and move from side to side and see the time that you’d jump to if you clicked, but who knows what you might see when you get there.

Some video players have begun to use the time rail to show video thumbnails on hover in a tooltip. For most videos these thumbnails give a much better idea of what you’ll see when you click to jump to that time. I’ll show you how you can create your own thumbnail previews using HTML5 video.

TL;DR Use the time rail thumbnails plugin for Mediaelement.js.

Archival Use Case

We usually follow agile practices in our archival processing. This style of processing became popularized by the article More Product, Less Process: Revamping Traditional Archival Processing by Mark A. Greene and Dennis Meissner. For instance, we don’t read every page of every folder in every box of every collection in order to describe it well enough for us to make the collection accessible to researchers. Over time we may decide to make the materials for a particular collection or parts of a collection more discoverable by doing the work to look closer and add more metadata to our description of the contents. But we try not to allow the perfect from being the enemy of the good enough. Our goal is to make the materials accessible to researchers and not hidden in some box no one knows about.

Some of our collections of videos are highly curated like for video oral histories. We’ve created transcripts for the whole video. We extract out the most interesting or on topic clips. For each of these video clips we create a WebVTT caption file and an interface to navigate within the video from the transcript.

At NCSU Libraries we have begun digitizing more archival videos. And for these videos we’re much more likely to treat them like other archival materials. We’re never going to watch every minute of every video about cucumbers or agricultural machinery in order to fully describe the contents. Digitization gives us some opportunities to automate the summarization that would be manually done with physical materials. Many of these videos don’t even have dialogue, so even when automated video transcription is more accurate and cheaper we’ll still be left with only the images. In any case, the visual component is a good place to start.

Video Thumbnail Previews

When you hover over the time rail on some video viewers, you see a thumbnail image from the video at that time. YouTube does this for many of its videos. I first saw that this would be possible with HTML5 video when I saw the JW Player page on Adding Preview Thumbnails. From there I took the idea to use an image sprite and a WebVTT file to structure which media fragments from the sprite to use in the thumbnail preview. I’ve implemented this as a plugin for Mediaelement.js. You can see detailed instructions there on how to use the plugin, but I’ll give the summary here.

1. Create an Image Sprite from the Video

This uses ffmpeg to take a snapshot every 5 seconds in the video and then uses montage (from ImageMagick) to stitch them together into a sprite. This means that only one file needs to be downloaded before you can show the preview thumbnail.

ffmpeg -i "video-name.mp4" -f image2 -vf fps=fps=1/5 video-name-%05d.jpg montage video-name*jpg -tile 5x -geometry 150x video-name-sprite.jpg 2. Create a WebVTT metadata file

This is just a standard WebVTT file except the cue text is metadata instead of captions. The URL is to an image and uses a spatial Media Fragment for what part of the sprite to display in the tooltip.

WEBVTT 00:00:00.000 --> 00:00:05.000,0,150,100 00:00:05.000 --> 00:00:10.000,0,150,100 00:00:10.000 --> 00:00:15.000,0,150,100 00:00:15.000 --> 00:00:20.000,0,150,100 00:00:20.000 --> 00:00:25.000,0,150,100 00:00:25.000 --> 00:00:30.000,100,150,100 3. Add the Video Thumbnail Preview Track

Put the following within the <video> element.

<track kind="metadata" class="time-rail-thumbnails" src=""></track> 4. Initialize the Plugin

The following assumes that you’re already using Mediaelement.js, jQuery, and have included the vtt.js library.

$('video').mediaelementplayer({ features: ['playpause','progress','current','duration','tracks','volume', 'timerailthumbnails'], timeRailThumbnailsSeconds: 5 }); The Result

Your browser won’t play an MP4. You can download it instead.

See Bug Sprays and Pets with sound.


The plugin can either be installed using the Rails gem or the Bower package.


One of the DOM API features I hadn’t used before is MutationObserver. One thing the thumbnail preview plugin needs to do is know what time is being hovered over on the time rail. I could have calculated this myself, but I wanted to rely on MediaElement.js to provide the information. Maybe there’s a callback in MediaElement.js for when this is updated, but I couldn’t find it. Instead I use a MutationObserver to watch for when MediaElement.js changes the DOM for the default display of a timestamp on hover. Looking at the time code there then allows the plugin to pick the correct cue text to use for the media fragment. MutationObserver is more performant than the now deprecated MutationEvents. I’ve experienced very little latency using a MutationObserver which allows it to trigger lots of events quickly.

The plugin currently only works in the browsers that support MutationObserver, which is most current browsers. In browsers that do not support MutationObserver the plugin will do nothing at all and just show the default timestamp on hover. I’d be interested in other ideas on how to solve this kind of problem, though it is nice to know that plugins that rely on another library have tools like MutationObserver around.

Other Caveats

This plugin is brand new and works for me, but there are some caveats. All the images in the sprite must have the same dimensions. The durations for each thumbnail must be consistent. The timestamps currently aren’t really used to determine which thumbnail to display, but is instead faked relying on the consistent durations. The plugin just does some simple addition and plucks out the correct thumbnail from the array of cues. Hopefully in future versions I can address some of these issues.


Having this feature be available for our digitized video, we’ve already found things in our collection that we wouldn’t have seen before. You can see how a “Profession with a Future” evidently involves shortening your life by smoking (at about 9:05). I found a spinning spherical display of Soy-O and synthetic meat (at about 2:12). Some videos switch between black & white and color which you wouldn’t know just from the poster image. And there are some videos, like talking heads, that appear from the thumbnails to have no surprises at all. But maybe you like watching boiling water for almost 13 minutes.

OK, this isn’t really a discovery in itself, but it is fun to watch a head banging JFK as you go back and forth over the time rail. He really likes milk. And Eisenhower had a different speaking style.

You can see this in action for all of our videos on the NCSU Libraries' Rare & Unique Digital Collections site and make your own discoveries. Let me know if you find anything interesting.

Preview Thumbnail Sprite Reuse

Since we already had the sprite images for the time rail hover preview, I created another interface to allow a user to jump through a video. Under the video player is a control button that shows a modal with the thumbnail sprite. The sprite alone provides a nice overview of the video that allows you to see very quickly what might be of interest. I used an image map so that the rather large sprite images would only have to be in memory once. (Yes, image maps are still valid in HTML5 and have their legitimate uses.) jQuery RWD Image Maps allows the map area coordinates to scale up and down across devices. Hovering over a single thumb will show the timestamp for that frame. Clicking a thumbnail will set the current time for the video to be the start time of that section of the video. One advantage of this feature is that it doesn’t require the kind of fine motor skill necessary to hover over the video player time rail and move back and forth to show each of the thumbnails.

This feature has just been added this week and deployed to production this week, so I’m looking for feedback on whether folks find this useful, how to improve it, and any bugs that are encountered.

Summarization Services

I expect that automated summarization services will become increasingly important for researchers as archives do more large-scale digitization of physical collections and collect more born digital resources in bulk. We’re already seeing projects like fondz which autogenerates archival description by extracting the contents of born digital resources. At NCSU Libraries we’re working on other ways to summarize the metadata we create as we ingest born digital collections. As we learn more what summarization services and interfaces are useful for researchers, I hope to see more work done in this area. And this is just the beginning of what we can do with summarizing archival video.

LITA: Doing Web Accessibility

planet code4lib - Fri, 2014-10-03 14:40

Physical library spaces are designed to comply with the Americans with Disabilities Act (ADA), hence the wide aisles, low checkout stations, and ramps. In contrast, alt tag awareness is low and web accessibility not a priority for most librarians. Yet for visually or otherwise impaired users, an improperly coded website can be like wandering into a maze and hitting a brick wall of frustration.

With accessibility in mind, I’ve been teaching myself to assess and retrofit webpages, aligning my library’s website with the W3C’s Web Content Accessibility Guidelines (WCAG), the U.S. Rehabilitation Act’s Section 508, and this WebAIM Infographic aimed at accessible design as well as code. For best practices, these are your first stops.

Design for Users

Crucially, designing with accessibility in mind makes for websites that are more usable for everyone, not just for disabled users. Questioning trendy design elements can pay off too. Do image-heavy carousels and page-spanning images really enhance UX enough to justify the space they fill and the accessibility problems they may engender?

Out-of-the-box products may come with their own access problems. WordPress themes often provide low contrast. LibGuides omits the HTML lang attribute on some templates. Developers forget alt tags and form labels. Sometimes it’s easier just to fix stuff yourself.

And I use the word “easier” advisedly.

W3C Markup Validator

First, copy and paste your webpage’s URL into the free W3C Markup Validation Service to check the HTML for conformance to W3C web standards. Optimally, your code would be up to HTML5 (and CSS3) standards. This makes for cleaner aesthetics, no deprecated elements, and fewer errors when you run accessibility evaluation tools in the next stages of this process. The Validator will tell you which lines of code need correcting, and lead you to relevant documentation. Once your code is sound (imperfections are ok), break out the WAVE tool.


Plug in a URL, and the WAVE web accessibility evaluation tool from WebAIM will scan your code, flagging errors, marking structural elements, and alerting you to potential issues. WAVE will flag link texts that say “Click here” or “More,” redundant or empty links, PDFs that may or may not be optimized for accessibility, missing alternative text and form elements, and other problems. WAVE also says what the page does right (for example, WAI-ARIA features, helpful alternative text, and the like).

As a coding newbie, I love WAVE’s unique color-coded icons, which you can click to see thorough explanations of each concern. Better yet, WAVE also comes as a Firefox toolbar that lets you evaluate pages on the fly–and it tests for JavaScript too!

Browser Developer Tools

To dig deeper into your code, I suggest using a browser developer tool (Bryan Brown wrote an excellent LITA Blog post on such tools). Google Chrome’s Accessibility Developer Tools are particularly good at auditing for color contrast and recognizable links. Add these to your browser and you can test any page for accessibility and discover exactly what could be improved. Note that these tools can be really nitpicky, and again, functionality rather than perfection is our goal.

Manual Checks

Can you turn off the CSS and still make sense of the page design? Did nothing disappear? Can you manually resize the font to at least 150% without spectacularly messing up the design? Can you navigate using only the keyboard? Are any videos close captioned and any audio files accompanied by transcripts? Can you run pages or sections of pages through a screen reader and still make sense of the content? Try it, and congratulations! You just became a web accessibility guru.


You’re not a web developer, you say? Neither am I. But even if your job has nothing to do with digital services, librarians need to know about these technical matters so as to make the case for prioritizing web accessibility and to be able to speak the language of colleagues (often the IT department) who do engage in web development. Web accessibility builds equal access and diverse communities. These are enduring values for librarians, and why I joined the profession.

What about you? How do you “do” web accessibility?

Eric Hellman: The Perfect Bookstore Loses to Amazon

planet code4lib - Fri, 2014-10-03 13:59
My book industry friends are always going on and on about "the book discovery problem". Last month, a bunch of us, convened by Chris Kubica, sat in a room in Manhattan's Meatpacking district and plotted out how to make the perfect online bookstore. "The discovery problem" occupied a big part of the discussion. Last year, Perseus Books gathered a smattering of New York's nerdiest at "the first Publishing Hackathon". The theme of the event, the "killer problem": "book discovery". Not to be outdone, HarperCollins sponsored a "BookSmash Challenge" to find "new ways of reading and discovering books".  

Here's the typical framing of "the book discovery problem". "When I go to a bookstore, I frequently leave with all sort of books I never meant to get. I see a book on the table, pick it up and start reading, and I end up buying it. But that sort of serendipitous discovery doesn't happen at Amazon. How do we recreate that experience?" Or "There are so many books published, how do we match readers to the books they'd like best?"

This "problem" has always seemed a bit bogus to me. First of all, when I'm on the internet, I'm constantly running across interesting sounding books. There are usually links pointing me at Amazon, and occasionally I'll buy the book.

As a consumer, I don't find I have a problem with book discovery. I'm not compulsive about finding new books; I'm compulsive about finishing the book I've read half of. When I finish a book, it's usually two in the morning and I really want to get to sleep. I have big stacks both real and virtual of books on my to-read list.

Finally, the "discovery problem" is a tractable one from a tech point of view. Throw a lot of data and some machine learning at the problem and a good-enough solution should emerge. (I should note here that  book "discovery" on the website I run,, is terrible at the moment, but pretty soon it will be much better.)

So why on earth does Amazon, which throws huge amounts of money into capital investment, do such a middling job of book discovery?

Recently the obvious answer hit me in the face, as such answers are wont to do. The answer is that mediocre discovery creates a powerful business advantage for Amazon!

Consider the two most important discovery tools used on the Amazon website:
  1. People who bought X also bought y.
  2. Top seller lists.
Both of these methods have the property that the way to make these work for your book is for your book to sell a lot on Amazon. That means that any author or publisher that wants to sell a lot of books on Amazon will try to steer as many fans as possible to Amazon. More sales means more recommendations, which means more sales, and so on. Amazon is such a dominant bookseller that a bookstore could have the dreamiest features and pay the publisher a larger share of the retail selling price and still have the publisher try to push people to Amazon.

What happens in this sort of positive feedback system is pretty obvious to an electrical engineer like me, but Wikipedia's example of a cattle stampede makes a better picture.

The number of cattle running is proportional to the overall level of panic, which is proportional to...the number of cattle running! "Stampede loop" by Trevithj. CC BY-SAResult: Stampede! Yeah, OK, these are sheep. But you get the point. "Herdwick Stampede" by Andy Docker. CC BY.Imagine what would happen if Amazon shifted from sales-based recommendations to some sort of magic that matched a customer with the perfect book. Then instead of focusing effort on steering readers to Amazon, authors and publishers would concentrate on creating the perfect book. The rich would stop getting richer, and instead, reward would find the deserving.

Ain't never gonna happen. Stampedes sell more books.

Islandora: Islandora Camp CO Has a Logo!

planet code4lib - Fri, 2014-10-03 13:47

Islandora Camp is going to Denver, CO in just 10 days. When we get there, we will be very happy to welcome our Camp attendees with their official t-shirt, designed by UPEI's Donald Moses:

Don's iCampCO logo will join the ranks of our historical Camp Logos. Keep an eye out for our next Camp and yours could be next!

Open Knowledge Foundation: Streamlining the Local Groups network structure

planet code4lib - Fri, 2014-10-03 11:21

We are now a little over a year into the Local Groups scheme that was launched in early 2013. Since then we have been receiving hundreds of applications from great community members wanting to start Local Groups in their countries and become Ambassadors and community leaders. From this great body of amazing talent, Local Groups in over 50 countries have been established and frankly we’ve been overwhelmed with the interest that this program has received!

Over the course of this time we have learned a lot. Not only have we seen that open knowledge first and foremost develops locally and how global peer support is a great driver for making a change in local environments. We’re humbled and proud to be able to help facilitate the great work that is being done in all these countries.

We have also learned, however, of things in the application process and the general network structure that can be approved. After collecting feedback from the community earlier in the year, we learned that the structure of the network and the different labels (Local Group, Ambassador, Initiative and Chapter) were hard to comprehend, and also that the waiting time that applicants wanting to become Ambassadors and starting Local Groups were met with was a little bit frustrating. People applying are eager to get started, and of course having to wait weeks or even longer (because of the number of applications that came in) was obviously a little bit frustrating.

Presenting a more streamlined structure and way of getting involved

We have now thoroughly discussed the feedback with our great Local Groups community and as a result we are excited to present a more streamlined structure and a much easier way of getting involved. The updated structure is written up entirely on the Open Knowledge wiki, and includes the following major headlines:

1. Ambassador and Initiative level merge into “Local Groups”

As mentioned, applying to become an Ambassador and applying to set up an Initiative were the two kinds of entry-level ways to engage; “Ambassador” implying that the applicant was – to begin with – just one person, and “Initiative” being the way for an existing group to join the network. These were then jointly labelled “Local Groups”, which was – admittedly – a lot of labels to describe pretty much the same thing: People wanting to start a Local Group and collaborate. Therefore we are removing the Initiative label all together, and from now everyone will simply apply through one channel to start a Local Group. If you are just one person doing that (even though more people will join later) you are granted the opportunity to take the title of Ambassador. If you are a group applying collectively to start a Local Group, then everyone in that group can choose to take the title of Local Group Lead, which is a more shared way to lead a new group (as compared to an Ambassador). Applying still happens through a webform, which has been revamped to reflect these changes.

2. Local Group applications will be processed twice per year instead of on a rolling basis

All the hundreds of applications that have come in over the last year have been peer-reviewed by a volunteer committee of existing community members (and they have been doing a stellar job!). One of the other major things we’ve learned is the work pressure that the sheer number of applications put on this hard-working group simply wasn’t long term sustainable. That is why that we as of now will replace the rolling basis processing and review of applications in favor of two annual sprints in October and April. This may appear as if waiting time for applicants becomes even longer, but that is not the case! In fact, we are implementing a measure that ensures no waiting at all! Keep reading.

3. Introducing a new easy “get-started-right-away” entry level: “Local Organiser”

This is the new thing we are most excited to introduce! Seeing how setting up a formal Local Group takes time (regardless of how many applications come in), it was clear that we needed a way for people to get involved in the network right away, without having to wait for weeks and weeks on formalities and practicalities. This has lead to the new concept of “Local Organiser”:

Anyone can pick up this title immediately and start to organise Open Knowledge activities locally in their own name, but by calling themselves Local Organiser. This can include organising meetups, contributing on discussion lists, advocating the use of open knowledge, building community and gather more people to join – or any other relevant activity aligned with the values of Open Knowledge.

Local Organisers needs to register by setting up a profile page on the Open Knowledge wiki as well as filling this short form. Shortly thereafter the Local Organiser will then be greeted officially into the community with an email from the Open Knowledge Local Group Team containing a link to the Local Organiser Code of Conduct that the person automatically agrees to adhere to when he/she picks up the title.

Local Organisers use existing, public tools such as, Tumblr, Twitter etc. – but can also request Open Knowledge to set up a public discussion list for their country (if needed – otherwise they can also use other existing public discussion lists). Additionally, they can use the Open Knowledge wiki as a place to put information and organize as needed. Local Organisers are enrouraged to publicly document their activities on their Open Knowledge wiki profile in order to become eligible to apply to start an official Open Knowledge Local Group later down the road.

A rapidly growing global network

What about Chapters you might wonder? Their status remain unchanged and continue to be the expert level entity that Local Groups can apply to become when reaching a certain level of prowess.

All in all it’s fantastic to see how Open Knowledge folks are organising locally in all corners of the world. We look forward to continue supporting you all!

If you have any questions, ideas or comments, feel free to get in touch!

Mita Williams: The Knight Foundation News Challenge Entries That I Have Applauded

planet code4lib - Fri, 2014-10-03 02:55
The Knight News Challenge has been issued and it's about libraries:
How might we leverage libraries as a platform to build more knowledgeable communities? 

I'm reviewing these entries because I think some of them might prove useful in a paper I'm currently writing. There are some reoccurring themes to the entries that I think are quite telling.

Of the 680 entries, there's some wonderful ideas that need to be shared. Here are some of the proposals that I've applauded:

    For the purposes of my paper, I'm interested in the intersections of Open Data and Libraries. Here are the entries that touch on these two topics:

    And I would be remiss if I didn't tell you that I am also collaborating on this entry:

    OVER UNDER AROUND THROUGH: a national library-library game to build civic engagement skills: OVER UNDER AROUND THROUGH is kinda like a dance-off challenge: libraries challenge each other – but instead of “show us your moves” the challenge is “show us how you would take on” actual community challenges such as economic disparity and racial tensions

    In many ways, this Knight News Challenge is just such a dance-off.

    CrossRef: The Public Knowledge Project and CrossRef Collaborate to Improve Services for Publishers using Open Journal Systems

    planet code4lib - Fri, 2014-10-03 02:46

    2 October 2014, Lynnfield, MA, USA and Vancouver, BC, Canada---CrossRef and the Public Knowledge Project (PKP) are collaborating to help publishers and journals using the Open Journal Systems (OJS) platform take better advantage of CrossRef services.

    The collaboration involves an institutional arrangement between CrossRef and PKP, and new software features. Features include an improved CrossRef plugin for OJS that will automate Digital Object Identifier (DOI) deposits, as well as plans to create new tools for extracting references from submissions. To facilitate CrossRef membership, PKP has also become a CrossRef Sponsoring Entity, which will allow OJS-based publishers to join CrossRef through PKP.

    The latest release of OJS version 2.4.5 includes a new CrossRef plugin with improved support for CrossRef deposits, the process by which CrossRef member publishers can assign DOIs (persistent, actionable identifiers) to their content. A CrossRef deposit includes the bibliographic metadata about an article or other scholarly document, the current URL of the item, and the DOI. Publishers need only update the URL at CrossRef if the content's web address changes. The cited DOI will automatically direct readers to the current URL.

    OJS 2.4.5 includes several general improvements that benefit CrossRef members directly and indirectly. First, OJS now allows for automatic deposits to the CrossRef service - manually uploading data via CrossRef's web interface is no longer necessary. Second, users of the plugin will be able to deposit Open Researcher and Contributor Identifiers (ORCIDs), which OJS can now accept during the author registration and article submission processes.

    Additionally, this release also allows OJS publishers to more easily participate in the LOCKSS archiving service of their choice (including the forthcoming PKP PLN Service).

    Finally, this new release will serve as the foundation for further integration of other CrossRef features and services, such as the deposit of FundRef funding data, and the CrossMark publication record service.

    "The release of OJS 2.4.5 signals a new strategic direction for PKP in the provision of enhanced publishing services, such as the new CrossRef plugin," said Brian Owen, Managing Director of PKP. "Our collaboration with CrossRef has enabled us to move up the development of features that our publishers have been asking for. The partnership doesn't end here, either. We're looking forward to supporting publishers more directly now that we are a Sponsoring Entity and to jointly develop tools that will make it easier for publishers to comply with CrossRef's outbound reference linking requirements."

    CrossRef Executive Director Ed Pentz noted, "The profile of CrossRef's member publishers has changed significantly over the years. We are growing by hundreds of members each year. Many of these publishers are small institution-based journals from around the world. And many are hosted by the open source OJS software. It has been challenging for some of these organizations to meet our membership obligations like outbound reference linking and arranging for long-term archiving. And many have not been able to participate in newer services, because they require the ability to deposit additional metadata. We want all of our publishers to have a level playing field, regardless of their size. Our cooperation with PKP will help make that happen."

    Journals and publishers that use OJS and that already have established a direct relationship with CrossRef, or those that have an interest in becoming members through PKP, may take advantage of the enhanced features in the new CrossRef plugin by upgrading to OJS 2.4.5. And starting now, eligible journals can apply for a PKP-sponsored CrossRef membership for free DOI support. See PKP's CrossRef page for more information.

    About PKP
    The Public Knowledge Project was established in 1998 at the University of British Columbia. Since that time PKP has expanded and evolved into an international and virtual operation with two institutional anchors at Stanford University ( and Simon Fraser University Library ( ). OJS is open source software made freely available to journals worldwide for the purpose of making open access publishing a viable option for more journals, as open access can increase a journal's readership as well as its contribution to the public good on a global scale. More information about PKP and its software and services is available at

    About CrossRef

    CrossRef ( serves as a digital hub for the scholarly communications community. A global not-for profit membership organization of scholarly publishers, CrossRef's innovations shape the future of scholarly communications by fostering collaboration among multiple stakeholders. CrossRef provides a wide spectrum of services for identifying, locating, linking to, and assessing the reliability and provenance of scholarly content.

    James MacGregor, PKP

    Carol Anne Meyer, CrossRef
    Phone +1 781-295-0072 x23

    DuraSpace News: Announcing the Release of the 2015 National Agenda For Digital Stewardship

    planet code4lib - Fri, 2014-10-03 00:00

    Washington, DC  The 2015 National Agenda for Digital Stewardship has been released!

    You can download a copy of the Executive Summary and Full Report here: 

    DuraSpace News: Webinar Recording Available

    planet code4lib - Fri, 2014-10-03 00:00

    Winchester, MA  The 8th DuraSpace Hot Topics Community Webinar Series, “Doing It: How Non-ARL Institutions are Managing Digital Collections” began October 2, 2014.  The first webinar in this series curated by Liz Bishoff, “Research Results on Non-ARL Academic Libraries Managing Digital Collections,” provided an overview of the methodology and key questions and findings of the Managing Digital Collections Survey of non-ARL academic libraries.  Participants also had the opportunity to share how

    PeerLibrary: Weekly PeerLibrary meeting finalizing our Knight News Challenge...

    planet code4lib - Thu, 2014-10-02 23:24

    Weekly PeerLibrary meeting finalizing our Knight News Challenge submission.

    Cynthia Ng: Access 2014: Closing Keynote Productivity and Collaboration in the Age of Digital Distraction

    planet code4lib - Thu, 2014-10-02 18:35
    The closing keynote for Access 2014. He spoke really fast, so apologies if I missed a couple of points. Presented by Jesse Brown Digital Media Expert, Futurist, Broadcast Journalist Background Bitstrips: To make fun cartoons. Co-founder. CBC show Podcast: Canadaland, broader view of media, in a global sense to what’s happening to society and culture. Technology Changing […]

    Evergreen ILS: Evergreen 2.7.0 has been released!

    planet code4lib - Thu, 2014-10-02 17:42

    Evergreen 2.7.0 has been released!

    Small delay in announcing, but here we go…

    Cheers and many thanks to everyone who helped to make Evergreen 2.7.0 a reality, our first official release of the 2.7 series! After six months of hard work with development, bug reports, testing, and documentation efforts, the 2.7.0 files are available on the Evergreen website’s downloads page:

    So what’s new in Evergreen 2.7? You can see the full release notes here: To briefly summarize though, there were contributions made for both code and documentation by numerous individuals. A special welcome and acknowledgement to all our first-time contributors, thanks for your contributions to Evergreen!

    Some caveats now… currently Evergreen 2.7.0 requires the use of the latest OpenSRF 2.4 series, which is still in its alpha release (beta coming soon). As folks help to test the OpenSRF release, this will no doubt help to make Evergreen 2.7 series better. Also, for localization/i18n efforts, there was some last minute bug finding and we plan to release updated translation files in the next maintenance release 2.7.1 for the 2.7 series.

    Evergreen 2.7.0 includes a preview of the new web-based staff client code. The instructions for setting this up are being finalized by the community and should be expected for release during the next maintenance version 2.7.1 later in October.

    See here for some direct links to the various files so far:

    Once again, a huge thanks to everyone in the community who has participated this cycle to contribute new code, test and sign-off on features, and work on new documentation and other ongoing development efforts.


    — Ben

    Cynthia Ng: Access 2014: Day 3 Notes

    planet code4lib - Thu, 2014-10-02 17:09
    Final half day of Access 2014. The last stretch. ## RDF and Discovery in the Real World(cat) Karen Coombs, Senior Product Analyst, WorldShare Platform The Web of Data: Things, Not Strings Way for search engine to be the most relevant e.g. May 2012: Google Knowledge Graph provides more knowledge in search results. Traditionally in bibliographic description […]


    Subscribe to code4lib aggregator