You are here

Feed aggregator

DuraSpace News: VIVO Data from Mars: LASP, VIVO, and MAVEN

planet code4lib - Mon, 2014-10-06 00:00

Above are four of the first images taken by the IUVS instrument. IUVS obtained these false-color images about eight hours after the successful completion of MAVEN’s Mars orbital insertion maneuver on September 21, 2014.

From Michael Cox, Laboratory of Atmospheric and Space Physics, University of Colorado at Boulder

Ed Summers: Sign o’ the Times

planet code4lib - Sun, 2014-10-05 17:42



a sign, a metaphor by Casey Bisson.

An old acquaintance took this photo in Coaldale, Nevada. I had to have a copy for myself.

Patrick Hochstenbach: Homework assignment #2 Sketchbookskool

planet code4lib - Sun, 2014-10-05 11:34
Filed under: Comics Tagged: cartoon, cat, comics, copic, manual, sketchskool, watercolor

Riley Childs: Test Post

planet code4lib - Sat, 2014-10-04 21:05

This is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutes

The post Test Post appeared first on Riley's blog at https://rileychilds.net.

Patrick Hochstenbach: Homework assignment #1 Sketchbookskool

planet code4lib - Sat, 2014-10-04 08:39
I enrolled in Sketchbook Skool. As first homework assignment we were asked to draw a recipe Filed under: Comics Tagged: cat, copic, recipe, sketchbookskool, watercolor

LITA: A Tested* Approach to Leveling Up

planet code4lib - Sat, 2014-10-04 00:04

*Unscientifically, by a person from the internet.

If you’re a LITA member, then you’re probably very skilled in a few technical areas, and know just enough to be dangerous in several other areas. The later can be a liability if you’ve just been volunteered to implement the Great New Tech Thing at your library. Do it right, and you just might be recognized for your ingenuity and hard work (finally!). Do it wrong, and you’ll end up in the pillory (again!).

Maybe the Great New Tech Thing requires you to learn a new programming or markup language. Perhaps you’re looking to expand on your skills–and resume–by adding a language. For many years, the library associations and schools have emphasized tech skills as an essential component of librarianship. The reasons are plentiful, and the means are easier that you might think. With a library card, a few free, open source software tools, and some time, you can level up your tech skills by learning a new language.

I humbly suggest the following approach to leveling up, which has worked for me.

What you’ll need

A computer. A Windows, OS X, or Linux laptop or desktop computer will suffice.

Resources. Online programming “schools”, such as Codeacademy and Code School are a great concept and work for some people, but I’ve personally found them to provide an incomplete education. The UI demands brevity, and therefore many of the explanations and instructions require a certain level of knowledge about coding in general that most beginners lack. I have found good ol’ fashioned books to be a better resource. Find titles that have exercises, and you’ll learn by doing. Actually building something practical makes the process enjoyable. The Visual Quickstart Guide series by Peachpit Press and the Head First series by O’Reilly usually teach through practical examples.

Books are a great source of knowledge, but so are your fellow coders. Most languages have a community with an online presence, and it would be a good idea to find those forums and bookmark them. But if you were to bookmark only one forum, it should be the Stack Overflow forum for the language you’re learning.

Some languages also have official documentation online, for example, php.net and python.org.

Time. Carve out time wherever you can. If you take public transportation to work, use that time (if you can find a seat). Learn during your lunch break. Give up a season of your favorite TV show (you can always catch up later in a weekend binge-watch when the DVDs hit your library shelves).

Where to start

Here and now. Maybe you’re reading this because you’ve just been tapped to implement the Great New Tech Thing at your library. Or maybe you’re considering adding a skill to your resume. Whatever the reason, there’s no time like the present.

Leveling up for professional development affords you greater flexibility. Start with a language your friends know–they will be an invaluable resource if you get stuck along the way. Also, consider starting with a simple language that you can build upon. If you already know HTML, then PHP and JavaScript are natural progressions, and they open the door to object-oriented languages like C++, Java, or Python. Finally, make sure there’s a viable–if not growing–community around the language you want to learn. Not only does this give a sense of the language’s future and staying-power, the community can also provide support through online forums, conferences and meetups, etc.

If you’re new to programming languages, I hope this approach helps. If you’re a veteran coder, please share your learning approach in the comments.

HangingTogether: Jyaa mata Seki-san – farewell to our OCLC Research Fellow from Japan

planet code4lib - Fri, 2014-10-03 23:48

Hideyuki Seki and Jim Michalko on the OCLC headquarters campus

We are about to say goodbye to Hideyuki Seki, our current OCLC Research Fellow from Japan. The Manager of the Media Center (a designation for all the libraries) at Keio University in Tokyo, Seki-san has been with us for the last two weeks spending time in both the San Mateo and Dublin offices. His time with us was structured so that he would learn enough about our work and goals that he could informally but effectively represent OCLC Research and the OCLC Research Library Partnership to his colleagues at Keio and to his peers in the Japanese research library community.

Seki-san arrived with a particular set of interests he hoped to explore during his brief residency with us. He wanted to know more about:

1) Invigoration of cooperation among research libraries in Japan

There is not a strong history of collaborative projects among Japanese research libraries and he wanted to see if the strong commitment to collaboration here had lessons that would be useful in building communities of interest and practice in his country.

2) Advancing Keio University’s research impact and reputation

Connecting the Keio Media Center’s activities to the research being done at the University in ways that enrich it and increase its impact is a particular challenge shared with many US university libraries. He wanted to see the range of responses that are emerging here and consider them in light of the culture of Japanese universities.

3) Future of the digital repository

This interest is connected to research reputation and support issues as well as concerns about digital surrogates for preservation and access. He wondered about the US view of impact and sustainability.

He was also curious about the way OCLC Research operates, how it supports the Partnership as well as the OCLC cooperative.

We structured a program for him and included him in our annual all-staff face-to-face planning meetings at headquarters in Dublin, Ohio. It was certainly a challenge for him to be immersed in our idiosyncratic vocabulary, our bundles of acronyms and the flood of idiomatic English we consistently let flow. We reminded ourselves of the difficulties we might be causing to his understanding but seemed powerless to temporarily amend our ways. Instead we relied on him to rise to the challenge. He did.

We benefited from his presence in a number of ways. Explaining why we were giving attention to certain topics occasionally challenged us to reconsider. The extent to which the Japanese publishing industry has maintained a library service landscape still tied to print was revelatory for most of us. (The publishers have been slow to offer e-journals given that their market is captive by language and they are addicted to their high margins.) The management regime within the administrative echelons of Japanese universities also dictates the pace of change and progress. Managers are routinely re-assigned on a regular schedule to new responsibility areas at just about the time that they know enough to implement new directions.

And it was great fun seeing our Bay Area and mid-Ohio tourist sites through his eyes. Everybody learned a lot. We trust he’ll judge it worth the journey. We’ve already decided it was worth our effort.

About Jim Michalko

Jim coordinates the OCLC Research office in San Mateo, CA, focuses on relationships with research libraries and work that renovates the library value proposition in the current information environment.

Mail | Web | Twitter | LinkedIn | Google+ | More Posts (103)

District Dispatch: Librarians won’t stay quiet about surveillance

planet code4lib - Fri, 2014-10-03 21:07

Photo by KPBS

The Washington Post highlighted the library community’s efforts to protect the public from government intrusion or censorship in “Librarians won’t stay quiet about government surveillance,” a feature article published today. It has been a longstanding belief in the library community that the possibility of surveillance—whether directly or through access to records of speech, research and exploration—undermines a democratic society.

Washington Post writer Andrea Peterson states:

In September 2003, Attorney General John Ashcroft called out the librarians. The American Library Association and civil liberties groups, he said, were pushing “baseless hysteria” about the controversial Patriot Act. He suggested that they were worried that spy agencies wanted to know “how far you have gotten on the latest Tom Clancy novel.”

In the case of government surveillance, they are not shushing. They’ve been among the loudest voices urging freedom of information and privacy protections.

Edward Snowden’s campaign against the National Security Agency’s data collection program has energized this group once again. And a new call to action from the ALA’s president means their voices could be louder and more coordinated than ever.

Read more

The post Librarians won’t stay quiet about surveillance appeared first on District Dispatch.

Andromeda Yelton: how the next meeting went

planet code4lib - Fri, 2014-10-03 19:56

In June, I said that I wouldn’t be voting to approve the LITA budget, due to a variety of unaddressed concerns. At Annual, Cindi Blyberg ran an awesome meeting where we put off the vote, to give our Financial Advisory Committee time to update things before the fiscal year close. And then I forgot to update the blog about the subsequent meeting!

Well, the FAC did outstanding work, pulling together a budget that I solidly believe in, on very little notice. (I owe you all beers. A lot of beers. Zoe Stewart-Marshall, Andrew Pace, Susan Sharpless Smith: please be thinking what kind of beverages you like.) We approved it in August with very little fuss. I was pleased to vote for it.

You can have a look yourself if you like: FY2015 budget [.xlsx].

You’ll notice the bottom line here is a deficit. I’m bummed about that, because I’m concerned about LITA’s health, but mostly I’m happy about it, because it’s honest. I think the lines above it are credible estimates of what we’ll end up doing in FY ’15, and I would a million times rather be honest about the challenges of that, so we can plan for them, than sweep them under the rug.

And now the FAC is off and running on the FY ’16 budget, with a timeline that should allow for much more deliberation at leisure than FY ’15 allowed. (Note to self: you owe them even more beers.) And Board is pondering how to fill the holes. Your ideas, as always, are welcome.

CrossRef: CrossRef Indicators

planet code4lib - Fri, 2014-10-03 19:06

Last update September 29, 2014

Total no. participating publishers & societies 5369
Total no. voting members 2626
% of non-profit publishers 57%
Total no. participating libraries 1903
No. journals covered 36,144
No. DOIs registered to date 69,632,826
No. DOIs deposited in previous month 582,561
No. DOIs retrieved (matched references) in previous month 35,125,120
DOI resolutions (end-user clicks) in previous month 79,193,741

CrossRef: New CrossRef Members

planet code4lib - Fri, 2014-10-03 19:05

Updated September 29, 2014

Voting Members
Agora University in Oradea
Classical Association of South Africa
Hind Agri Horticultural Society
International Journal of Integrated Health Sciences (IJIHS)
Jurnal Anestesi Perioperatif (JAP)
Pediatric Neurology Briefs Publishers
Towarzystwo Naukowe W Toruniu

Sponsored Members
IJNC Editorial Committee
Japanese Association of Cardioangioscopy
Lithuanian University of Educational Sciences
The Operations Research Society of Japan

Represented Members
Association for Korea Public Administration History
Educational Research Institute, College of Education, Ewha Womans University
English and American Cultural Studies
Kamchatka Research Institute of Fisheries and Oceanography
Korea Association of Teachers of English
Korea Productivity Association
Korea Society for Philosophy East-West
Korean Association for Educational Information and Media
Korean Association of Social Welfare Policy
Korean Clinical Psychology Association
Korean Generative Grammar Circle
Korean Society of Exercise Physiology
Korean Society of Special Education
Online Journal of Analytic Combinatorics
Publishing Centre Naukovedenie
Society for Historical Studies of Ancient and Medieval China
Yeol-song Society of Classical Studies

Last updated September 23, 2014

Voting Members
Brazilian Journal of Internal Medicine
Brazilian Journal of Irrigation and Drainage - IRRIGA
Djokosoetono Research Center
EDIPUCRS
Education Association of South Africa
Feminist Studies
Laboreal, FPCE, Universidade do Porto
Libronet Bilgi Hizmetleri ve Yazilim San. Tic. Ltd., Sti.
Open Access Text Pvt, Ltd.
Pontifical University of John Paul II in Krakow
Revista Brasileira de Quiropraxia - Brazilian Journal of Chiropractic
Scientific Online Publishing, Co. Ltd.
Symposium Books, Ltd.
Turkiye Yesilay Cemiyeti
Uniwersytet Ekonomiczny w Krakowie - Krakow University of Economics
Volgograd State University

Sponsored Members

IJNC Editorial Committee
Japanese Association of Cardioangioscopy
Lithuanian Universtity of Educational Sciences
The Operations Research Society of Japan

Represented Members

Acta Medica Anatolia
Ankara University Faculty of Agriculture
CNT Nanostroitelstvo
Dnipropetrovsk National University of Railway Transport
English Language and Literature Association of Korea
Institute for Humanities and Social Sciences
Institute of Korean Independence Movement Studies
Journal of Chinese Language and Literature
Journal of Korean Linguistics
Knowledge Management Society of Korea
Korea Association for International Commerce and Information
Korea Research Institute for Human Settlements
Korean Academic Society for Public Relations
Korean Marketing Association
Korean Society for Art History
Korean Society for the Study of Physical Education
Korean Society of Consumer Policy and Education
Law Research Institute, University of Seoul
Research Institute Centerprogamsystem, JSC
Research Institute of Science Education, Pusan National University
Research Institute of Social Science
Silicea - Poligraf, LLC
SPb RAACI
The Altaic Society of Korea
The Hallym Academy of Sciences
The Korean Association of Ethics
The Korean Association of Translation Studies
The Korean Society for Culture and Arts Education Studies
The Korean Society for Feminist Studies in English Literature
The Korean Society for Investigative Cosmetology
The Regional Association of Architectural Institute of Korea
The Society for Korean Language and Literary Research
Ural Federal University
V.I. Shimakov Federal Research Center of Transplantology and Artificial Organs
World Journal of Traditional Chinese Medicine
Yonsei Institute for North Korean Studies

Jonathan Rochkind: Non-digested asset names in Rails 4: Your Options

planet code4lib - Fri, 2014-10-03 16:48

Rails 4 removes the ability to produce non-digest-named assets in addition to digest-named-assets. (ie ‘application.js’ in addition to ‘application-810e09b66b226e9982f63c48d8b7b366.css’).

There are a variety of ways to work around this by extending asset compilation. After researching and considering them all, I chose to use a custom Rake task that uses the sprockets manifest.json file. In this post, I’ll explain the situation and the options.

The Background

The Rails asset pipeline, powered by sprockets, compiles (sass, coffeescript, others), aggregates (combines multiple source files into one file for performance purposes), and post-processes (minimization, gzip’ing) your assets.

It produces assets to be delivered to the client that are fingerprinted with a digest hash based on the contents of the file — such as ‘application-810e09b66b226e9982f63c48d8b7b366.css’.  People (and configuration) often refer to this filename-fingerprinting as “digested assets”.

The benefit of this is that because the asset filenames are guaranteed to change if their content changes, the individual files can be cached indefinitely, which is great. (You still probably need to adjust your web server configuration to take advantage of this, which you may not be doing).

In Rails3, a ‘straight’ named copy of the assets (eg `application.js`) were also produced, alongside the fingerprinted digest-named assets.

Rails4 stopped doing this by default, and also took away any ability to do this even as a configurable option. While I can’t find the thread now, I recall seeing discussion that in Rails3, the production of non-digest-named assets was accomplished through actually asking sprockets to compile everything twice, which made asset compilation take roughly twice as long as it should.   Which is indeed a problem.

Rather than looking to fix Sprockets api to make it possible to compile the file once but simply write it twice, Rails devs decided there was no need for the straight-named files at all, and simply removed the feature.

Why would you need straight-named assets?

Extensive and combative discussion on this feature change occurred in sprockets-rails issue #49.

The title of this issue reveals one reason people wanted the non-digest-named assets: “breaks compatibility with bad gems”.   This mainly applies to gems that supply javascript, which may need to generate links to assets, and not be produced to look up the current digest-named URLs.  It’s really about javascript, not ‘gems’, it can apply to javascript you’ve included without gemifying it too.

The Rails devs expression opinion on this issue believed (at least initially) that these ‘bad gems’ should simply be fixed, accomodating them was the wrong thing to do, as it eliminates the ability to cache-forever assets they refer to.

I think they under-estimate the amount of work it can take to fix these ‘bad’ JS dependencies, which often are included through multi-level dependency trees (requiring getting patches accepted by multiple upstreams) — and also basically requires wrapping all JS assets in rubygems that apply sprockets/rails-specific patches on top, instead of, say, just using bower.

I think there’s a good argument for accommodating JS assets which the community has not yet had the time/resources to make respect the sprockets fingerprinting. Still, it is definitely preferable, and always at least theoretically possible, to make all your JS respect sprockets asset fingerprinting — and in most of my apps, I’ve done that.

But there’s other use cases: like mine!

I have an application that needs to offer a Javascript file at a particular stable URL, as part of it’s API — think JS “widgets”.

I want it to go through the asset pipeline, for source control, release management, aggregation, SASS, minimization, etc. The suggestion to just “put it in /public as a static asset” is no good at all. But I need the current version available at a persistent  URL.

Rails 3, this Just Worked, since the asset pipeline created a non-digested name. In Rails 4, we need a workaround.  I don’t need every asset to have a non-digest-named version, but I do need a whitelist of a few that are part of my public API.

I think this is a pretty legitimate use case, and not one that can be solved by ‘fixing bad gems’. I have no idea if Rails devs recognize it or not.

(It’s been suggested that HTML emails linking to CSS stylesheets (or JS?) is another use case. I haven’t done that and don’t understand it well enough to comment. Oh, and other people want em for their static 500 error pages.)

Possible Workaround Options

So that giant Github Issue thread? At first it looks like just one of those annoying ones with continual argument by uninformed people that will never die, and eventually @rafaelfranca locked it. But it’s also got a bunch of comments with people offering their solutions, and is the best aggregation of possible workarounds to consider — I’m glad it wasn’t locked sooner. Another example of how GitHub qualitatively improves open source development — finding this stuff on a listserv would have been a lot harder.

The Basic Rake Task

Early in the thread, Rails core team member @guilleiguaran suggested a Rake task, which simply looks in the file system for fingerprinted assets and copies them over to the un-digest-named version. Rails core team member @rafaelfranca later endorsed this approach too. 

The problem is it won’t work. I’ve got nothing against a rake task solution. It’s easy to wire things up so your new rake task automatically gets called every time after `rake assets:precompile’, no problem!

The problem is that a deployed Rails app may have multiple fingerprinted versions of a particular asset file around, representing multiple releases. And really you should set things up this way —  because right after you do a release, there may be cached copies of HTML (in browser caches, or proxying caches including a CDN) still around, still referencing the old version with the old digest fingerprint. You’ve got to keep it around for a while.

(How long? Depends on the cache headers on the HTML that might reference it. The fact that sprockets only supports keeping around a certain number of releases, and not releases made within a certain time window, is a different discussion. But, yeah, you need to keep around some old versions).

So it’s unpredictable which of the several versions you’ve got hanging around the rake task is going to copy to the non-digest-named version, there’s no guarantee it’ll be the latest one. (Maybe it depends on their lexographic sort?). That’s no good.

Enhance the core-team-suggested rake task?

Before I realized this problem, I had already spent some time trying to implement the basic rake task, add a whitelist parameter, etc. So I tried to keep going with it after realizing this problem.

I figured, okay, there are multiple versions of the asset around, but sprockets and rails have to know which one is the current one (to serve it to the current application), so I must be able to use sprockets ruby API in the rake task to figure it out and copy that one.

  • It was kind of challenging to figure out how to get sprockets to do this, but eventually it was sort of working.
  • Except i started to get worried that I might be triggering the double-compilation that Rails3 did, which I didn’t want to do, and got confused about even figuring out if I was doing it.
  • And I wasn’t really sure if I was using sprockets API meant to be public or internal. It didn’t seem to be clearly documented, and sprockets and sprockets-rails have been pretty churny, I thought I was taking a significant risk of it breaking in future sprockets/rails version(s) and needing continual maintenance.

Verdict: Nope, not so simple, even though it seems to be the rails-core-endorsed solution. 

Monkey-patch sprockets: non-stupid-digest-assets

Okay, so maybe we need to monkey-patch sprockets I figured.

@alexspeller provides a gem to monkey-patch Sprockets to support non-digested-asset creation, the unfortunately combatively named non-stupid-digest-assets.

If someone else has already figured it out and packaged it in a gem, great! Maybe they’ll even take on the maintenance burden of keeping it working with churny sprockets updates!

But non-stupid-digest-assets just takes the same kind logic from that basic rake task, another pass through all the assets post-compilation, but implements it with a sprockets monkeypatch instead of a rake task. It does add a white list.  I can’t quite figure out if it’s still subject to the same might-end-up-with-older-version-of-asset problem.

There’s really no benefit just to using a monkey patch instead of a rake task doing the same thing, and it has increased risk of breaking with new Rails releases. Some have already reported it not working with the Rails 4.2.betas — I haven’t investigated myself to see what’s up with that, and @alexspeller doesn’t seem to be in any hurry to either.

Verdict: Nope. non-stupid-digest-assets ain’t as smart as it thinks it is. 

Monkey-patch sprockets: The right way?

If you’re going to monkey-patch sprockets and take on forwards-compat risk, why not actually do it right, and make sprockets simply write the compiled file to two different file locations (and/or use symlinks) at the point of compilation?

@ryana  suggested such code. I’m not sure how tested it is, and I’d want to add the whitelist feature.

At this point, I was too scared of the forwards-compatibility-maintenance risks of monkey patching sprockets, and realized there was another solution I liked better…

Verdict: It’s the right way to do it, but carries some forwards-compat maintenance risk as an unsupported monkey patch

Use the Manifest, Luke, erm, Rake!

I had tried and given up on using the sprockets ruby api to determine ‘current digest-named asset’.  But as I was going back and reading through the Monster Issue looking for ideas again, I noticed @drojas suggested using the manifest.json file that sprockets creates, in a rake task.

Yep, this is where sprockets actually stores info on the current digest-named-assets. Forget the sprockets ruby api, we can just get it from there, and make sure we’re making a copy (or symlinking) the current digested version to the non-digested name.

But are we still using private api that may carry maintenance risk with future sprockets versions?  Hey, look, in a source code comment Sprockets tells us “The JSON is part of the public API and should be considered stable.” Sweet!

Now, even if sprockets devs  remember one of them once said this was public API (I hope this blog post helps), and even if sprockets is committed to semantic versioning, that still doesn’t mean it can never change. In fact, the way some of rubydom treats semver, it doesn’t even mean it can’t change soon and frequently; it just means they’ve got to update the sprockets major version number when it changes. Hey, at least that’d be a clue.

But note that changes can happen in between Rails major releases. Rails 4.1 uses sprockets-rails 2.x which uses sprockets 2.x. Rails 4.2 — no Rails major version number change — will use sprockets-rails 3.x which, oh, still uses sprockets 2.x, but clearly there’s no commitment on Rails not to change sprockets-rails/sprockets major versions without a Rails major version change.

Anyway, what can you do, you pays your money and you takes your chances. This solution seems pretty good to me.

Here’s my rake task, just a couple dozen lines of code, no problem.

 Verdict: Pretty decent option, best of our current choices

The Redirect

One more option is using a redirect to take requests for the non-digest-named asset, and redirect it to the current digest-named asset.

@Intrepidd suggests using rack middleware to do that.   I think it would also work to just use a Rails route redirect, with lambda. (I’m kind of allergic to middleware.) Same difference either way as far as what your app is doing.

I didn’t really notice this one until I had settled on The Manifest.  It requires two HTTP requests every time a client wants the asset at the persistent URL though. The first one will touch your app and needs short cache time, that will then redirect to the digest-named asset that will be served directly by the web server and can be cached forever. I’m not really sure if the performance implications are significant, probably depends on your use cases and request volume. @will-r suggests it won’t work well with CDN’s though. 

Verdict: Meh, maybe, I dunno, but it doesn’t feel right to introduce the extra latency

The Future

@rafaelfranca says Rails core has changed their mind and are going to deal with “this issue” “in some way”. Although I don’t think it made it into Rails 4.2 after all.

But what’s “this issue” exactly? I dunno, they are not sharing what they see as the legitimate use cases to handle, and requirements on legitimate ways to handle em.

I kinda suspect they might just be dealing with the “non-Rails JS that needs to know asset URLs” issue, and considering some complicated way to automatically make it use digest-named assets without having to repackage it for Rails.  Which might be a useful feature, although also a complicated enough one to have some bug risks (ah, the story of the asset pipeline).

And it’s not what I need, anyway, there are other uses cases than the “non-Rails JS” one that need non-digest-named assets.

I just need sprockets to produce parallel non-digested asset filenames for certain whitelisted assets. That really is the right way to handle it for my use case. Yes, it means you need to know the implications and how to use cache headers responsibly. If you don’t give me enough rope to hang myself, I don’t have enough rope to climb the rock face either. I thought Rails target audience was people who know what they’re doing?

It doesn’t seem like this would be a difficult feature for sprockets to implement (without double compilation!).  @ryana’s monkeypatch seems like pretty simple code that is most of the way there.  It’s the feature what I need.

I considered making a pull request to sprockets (the first step, then probably sprockets-rails, needs to support passing on the config settings).  But you know what, I don’t have the time or psychic energy to get in an argument about it in a PR; the Rails/sprockets devs seem opposed to this feature for some reason.  Heck, I just spent hours figuring out how to make my app work now, and writing it all up for you instead!

But, yeah, just add that feature to sprockets, pretty please.

So, if you’re reading this post in the future, maybe things will have changed, I dunno.


Filed under: General

Jonathan Rochkind: Non-digested asset names in Rails 4: Your Options

planet code4lib - Fri, 2014-10-03 16:48

Rails 4 removes the ability to produce non-digest-named assets in addition to digest-named-assets. (ie ‘application.js’ in addition to ‘application-810e09b66b226e9982f63c48d8b7b366.css’).

There are a variety of ways to work around this by extending asset compilation. After researching and considering them all, I chose to use a custom Rake task that uses the sprockets manifest.json file. In this post, I’ll explain the situation and the options.

The Background

The Rails asset pipeline, powered by sprockets, compiles (sass, coffeescript, others), aggregates (combines multiple source files into one file for performance purposes), and post-processes (minimization, gzip’ing) your assets.

It produces assets to be delivered to the client that are fingerprinted with a digest hash based on the contents of the file — such as ‘application-810e09b66b226e9982f63c48d8b7b366.css’.  People (and configuration) often refer to this filename-fingerprinting as “digested assets”.

The benefit of this is that because the asset filenames are guaranteed to change if their content changes, the individual files can be cached indefinitely, which is great. (You still probably need to adjust your web server configuration to take advantage of this, which you may not be doing).

In Rails3, a ‘straight’ named copy of the assets (eg `application.js`) were also produced, alongside the fingerprinted digest-named assets.

Rails4 stopped doing this by default, and also took away any ability to do this even as a configurable option. While I can’t find the thread now, I recall seeing discussion that in Rails3, the production of non-digest-named assets was accomplished through actually asking sprockets to compile everything twice, which made asset compilation take roughly twice as long as it should.   Which is indeed a problem.

Rather than looking to fix Sprockets api to make it possible to compile the file once but simply write it twice, Rails devs decided there was no need for the straight-named files at all, and simply removed the feature.

Why would you need straight-named assets?

Extensive and combative discussion on this feature change occurred in sprockets-rails issue #49.

The title of this issue reveals one reason people wanted the non-digest-named assets: “breaks compatibility with bad gems”.   This mainly applies to gems that supply javascript, which may need to generate links to assets, and not be produced to look up the current digest-named URLs.  It’s really about javascript, not ‘gems’, it can apply to javascript you’ve included without gemifying it too.

The Rails devs expression opinion on this issue believed (at least initially) that these ‘bad gems’ should simply be fixed, accomodating them was the wrong thing to do, as it eliminates the ability to cache-forever assets they refer to.

I think they under-estimate the amount of work it can take to fix these ‘bad’ JS dependencies, which often are included through multi-level dependency trees (requiring getting patches accepted by multiple upstreams) — and also basically requires wrapping all JS assets in rubygems that apply sprockets/rails-specific patches on top, instead of, say, just using bower.

I think there’s a good argument for accommodating JS assets which the community has not yet had the time/resources to make respect the sprockets fingerprinting. Still, it is definitely preferable, and always at least theoretically possible, to make all your JS respect sprockets asset fingerprinting — and in most of my apps, I’ve done that.

But there’s other use cases: like mine!

I have an application that needs to offer a Javascript file at a particular stable URL, as part of it’s API — think JS “widgets”.

I want it to go through the asset pipeline, for source control, release management, aggregation, SASS, minimization, etc. The suggestion to just “put it in /public as a static asset” is no good at all. But I need the current version available at a persistent  URL.

Rails 3, this Just Worked, since the asset pipeline created a non-digested name. In Rails 4, we need a workaround.  I don’t need every asset to have a non-digest-named version, but I do need a whitelist of a few that are part of my public API.

I think this is a pretty legitimate use case, and not one that can be solved by ‘fixing bad gems’. I have no idea if Rails devs recognize it or not.

(It’s been suggested that HTML emails linking to CSS stylesheets (or JS?) is another use case. I haven’t done that and don’t understand it well enough to comment. Oh, and other people want em for their static 500 error pages.)

Possible Workaround Options

So that giant Github Issue thread? At first it looks like just one of those annoying ones with continual argument by uninformed people that will never die, and eventually @rafaelfranca locked it. But it’s also got a bunch of comments with people offering their solutions, and is the best aggregation of possible workarounds to consider — I’m glad it wasn’t locked sooner. Another example of how GitHub qualitatively improves open source development — finding this stuff on a listserv would have been a lot harder.

The Basic Rake Task

Early in the thread, Rails core team member @guilleiguaran suggested a Rake task, which simply looks in the file system for fingerprinted assets and copies them over to the un-digest-named version. Rails core team member @rafaelfranca later endorsed this approach too. 

The problem is it won’t work. I’ve got nothing against a rake task solution. It’s easy to wire things up so your new rake task automatically gets called every time after `rake assets:precompile’, no problem!

The problem is that a deployed Rails app may have multiple fingerprinted versions of a particular asset file around, representing multiple releases. And really you should set things up this way —  because right after you do a release, there may be cached copies of HTML (in browser caches, or proxying caches including a CDN) still around, still referencing the old version with the old digest fingerprint. You’ve got to keep it around for a while.

(How long? Depends on the cache headers on the HTML that might reference it. The fact that sprockets only supports keeping around a certain number of releases, and not releases made within a certain time window, is a different discussion. But, yeah, you need to keep around some old versions).

So it’s unpredictable which of the several versions you’ve got hanging around the rake task is going to copy to the non-digest-named version, there’s no guarantee it’ll be the latest one. (Maybe it depends on their lexographic sort?). That’s no good.

Enhance the core-team-suggested rake task?

Before I realized this problem, I had already spent some time trying to implement the basic rake task, add a whitelist parameter, etc. So I tried to keep going with it after realizing this problem.

I figured, okay, there are multiple versions of the asset around, but sprockets and rails have to know which one is the current one (to serve it to the current application), so I must be able to use sprockets ruby API in the rake task to figure it out and copy that one.

  • It was kind of challenging to figure out how to get sprockets to do this, but eventually it was sort of working.
  • Except i started to get worried that I might be triggering the double-compilation that Rails3 did, which I didn’t want to do, and got confused about even figuring out if I was doing it.
  • And I wasn’t really sure if I was using sprockets API meant to be public or internal. It didn’t seem to be clearly documented, and sprockets and sprockets-rails have been pretty churny, I thought I was taking a significant risk of it breaking in future sprockets/rails version(s) and needing continual maintenance.

Verdict: Nope, not so simple, even though it seems to be the rails-core-endorsed solution. 

Monkey-patch sprockets: non-stupid-digest-assets

Okay, so maybe we need to monkey-patch sprockets I figured.

@alexspeller provides a gem to monkey-patch Sprockets to support non-digested-asset creation, the unfortunately combatively named non-stupid-digest-assets.

If someone else has already figured it out and packaged it in a gem, great! Maybe they’ll even take on the maintenance burden of keeping it working with churny sprockets updates!

But non-stupid-digest-assets just takes the same kind logic from that basic rake task, another pass through all the assets post-compilation, but implements it with a sprockets monkeypatch instead of a rake task. It does add a white list.  I can’t quite figure out if it’s still subject to the same might-end-up-with-older-version-of-asset problem.

There’s really no benefit just to using a monkey patch instead of a rake task doing the same thing, and it has increased risk of breaking with new Rails releases. Some have already reported it not working with the Rails 4.2.betas — I haven’t investigated myself to see what’s up with that, and @alexspeller doesn’t seem to be in any hurry to either.

Verdict: Nope. non-stupid-digest-assets ain’t as smart as it thinks it is. 

Monkey-patch sprockets: The right way?

If you’re going to monkey-patch sprockets and take on forwards-compat risk, why not actually do it right, and make sprockets simply write the compiled file to two different file locations (and/or use symlinks) at the point of compilation?

@ryana  suggested such code. I’m not sure how tested it is, and I’d want to add the whitelist feature.

At this point, I was too scared of the forwards-compatibility-maintenance risks of monkey patching sprockets, and realized there was another solution I liked better…

Verdict: It’s the right way to do it, but carries some forwards-compat maintenance risk as an unsupported monkey patch

Use the Manifest, Luke, erm, Rake!

I had tried and given up on using the sprockets ruby api to determine ‘current digest-named asset’.  But as I was going back and reading through the Monster Issue looking for ideas again, I noticed @drojas suggested using the manifest.json file that sprockets creates, in a rake task.

Yep, this is where sprockets actually stores info on the current digest-named-assets. Forget the sprockets ruby api, we can just get it from there, and make sure we’re making a copy (or symlinking) the current digested version to the non-digested name.

But are we still using private api that may carry maintenance risk with future sprockets versions?  Hey, look, in a source code comment Sprockets tells us “The JSON is part of the public API and should be considered stable.” Sweet!

Now, even if sprockets devs  remember one of them once said this was public API (I hope this blog post helps), and even if sprockets is committed to semantic versioning, that still doesn’t mean it can never change. In fact, the way some of rubydom treats semver, it doesn’t even mean it can’t change soon and frequently; it just means they’ve got to update the sprockets major version number when it changes. Hey, at least that’d be a clue.

But note that changes can happen in between Rails major releases. Rails 4.1 uses sprockets-rails 2.x which uses sprockets 2.x. Rails 4.2 — no Rails major version number change — will use sprockets-rails 3.x which, oh, still uses sprockets 2.x, but clearly there’s no commitment on Rails not to change sprockets-rails/sprockets major versions without a Rails major version change.

Anyway, what can you do, you pays your money and you takes your chances. This solution seems pretty good to me.

Here’s my rake task, just a couple dozen lines of code, no problem.

 Verdict: Pretty decent option, best of our current choices

The Redirect

One more option is using a redirect to take requests for the non-digest-named asset, and redirect it to the current digest-named asset.

@Intrepidd suggests using rack middleware to do that.   I think it would also work to just use a Rails route redirect, with lambda. (I’m kind of allergic to middleware.) Same difference either way as far as what your app is doing.

I didn’t really notice this one until I had settled on The Manifest.  It requires two HTTP requests every time a client wants the asset at the persistent URL though. The first one will touch your app and needs short cache time, that will then redirect to the digest-named asset that will be served directly by the web server and can be cached forever. I’m not really sure if the performance implications are significant, probably depends on your use cases and request volume. @will-r suggests it won’t work well with CDN’s though. 

Verdict: Meh, maybe, I dunno, but it doesn’t feel right to introduce the extra latency

The Future

@rafaelfranca says Rails core has changed their mind and are going to deal with “this issue” “in some way”. Although I don’t think it made it into Rails 4.2 after all.

But what’s “this issue” exactly? I dunno, they are not sharing what they see as the legitimate use cases to handle, and requirements on legitimate ways to handle em.

I kinda suspect they might just be dealing with the “non-Rails JS that needs to know asset URLs” issue, and considering some complicated way to automatically make it use digest-named assets without having to repackage it for Rails.  Which might be a useful feature, although also a complicated enough one to have some bug risks (ah, the story of the asset pipeline).

And it’s not what I need, anyway, there are other uses cases than the “non-Rails JS” one that need non-digest-named assets.

I just need sprockets to produce parallel non-digested asset filenames for certain whitelisted assets. That really is the right way to handle it for my use case. Yes, it means you need to know the implications and how to use cache headers responsibly. If you don’t give me enough rope to hang myself, I don’t have enough rope to climb the rock face either. I thought Rails target audience was people who know what they’re doing?

It doesn’t seem like this would be a difficult feature for sprockets to implement (without double compilation!).  @ryana’s monkeypatch seems like pretty simple code that is most of the way there.  It’s the feature what I need.

I considered making a pull request to sprockets (the first step, then probably sprockets-rails, needs to support passing on the config settings).  But you know what, I don’t have the time or psychic energy to get in an argument about it in a PR; the Rails/sprockets devs seem opposed to this feature for some reason.  Heck, I just spent hours figuring out how to make my app work now, and writing it all up for you instead!

But, yeah, just add that feature to sprockets, pretty please.

So, if you’re reading this post in the future, maybe things will have changed, I dunno.


Filed under: General

Library of Congress: The Signal: The Library of Congress Wants You (and Your File Format Ideas)

planet code4lib - Fri, 2014-10-03 16:24

“Uncle Sam Needs You” painted by James Montgomery Flagg

In June of this year, the Library of Congress announced a list of formats it would prefer for digital collections. This list of recommended formats is an ongoing work; the Library will be reviewing the list and making revisions for an updated version in June 2015. Though the team behind this work continues to put a great deal of thought and research into listing the formats, there is still one more important component needed for the project: the Library of Congress needs suggestions from you.

This request is not half-hearted. As the Library increasingly relies on the list to identify preferred formats for acquisition of digital collections, no doubt other institutions will adopt the same list. It is important, therefore, that as the Library undertakes this revision of the recommended formats, it conducts a public dialog about them in order to reach an informed consensus.

This public dialog includes librarians, library students, teachers, vendors, publishers, information technologists — anyone and everyone with an opinion on the matter and a stake in preserving digital files. Collaboration is essential for digital preservation. No single institution can know everything and do everything alone. This is a shared challenge.

Librarians, what formats would you prefer to receive your digital collections in? What file formats are easiest for you to process and access? Publishers and vendors, what format do you think you should create your digital publications in if you want your stuff to last and be accessible into the future? The time may come when you want to re-monetize a digital publication, so you want to ensure that it is accessible.

Those are general questions, of course. Let’s look at the specific file formats the Library has selected so far. The preferred formats are categorized by:

  • Textual Works and Musical Compositions
  • Still Image Works
  • Audio Works
  • Moving Image Works
  • Software and Electronic Gaming and Learning
  • Datasets/Databases

Take, for example, digital photographs. Here is the list of formats the Library would most prefer to receive for digital preservation:

  • TIFF (uncompressed)
  • JPEG2000 (lossless (*.jp2)
  • PNG (*.png)
  • JPEG/JFIF (*.jpg)
  • Digital Negative DNG (*.dng)
  • JPEG2000 (lossy) (*.jp2)
  • TIFF (compressed)
  • BMP (*.bmp)
  • GIF (*.gif)

Is there anything you think should be changed in that list? If so, why? Or anything added to this list? There’s a section on metadata on that page. Does it say enough? Or too little? Is it clear enough? Should the Library add some description about adding photo metadata into the photo files themselves?

Please look over the file categories that interest you and tell us what you think. Help us shape a policy that will affect future digital collections, large and small. Be as specific as you can.

Email your questions and comments to the digital preservation experts below. Your emails will be confidential; they will not be published on this blog post. So don’t be shy. We welcome all questions and comments, great and small.

Send general email about preferred formats to Theron Westervelt (thwe at loc.gov) Send email about specific categories to:

  • Ardie Bausenbach (abau at loc.gov) for Textual Works and Musical Compositions
  • Phil Michel (pmic at loc.gov) for Still Image Works
  • Gene DeAnna (edea at loc.gov) for Audio Works
  • Mike Mashon (mima at loc.gov) for Moving Image Works
  • Trevor Owens (trow at loc.gov) for Software and Electronic Gaming and Learning
  • Donna Scanlon (dscanlon at loc.gov) for Datasets/Databases

They are all very nice people who are up to their eyeballs in digital-preservation work and would appreciate hearing your fresh perspective on the subject.

One last thing. The recommended formats are just that: recommended. It is not a fixed set of standards. And the Library of Congress will not reject any digital collection of value simply because the file formats in the collection might not conform to the recommended formats.

Jason Ronallo: The Lenovo X240 Keyboard and the End/Insert Key With FnLk On as a Software Developer on Linux

planet code4lib - Fri, 2014-10-03 16:12

As a software developer I’m using keys like F5 a lot. When I’m doing any writing, I use F6 a lot to turn off and on spell correction underlining. On the Lenovo X240 the function keys are overlaid on the same keys as volume and brightness control. This causes some problems for me. Luckily there’s a solution that works for me under Linux.

To access the function keys you have to also press the Fn key. If most of what you’re doing is reloading a browser and not using the volume control, then this is a problem, so they’ve created a function lock which is enabled by pressing the Fn and Esc/FnLk key. The Fn key lights up and you can press F5 without using the Fn modifier key.

That’s all well and good until you get to another quirk of this keyboard where the Home, End, and Delete keys are in the same function key row in a way that the End key also functions as the Insert key. When function lock is on the End key becomes an Insert key. I don’t ever use the Insert key on a keyboard, so I understand why they combined the End/Insert key. But in this combination it doesn’t work for me as a software developer. I’m continually going between something that needs to be reloaded with F5 and in an editor where I need to quickly go to the end of a line in a program.

Luckily there’s a pretty simple answer to this if you don’t ever need to use the Insert key. I found the answer on askubuntu.

All I needed to do was run the following:

xmodmap -e "keycode 118 = End"

And now even when the function keys are locked the End/Insert key always behaves as End. To make this is permanent and the mapping gets loaded with X11 starts, add xmodmap -e "keycode 118 = End" to your ~/.xinitrc.

Jason Ronallo: Questions Asked During the Presentation Websockets For Real-time And Interactive Interfaces At Code4lib 2014

planet code4lib - Fri, 2014-10-03 16:12

During my presentation on WebSockets, there were a couple points where folks in the audience could enter text in an input field that would then show up on a slide. The data was sent to the slides via WebSockets. It is not often that you get a chance to incorporate the technology that you’re talking about directly into how the presentation is given, so it was a lot of fun. At the end of the presentation, I allowed folks to anonymously submit questions directly to the HTML slides via WebSockets.

I ran out of time before I could answer all of the questions that I saw. I’ll try to answer them now.

Questions From Slides

You can see in the YouTube video at the end of my presentation (at 1h38m26s) the following questions came in. ([Full presentation starts here[(https://www.youtube.com/watch?v=_8MJATYsqbY&feature=share&t=1h25m37s).) Some lines that came in were not questions at all. For those that are really questions, I’ll answer them now, even if I already answered them.

Are you a trained dancer?

No. Before my presentation I was joking with folks about how little of a presentation I’d have, at least for the interactive bits, if the wireless didn’t work well enough. Tim Shearer suggested I just do an interpretive dance in that eventuality. Luckily it didn’t come to that.

When is the dance?

There was no dance. Initially I thought the dance might happen later, but it didn’t. OK, I’ll admit it, I was never going to dance.

Did you have any efficiency problems with the big images and chrome?

On the big video walls in Hunt Library we often use Web technologies to create the content and Chrome for displaying it on the wall. For the most part we don’t have issues with big images or lots of images on the wall. But there’s a bit of trick happening here. For instance when we display images for My #HuntLibrary on the wall, they’re just images from Instagram so only 600x600px. We initially didn’t know how these would look blown up on the video wall, but they end up looking fantastic. So you don’t necessarily need super high resolution images to make a very nice looking display.

Upstairs on the Visualization Wall, I display some digitized special collections images. While the possible resolution on the display is higher, the current effective resolution is only about 202px wide for each MicroTile. The largest image is then only 404px side. In this case we are also using a Djatoka image server to deliver the images. Djatoka has an issue with the quality of its scaling between quality levels where the algorithm chosen can make the images look very poor. How I usually work around this is to pick the quality level that is just above the width required to fit whatever design. Then the browser scales the image down and does a better job making it look OK than the image server would. I don’t know which of these factors effect the look on the Visualization Wall the most, but some images have a stair stepping look on some lines. This especially effects line drawings with diagonal lines, while photographs can look totally acceptable. We’ll keep looking for how to improve the look of images on these walls especially in the browser.

Have you got next act after Wikipedia?

This question is referring to the adaptation of Listen to Wikipedia for the Immersion Theater. You can see video of what this looks like on the big Hunt Library Immersion Theater wall.

I don’t currently have solid plans for developing other content for any of the walls. Some of the work that I and others in the Libraries have done early on has been to help see what’s possible in these spaces and begin to form the cow paths for others to produce content more easily. We answered some big questions. Can we deliver content through the browser? What templates can we create to make this work easier? I think the next act is really for the NCSU Libraries to help more students and researchers to publish and promote their work through these spaces.

Is it lunchtime yet?

In some time zone somewhere, yes. Hopefully during the conference lunch came soon enough for you and was delicious and filling.

Could you describe how testing worked more?

I wish I could think of some good way to test applications that are destined for these kinds of large displays. There’s really no automated testing that is going to help here. BrowserStack doesn’t have a big video wall that they can take screenshots on. I’ve also thought that it’d be nice to have a webcam trained on the walls so that I could make tweaks from a distance.

But Chrome does have its screen emulation developer tools which were super helpful for this kind of work. These kinds of tools are useful not just for mobile development, which is how they’re usually promoted, but for designing for very large displays as well. Even on my small workstation monitor I could get a close enough approximation of what something would look like on the wall. Chrome will shrink the content to fit to the available viewport size. I could develop for the exact dimensions of the wall while seeing all of the content shrunk down to fit my desktop. This meant that I could develop and get close enough before trying it out on the wall itself. Being able to design in the browser has huge advantages for this kind of work.

I work at DH Hill Library while these displays are in Hunt Library. I don’t get over there all that often, so I would schedule some time to see the content on the walls when I happened to be over there for a meeting. This meant that there’d often be a lag of a week or two before I could get over there. This was acceptable as this wasn’t the primary project I was working on.

By the time I saw it on the wall, though, we were really just making tweaks for design purposes. We wanted the panels to the left and right of the Listen to Wikipedia visualization to fall along the bezel. We would adjust font sizes for how they felt once you’re in the space. The initial, rough cut work of modifying the design to work in the space was easy, but getting the details just right required several rounds of tweaks and testing. Sometimes I’d ask someone over at Hunt to take a picture with their phone to ensure I’d fixed an issue.

While it would have been possible for me to bring my laptop and sit in front of the wall to work, I personally didn’t find that to work well for me. I can see how it could work to make development much faster, though, and it is possible to work this way.

Race condition issues between devices?

Some spaces could allow you to control a wall from a kiosk and completely avoid any possibility of a race condition. When you allow users to bring their own device as a remote control to your spaces you have some options. You could allow the first remote to connect and lock everyone else out for a period of time. Because of how subscriptions and presence notifications work this would certainly be possible to do.

For Listen to Wikipedia we allow more than one user to control the wall at the same time. Then we use WebSockets to try to keep multiple clients in sync. Even though we attempt to quickly update all the clients, it is certainly possible that there could be race conditions, though it seems unlikely. Because we’re not dealing with persisting data, I don’t really worry about it too much. If one remote submits just after another but before it is synced, then the wall will reflect the last to submit. That’s perfectly acceptable in this case. If a client were to get out of sync with what is on the wall, then any change by that client would just be sent to the wall as is. There’s no attempt to make sure a client had the most recent, freshest version of the data prior to submitting.

While this could be an issue for other use cases, it does not adversely effect the experience here. We do an alright job keeping the clients in sync, but don’t shoot for perfection.

How did you find the time to work on this?

At the time I worked on these I had at least a couple other projects going. When waiting for someone else to finish something before being able to make more progress or on a Friday afternoon, I’d take a look at one of these projects for a little. It meant the progress was slow, but these also weren’t projects that anyone was asking to be delivered on a deadline. I like to have a couple projects of this nature around. If I’ve got a little time, say before a meeting, but not enough for something else, I can pull one of these projects out.

I wonder, though, if this question isn’t more about the why I did these projects. There were multiple motivations. A big motivation was to learn more about WebSockets and how the technology could be applied in the library context. I always like to have a reason to learn new technologies, especially Web technologies, and see how to apply them to other types of applications. And now that I know more about WebSockets I can see other ways to improve the performance and experience of other applications in ways that might not be as overt in their use of the technology as these project were.

For the real-time digital collections view this is integrated into an application I’ve developed and it did not take much to begin adding in some new functionality. We do a great deal of business analytic tracking for this application. The site has excellent SEO for the kind of content we have. I wanted to explore other types of metrics of our success.

The video wall projects allowed us to explore several different questions. What does it take to develop Web content for them? What kinds of tools can we make available for others to develop content? What should the interaction model be? What messaging is most effective? How should we kick off an interaction? Is it possible to develop bring your own device interactions? All of these kinds of questions will help us to make better use of these kinds of spaces.

Speed of an unladen swallow?

I think you’d be better off asking a scientist or a British comedy troupe.

Questions From Twitter

Mia (@mia_out) tweeted at 11:47 AM on Tue, Mar 25, 2014
@ostephens @ronallo out of curiosity, how many interactions compared to visitor numbers? And in-app or relying on phone reader?

sebchan (@sebchan) tweeted at 0:06 PM on Tue, Mar 25, 2014
@ostephens @ronallo (but) what are the other options for ‘interacting’?

This question was in response to how 80% of the interactions with the Listen to Wikipedia application are via QR code. We placed a URL and QR code on the wall for Listen to Wikipedia not knowing which would get the most use.

Unfortunately there’s no simple way I know of to kick off an interaction in these spaces when the user brings their own device. Once when there was a stable exhibit for a week we used a kiosk iPad to control a wall so that the visitor did not need to bring a device. We are considering how a kiosk tablet could be used more generally for this purpose. In cases where the visitor brings their own device it is more complicated. The visitor either must enter a URL or scan a QR code. We try to make the URLs short, but because we wanted to use some simple token authentication they’re at least 4 characters longer than they might otherwise be. I’ve considered using geolocation services as the authentication method, but they are not as exact as we might want them to be for this purpose, especially if the device uses campus wireless rather than GPS. We also did not want to have a further hurdle of asking for permission of the user and potentially being rejected. For the QR code the visitor must have a QR code reader already on their device. The QR code includes the changing token. Using either the URL or QR code sends the visitor to a page in their browser.

Because the walls I’ve placed content on are in public spaces there is no good way to know how many visitors there are compared to the number of interactions. One interesting thing about the Immersion Theater is that I’ll often see folks standing outside of the opening to the space looking in, so even if there where some way to track folks going in and out of the space, that would not include everyone who has viewed the content.

Other Questions

If you have other questions about anything in my presentation, please feel free to ask. (If you submit them through the slides I won’t ever see them, so better to email or tweet at me.)

Jason Ronallo: HTML Slide Decks With Synchronized and Interactive Audience Notes Using WebSockets

planet code4lib - Fri, 2014-10-03 16:12

One question I got asked after giving my Code4Lib presentation on WebSockets was how I created my slides. I’ve written about how I create HTML slides before, but this time I added some new features like an audience interface that synchronizes automatically with the slides and allows for audience participation.

TL;DR I’ve open sourced starterdeck-node for creating synchronized and interactive HTML slide decks.

Not every time that I give a presentation am I able to use the technologies that I am talking about within the presentation itself, so I like to do it when I can. I write my slide decks as Markdown and convert them with Pandoc to HTML slides which use DZslides for slide sizing and animations. I use a browser to present the slides. Working this way with HTML has allowed me to do things like embed HTML5 video into a presentation on HTML5 video and show examples of the JavaScript API and how videos can be styled with CSS.

For a presentation on WebSockets I gave at Code4Lib 2014, I wanted to provide another example from within the presentation itself of what you can do with WebSockets. If you have the slides and the audience notes handout page open at the same time, you will see how they are synchronized. (Beware slowness as it is a large self-contained HTML download using data URIs.) When you change to certain slides in the presenter view, new content is revealed in the audience view. Because the slides are just an HTML page, it is possible to make the slides more interactive. WebSockets are used to allow the slides to send messages to each audience members' browser and reveal notes. I am never able to say everything that I would want to in one short 20 minute talk, so this provided me a way to give the audience some supplementary material.

Within the slides I even included a simplistic chat application that allowed the audience to send messages directly to the presenter slides. (Every talk on WebSockets needs a gratuitous chat application.) At the end of the talk I also accepted questions from the audience via an input field. The questions were then delivered to the slides via WebSockets and displayed right within a slide using a little JavaScript. What I like most about this is that even someone who did not feel confident enough to step up to a microphone would have the opportunity to ask an anonymous question. And I even got a few legitimate questions amongst the requests for me to dance.

Another nice side benefit of getting the audience to notes before the presentation starts is that you can include your contact information and Twitter handle on the page.

I have wrapped up all this functionality for creating interactive slide decks into a project called starterdeck-node. It includes the WebSocket server and a simple starting point for creating your own slides. It strings together a bunch of different tools to make creating and deploying slide decks like this simpler so you’ll need to look at the requirements. This is still definitely just a tool for hackers, but having this scaffolding in place ought to make the next slide deck easier to create.

Here’s a video where I show starterdeck-node at work. Slides on the left; audience notes on the right.

Other Features

While the new exciting feature added in this version of the project is synchronization between presenter slides and audience notes, there are also lots of other great features if you want to create HTML slide decks. Even if you aren’t going to use the synchronization feature, there are still lots of reasons why you might want to create your HTML slides with starterdeck-node.

Self-contained HTML. Pandoc uses data-URIs so that the HTML version of your slides have no external dependencies. Everything including images, video, JavaScript, CSS, and fonts are all embedded within a single HTML document. That means that even if there’s no internet connection from the podium you’ll still be able to deliver your presentation.

Onstage view. Part of what gets built is a DZSlides onstage view where the presenter can see the current slide, next slide, speaker notes, and current time.

Single page view. This view is a self-contained, single-page layout version of the slides and speaker notes. This is a much nicer way to read a presentation than just flipping through the slides on various slide sharing sites. If you put a lot of work into your talk and are writing speaker notes, this is a great way to reuse them.

PDF backup. A script is included to create a PDF backup of your presentation. Sometimes you have to use the computer at the podium and it has an old version of IE on it. PDF backup to the rescue. While you won’t get all the features of the HTML presentation you’re still in business. The included Node.js app provides a server so that a headless browser can take screenshots of each slide. These screenshots are then compiled into the PDF.

Examples

I’d love to hear from anyone who tries to use it. I’ll list any examples I hear about below.

Here are some examples of slide decks that have used starterdeck-node or starterdeck.

Jason Ronallo: HTML and PDF Slideshows Written in Markdown with DZSlides, Pandoc, Guard, Capybara Webkit, and a little Ruby

planet code4lib - Fri, 2014-10-03 16:12

I’ve used different HTML slideshow tools in the past, but was never satisfied with them. I didn’t like to have to run a server just for a slideshow. I don’t like when a slideshow requires external dependencies that make it difficult to share the slides. I don’t want to actually have to write a lot of HTML.

I want to write my slides in a single Markdown file. As a backup I always like to have my slides available as a PDF.

For my latest presentations I came up with workflow that I’m satisfied with. Once all the little pieces were stitched together it worked really well for me. I’ll show you how I did it.

I had looked at DZSlides before but had always passed it by after seeing what a default slide deck looked like. It wasn’t as flashy as others and doesn’t immediately have all the same features readily available. I looked at it again because I liked the idea that it is a single file template. I also saw that Pandoc will convert Markdown into a DZSlides slideshow.

To convert my Markdown to DZSlides it was as easy as:

pandoc -w dzslides presentation.md > presentation.html

What is even better is that Pandoc has settings to embed images and any external files as data URIs within the HTML. So this allows me to maintain a single Markdown file and then share my presentation as a single HTML file including images and all–no external dependencies.

pandoc -w dzslides --standalone --self-contained presentation.md > presentation.html

The DZSlides default template is rather plain, so you’ll likely want to make some stylistic changes to the CSS. You may also want to add some more JavaScript as part of your presentation or to add features to the slides. For instance I wanted to add a simple way to toggle my speaker notes from showing. In previous HTML slides I’ve wanted to control HTML5 video playback by binding JavaScript to a key. The way I do this is to add in any external styles or scripts directly before the closing body tag after Pandoc does its processing. Here’s the simple script I wrote to do this:

#! /usr/bin/env ruby # markdown_to_slides.rb # Converts a markdown file into a DZslides presentation. Pandoc must be installed. # Read in the given CSS file and insert it between style tags just before the close of the body tag. css = File.read('styles.css') script = File.read('scripts.js') `pandoc -w dzslides --standalone --self-contained presentation.md > presentation.html` presentation = File.read('presentation.html') style = "<style>#{css}</style>" scripts = "<script>#{script}</script>" presentation.sub!('</body>', "#{style}#{scripts}</body>") File.open('presentation.html', 'w') do |fh| fh.puts presentation end

Just follow these naming conventions:

  • Presentation Markdown should be named presentation.md
  • Output presentation HTML will be named presentation.html
  • Create a stylesheet in styles.css
  • Create any JavaScript in a file named scripts.js
  • You can put images wherever you want, but I usually place them in an images directory.
Automate the build

Now what I wanted was for this script to run any time the Markdown file changed. I used Guard to watch the files and set off the script to convert the Markdown to slides. While I was at it I could also reload the slides in my browser. One trick with guard-livereload is to allow your browser to watch local files so that you do not have to have the page behind a server. Here’s my Guardfile:

guard 'livereload' do watch("presentation.html") end guard :shell do # If any of these change run the script to build presentation.html watch('presentation.md') {`./markdown_to_slides.rb`} watch('styles.css') {`./markdown_to_slides.rb`} watch('scripts.js') {`./markdown_to_slides.rb`} watch('markdown_to_slides.rb') {`./markdown_to_slides.rb`} end

Add the following to a Gemfile and bundle install:

source 'http://rubygems.org' gem 'guard-livereload' gem 'guard-shell'

Now I have a nice automated way to build my slides, continue to work in Markdown, and have a single file as a result. Just run this:

bundle exec guard

Now when any of the files change your HTML presentation will be rebuilt. Whenever the resulting presentation.html is changed, it will trigger livereload and a browser refresh.

Slides to PDF

The last piece I needed was a way to convert the slideshow into a PDF as a backup. I never know what kind of equipment will be set up or whether the browser will be recent enough to work well with the HTML slides. I like being prepared. It makes me feel more comfortable knowing I can fall back to the PDF if needs be. Also some slide deck services will accept a PDF but won’t take an HTML file.

In order to create the PDF I wrote a simple ruby script using capybara-webkit to drive a headless browser. If you aren’t able to install the dependencies for capybara-webkit you might try some of the other capybara drivers. I did not have luck with the resulting images from selenium. I then used the DZSlides JavaScript API to advance the slides. I do a simple count of how many times to advance based on the number of sections. If you have incremental slides this script would need to be adjusted to work for you.

The Webkit driver is used to take a snapshot of each slide, save it to a screenshots directory, and then ImageMagick’s convert is used to turn the PNGs into a PDF. You could just as well use other tools to stitch the PNGs together into a PDF. The quality of the resulting PDF isn’t great, but it is good enough. Also the capybara-webkit browser does not evaluate @font-face so the fonts will be plain. I’d be very interested if anyone gets better quality using a different browser driver for screenshots.

#! /usr/bin/env ruby # dzslides2pdf.rb # dzslides2pdf.rb http://localhost/presentation_root presentation.html require 'capybara/dsl' require 'capybara-webkit' # require 'capybara/poltergeist' require 'fileutils' include Capybara::DSL base_url = ARGV[0] || exit presentation_name = ARGV[1] || 'presentation.html' # temporary file for screenshot FileUtils.mkdir('./screenshots') unless File.exist?('./screenshots') Capybara.configure do |config| config.run_server = false config.default_driver config.current_driver = :webkit # :poltergeist config.app = "fake app name" config.app_host = base_url end visit '/presentation.html' # visit the first page # change the size of the window if Capybara.current_driver == :webkit page.driver.resize_window(1024,768) end sleep 3 # Allow the page to render correctly page.save_screenshot("./screenshots/screenshot_000.png", width: 1024, height: 768) # take screenshot of first page # calculate the number of slides in the deck slide_count = page.body.scan(%r{slide level1}).size puts slide_count (slide_count - 1).times do |time| slide_number = time + 1 keypress_script = "Dz.forward();" # dzslides script for going to next slide page.execute_script(keypress_script) # run the script to transition to next slide sleep 3 # wait for the slide to fully transition # screenshot_and_save_page # take a screenshot page.save_screenshot("./screenshots/screenshot_#{slide_number.to_s.rjust(3,'0')}.png", width: 1024, height: 768) print "#{slide_number}. " end puts `convert screenshots/*png presentation.pdf` FileUtils.rm_r('screenshots')

At this point I did have to set this up to be behind a web server. On my local machine I just made a symlink from the root of my Apache htdocs to my working directory for my slideshow. The script can be called with the following.

./dzslides2pdf.rb http://localhost/presentation/root/directory presentation.html Speaker notes

One addition that I’ve made is to add some JavaScript for speaker notes. I don’t want to have to embed my slides into another HTML document to get the nice speaker view that DZslides provides. I prefer to just have a section at the bottom of the slides that pops up with my notes. I’m alright with the audience seeing my notes if I should ever need them. So far I haven’t had to use the notes.

I start with adding the following markup to the presentation Markdown file.

<div role="note" class="note"> Hi. I'm Jason Ronallo the Associate Head of Digital Library Initiatives at NCSU Libraries. </div>

Add some CSS to hide the notes by default but allow for them to display at the bottom of the slide.

div[role=note] { display: none; position: absolute; bottom: 0; color: white; background-color: gray; opacity: 0.85; padding: 20px; font-size: 12px; width: 100%; }

Then a bit of JavaScript to show/hide the notes when pressing the “n” key.

window.onkeypress = presentation_keypress_check; function presentation_keypress_check(aEvent){ if ( aEvent.keyCode == 110) { aEvent.preventDefault(); var notes = document.getElementsByClassName('note'); for (var i=0; i < notes.length; i++){ notes[i].style.display = (notes[i].style.display == 'none' || !notes[i].style.display) ? 'block' : 'none'; } } } Outline

Finally, I like to have an outline I can see of my presentation as I’m writing it. Since the Markdown just uses h1 elements to separate slides, I just use the following simple script to output the outline for my slides.

#!/usr/bin/env ruby # outline_markdown.rb file = File.read('presentation.md') index = 0 file.each_line do |line| if /^#\s/.match line index += 1 title = line.sub('#', index.to_s) puts title end end Full Example

You can see the repo for my latest HTML slide deck created this way for the 2013 DLF Forum where I talked about Embedded Semantic Markup, schema.org, the Common Crawl, and Web Data Commons: What Big Web Data Means for Libraries and Archives.

Conclusion

I like doing slides where I can write very quickly in Markdown and then have the ability to handcraft the deck or particular slides. I’d be interested to hear if you do something similar.

Jason Ronallo: DLF Forum 2013 presentation: Embedded Semantic Markup, schema.org, the Common Crawl, and Web Data Commons

planet code4lib - Fri, 2014-10-03 16:12

I spoke at the 2013 DLF Forum about Embedded Semantic Markup, schema.org, the Common Crawl, and Web Data Commons: What Big Web Data Means for Libraries and Archives. My slides, code, and data are all open.

Here’s the abstract:

Search engines are reaching the limits of natural language processing while wanting to provide more exact answers, not just results, especially for the mobile context. This shift is part of what has spurred progress in how data can be published and consumed on the Web. Broad and simple vocabularies and simplified embedded semantic markup is leading to wider adoption of publishing data in HTML. Libraries and archives can take advantage of new opportunities to make their services and collections more discoverable on the open Web. This presentation will show some examples of what libraries and archives are currently doing and point to future possibilities.

At the same time as this new data is being made available, only a few organizations have the resources to crawl the Web and extract the data. The Common Crawl is helping to make a large repository of Web crawl data available for public use, and Web Data Commons is extracting the data embedded in the Common Crawl and making the resulting linked data available for download. This presentation will share data from original research on how libraries currently fare in this new environment of big Web data. Are libraries and archives represented in the corpus? With this democratization of Web crawl data and lowered expense for consumption of it, what are the opportunities for new library services and collections?

Jason Ronallo: A Plugin For Mediaelement.js For Preview Thumbnails on Hover Over the Time Rail Using WebVTT

planet code4lib - Fri, 2014-10-03 16:12

The time rail or progress bar on video players gives the viewer some indication of how much of the video they’ve watched, what portion of the video remains to be viewed, and how much of the video is buffered. The time rail can also be clicked on to jump to a particular time within the video. But figuring out where in the video you want to go can feel kind of random. You can usually hover over the time rail and move from side to side and see the time that you’d jump to if you clicked, but who knows what you might see when you get there.

Some video players have begun to use the time rail to show video thumbnails on hover in a tooltip. For most videos these thumbnails give a much better idea of what you’ll see when you click to jump to that time. I’ll show you how you can create your own thumbnail previews using HTML5 video.

TL;DR Use the time rail thumbnails plugin for Mediaelement.js.

Archival Use Case

We usually follow agile practices in our archival processing. This style of processing became popularized by the article More Product, Less Process: Revamping Traditional Archival Processing by Mark A. Greene and Dennis Meissner. For instance, we don’t read every page of every folder in every box of every collection in order to describe it well enough for us to make the collection accessible to researchers. Over time we may decide to make the materials for a particular collection or parts of a collection more discoverable by doing the work to look closer and add more metadata to our description of the contents. But we try not to allow the perfect from being the enemy of the good enough. Our goal is to make the materials accessible to researchers and not hidden in some box no one knows about.

Some of our collections of videos are highly curated like for video oral histories. We’ve created transcripts for the whole video. We extract out the most interesting or on topic clips. For each of these video clips we create a WebVTT caption file and an interface to navigate within the video from the transcript.

At NCSU Libraries we have begun digitizing more archival videos. And for these videos we’re much more likely to treat them like other archival materials. We’re never going to watch every minute of every video about cucumbers or agricultural machinery in order to fully describe the contents. Digitization gives us some opportunities to automate the summarization that would be manually done with physical materials. Many of these videos don’t even have dialogue, so even when automated video transcription is more accurate and cheaper we’ll still be left with only the images. In any case, the visual component is a good place to start.

Video Thumbnail Previews

When you hover over the time rail on some video viewers, you see a thumbnail image from the video at that time. YouTube does this for many of its videos. I first saw that this would be possible with HTML5 video when I saw the JW Player page on Adding Preview Thumbnails. From there I took the idea to use an image sprite and a WebVTT file to structure which media fragments from the sprite to use in the thumbnail preview. I’ve implemented this as a plugin for Mediaelement.js. You can see detailed instructions there on how to use the plugin, but I’ll give the summary here.

1. Create an Image Sprite from the Video

This uses ffmpeg to take a snapshot every 5 seconds in the video and then uses montage (from ImageMagick) to stitch them together into a sprite. This means that only one file needs to be downloaded before you can show the preview thumbnail.

ffmpeg -i "video-name.mp4" -f image2 -vf fps=fps=1/5 video-name-%05d.jpg montage video-name*jpg -tile 5x -geometry 150x video-name-sprite.jpg 2. Create a WebVTT metadata file

This is just a standard WebVTT file except the cue text is metadata instead of captions. The URL is to an image and uses a spatial Media Fragment for what part of the sprite to display in the tooltip.

WEBVTT 00:00:00.000 --> 00:00:05.000 http://example.com/video-name-sprite.jpg#xywh=0,0,150,100 00:00:05.000 --> 00:00:10.000 http://example.com/video-name-sprite.jpg#xywh=150,0,150,100 00:00:10.000 --> 00:00:15.000 http://example.com/video-name-sprite.jpg#xywh=300,0,150,100 00:00:15.000 --> 00:00:20.000 http://example.com/video-name-sprite.jpg#xywh=450,0,150,100 00:00:20.000 --> 00:00:25.000 http://example.com/video-name-sprite.jpg#xywh=600,0,150,100 00:00:25.000 --> 00:00:30.000 http://example.com/video-name-sprite.jpg#xywh=0,100,150,100 3. Add the Video Thumbnail Preview Track

Put the following within the <video> element.

<track kind="metadata" class="time-rail-thumbnails" src="http://example.com/video-name-sprite.vtt"></track> 4. Initialize the Plugin

The following assumes that you’re already using Mediaelement.js, jQuery, and have included the vtt.js library.

$('video').mediaelementplayer({ features: ['playpause','progress','current','duration','tracks','volume', 'timerailthumbnails'], timeRailThumbnailsSeconds: 5 }); The Result

Your browser won’t play an MP4. You can download it instead.

See Bug Sprays and Pets with sound.

Installation

The plugin can either be installed using the Rails gem or the Bower package.

MutationObserver

One of the DOM API features I hadn’t used before is MutationObserver. One thing the thumbnail preview plugin needs to do is know what time is being hovered over on the time rail. I could have calculated this myself, but I wanted to rely on MediaElement.js to provide the information. Maybe there’s a callback in MediaElement.js for when this is updated, but I couldn’t find it. Instead I use a MutationObserver to watch for when MediaElement.js changes the DOM for the default display of a timestamp on hover. Looking at the time code there then allows the plugin to pick the correct cue text to use for the media fragment. MutationObserver is more performant than the now deprecated MutationEvents. I’ve experienced very little latency using a MutationObserver which allows it to trigger lots of events quickly.

The plugin currently only works in the browsers that support MutationObserver, which is most current browsers. In browsers that do not support MutationObserver the plugin will do nothing at all and just show the default timestamp on hover. I’d be interested in other ideas on how to solve this kind of problem, though it is nice to know that plugins that rely on another library have tools like MutationObserver around.

Other Caveats

This plugin is brand new and works for me, but there are some caveats. All the images in the sprite must have the same dimensions. The durations for each thumbnail must be consistent. The timestamps currently aren’t really used to determine which thumbnail to display, but is instead faked relying on the consistent durations. The plugin just does some simple addition and plucks out the correct thumbnail from the array of cues. Hopefully in future versions I can address some of these issues.

Discoveries

Having this feature be available for our digitized video, we’ve already found things in our collection that we wouldn’t have seen before. You can see how a “Profession with a Future” evidently involves shortening your life by smoking (at about 9:05). I found a spinning spherical display of Soy-O and synthetic meat (at about 2:12). Some videos switch between black & white and color which you wouldn’t know just from the poster image. And there are some videos, like talking heads, that appear from the thumbnails to have no surprises at all. But maybe you like watching boiling water for almost 13 minutes.

OK, this isn’t really a discovery in itself, but it is fun to watch a head banging JFK as you go back and forth over the time rail. He really likes milk. And Eisenhower had a different speaking style.

You can see this in action for all of our videos on the NCSU Libraries' Rare & Unique Digital Collections site and make your own discoveries. Let me know if you find anything interesting.

Preview Thumbnail Sprite Reuse

Since we already had the sprite images for the time rail hover preview, I created another interface to allow a user to jump through a video. Under the video player is a control button that shows a modal with the thumbnail sprite. The sprite alone provides a nice overview of the video that allows you to see very quickly what might be of interest. I used an image map so that the rather large sprite images would only have to be in memory once. (Yes, image maps are still valid in HTML5 and have their legitimate uses.) jQuery RWD Image Maps allows the map area coordinates to scale up and down across devices. Hovering over a single thumb will show the timestamp for that frame. Clicking a thumbnail will set the current time for the video to be the start time of that section of the video. One advantage of this feature is that it doesn’t require the kind of fine motor skill necessary to hover over the video player time rail and move back and forth to show each of the thumbnails.

This feature has just been added this week and deployed to production this week, so I’m looking for feedback on whether folks find this useful, how to improve it, and any bugs that are encountered.

Summarization Services

I expect that automated summarization services will become increasingly important for researchers as archives do more large-scale digitization of physical collections and collect more born digital resources in bulk. We’re already seeing projects like fondz which autogenerates archival description by extracting the contents of born digital resources. At NCSU Libraries we’re working on other ways to summarize the metadata we create as we ingest born digital collections. As we learn more what summarization services and interfaces are useful for researchers, I hope to see more work done in this area. And this is just the beginning of what we can do with summarizing archival video.

Pages

Subscribe to code4lib aggregator