You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 37 min 51 sec ago

Peter Murray: Thursday Threads: History of the Future, Kuali change-of-focus, 2018 Mindset List

Thu, 2014-09-04 10:22
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

This weeks threads are a mixture of the future, the present and the past. Starting things off is A History of the Future in 100 Objects, a revealing look at what technology and society has in store for us. Parts of this resource are available freely on the website with the rest available as a $5 e-book. Next, in the present, is the decision by the Kuali Foundation to shift to a for-profit model and what it means for open source in the academic domain. And finally, a look at the past with the mindset list for the class of 2018 from Beloit College.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

A History of the Future in 100 Objects

What are the 100 objects that future historians will pick to define our 21st century? A javelin thrown by an ‘enhanced’ Paralympian, far further than any normal human? Virtual reality interrogation equipment used by police forces? The world’s most expensive glass of water, mined from the moons of Mars? Or desire modification drugs that fuel a brand new religion?
A History of the Future in 100 Objects describes a hundred slices of the future of everything, spanning politics, technology, art, religion, and entertainment. Some of the objects are described by future historians; others through found materials, short stories, or dialogues. All come from a very real future.

- About A History of the Future, by Adrian Hon

I was turned on to this book-slash-website-slash-resource by a tweet from Herbert Von de Sompel:

I'm assuming @apple doesn't believe in the future – "A history of the Future in 100 objects" not in iBooks / @cni_org http://t.co/dK5OI4JuIr

— Herbert (@hvdsomp) August 21, 2014


The name is intriguing, right? I mean, A History of the Future in 100 Objects? What does it mean to have a “History of the Future”?

The answer is an intriguing book that places the reader in the year 2082 looking back at the previous 68 years. (Yes, if you are doing the math, the book starts with objects from 2014.) Whether it is high-tech gizmos or the impact of world events, the author makes a projection of what might happen by telling the brief story of an artifact. For those in the library arena, you want to read about the reading rooms of 2030, but I really suggest starting at the beginning and working your way through the vignettes from the book that the author has published on the website. There is a link in the header of each pages that points to e-book purchasing options.

Kuali Reboots Itself into a Commercial Entity

Despite the positioning that this change is about innovating into the next decade, there is much more to this change than might be apparent on the surface. The creation of a for-profit entity to “lead the development and ongoing support” and to enable “an additional path for investment to accelerate existing and create new Kuali products fundamentally moves Kuali away from the community source model. Member institutions will no longer have voting rights for Kuali projects but will instead be able to “sit on customer councils and will give feedback about design and priority”. Given such a transformative change to the underlying model, there are some big questions to address.

- Kuali For-Profit: Change is an indicator of bigger issues, by Phil Hill, e-Literate

As Phil noted in yesterday’s post, Kuali is moving to a for-profit model, and it looks like it is motivated more by sustainability pressures than by some grand affirmative vision for the organization. There has been a long-term debate in higher education about the value of “community source,” which is a particular governance and funding model for open source projects. This debate is arguably one of the reasons why Indiana University left the Sakai Foundation (as I will get into later in this post). At the moment, Kuali is easily the most high-profile and well-funded project that still identifies itself as Community Source. The fact that this project, led by the single most vocal proponent for the Community Source model, is moving to a different model strongly suggests that Community Source has failed.
It’s worth taking some time to talk about why it has failed, because the story has implications for a wide range of open-licensed educational projects. For example, it is very relevant to my recent post on business models for Open Educational Resources (OER).

- Community Source Is Dead, by Michael Feldstein, e-Literate blog

I touched on the cosmic shift in the direction of Kuali on DLTJ last week, but these two pieces from Phil Hill and Michael Feldstein on the e-Literate blog. I have certainly been a proponent of the open source method of building software and the need for sustainable open source software to develop a community around that software. But I can’t help but think there is more to this story than meets the eye: that there is something about a lack of faith by senior university administrators in having their own staff own the needs and issues of their institutions. Or maybe it has something to do with the high levels of fiscal commitment to elaborate “community source” governance structures. In thinking about what happened with Kuali, I can’t help but compare it to the reality of Project Hydra, where libraries participate with in-kind donations of staff time, travel expenses and good will to a self-governing organization that has only as much structure as it needs.

The 2018 Mindset List

Students heading into their first year of college this year were generally born in 1996.

Among those who have never been alive in their lifetime are Tupac Shakur, JonBenet Ramsey, Carl Sagan, and Tiny Tim.

On Parents’ Weekend, they may want to watch out in case Madonna shows up to see daughter Lourdes Maria Ciccone Leon or Sylvester Stallone comes to see daughter Sophia.

For students entering college this fall in the Class of 2018…

- 2018 List, by Tom McBride and Ron Nief, Beloit College Mindset List

So begins the annual “mindset list” — a tool originally developed to help the Beloit College instructors use cultural references that were relevant to the students entering their classrooms. I didn’t see as much buzz about it this year in my social circles, so I wanted to call it out (if for no other reason than to make you feel just a little older…).

Link to this post!

Peter Murray: Blocking /xmlrpc.php Scans in the Apache .htaccess File

Thu, 2014-09-04 02:41

Someone out there on the internet is repeatedly hitting this blog’s /xmlrpc.php service, probably looking to enumerate the user accounts on the blog as a precursor to a password scan (as described in Huge increase in WordPress xmlrpc.php POST requests at Sysadmins of the North). My access logs look like this:

176.227.196.86 - - [04/Sep/2014:02:18:19 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 195.154.136.19 - - [04/Sep/2014:02:18:19 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:19 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:21 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:22 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:24 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 195.154.136.19 - - [04/Sep/2014:02:18:24 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:26 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"

By itself, this is just annoying — but the real problem is that the PHP stack is getting invoked each time to deal with the request, and at several requests per second from different hosts this was putting quite a load on the server. I decided to fix the problem with a slight variation from what is suggested in the Sysadmins of the North blog post. This addition to the .htaccess file at the root level of my WordPress instance rejects the connection attempt at the Apache level rather than the PHP level:

RewriteCond %{REQUEST_URI} =/xmlrpc.php [NC] RewriteCond %{HTTP_USER_AGENT} .*Mozilla\/4.0\ \(compatible:\ MSIE\ 7.0;\ Windows\ NT\ 6.0.* RewriteRule .* - [F,L]

Which means:

  1. If the requested path is /xmlrpc.php, and
  2. you are sending this particular agent string, then
  3. send back a 403 error message and don’t bother processing any more Apache rewrite rules.

If you need to use this yourself, you might find that the HTTP_USER_AGENT string has changed. You can copy the user string from your Apache access logs, but remember to preface each space or each parenthesis with a backslash.

Link to this post!

Peter Murray: 2nd Workshop on Sustainable Software for Science: Practice and Experiences — Accepted Papers and Travel Support

Thu, 2014-09-04 02:08

The conference organizers for WSSSPE2 have posted the list of accepted papers and the application for travel support. I was on the program committee for this year’s conference, and I can point to some papers that I think are particularly useful to libraries and the cultural heritage community in general:

Link to this post!

William Denton: Moodie's Tale

Thu, 2014-09-04 01:19

Somebody said we need a Moo for libraries. We still do. But I just read Moodie’s Tale by Eric Wright and I think it’s the Moo of Canadian academia. I don’t know Susanna Moodie or The Canterbury Tales so I think I’m missing a fair bit, but I still enjoyed it very much.

There are a few mentions of libraries, like this:

“Here’s an example,” the president continued. “I propose that henceforth you fellows be called ‘deans.’ Most places have deans nowadays. Sound the others out to see if there’s a problem. Now what else? What else does a college have? A proper college.”

“A library?”

“We’ve got one of sorts, haven’t we? In the corner room of the Drug Mart.”

“Just a few shelves, Gravely. Not many of the faculty know about it. It ought to have some standard reference works. Encyclopedias, that kind of thing.”

“We can afford a couple of thousand from the cleaning budget. Draw up a list. But now you’ve mentioned it, what is the real mark of a library?”

“Other than books?”

“Yes. What else?”

“A copying machine?”

“What else?”

It was important to guess right. Cunningham was getting impatient. “I am not sure of your emphasis, Gravely,” he hedged.

“Emphasis? How do you know it is a library?”

“The sign on the door?”

“Exactly. The label, William, the label. Get a sign made. And what do people find inside the door?”

“The librarian?”

“Now you’re on to it. Apart from the sign, the cheapest thing in the library is the librarian, especially since they aren’t unionized. We could put anyone in and call him the librarian. Now who have we got?”

“Beckett?”

Beckett was a religious maniac, a clerk in the maintenance department who spent his hours walking the streets with a billboard, warning of the end. His fellow workers complained constantly of his proselytizing in the storeroom.

“Perfect. He’s a bit more eccentric than most librarians, I suppose, but he’ll do. Is he conscientious?”

“It’s the other thing his colleagues dislike about him.”

“Done, then.”

Islandora: Varnish, Islandora, and Islandnewspapers.ca

Thu, 2014-09-04 00:24
Varnish and Islandora

Below you will find some information on how UPEI's Robertson Library configured Varnish for use with Islandora. Currently we have Varnish running on our Newspaper site and it is working well with the OpenSeadragon viewer, but we have not tested with the IA Bookviewer yet.

Why use Varnish?

At Robertson Library we have been digitizing the Guardian newspaper for a while now. We expected there would be a good amount of traffic to this site when it went live so prior to launch we wanted to do some benchmarks. We also noticed with the stock Islandora Newspaper solution pack that loading the Guardian newspaper page was very slow and we expected we would have to try to optimize things to handle load.

The benchmarks we used were pretty simple and were really just a way to help us determine whether or not an optimization was worth keeping. We used The Grinder, a Java based load testing framework.

We loaded Grinder with a simple scenario - hit the homepage, the main Guardian newspaper page, a Newspaper page (in the Openseadragon viewer) and the main Guardian page again (the one that lists all the Issues of the Guardian, we have almost 20,000 issues of the Guardian so far). Grinder was configured to hit these pages 250 times with 50 threads.

Our first run at it was with the stock islandora newspaper solution pack.

The numbers were not great with the stock Islandora Newspaper solution pack, we could handle about 1 request per second and we were starting to receive some errors. Total throughput was 1106.59KB/sec. CPU usage on the server was very high, all cores were pretty steady at or near 100%.

The biggest problem seemed to be hitting the resource index over and over again and manipulating the resulting array. So to try and speed things up a little we modified the code to query Solr instead of the Resource Index.

Test results with Solr query.

By querying Solr we were able to speed things up quite a bit. We were now getting close to 5 requests per second, no errors and a throughput of 4874.92 KB/sec. Our CPU usage was still very high, all cores at or near 100%.

We couldn’t see other ways to make the main Guardian page load faster without significantly changing how the Newspaper solution packed worked. Dynamically listing almost 20,000 issues on one page was going to take time no matter how we did it, unless we broke the page up into several requests. Breaking the page up into several requests would not be ideal either, as we would have to make roundtrips to the server to get the list of years available as well as all issues for a selected year. Instead of breaking this page up into several requests we discussed caching it.

So our next step was to install and configure Varnish so that this page would be cached. With Varnish installed and configured we ran the same Grinder tests.

Test with Varnish enabled

By using Varnish our numbers improved again. We were now handling 10 requests per second, no errors and a throughput of 9808.21 KB/sec. Our CPU usage was way down with our all cores between 3% and 20% usage (most were closer to the 3%). By using Varnish we got a speed boost but I think the biggest advantage will be in the number of users we can handle as our most expensive requests now come from the cache with little server overhead.

Of course using Grinder to test with Varnish makes Varnish look even better, as we are hitting the same URLs over and over but the results especially the low CPU usage lead us to believe Varnish is worth using on the Islandnewspapers.ca site.

Since we have launched we have had as many as 75 concurrent users and response times are great even under load.

Configuring Drupal and Islandora for Varnish Configure Drupal Performance

On the Drupal Performance admin page (admin/config/development/performance) we configured Drupal to cache and compress pages. We also aggregate and compress css and javascript.

Configure Islandora

On the Islandora config page (admin/islandora/configure) we disabled setting the cache headers.

If we enable the Generate/parse datastream HTTP cache headers Varnish doesn’t serve the page thumbnail images from it’s cache, on the plus side we may get better browser caching of thumbnails.

We seemed to get better performance with Generate/parse datastream HTTP headers unchecked so we have left it off for now.

Installing and configuring Varnish

We installed Varnish on Ubuntu with sudo apt-get install varnish. We are currently using Varnish 3.0.2.

Varnish Configuration

We modified the default.vcl in /etc/varnish.

Our vcl file looks like this:

# This is a basic VCL configuration file for varnish. See the vcl(7) # man page for details on VCL syntax and semantics. # # Default backend definition. Set this to point to your content # server. # backend default { .host = "127.0.0.1"; .port = "8090"; .connect_timeout = 30s; .first_byte_timeout = 30s; .between_bytes_timeout = 30s; } sub vcl_recv { // Remove has_js and Google Analytics __* cookies. set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", ""); // Remove a ";" prefix, if present. set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", ""); // Remove empty cookies. if (req.http.Cookie ~ "^\s*$") { unset req.http.Cookie; } //in testing pipe seemed to give us better results then pass if(req.url ~ "^/adore-djatoka"){ unset req.http.Cookie; return (pipe); } if (req.url ~ "\.(png|gif|jpg|js|css)$") { unset req.http.Cookie; return (lookup); } if(req.url ~ "^/search"){ unset req.http.Cookie; return (pass); } if (req.request == "GET" || req.request == "HEAD") { return (lookup); } } sub vcl_pipe { # http://www.varnish-cache.org/ticket/451 # This forces every pipe request to be the first one. set bereq.http.connection = "close"; }

In /etc/default/varnish (Ubuntu/Debian) or /etc/sysconfig/varnish (Centos/Fedora) you will have to change your DAEMON_OPTS. Ours look like this:

DAEMON_OPTS="-a :80 \ -T localhost:6082 \ -f /etc/varnish/default.vcl \ -S /etc/varnish/secret \ -s malloc,5g"

You can see from the two config files that we have Varnish listening on port 80 and looking for the backend on port 8090.

Our Apache server is configured to listen on port 8090, other than that Apache is using a standard Islandora type setup.

The timeouts in our VCL are pretty high and could probably be set a lot lower. With an earlier version of Varnish we were having some inconsistencies with loading times when using the OpenSeadragon viewer, the higher timeouts were left over from testing with the older version of Varnish and we will adjust them.

We have Varnish configured to use RAM (malloc) for it’s cache but this could be set to a file.

One thing we decided to do is pipe requests to Djatoka. Since Djatoka is already caching images we decided not to cache them twice.

We have also made some optimizations to Djatoka’s configs. Basically we increased the number of tiles and images Djatoka would keep in it’s cache.

Note: We are not using the Varnish Drupal module.

There are many great resources for Varnish on the web. Pantheon has a great page regarding Varnish and Drupal.

Pages