You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 4 days 23 hours ago

Karen Coyle: WP:NOTABILITY (and Women)

Fri, 2014-09-05 01:04
I've been spending quite a bit of time lately following the Wikipedia pages of "Articles for Deletion" or WP:AfD in Wikipedia parlance. This is a fascinating way to learn about the Wikipedia world. The articles for deletion fall mostly into a few categories:
  1. Brief mentions of something that someone once thought interesting (a favorite game character, a dearly loved soap opera star, a heartfelt local organization) but that has not been considered important by anyone else. In Wikipedian, it lacks WP:NOTABILITY.
  2. Highly polished P.R. intended to make someone or something look more important than it is, knowing that Wikipedia shows up high on search engine results, and that any site linked to from Wikipedia also gets its ranking boosted.
Some of #2 is actually created by companies that are paid to get their clients into Wikipedia along with promoting them in other places online. Another good example is that of authors of self-published books, some of whom appear to be more skilled in P.R. than they are in the literary arts.

In working through a few of the fifty or more articles proposed for deletion each day, you get to do some interesting sleuthing. You can see who has edited the article, and what else they have edited; any account that has only edited one article could be seen as a suspected bogus account created just for that purpose. Or you could assume that only one person in the English-speaking world has any interest in this topic at all.

Most of the work, though, is in seeing if you can establish notability. Notability is not a precise measure, and there are many pages of policy and discussion on the topic. The short form is that for something or someone to be notable, it has to be written about in respected, neutral, third-party publications. Thus a New York Times book review is good evidence of notability for a book, while a listing in the Amazon book department is not. The grey area is wide, however. Publisher's Weekly may or may not indicate notability, since they publish only short paragraphs, and cover about 7,000 books a year. That's not very discriminating.

Notability can be tricky. I recently came across an article for deletion pointing to Elsie Finnimore Buckley, a person I had never heard of before. I discovered that her dates were 1882-1959, and she was primarily a translator of works from French into English. She did, though, write what appears to have been a popular book of Greek tales for young people.

As a translator, her works were listed under "E. F. Buckley." I can well imagine that if she had used her full name it would not have been welcome on the title page of the books she translated. Some of the works she translated appear to have a certain stature, such as works by Franz Funck-Brentano. She has an LC name authority file under "Buckley, E. F." although her full name is added in parentheses: "(Elsie Finnimore)".

To understand what it was like for women writers, one can turn to Linda Peterson's book "Becoming a Woman of Letters and the fact of the Victorian market." In that, she quotes a male reviewer of Buckley's Greek tales, which she did publish under her full name. His comments are enough to chill the aspirations of any woman writer. He said that writing on such serious topics is "not women's work" and that "a woman has neither the knowledge nor the literary tact necessary for it." (Peterson, p. 58) Obviously, her work as a translator is proof otherwise, but he probably did not know of that work.

Given this attitude toward women as writers (of anything other than embroidery patterns and luncheon menus) it isn't all that surprising that it's not easy to establish WP:NOTABILITY for women writers of that era. As Dale Spender says in "Mothers of the Novel; 100 good women writers before Jane Austen":
"If the laws of literary criticism were to be made explicit they would require as their first entry that the sex of the author is the single most important factor in any test of greatness and in any preservation for posterity." (p. 137)That may be a bit harsh, but it illustrates the problem that one faces when trying to rectify the prejudices against women, especially from centuries past, while still wishing to provide valid proof that this woman's accomplishments are worthy of an encyclopedia entry.

We know well that many women writers had to use male names in order to be able to publish at all. Others, like E.F. Buckley, hid behind initials. Had her real identity been revealed to the reading public, she might have lost her work as a translator. Of late, J.K. Rowling has used both techniques, so this is not a problem that we left behind with the Victorian era. As I said in the discussion on Wikipedia:
"It's hard to achieve notability when you have to keep your head down."

Cherry Hill Company: Cherry Hill to present at DrupalCamp LA this weekend

Thu, 2014-09-04 22:03

Cherry Hill is looking forward to DrupalCamp LA this weekend! Come join us for some of our sessions to expand your Drupal knowledge. Whether you are a seasoned Drupal ninja, or a green newbie, LA Drupal community members, including the crew at Cherry Hill, will be on hand to show you some ins and outs of the Drupal world. 

Check out our sessions below:

Saturday: Morning InstallFest: Get PHP & Drupal running in under 15 minutes with Tommy Keswick

8:30am Pacific Ballroom AB 
InstallFest volunteers will help guide and verify the installation of PHP and/or Drupal on your personal laptop.

Drupal Camp Into for Newbies with John Romine and Ashok Modi

8:40am Pacific Ballroom C
Pre-camp cup of coffee and a quick introduction to how to get...

Read more »

HangingTogether: 939,594,891 library users worldwide — Prove me wrong!

Thu, 2014-09-04 20:19

That crunching you hear is the sound of the numbers available from OCLC’s Global Library Statistics page.

Over the past several years, the OCLC Library has been compiling data for the total number of libraries, librarians, volumes, expenditures, and users for every country and territory in the world, broken down into the major library types: academic, public, school, special and national.  The goal was to provide statistics on all libraries—not just OCLC libraries—that could be accessed and used by anyone.

A while back Dr. Frank Seeliger, Director of the Library at the Technical University of Applied Sciences in Wildau, Germany, contacted me about the statistics.  He asked if I could send him the actual data behind the site so that he could total up all the libraries, librarians, books, etc.  (At the time the information was only accessible country-by-country.)  I was happy to oblige, and here’s what he came up with.

Global library statistics summary

His request created the impetus for us to make the data available under an Open Data Commons Attribution License. Two spreadsheets provide information for countries and for U.S. states and Canadian provinces.  A third gives information on the over 80 sources that contribute data.

See the data for yourself!

The staff of the OCLC Library extracted data from respected third-party sources, both electronic and print, that in their judgment are the most current and accurate sources to which they have access. For many countries, data were either unavailable (indicated in the charts as NA) or sporadic. For a lot of the world, the data were not as current as the we would have liked.

We want to makes these statistics as accurate as we can.  Once you’ve taken a look at the Global Library Statistics, take a look the Sources and send me your suggestions or leave a comment below.  While $51 billion in library expenditures is nothing to sneeze at, it is, as Dr. Seeliger put it, a Hausnummer.  A ballpark figure.  And it’s not even adjusted for inflation!

Thanks for your help.

About Tam Dalrymple

Tam Dalrymple is Senior Information Specialist (reference librarian) at the OCLC Library in Dublin Ohio. Prior to joining OCLC as a product manager some years back, Tam managed reference services at Ohio State and at the Columbus Metropolitan Library.

Mail | More Posts (1)

Jodi Schneider: Rating the evidence, citation by citation?

Thu, 2014-09-04 17:21

Publishers from HighWire Press are experimenting with a plugin called SocialCite. This is intended to rate the evidence, citation by citation. Like this:

SocialCite at PNAS, HighWire Press from http://www.pnas.org/content/108/14/5488.full#ref-list-1:

So far a few publishers (including PNAS) have implemented it as a pilot. Apparently the Journal of Bone and Joint Surgery is apparently leading this effort, I’d be really interested in speaking with them further:

Find out more about SocialCite from their website or the slidedeck from their debut at the HighwirePress meeting.

SocialCite makes its debut at the HighWire Press meeting from Kent Anderson

I’m *very* curious to hear what peopel think of this — it really surprised me.

LITA: LITA Updates

Thu, 2014-09-04 16:45

This is one of our periodic messages sent to all LITA members. This update includes information about:

  • LITA Forum Opportunities
  • New LITA Guides available
LITA Forum in Albuquerque

Two workshops, three keynotes, 30 plus concurrent sessions, poster sessions, and, multiple networking opportunities promise to deliver opportunities to you.

The two preconference workshops begin on Wednesday, November 5, 1:00-5:00pm and run through Thursday, November 6, 8am to noon.

1) Learn Python by Playing with Library Data with Francis Kayiwa. Learn the basics on how to set up your Python environment, install useful packages, and, write programs.

2) LinkedData for Libraries: How libraries can make use of Linked Open Data to share information about library resources and to improve discovery, access, and understanding for library users with Dean Krafft and Jon Corson-Rikert from Cornell University Library.

The three keynote speakers are:

AnnMarie Thomas, Engineering Professor at the University of St. Thomas. AnnMarie is the director of the UST Design Laboratory. Dr. Thomas co-founded, and co-directs the University of St. Thomas Center for Pre-Collegiate Engineering Education. She served as the Founding Executive Director of the Maker Education Initiative. AnnMarie has also worked on robotics design, creation, and propulsion.

Lorcan Dempsey, Vice President, OCLC Research and Chief Strategist, oversees the research division and participates in planning at OCLC. Lorcan has policy, research and service development experience, mostly in the area of networked information and digital libraries.

Kortney Ryan Ziegler, Founder of Trans*h4ck, is an award winning artist, writer, and the first person to hold the PhD of African American Studies from Northwestern University. Trans*H4CK is the only tech event of its kind that spotlights trans* created technology, trans* entrepreneurs and trans* led startups.

Networking opportunities

All Forum sessions are in a single hotel which facilitates networking opportunities. These include a first night reception, two nights of networking dinners (gather on site and then move off site to various restaurants), all conference meals on site (breakfasts, lunch) and lengthy breaks. Not to mention conversations in the hotel hallways and elevators. The first night reception launches the Sponsor Showcase where participants will have ample opportunities to meet with representatives from EBSCO, Springshare, and, @MIRE both that evening and the next day. Our thanks go to all the Forum sponsors including Innovative and OCLC. Rachel Vacek, LITA President, and, Thomas Dowling, LITA President-elect, have plans to lead two networking dinners focused on LITA specific Kitchen Conversations. LITA and the LITA Forum fully support the Statement of Appropriate Conduct at ALA Conferences

Hope to see you in Albuquerque!

New LITA Guides

Two LITA Guides were published this summer. The Top Technologies Every Librarian Needs to Know, Kenneth Varnum, editor and contributor, and, Using Massive Digital Libraries by Andrew Weiss with Ryan James.

The Top Technologies guide is focused on the impact a technology could have on staff, services, and patrons. An expert on each emerging technology talks about the technology within the near-term future of three to five years. In the introduction, Ken Varnum says, “Each chapter includes a thorough description of a particular technology: what it is, where it came from, and why it matters. We will look at early adopters or prototypes for the technology to see how it could be used more broadly. And then, having described a trajectory, we will paint a picture of how the library of the not-so-distant future could be changed by adopting and embracing that particular technology.”

Using Massive Digital Libraries examines “what Ryan James and (Andrew Weiss) in previous studies have together defined as massive digital libraries (MDLs). … A massive digital library is a collection of organized information large enough to rival the size of the world’s largest bricks-and-mortar libraries in terms of book collections. The examples examined in this book range from hundreds of thousands of books to tens of millions. This basic definition … is a starting point for discussion. As the book progresses this definition is refined further to make it more usable and relevant. This book will introduce more characteristics of MDLs and examine how they affect the current traditional library.”

I encourage you to connect with LITA by:

  1. Exploring our web site.
  2. Subscribing to LITA-L email discussion list. E-mail to sympa@ala.org with the subject line “subscribe lita-l”.
  3. Visiting the LITA blog and LITA Division page on ALA Connect.
  4. Connecting with us on Facebook and Twitter.
  5. Reaching out to the LITA leadership at any time.

Please note: the Information Technology and Libraries (ITAL) journal is available to you and to the entire profession. ITAL features high-quality articles that undergo rigorous peer-review as well as case studies, commentary, and information about topics and trends of interest to the LITA community and beyond. Be sure to sign up for notifications when new issues are posted (March, June, September, and December).

If you have any questions or wish to discuss any of these items, please do let me know.

All the best,

Mary

Mary Taylor, Executive Director
Library and Information Technology Association (LITA)
50 E. Huron, Chicago, IL 60611
800-545-2433 x4267
312-280-4267 (direct line)
312-280-3257 (fax)
mtaylor (at) ala.org
www.lita.org

Join us in Albuquerque, November 5-8, 2014 for the LITA Forum. The theme is “Transformation: From Node to Network”

District Dispatch: Free webinar: Understanding Social Security

Thu, 2014-09-04 16:26

Photo by the Knight Foundation

Do you know how to help your patrons locate information on Supplemental Security Income or Social Security? The American Library Association (ALA) is encouraging librarians to participate in “My SSA,” a free webinar that will teach participants how to use My Social Security (MySSA), the online Social Security resource.

Presented by leaders and members of the development team of MySSA, this session will provide attendees with an overview of MySSA. In addition to receiving benefits information in print, the Social Security Administration is encouraging librarians to create an online MySSA account to view and track benefits.

Attendees will learn about viewing earnings records and receiving instant estimates of their future Social Security benefits. Those already receiving benefits can check benefit and payment information and manage their benefits.

Speakers include:

  • Maria Artista-Cuchna, Acting Associate Commissioner, External Affairs
  • Kia Anderson, Supervisory Social Insurance Specialist
  • Arnoldo Moore, Social Insurance Specialist
  • Alfredo Padilia Jr., Social Insurance Specialist
  • Diandra Taylor, Management Analyst

Date: Wednesday, September 17, 2014
Time: 2:00 PM – 3:00 PM EDT
Register for the free event

If you cannot attend this live session, a recorded archive will be available. To view past webinars also hosted collaboratively with iPAC, please visit Lib2Gov.org.

The post Free webinar: Understanding Social Security appeared first on District Dispatch.

Library of Congress: The Signal: DPOE Working Group Moves Forward on Curriculum

Thu, 2014-09-04 13:03

The working group at their recent meeting. Photo by Julio Diaz.

For many organizations that are just starting to tackle digital preservation, it can be a daunting challenge – and particularly difficult to figure out the first steps to take.  Education and training may be the best starting point, creating and expanding the expertise available to handle this kind of challenge.  The Digital Preservation Outreach and Education  program here at the Library aims to do just that, by providing the materials as well as the hands-on instruction to help build the expertise needed for current and future professionals working on digital preservation.

Recently, the Library was host to a meeting of the DPOE Working Group, consisting of a core group of experts and educators in the field of digital preservation.  The Working Group participants were Robin Dale (Institute of Museum and Library Services), Sam Meister (University of Montana-Missoula), Mary Molinaro (University of Kentucky), and Jacob “Jake” Nadal (Princeton University).  The meeting was chaired by George Coulbourne of the Library of Congress, and Library staffers Barrie Howard and Kris Nelson also participated.

The main goal of the meeting was to update the existing DPOE Curriculum, which is used as the basis for the Program’s training workshops and then subsequently, by the trainees themselves.  A survey is being conducted to gather even more information, and will help inform this curriculum as well (see a related blog post).   The Working Group reviewed and edited all of the six substantive modules which are based on terms from the OAIS Reference Model framework:

  • Identify   (What digital content do you have?)
  • Select   (What portion of your digital content will be preserved?)
  • Store   (What issues are there for long-term storage?)
  • Protect  (What steps are needed to protect your digital content?)
  • Manage   (What provisions are needed for long-term management?)
  • Provide   (What considerations are there for long-term access?)

The group also discussed adding a seventh module on implementation.  Each of these existing modules contains a description, goals, concepts and resources designed to be used by current and/or aspiring digital preservation practitioners.

Mary Molinaro, Director, Research Data Center at the University of Kentucky Libraries, noted that “as we worked through the various modules it became apparent how flexible this curriculum is for a wide range of institutions.  It can be adapted for small, one-person cultural heritage institutions and still be relevant for large archives and libraries. ”

Mary also spoke to the advantages of having a focused, group effort to work through these changes: “Digital preservation has some core principles, but it’s also a discipline subject to rapid technological change.  Focusing on the curriculum together as an instructor group allowed us to emphasize those things that have not changed while at the same time enhancing the materials to reflect the current technologies and thinking.”

These curriculum modules are currently in the process of further refinement and revision, including an updated list of resources. The updated version of the curriculum will be available later this month. The Working Group also recommended some strategies for extending the curriculum to address executive audiences, and how to manage the process of updating the curriculum going forward.

Peter Murray: Thursday Threads: History of the Future, Kuali change-of-focus, 2018 Mindset List

Thu, 2014-09-04 10:22
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

This weeks threads are a mixture of the future, the present and the past. Starting things off is A History of the Future in 100 Objects, a revealing look at what technology and society has in store for us. Parts of this resource are available freely on the website with the rest available as a $5 e-book. Next, in the present, is the decision by the Kuali Foundation to shift to a for-profit model and what it means for open source in the academic domain. And finally, a look at the past with the mindset list for the class of 2018 from Beloit College.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

A History of the Future in 100 Objects

What are the 100 objects that future historians will pick to define our 21st century? A javelin thrown by an ‘enhanced’ Paralympian, far further than any normal human? Virtual reality interrogation equipment used by police forces? The world’s most expensive glass of water, mined from the moons of Mars? Or desire modification drugs that fuel a brand new religion?
A History of the Future in 100 Objects describes a hundred slices of the future of everything, spanning politics, technology, art, religion, and entertainment. Some of the objects are described by future historians; others through found materials, short stories, or dialogues. All come from a very real future.

- About A History of the Future, by Adrian Hon

I was turned on to this book-slash-website-slash-resource by a tweet from Herbert Von de Sompel:

I'm assuming @apple doesn't believe in the future – "A history of the Future in 100 objects" not in iBooks / @cni_org http://t.co/dK5OI4JuIr

— Herbert (@hvdsomp) August 21, 2014


The name is intriguing, right? I mean, A History of the Future in 100 Objects? What does it mean to have a “History of the Future”?

The answer is an intriguing book that places the reader in the year 2082 looking back at the previous 68 years. (Yes, if you are doing the math, the book starts with objects from 2014.) Whether it is high-tech gizmos or the impact of world events, the author makes a projection of what might happen by telling the brief story of an artifact. For those in the library arena, you want to read about the reading rooms of 2030, but I really suggest starting at the beginning and working your way through the vignettes from the book that the author has published on the website. There is a link in the header of each pages that points to e-book purchasing options.

Kuali Reboots Itself into a Commercial Entity

Despite the positioning that this change is about innovating into the next decade, there is much more to this change than might be apparent on the surface. The creation of a for-profit entity to “lead the development and ongoing support” and to enable “an additional path for investment to accelerate existing and create new Kuali products fundamentally moves Kuali away from the community source model. Member institutions will no longer have voting rights for Kuali projects but will instead be able to “sit on customer councils and will give feedback about design and priority”. Given such a transformative change to the underlying model, there are some big questions to address.

- Kuali For-Profit: Change is an indicator of bigger issues, by Phil Hill, e-Literate

As Phil noted in yesterday’s post, Kuali is moving to a for-profit model, and it looks like it is motivated more by sustainability pressures than by some grand affirmative vision for the organization. There has been a long-term debate in higher education about the value of “community source,” which is a particular governance and funding model for open source projects. This debate is arguably one of the reasons why Indiana University left the Sakai Foundation (as I will get into later in this post). At the moment, Kuali is easily the most high-profile and well-funded project that still identifies itself as Community Source. The fact that this project, led by the single most vocal proponent for the Community Source model, is moving to a different model strongly suggests that Community Source has failed.
It’s worth taking some time to talk about why it has failed, because the story has implications for a wide range of open-licensed educational projects. For example, it is very relevant to my recent post on business models for Open Educational Resources (OER).

- Community Source Is Dead, by Michael Feldstein, e-Literate blog

I touched on the cosmic shift in the direction of Kuali on DLTJ last week, but these two pieces from Phil Hill and Michael Feldstein on the e-Literate blog. I have certainly been a proponent of the open source method of building software and the need for sustainable open source software to develop a community around that software. But I can’t help but think there is more to this story than meets the eye: that there is something about a lack of faith by senior university administrators in having their own staff own the needs and issues of their institutions. Or maybe it has something to do with the high levels of fiscal commitment to elaborate “community source” governance structures. In thinking about what happened with Kuali, I can’t help but compare it to the reality of Project Hydra, where libraries participate with in-kind donations of staff time, travel expenses and good will to a self-governing organization that has only as much structure as it needs.

The 2018 Mindset List

Students heading into their first year of college this year were generally born in 1996.

Among those who have never been alive in their lifetime are Tupac Shakur, JonBenet Ramsey, Carl Sagan, and Tiny Tim.

On Parents’ Weekend, they may want to watch out in case Madonna shows up to see daughter Lourdes Maria Ciccone Leon or Sylvester Stallone comes to see daughter Sophia.

For students entering college this fall in the Class of 2018…

- 2018 List, by Tom McBride and Ron Nief, Beloit College Mindset List

So begins the annual “mindset list” — a tool originally developed to help the Beloit College instructors use cultural references that were relevant to the students entering their classrooms. I didn’t see as much buzz about it this year in my social circles, so I wanted to call it out (if for no other reason than to make you feel just a little older…).

Link to this post!

Peter Murray: Blocking /xmlrpc.php Scans in the Apache .htaccess File

Thu, 2014-09-04 02:41

Someone out there on the internet is repeatedly hitting this blog’s /xmlrpc.php service, probably looking to enumerate the user accounts on the blog as a precursor to a password scan (as described in Huge increase in WordPress xmlrpc.php POST requests at Sysadmins of the North). My access logs look like this:

176.227.196.86 - - [04/Sep/2014:02:18:19 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 195.154.136.19 - - [04/Sep/2014:02:18:19 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:19 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:21 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:22 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:24 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 195.154.136.19 - - [04/Sep/2014:02:18:24 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)" 176.227.196.86 - - [04/Sep/2014:02:18:26 +0000] "POST /xmlrpc.php HTTP/1.0" 200 291 "-" "Mozilla/4.0 (compatible: MSIE 7.0; Windows NT 6.0)"

By itself, this is just annoying — but the real problem is that the PHP stack is getting invoked each time to deal with the request, and at several requests per second from different hosts this was putting quite a load on the server. I decided to fix the problem with a slight variation from what is suggested in the Sysadmins of the North blog post. This addition to the .htaccess file at the root level of my WordPress instance rejects the connection attempt at the Apache level rather than the PHP level:

RewriteCond %{REQUEST_URI} =/xmlrpc.php [NC] RewriteCond %{HTTP_USER_AGENT} .*Mozilla\/4.0\ \(compatible:\ MSIE\ 7.0;\ Windows\ NT\ 6.0.* RewriteRule .* - [F,L]

Which means:

  1. If the requested path is /xmlrpc.php, and
  2. you are sending this particular agent string, then
  3. send back a 403 error message and don’t bother processing any more Apache rewrite rules.

If you need to use this yourself, you might find that the HTTP_USER_AGENT string has changed. You can copy the user string from your Apache access logs, but remember to preface each space or each parenthesis with a backslash.

Link to this post!

Peter Murray: 2nd Workshop on Sustainable Software for Science: Practice and Experiences — Accepted Papers and Travel Support

Thu, 2014-09-04 02:08

The conference organizers for WSSSPE2 have posted the list of accepted papers and the application for travel support. I was on the program committee for this year’s conference, and I can point to some papers that I think are particularly useful to libraries and the cultural heritage community in general:

Link to this post!

William Denton: Moodie's Tale

Thu, 2014-09-04 01:19

Somebody said we need a Moo for libraries. We still do. But I just read Moodie’s Tale by Eric Wright and I think it’s the Moo of Canadian academia. I don’t know Susanna Moodie or The Canterbury Tales so I think I’m missing a fair bit, but I still enjoyed it very much.

There are a few mentions of libraries, like this:

“Here’s an example,” the president continued. “I propose that henceforth you fellows be called ‘deans.’ Most places have deans nowadays. Sound the others out to see if there’s a problem. Now what else? What else does a college have? A proper college.”

“A library?”

“We’ve got one of sorts, haven’t we? In the corner room of the Drug Mart.”

“Just a few shelves, Gravely. Not many of the faculty know about it. It ought to have some standard reference works. Encyclopedias, that kind of thing.”

“We can afford a couple of thousand from the cleaning budget. Draw up a list. But now you’ve mentioned it, what is the real mark of a library?”

“Other than books?”

“Yes. What else?”

“A copying machine?”

“What else?”

It was important to guess right. Cunningham was getting impatient. “I am not sure of your emphasis, Gravely,” he hedged.

“Emphasis? How do you know it is a library?”

“The sign on the door?”

“Exactly. The label, William, the label. Get a sign made. And what do people find inside the door?”

“The librarian?”

“Now you’re on to it. Apart from the sign, the cheapest thing in the library is the librarian, especially since they aren’t unionized. We could put anyone in and call him the librarian. Now who have we got?”

“Beckett?”

Beckett was a religious maniac, a clerk in the maintenance department who spent his hours walking the streets with a billboard, warning of the end. His fellow workers complained constantly of his proselytizing in the storeroom.

“Perfect. He’s a bit more eccentric than most librarians, I suppose, but he’ll do. Is he conscientious?”

“It’s the other thing his colleagues dislike about him.”

“Done, then.”

Islandora: Varnish, Islandora, and Islandnewspapers.ca

Thu, 2014-09-04 00:24
Varnish and Islandora

Below you will find some information on how UPEI's Robertson Library configured Varnish for use with Islandora. Currently we have Varnish running on our Newspaper site and it is working well with the OpenSeadragon viewer, but we have not tested with the IA Bookviewer yet.

Why use Varnish?

At Robertson Library we have been digitizing the Guardian newspaper for a while now. We expected there would be a good amount of traffic to this site when it went live so prior to launch we wanted to do some benchmarks. We also noticed with the stock Islandora Newspaper solution pack that loading the Guardian newspaper page was very slow and we expected we would have to try to optimize things to handle load.

The benchmarks we used were pretty simple and were really just a way to help us determine whether or not an optimization was worth keeping. We used The Grinder, a Java based load testing framework.

We loaded Grinder with a simple scenario - hit the homepage, the main Guardian newspaper page, a Newspaper page (in the Openseadragon viewer) and the main Guardian page again (the one that lists all the Issues of the Guardian, we have almost 20,000 issues of the Guardian so far). Grinder was configured to hit these pages 250 times with 50 threads.

Our first run at it was with the stock islandora newspaper solution pack.

The numbers were not great with the stock Islandora Newspaper solution pack, we could handle about 1 request per second and we were starting to receive some errors. Total throughput was 1106.59KB/sec. CPU usage on the server was very high, all cores were pretty steady at or near 100%.

The biggest problem seemed to be hitting the resource index over and over again and manipulating the resulting array. So to try and speed things up a little we modified the code to query Solr instead of the Resource Index.

Test results with Solr query.

By querying Solr we were able to speed things up quite a bit. We were now getting close to 5 requests per second, no errors and a throughput of 4874.92 KB/sec. Our CPU usage was still very high, all cores at or near 100%.

We couldn’t see other ways to make the main Guardian page load faster without significantly changing how the Newspaper solution packed worked. Dynamically listing almost 20,000 issues on one page was going to take time no matter how we did it, unless we broke the page up into several requests. Breaking the page up into several requests would not be ideal either, as we would have to make roundtrips to the server to get the list of years available as well as all issues for a selected year. Instead of breaking this page up into several requests we discussed caching it.

So our next step was to install and configure Varnish so that this page would be cached. With Varnish installed and configured we ran the same Grinder tests.

Test with Varnish enabled

By using Varnish our numbers improved again. We were now handling 10 requests per second, no errors and a throughput of 9808.21 KB/sec. Our CPU usage was way down with our all cores between 3% and 20% usage (most were closer to the 3%). By using Varnish we got a speed boost but I think the biggest advantage will be in the number of users we can handle as our most expensive requests now come from the cache with little server overhead.

Of course using Grinder to test with Varnish makes Varnish look even better, as we are hitting the same URLs over and over but the results especially the low CPU usage lead us to believe Varnish is worth using on the Islandnewspapers.ca site.

Since we have launched we have had as many as 75 concurrent users and response times are great even under load.

Configuring Drupal and Islandora for Varnish Configure Drupal Performance

On the Drupal Performance admin page (admin/config/development/performance) we configured Drupal to cache and compress pages. We also aggregate and compress css and javascript.

Configure Islandora

On the Islandora config page (admin/islandora/configure) we disabled setting the cache headers.

If we enable the Generate/parse datastream HTTP cache headers Varnish doesn’t serve the page thumbnail images from it’s cache, on the plus side we may get better browser caching of thumbnails.

We seemed to get better performance with Generate/parse datastream HTTP headers unchecked so we have left it off for now.

Installing and configuring Varnish

We installed Varnish on Ubuntu with sudo apt-get install varnish. We are currently using Varnish 3.0.2.

Varnish Configuration

We modified the default.vcl in /etc/varnish.

Our vcl file looks like this:

# This is a basic VCL configuration file for varnish. See the vcl(7) # man page for details on VCL syntax and semantics. # # Default backend definition. Set this to point to your content # server. # backend default { .host = "127.0.0.1"; .port = "8090"; .connect_timeout = 30s; .first_byte_timeout = 30s; .between_bytes_timeout = 30s; } sub vcl_recv { // Remove has_js and Google Analytics __* cookies. set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", ""); // Remove a ";" prefix, if present. set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", ""); // Remove empty cookies. if (req.http.Cookie ~ "^\s*$") { unset req.http.Cookie; } //in testing pipe seemed to give us better results then pass if(req.url ~ "^/adore-djatoka"){ unset req.http.Cookie; return (pipe); } if (req.url ~ "\.(png|gif|jpg|js|css)$") { unset req.http.Cookie; return (lookup); } if(req.url ~ "^/search"){ unset req.http.Cookie; return (pass); } if (req.request == "GET" || req.request == "HEAD") { return (lookup); } } sub vcl_pipe { # http://www.varnish-cache.org/ticket/451 # This forces every pipe request to be the first one. set bereq.http.connection = "close"; }

In /etc/default/varnish (Ubuntu/Debian) or /etc/sysconfig/varnish (Centos/Fedora) you will have to change your DAEMON_OPTS. Ours look like this:

DAEMON_OPTS="-a :80 \ -T localhost:6082 \ -f /etc/varnish/default.vcl \ -S /etc/varnish/secret \ -s malloc,5g"

You can see from the two config files that we have Varnish listening on port 80 and looking for the backend on port 8090.

Our Apache server is configured to listen on port 8090, other than that Apache is using a standard Islandora type setup.

The timeouts in our VCL are pretty high and could probably be set a lot lower. With an earlier version of Varnish we were having some inconsistencies with loading times when using the OpenSeadragon viewer, the higher timeouts were left over from testing with the older version of Varnish and we will adjust them.

We have Varnish configured to use RAM (malloc) for it’s cache but this could be set to a file.

One thing we decided to do is pipe requests to Djatoka. Since Djatoka is already caching images we decided not to cache them twice.

We have also made some optimizations to Djatoka’s configs. Basically we increased the number of tiles and images Djatoka would keep in it’s cache.

Note: We are not using the Varnish Drupal module.

There are many great resources for Varnish on the web. Pantheon has a great page regarding Varnish and Drupal.

Pages