You are here

Feed aggregator

Ed Summers: JavaScript and Archives

planet code4lib - Thu, 2015-03-12 09:33

Tantek Çelik has some strong words about the use of JavaScript in Web publishing, specifically regarding it’s accessibility and longevity:

… in 10 years nothing you built today that depends on JS for the content will be available, visible, or archived anywhere on the web

It is a dire warning. It sounds and feels true. I am in the middle of writing a webapp that happens to use React, so Tantek’s words are particularly sobering.

And yet, consider for a moment how Twitter make personal downloadable archives available. When you request your archive you eventually get a zip file. When you unzip it, you open an index.html file in your browser, and are provided you with a view of all the tweets you’ve ever sent.

If you take a look under the covers you’ll see it is actually a JavaScript application called Grailbird. If you have JavaScript turned on it looks something like this:

If you have JavaScript turned off it looks something like this:

But remember this is a static site. There is no server side piece. Everything is happening in your browser. You can disconnect from the Internet and as long as your browser has JavaScript turned on it is fully functional. (Well the avatar URLs break, but that could be fixed). You can search across your tweets. You can drill into particular time periods. You can view your account summary. It feels pretty durable. I could stash it away on a hard drive somewhere, and come back in 10 years and (assuming there are still web browsers with a working JavaScript runtime) I could still look at it right?

So is Tantek right about JavaScript being at odds with preservation of Web content? I think he is, but I also think JavaScript can be used in the service of archiving, and that there are starting to be some options out there that make archiving JavaScript heavy websites possible.

The real problem that Tantek is talking about is when human readable content isn’t available in the HTML and is getting loaded dynamically from Web APIs using JavaScript. This started to get popular back in 2005 when Jesse James Garrett coined the term AJAX for building app-like web pages using asynchronous requests for XML, which is now mostly JSON. The scene has since exploded with all sorts of client side JavaScript frameworks for building web applications as opposed to web pages.

So if someone (e.g. Internet Archive) comes along and tries to archive a URL it will get the HTML and associated images, stylesheets and JavaScript files that are referenced in that HTML. These will get saved just fine. But when the content is played back later in (e.g. Wayback Machine) the JavaScript will run and try to talk to these external Web APIs to load content. If those APIs no longer exist, the content won’t load.

One solution to this problem is for the web archiving process to execute the JavaScript and to archive any of the dynamic content that was retrieved. This can be done using headless browsers like PhantomJS, and supposedly Google has started executing JavaScript. Like Tantek I’m dubious about how widely they execute JavaScript. I’ve had trouble getting Google to index a JavaScript heavy site that I’ve inherited at work. But even if the crawler does execute the JavaScript, user interactions can cause different content to load. So does the bot start clicking around in the application to get content to load? This is yet more work for a archiving bot to do, and could potentially result in write operations which might not be great.

Another option is to change or at least augment the current web archiving paradigm by adding curator driven web archiving to the mix. The best examples I’ve seen of this are Ilya Kreymer’s work on pywb and pywb-recorder. Ilya is a former Internet Archive engineer, and is well aware of the limitations in the most common forms of web archiving today. pywb is a new player for web archives and pywb-recorder is a new recording environment. Both work in concert to let archivists interactively select web content that needs to be archived, and then for that content to be played back. The best example of this is his demo service webrecorder.io which composes pywb and pywb-recorder so that anyone can create a web archive of a highly dynamic website, download the WARC archive file, and then reupload it for playback.

The nice thing about Ilya’s work is that it is geared at archiving this JavaScript heavy content. Rhizome and the New Museum in New York City have started working with Ilya to use pywb to archive highly dynamic Web content. I think this represents a possible bright future for archives, where curators or archivists are more part of the equation, and where Web archives are more distributed, not just at Internet Archive and some major national libraries. I think the work Genius are doing to annotate the Web, archived versions of the Web is in a similar space. It’s exciting times for Web archiving. You know, exciting if you happen to be an archivist and/or archiving things.

At any rate, getting back to Tantek’s point about JavaScript. If you are in the business of building archives on the Web definitely think twice about using client side JavaScript frameworks. If you do, make sure your site degrades so that the majority of the content is still available. You want to make it easy for Internet Archive to archive your content (lots of copies keeps stuff safe) and you want to make it easy for Google et al to index it, so people looking for your content can actually find it. Stanford University’s Web Archiving team have a super set of pages describing archivability of websites. We can’t control how other people publish on the Web, but I think as archivists we have a responsibility to think about these issues as we create archives on the Web.

Nicole Engard: Bookmarks for March 11, 2015

planet code4lib - Wed, 2015-03-11 20:30

Today I found the following resources and bookmarked them on <a href=

  • Avalon The Avalon Media System is an open source system for managing and providing access to large collections of digital audio and video. The freely available system enables libraries and archives to easily curate, distribute and provide online access to their collections for purposes of teaching, learning and research.

Digest powered by RSS Digest

The post Bookmarks for March 11, 2015 appeared first on What I Learned Today....

Related posts:

  1. Harvard Business School approves open-access policy
  2. Why can’t it all be this easy?
  3. Handheld Librarian Online Conference

LITA: Yes, You Can Video!

planet code4lib - Wed, 2015-03-11 19:24

A how-to guide for creating high-impact instructional videos without tearing your hair out.

Tuesday May 12, 2015
1:00 pm – 2:30 pm Central Time
Register now for this webinar

This brand new LITA Webinar promises a fun time learning how to create instructional videos

Have you ever wanted to create an engaging and educational instructional video, but felt like you didn’t have the time, ability, or technology? Are you perplexed by all the moving parts that go into creating an effective tutorial? In this session, Anne Burke and Andreas Orphanides will help to demystify the process, breaking it down into easy-to-follow steps, and provide a variety of technical approaches suited to a range of skill sets. They will cover choosing and scoping your topic, scripting and storyboarding, producing the video, and getting it online. They will also address common pitfalls at each stage.

Join

Anne Burke
Undergraduate Instruction & Outreach Librarian
North Carolina State University Libraries

and

Andreas Orphanides
Librarian for Digital Technologies and Learning
North Carolina State University Libraries

Then register for the webinar

Full details
Can’t make the date but still want to join in? Registered participants will have access to the recorded webinar.
Cost:

LITA Member: $45
Non-Member: $105
Group: $196
Registration Information

Register Online page arranged by session date (login required)
OR
Mail or fax form to ALA Registration
OR
Call 1-800-545-2433 and press 5
OR
email registration@ala.org

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4269 or Mark Beatty, mbeatty@ala.org.

Open Knowledge Foundation: Open Knowledge Russia: Experimenting with data expeditions

planet code4lib - Wed, 2015-03-11 00:43

As part of Open Education Week #openeducationwk activities we are publishing a post on how Open Knowledge Russia have been experimenting with data expeditions. This a follow up post to one that appeared on the Open Education Working Group Website which gave an overview of Open Education projects in Russia.

Anna Sakoyan

The authors of this post are Anna Sakoyan and Irina Radchenko, who together have founded DataDrivenJournalism.RU.

Irina Radchenko

Anna is currently working as a journalist and translator for a Russian analytical resource Polit.ru and is also involved in the activities of NGO InfoCulture. You can reach Anna on Twitter on @ansakoy, on Facebook and on LinkedIn. She blogs in English at http://ourchiefweapons.wordpress.com/.

Irina Radchenko is a Associate Professor at ITMO University and Chief Coordinator of Open Knowledge Russia. You can reach Irina on Twitter on @iradche, on Facebook and on LinkedIn. She blogs in Russian at http://iradche.ru//.

1. DataDrivenJournalism.RU project and Russian Data Expeditions

The open educational project DataDrivenJournalism.RU was launched in April 2013 by a group of enthusiasts. Initially it was predominantly a blog, which accumulated translated and originally written manuals on working with data, as well as more general articles about data driven journalism. Its mission was formulated as promoting the use of data (Open Data first of all) in the Russian-language environment and its main objective was to create an online platform to consolidate the Russian-speaking people who were interested in working with data, so that they can exchange their experiences and learn from each other. As the number of the published materials grew, they had to be structured in a searchable way, which resulted in making it look more like a website with special sections for learning materials, interactive educational projects (data expeditions), helpful links, etc.

On one hand, it operates as an educational resource with a growing collection of tutorials, a glossary and lists of helpful external links, as well as the central platform of its data expeditions; on the other hand, as a blog, it provides a broader context of open data application to various areas of activity, including data driven journalism itself. After almost two years of its existence, DataDrivenJournalism.RU has a team of 10 regular authors (comprised of enthusiasts from Germany, Kazakhstan, Russia, Sweden and UK). More than a hundred posts have been published, including 15 tutorials. It has also launched 4 data expeditions, the most recent in December 2014.

The term data expedition was first coined by Open Knowledge’s School of Data, which launched such peer-learning projects both in online and offline formats. We took this model as the basic principle and tried to apply it to the Russian environment. It turned out to be rather perspective, so we began experimenting with it, in order to make this format a more efficient education tool. In particular, we have tried a very loose organisational approach where the participants only had a general subject in common, but were free to choose their own strategy in working with it; a rather rigid approach with a scenario and tasks; and a model, which included experts who could navigate the participants in the area that they had to explore. These have been discussed in our guest post on Brian Kelly’s blog ‘UK Web Focus’.

Our fourth data expedition was part of a hybrid learning model. Namely, it was the practical part of a two-week’s offline course taught by Irina Radchenko in Kazakhstan. This experience appears to be rather inspiring and instructive.

2. International Data Expedition in Kazakhstan

The fourth Russian-language data expedition (DE4) was a part of a two-week’s course under the auspices of Karaganda State Technological University taught by Irina Radchenko. After the course was over the university participants who sucessfully completed all the tasks within DE4 received a certificate. Most interesting projects were later published at DataDrivenJournalism.RU. One of them is about industry in Kazakhstan by Asylbek Mubarak who also tells (in Russian) about his experience of participating in DE4 and also about the key stages of his work with data. The other, by Roman Ni is about some aspects of Kazakhstan budget.

First off, it was a unique experience of launching a data expedition outside Russia. It was also interesting that DE4 was a part of a hybrid learning format, which combined traditional offline lectures and seminars with a peer-learning approach. The specific of the peer-learning part was that it was open, so that any online user could participate. The problem was that the decision to make it open occurred rather late, so there was not much time to properly promote its announcement. However, there were several people from Russia and Ukraine who registered for participation. Unfortunately none of them participated actively, but hopefully, they managed to make some use of course materials and tasks published in the DE4 Google group.

This mixed format was rather time-taking, because it required not only preparation for regular lectures, but also a lot of online activity, including interaction with the participants, answering their questions in Google group and checking their online projects. The participants of the offline course seemed enthusiastic about the online part, many found it interesting and intriguing. In the final survey following DE4, most of the respondents emphasised that they liked the online part.

The initial level of the participants was very uneven. Some of them knew how to program and work with data bases, others had hardly ever been exposed to working with data. DE4 main tasks were build in a way that they could be done from scratch based only on the knowledge provided within the course. Meanwhile, there were also more advanced tasks and techniques for those who might find them interesting. Unfortunately, many participants could not complete all the tasks, because they were students and were right in the middle of taking their midterm exams at university.

Compared to our previous DEs, the percentage of completed tasks was much higher. The DE4 participants were clearly better motivated in terms of demonstrating their performance. Most importantly, some of them were interested in receiving a certificate. Another considerable motivation was participation in offline activities, including face-to-face discussions, as well as interaction during Irina’s lectures and seminars.

Technically, like all the previous expeditions, DE4 was centered around a closed Google group, which was used by the organisers to publish materials and tasks and by participants to discuss tasks, ask questions, exchange helpful links and coordinate their working process (as most of them worked in small teams). The chief tools within DE4 were Google Docs, Google Spreadsheets, Google Refine and Infogr.am. Participants were also encouraged to suggest or use other tools if they find it appropriate.

42 people registered for participation. 36 of them were those who took the offline course at Karaganda State Technical University. Those were most active, so most of our observations are based on their results and feedback. Also, due to the university base of the course, 50% of the participants were undergraduate students, while the other half included postgraduate students, people with a higher education and PhD. Two thirds of the participants were women. As to age groups, almost a half of the participants were between 16 and 21 years old, but there was also a considerable number of those between 22 and 30 years old and two above 50.

13 per cent of the participants completed all the tasks, including the final report. According to their responses to the final survey, most of them did their practical tasks by small pieces, but regularly. As to online interaction, the majority of respondens said they were quite satisfied with their communication experience. About a half of them though admitted that they did not contribute to online discussions, although found others’ contributions helpful. General feedback was very positive. Many pointed out that they were inspired by the friendly atmosphere and mutual helpfulness. Most said they were going to keep learning how to work with open data on their own. Almost all claimed they would like to participate in other data expeditions.

3. Conclusions

DE4 was an interesting step in the development of the format. In particular, it showed that an open peer-learning format can be an important integral part of a traditional course. It had a ready-made scenario and an instructor, but at the same time it heavily relied on the participants’ mutual help and experience exchange, and also provided a great degree of freedom and flexibility regarding the choice of subjects and tools. It is also yet another contribution to the collection of materials, which might be helpful in future expeditions alongside with the materials from all the previous DEs. It is part of a process of gradual formation of an educational resources base, as well as a supportive social base. As new methods are applied and tested in DEs, the practices that proved best are stored and used, which helps to make this format more flexible and helpful. What is most important is that this model can be applied to almost any educational initiative, because it is easily replicated and based on using free online services.

DuraSpace News: OR2015 NEWS: Registration Opens; Speakers from Mozilla and Google Announced

planet code4lib - Wed, 2015-03-11 00:00

From Jon Dunn, Julie Speer, and Sarah Shreeves, OR2015 Conference Organizing Committee; Holly Mercer, William Nixon, and Imma Subirats OR2015 Program Co-Chairs

Indianapolis, IN  We are pleased to announce that registration is now open for the 10th International Conference on Open Repositories, to be held on June 8-11, 2015 in Indianapolis, Indiana, United States of America. Full registration details and a link to the registration form may be found at: http://www.or2015.net/registration

Ed Summers: Facts are Mobile

planet code4lib - Tue, 2015-03-10 19:42

To classify is, indeed, as useful as it is natural. The indefinite multitude of particular and changing events is met by the mind with acts of defining, inventorying and listing, reducing the common heads and tying up in bunches. But these acts like other intelligent acts are performed for a purpose, and the accomplishment of purpose is their only justification. Speaking generally, the purpose is to facilitate our dealing with unique individuals and changing events. When we assume that our clefts and bunches represent fixed separations and collections in rerum natura, we obstruct rather than aid our transactions with things. We are guilty of a presumption which nature promptly punishes. We are rendered incompetent to deal effectively with the delicacies and novelties of nature and life. Our thought is hard where facts are mobile ; bunched and chunky where events are fluid, dissolving.

John Dewey in Human Nature and Conduct (p. 131)

FOSS4Lib Recent Releases: Koha - Maintenance and security releases v 3.16.8 and 3.18.4

planet code4lib - Tue, 2015-03-10 19:22
Package: KohaRelease Date: Tuesday, March 3, 2015

Last updated March 10, 2015. Created by David Nind on March 10, 2015.
Log in to edit this page.

Monthly maintenance and security releases for Koha. See the release announcements for the details:

Koha 3.18 is the latest stable release of Koha and is recommended for new installations.

OCLC Dev Network: Developer House Project: Advanced Typeahead

planet code4lib - Tue, 2015-03-10 16:00

We are Jason Thomale from the University of North Texas and George Campbell from OCLC, and we created an advanced “Advanced Typeahead” application during the December 1-5, 2014 Developer House event at OCLC headquarters in Columbus, Ohio. The Developer House events provide OCLC Platform Engineers and library developers an opportunity to brainstorm and develop applications against OCLC Web Services. We would like to share our development experience and the application we designed in this blog post.

HangingTogether: OCLC Research Library Partnership, making a difference: part 2

planet code4lib - Tue, 2015-03-10 15:37

I previously shared the story of Keio University, who benefited from attending our 2013 partner meeting — I wanted to share two more “member stories” which have roots in the OCLC Research Library Partnership.

OCLC member stories are being highlighted on the OCLC web page — there are many other interesting and dare I say inspiring stories shared there, so go check them out.

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (279)

Islandora: Site built on Islandora wins the ABC-CLIO Online History Award

planet code4lib - Tue, 2015-03-10 15:24

Congratulations are in order for Drexel's University and its Doctor or Doctress digital collection, which as been selected as this year's winner of the ABC-CLIO Online History Award.

This amazing site, which is both a historical collection and an online learning tool, more than fulfills its mission of helping people to "explore American history through the stories of women physicians." It is also one of the hands-down most stunningly executed Islandora sites out there in production right now and we could not be more thrilled to see it recognized as the accomplishment it truly is. For more information about the award, please visit the ALA's announcement.

DPLA: Digitization partnerships with Minnesota Public Libraries

planet code4lib - Tue, 2015-03-10 15:17

Carla Urban leads a discussion with PLPP participants at the Detroit Lakes training in September 2014.

The Minnesota Digital Library (MDL) is one of four DPLA Service Hubs to be sub-awarded a grant from the Bill and Melinda Gates Foundation, through the DPLA, for the Public Library Partnership Project (PLPP).  The purpose of PLPP is to develop a curriculum for teaching basic digitization concepts and skills and pilot it through workshops for public library staff, encourage and facilitate their participation in their local digital libraries and DPLA, and create collaborative online exhibitions based on materials digitized through this project. At the end of PLPP, we will also be sharing a self-guided version of the curriculum we built.

MDL was very pleased with the success of our implementation of the first stage of PLPP—we offered four digital skills training sessions to thirty-one individuals from twenty-two different public libraries and collaborating historical societies around Minnesota.  The training was so well received that we hope to incorporate similar basic group training sessions into our ongoing recruitment and preparation of potential participants.

We are now deep into the second phase of the PLPP in which the organizations propose projects, select appropriate materials from their collections and send them to us for digitization and metadata preparation. An early success was the contribution of a 1930 plat book of Polk County by the Fosston Public Library, the first organization to contribute to MDL from this county.

“Outline Map of Polk County, Minnesota.” Standard Map of Polk County, Minnesota, page 8-9. Courtesy of Fosston Public Library via Minnesota Digital Library.

One of the challenges we face is that, because of a very strong network of local historical societies throughout Minnesota, our public libraries don’t often have significant collections of archival or historic materials (Hennepin County Library being one important exception). However, we have been able to leverage our PLPP resources to encourage and support collaboration between public libraries and other organizations in their communities. In some cases, public libraries made new connections with city or county offices when collaborators realized they had materials that were worth preserving and making accessible, but didn’t know how to go about it and were not aware of MDL.  Public library participants in PLPP were able to identify these materials, make the case of online access, facilitate an avenue for digitization, and share description and rights assessment work. Because of the connections made via our PLPP library participants we’ll be digitizing the portraits of Duluth mayors, the master plans for county parks from the Washington County Park Board, and historically significant and previously inaccessible materials from the Minneapolis Parks and Recreation Board, among other projects.

The Gates-funded project will wrap up at the end of September 2015.  Between now and then we will be completing additional projects and developing two online exhibitions built in part on materials digitized through this grant.

PLPP has strengthened our relationship with public libraries around the state, improved the digitization knowledge of public library staff, increased our capacity, and brought in materials to which we would otherwise not have had access.  MDL has been more than pleased by the outcomes of our participation in the PLPP!

Carla Urban will be co-leading a digitization training session, with Sheila McAllister of the Digital Library of Georgia, at DPLAfest 2015. To learn more about PLPP and lessons learned, come participate in the discussion!

Header image: Detroit Public Library, Detroit, Minnesota, 1913. Courtesy of Becker County Historical Society via Minnesota Digital Library.

 

District Dispatch: Interested in Natl. Library Legislative Day? Here’s what you need to know

planet code4lib - Tue, 2015-03-10 14:54

North Suburban Library System trustee and staff view Capitol.

If you appreciate the critical roles that libraries play in creating an informed and engaged citizenry, register now for this year’s National Library Legislative Day (NLLD), a two-day advocacy event where hundreds of library supporters, leaders and patrons will meet with their legislators to advocate for library funding.

National Library Legislative Day, which is hosted by the American Library Association (ALA), will be held May 4-5, 2015, in Washington, D.C. Now in its 41st year, National Library Legislative Day focuses on the most pressing issues, including the need to fund the Library Services and Technology Act, support legislation that gives people who use libraries access to federally-funded scholarly journal articles and continued funding that provides school libraries with vital materials.

National Library Legislative Day Coordinators from each U.S. state arrange all advocacy meetings with legislators, communicate with the ALA Washington Office and serve as the contact person for state delegations. The ALA Washington Office will host a special training session on Sunday (May 3rd) afternoon for first-timers. On the first day of the event, participants will receive training and issue briefings to prepare them for meetings with their members of Congress.

Advocate from Home

Advocates who cannot travel to Washington for National Library Legislative Day can still make a difference and speak up for libraries. As an alternative, the American Library Association sponsors Virtual Library Legislative Day, which takes place on May 5, 2015. To participate in Virtual Library Legislative Day, register now for American Library Association policy action alerts.

For the next month, the ALA Washington Office will share National Library Legislative Day resources on the District Dispatch. Keep up with the conversation by using the hashtag #nlld15.

The post Interested in Natl. Library Legislative Day? Here’s what you need to know appeared first on District Dispatch.

Access Conference: Call for Peer-Reviewers

planet code4lib - Tue, 2015-03-10 13:41

Wanna be a peer reviewer for AccessYYZ? Excellent, because we need some of those.

If you’re interested, please shoot an email to accesslibcon@gmail.com by March 27th, 2015 with the following information:

Name
Current Position (including whether you are a student)
Institution
Have you been to Access before?
Have you presented at Access before?
Have you done peer review for Access before?

Come to think of it, a CV/resume would be nice. Yes, make sure you include that too.

Islandora: From the Listserv: A Dev Ops Group and a Challenge

planet code4lib - Tue, 2015-03-10 12:31

Every once in a while something really interesting comes up on the listserv, and I try to bring the highlights here to the blog so that it will get exposure with a wider audience. Right now, that interesting thing is the nascent Dev Ops Interest Group.

Interest Groups have been a thing in Islandora for just about a year now. They are a way for members of the Islandora community with similar interests, challenges, and projects to come together to share resources and discuss the direction the project should take in the future. We have one for Preservation, Archives, Documentation, GIS, and Fedora 4 (which has become the guiding group for the Islandora/Fedora 4 upgration project). Following up on some conversations from iCampBC, Mark Jordan has proposed that there might be a need for a Dev Ops Interest Group as well, for the folks who spend their time actually deploying Islandora to come together and talk strategies. 

As you can see from the thread, the interest is certainly there, and I expect to be announcing the provision of a new Interest Group in days to come. But what brings this subject out from the list is the challenge issued by our Release Manager and Upgration Guru, Nick Ruest:

I'm sitting here waiting for the 7.x-1.5RC1 VM to upload to the release server, and I'm thinking... 

I propose or challenge the following: 

All of those who have expressed interest in the group, would you be willing to collaborate on creating a canonical development and release VM Vagrant file? I think this is probably the most pressing need to grow our developer community. 

I can create a shared repo in the Islandora-Labs organization, and add anyone willing to contribute to it. 

I can get us started. I'll cannibalize what we have for the Islandora & Fedora integration project. 

We could cannibalize Islandora Chef. 

We could cannibalize anything y'all are willing to bring to the table. 

Things to think about and sort out - CLAs, LSAP, 7.x-1.6 release. Those who are willing to contribute, should be aware that if this is given to the foundation via LSAP, we'll all have to be covered by a CLA, and do you think we could get this finished by the 7.x-1.6 release? I think we could. 

Other benefits, by sticking with bash scripts, and Vagrant, we can take advantage of other DevOps platforms. I'm thinking specifically of Packer.io and virtually looking toward Kevin Clarke. Wouldn't it be great if we finally had that Docker container whose tires have been kicked a couple of time? 

Any, let me know what you think. If you think I'm crazy, that's ok too :-) 

A crowd-sourced development environment for Islandora. And in fact, the first draft is already out there, just waiting for you to try it out and contribute. And prove to Nick that he's not crazy. 

Open Library Data Additions: Amazon Crawl: part 2-ba

planet code4lib - Tue, 2015-03-10 04:11

Part 2-ba of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

Galen Charlton: Notes on making my WordPress blog HTTPS-only

planet code4lib - Tue, 2015-03-10 02:25

The other day I made this blog, galencharlton.com/blog/, HTTPS-only.  In other words, if Eve want to sniff what Bob is reading on my blog, she’ll need to do more than just capture packets between my blog and Bob’s computer to do so.

This is not bulletproof: perhaps Eve is in possession of truly spectacular computing capabilities or a breakthrough in cryptography and can break the ciphers. Perhaps she works for any of the sites that host external images, fonts, or analytics for my blog and has access to their server logs containing referrer headers information.  Currently these sites are Flickr (images), Gravatar (more images), Google (fonts) or WordPress (site stats – I will be changing this soon, however). Or perhaps she’s installed a keylogger on Bob’s computer, in which case anything I do to protect Bob is moot.

Or perhaps I am Eve and I’ve set up a dastardly plan to entrap people by recording when they read about MARC records, then showing up at Linked Data conferences and disclosing that activity.  Or vice versa. (Note: I will not actually do this.)

So, yes – protecting the privacy of one’s website visitors is hard; often the best we can do is be better at it than we were yesterday.

To that end, here are some notes on how I made my blog require HTTPS.

Certificates

I got my SSL certificate from Gandi.net. Why them?  Their price was OK, I already register my domains through them, and I like their corporate philosophy: they support a number of free and open source software projects; they’re not annoying about up-selling, and they have never (to my knowledge) run sexist advertising, unlikely some of their larger and more well-known competitors. But there are, of course, plenty of options for getting SSL certificates, and once Let’s Encrypt is in production, it should be both cheaper and easier for me to replace the certs next year.

I have three subdomains of galencharlton.com that I wanted a certificate for, so I decided to get a multi-domain certificate.  I consulted this tutorial by rtCamp to generate the CSR.

After following the tutorial to create a modified version of openssl.conf specifying the subjectAltName values I needed, I generated a new private key and a certificate-signing request as follows:

openssl req -new -key galencharlton.com.key \ -out galencharlton.com.csr \ -config galencharlton.com.cnf \ -sha256

The openssl command asked me a few questions; the most important of which being the value to set the common name (CN) field; I used “galencharlton.com” for that, as that’s the primary domain that the certificate protects.

I then entered the text of the CSR into a form and paid the cost of the certificate.  Since I am a library techie, not a bank, I purchased a domain-validated certificate.  That means that all I had to prove to the certificate’s issuer that I had control of the three domains that the cert should cover.  That validation could have been done via email to an address at galencharlton.com or by inserting a special TXT field to the DNS zone file for galencharlton.com. I ended up choosing to go the route of placing a file on the web server whose contents and location were specified by the issuer; once they (or rather, their software) downloaded the test files, they had some assurance that I had control of the domain.

In due course, I got the certificate.  I put it and the intermediate cert specified by Gandi in the /etc/ssl/certs directory on my server and the private key in /etc/private/.

Operating System and Apache configuration

Various vulnerabilities in the OpenSSL library or in HTTPS itself have been identified and mitigated over the years: suffice it to say that it is a BEASTly CRIME to make a POODLE suffer a HeartBleed — or something like that.

To avoid the known problems, I wanted to ensure that I had a recent enough version of OpenSSL on the web server and had configured Apache to disable insecure protocols (e.g., SSLv3) and eschew bad ciphers.

The server in question is running Debian Squeeze LTS, but since OpenSSL 1.0.x is not currently packaged for that release, I indeed up adding Wheezy to the APT repositories list and upgrading the openssl and apache2 packages.

For the latter, after some Googling I ended up adapting the recommended Apache SSL virtualhost configuration from this blog post by Tim Janik.  Here’s what I ended up with:

<VirtualHost _default_:443> ServerAdmin gmc@galencharlton.com DocumentRoot /var/www/galencharlton.com ServerName galencharlton.com ServerAlias www.galencharlton.com SSLEngine on SSLCertificateFile /etc/ssl/certs/galencharlton.com.crt SSLCertificateChainFile /etc/ssl/certs/GandiStandardSSLCA2.pem SSLCertificateKeyFile /etc/ssl/private/galencharlton.com.key Header add Strict-Transport-Security "max-age=15552000" # No POODLE SSLProtocol all -SSLv2 -SSLv3 +TLSv1.1 +TLSv1.2 SSLHonorCipherOrder on SSLCipherSuite "EECDH+ECDSA+AESGCM EECDH+aRSA+AESGCM EECDH+ECDSA+SHA384 EECDH+ECDSA+SHA256 EECDH+ aRSA+SHA384 EECDH+aRSA+SHA256 EECDH+AESGCM EECDH EDH+AESGCM EDH+aRSA HIGH !MEDIUM !LOW !aNULL !eNULL !LOW !RC4 !MD5 !EXP !PSK !SRP !DSS" </VirtualHost>

I also wanted to make sure that folks coming in via old HTTP links would get permanently redirected to the HTTPS site:

<VirtualHost *:80> ServerName galencharlton.com Redirect 301 / https://galencharlton.com/ </VirtualHost> <VirtualHost *:80> ServerName www.galencharlton.com Redirect 301 / https://www.galencharlton.com/ </VirtualHost>

Checking my work

I’m a big fan of the Qualsys SSL Labs server test tool, which does a number of things to test how well a given website implements HTTPS:

  • Identifying issues with the certificate chain
  • Whether it supports vulnerable protocol versions such as SSLv3
  • Whether it supports – and request – use of sufficiently strong ciphers.
  • Whether it is vulnerable to common attacks.

Suffice it to say that I required a couple iterations to get the Apache configuration just right.

WordPress

To be fully protected, all of the content embedded on a web page served via HTTPS must also be served via HTTPS.  In other words, this means that image URLs should require HTTPS – and the redirects in the Apache config are not enough.  Here is the sledgehammer I used to update image links in the blog posts:

create table bkp_posts as select * from wp_posts; begin; update wp_posts set post_content = replace(post_content, 'http://galen', 'https://galen') where post_content like '%http://galen%'; commit;

Whee!

I also needed to tweak a couple plugins to use HTTPS rather than HTTP to embed their icons or fetch JavaScript.

Finishing touches

In the course of testing, I discovered a couple more things to tweak:

  • The web sever had been using Apache’s mod_php5filter – I no longer remember why – and that was causing some issues when attempting to load the WordPress dashboard.  Switching to mod_php5 resolved that.
  • My domain ownership proof on keybase.io failed after the switch to HTTPS.  I eventually tracked that down to the fact that keybase.io doesn’t have a bunch of intermediate certificates in its certificate store that many browsers do. I resolved this by adding a cross-signed intermediate certificate to the file referenced by SSLCertificateChainFile in the Apache config above.

My blog now has an A+ score from SSL Labs. Yay!  Of course, it’s important to remember that this is not a static state of affairs – another big OpenSSL or HTTPS protocol vulnerability could turn that grade to an F.  In other words, it’s a good idea to test one’s website periodically.

FOSS4Lib Upcoming Events: NE regional Hydra meeting

planet code4lib - Mon, 2015-03-09 21:35
Date: Thursday, May 7, 2015 - 09:00 to 16:00Supports: Hydra

Last updated March 9, 2015. Created by Peter Murray on March 9, 2015.
Log in to edit this page.

From the announcement:

A NE Hydra Meeting is being planned for Thursday May 7, 2015 at Brown University and we’d like your input.

LITA: 2015 Kilgour Award Goes to Ed Summers

planet code4lib - Mon, 2015-03-09 21:27

The Library & Information Technology Association (LITA), a division of the American Library Association (ALA), announces Ed Summers as the 2015 winner of the Frederick G. Kilgour Award for Research in Library and Information Technology. The award, which is jointly sponsored by OCLC, is given for research relevant to the development of information technologies, especially work which shows promise of having a positive and substantive impact on any aspect(s) of the publication, storage, retrieval and dissemination of information, or the processes by which information and data is manipulated and managed. The awardee receives $2,000, a citation, and travel expenses to attend the award ceremony at the ALA Annual Conference in San Francisco, where the award will be presented on June 28, 2015.

Ed Summers is Lead Developer at the Maryland Institute for Technology in the Humanities (MITH), University of Maryland. Ed has been working for two decades helping to build connections between libraries and archives and the larger communities of the World Wide Web. During that time Ed has worked in academia, start-ups, corporations and the government. He is interested in the role of open source software, community development, and open access to enable digital curation. Ed has a MS in Library and Information Science and a BA in English and American Literature from Rutgers University.

Prior to joining MITH Ed helped build the Repository Development Center (RDC) at the Library of Congress. In that role he led the design and implementation of the NEH funded National Digital Newspaper Program’s Web application, which provides access to 8 million newspapers from across the United States. He also helped create the Twitter archiving application that has archived close to 500 billion tweets (as of September 2014). Ed created LC’s image quality assurance service that has allowed curators to sample and review over 50 million images. He served as a member of the Semantic Web Deployment Group at the W3C where he helped standardize SKOS, which he put to use in implementing the initial version of LC’s Linked Data service.

Before joining the Library of Congress Ed was a software developer at Follett Corporation where he designed and implemented knowledge management applications to support their early e-book efforts. He was the fourth employee at CheetahMail in New York City, where he led the design of their data management applications. And prior to that Ed worked in academic libraries at Old Dominion University, the University of Illinois and Columbia University where he was mostly focused on metadata management applications.

Ed likes to use experiments to learn about the Web and digital curation. Examples of this include his work with Wikipedia on Wikistream, which helps visualize the rate of change on Wikipedia, and CongressEdits, which allows Twitter users to follow edits being made to Wikipedia from the Congress. Some of these experiments are social, such as his role in creating the code4lib community, which is an international, cross-disciplinary group of hackers, designers and thinkers in the digital library space.

Notified of the award, Ed said: “It is a great honor to have been selected to receive the Kilgour Award this year. I was extremely surprised since I have spent most of my professional career (so far) as a developer, building communities of practice around software for libraries and archives, rather than traditional digital library research. During this time I have had the good fortune to work with some incredibly inspiring and talented individuals, teams and open source collaborators. I’ve only been as good as these partnerships have allowed me to be, and I’m looking forward to more. I am especially grateful to all those individuals that worked on a free and open Internet and World Wide Web. I remain convinced that this is a great time for library and archives professionals, as the information space of the Web is in need of our care, attention and perspective.”

Members of the 2014-15 Frederick G. Kilgour Award committee are:

  • Tao Zhang, Purdue University (chair)
  • Erik Mitchell, University of California, Berkeley (past chair)
  • Danielle Cunniff Plumer, DCPlumer Associates, LLC
  • Holly Tomren, Drexel University Libraries
  • Jason Simon, Fitchburg State University
  • Kebede Wordofa, Austin Peay State University, and
  • Roy Tennant, OCLC liaison

About LITA

Established in 1966, LITA is the leading organization reaching out across types of libraries to provide education and services for a broad membership of over 3,000 systems librarians, library technologists, library administrators, library schools, vendors and many others interested in leading edge technology and applications for librarians and information providers. For more information, visit www.lita.org.

About OCLC

Founded in 1967, OCLC is a nonprofit, membership, computer library service and research organization dedicated to the public purposes of furthering access to the world’s information and reducing library costs. OCLC Research is one of the world’s leading centers devoted exclusively to the challenges facing libraries in a rapidly changing information environment. It works with the community to collaboratively identify problems and opportunities, prototype and test solutions, and share findings through publications, presentations and professional interactions. For more information, visit www.oclc.org/research.

Question and Comments

Mary Taylor
Executive Director
Library & Information Technology Association (LITA)
(800) 545-2433 ext 4267
mtaylor@ala.org

 

Nicole Engard: Bookmarks for March 9, 2015

planet code4lib - Mon, 2015-03-09 20:30

Today I found the following resources and bookmarked them on <a href=

  • CardKit A simple, configurable, web based image creation tool

Digest powered by RSS Digest

The post Bookmarks for March 9, 2015 appeared first on What I Learned Today....

Related posts:

  1. Can you say Kebberfegg 3 times fast
  2. Planning a party or event?
  3. Decipher that Font

Pages

Subscribe to code4lib aggregator