You are here

Feed aggregator

Andromeda Yelton: What I learned teaching jQuery (part 1)

planet code4lib - Mon, 2014-09-15 13:30

On August 11-12, I taught an Introduction to Programming Concepts via jQuery course at the DLF/Code4Lib unconference at the George Washington University. I was playing with several theories in developing this course:

  • Porting to jQuery so that it could be 100% browser-based: familiar environment, no installfest, maximizes time available for actual programming concepts.
  • Porting to jQuery so that it could be 100% visual (on which more below).
  • Simply giving up on the idea of getting true novices to the point of being able to write real-world-applicable code in a day-and-a-half workshop, and focusing instead on building a foundation that makes existing code-learning resources more intelligible, and leaves students with enough good feelings about code that they’ll be inclined to learn more.

Bottom line: I think it worked really well!

Today I’m going to talk about my theoretical inspiration for the course; upcoming posts will cover teaching techniques I used to operationalize that, and then future plans. (Look, there’s a jquery workshop tag so you can find them all!)

yo dawg i heard you like tests…

The whole workshop was, in a sense, a way to play with this paper: “A fresh look at novice programmers’ performance and their teachers’ expectations”. Its jaw-dropping result was that providing novice programming students with a test apparatus for a programming task quadrupled the number of subtasks they could successfully complete (students without the tests completed an average of 0.83 out of 4 tasks, compared to 3.26 for students who could check their work against the tests — in other words, students without tests didn’t even get one subtask working, on average).

Well gosh. If tests are that effective, I’m obligated to provide them. This is consistent with my intuitive observations of the CodingBat section of Boston Python Workshop — being asked to write provably correct code is the point where students discover whether their existing mental models are right, and start to iterate them. But the CodingBat interface is confusing, and you need to sink some instructional time just into making sense of it. And, honestly, with a lot of conventional intro programming tasks, it’s hard to tell if you’ve succeeded; you’ve got a command-line-ish interface (unfamiliar to many of my students) and a conceptual problem with abstract success criteria. I wanted something that would give immediate, obvious feedback.

Hence, jQuery. Manipulating the DOM produces instant visual effects. If you were asked to make a button disappear, it’s super obvious if you succeeded. (Well. Possibly assuming sightedness, and (with some of my tasks) possibly also non-colorblindness — I stayed away from red/green semantic pairs, but I didn’t audit for all the forms of colorblindness. I need to mull this one over.) And as it turns out, when you ask your students to add a class that changes a bunch of things to have a kitten pic background, it’s also super obvious to you as the instructor when they’ve succeeded (wait for it…wait…“awwww!”).

My hope for this class was that it would provide students who were genuinely novices at coding with the conceptual background they needed to get mileage out of the many intro-programming learning options out there. As Abigail Goben notes, these courses tend to implicitly assume that you already know how to code and just need to be told how to do it in this language, even when they brand themselves as intro courses. People will need much more practice than a day-and-a-half bootcamp to get from novice to proficient enough to write things they can use in everyday work, so I want to get them to a place where that practice will feel manageable. And for the students who do have some experience, hopefully I can introduce them to a language they don’t know yet in a way that has enough meat not to bore them.

Tomorrow, teaching techniques I used to get there, part 1: pacing.

LITA: Technology Skills and Your Resume/CV

planet code4lib - Mon, 2014-09-15 13:00

As I thought about what I wanted to write for my first LITA post, I really wasn’t sure until inspiration struck as I procrastinated by scrolling down my Facebook feed. I had been tagged in a status written by a library student who felt unsure of how she was displaying her tech skills on her CV. She asked for opinions. Was it even relevant to put a tech section on her CV if she wasn’t applying for a digital library job? If she already mentioned tech skills in a cover letter, did they need to be put on a CV, too?

The thread got a lot of different responses, some aligning with my thoughts on the subject and others that befuddled me. Why, for instance, was someone suggesting that you should only list tech skills you got in the classroom and not those you picked up on the job? Why did people seem to think that if you were writing a cover letter you should list your tech skills there and not on a CV?

Today, I thought I would share a few brief thoughts on how I list tech skills on my professional documents and how that connects to how I talk about them in a cover letter. Keep in mind that I am an academic librarian with a job in digital libraries, so the usefulness of my perspective beyond this specific area may be limited. And just to clarify, I recognize that everyone has different opinions on content, formatting, and length of professional documents. Just check out one of the myriad library resources for job hunters. It’s a good thing to have varying perspectives, actually, and I welcome all the opinions out there, whether they agree or disagree with my take on the subject.

What I Do

Why would I write a paragraph about it when I can just show you? This is how the tech section of my resume and CV looks now (very similar to when I applied for jobs in late 2013/early 2014).

  • Coding – HTML5, CSS
  • Digital Collection/Content Management – Drupal, Omeka
  • Digitization Software  – Epson Scan, Silverfast
  • Document Design – Adobe Creative Suite 5, Microsoft Office 2010 suite
  • Markup Languages & Standards – EAD, MODS, RDF, TEI, XML
  • Operating Systems – Mac OS X, Windows, UNIX
  • Social Media – Facebook, Twitter, WordPress, podcasting, wikis
  • Repository Software DSpace, Fedora
  • Other – ArcGIS, Neatline

This section is listed under the header “Technology” and does not include bullet points (used in this post for formatting reasons). Check out my entire CV to see how this section fits in with the rest of my content.

Conveying my tech skills in this way provides a quick way for a potential employer to understand the different software I know. It doesn’t provide a lot of usable information since there’s no indication of my skill level or familiarity with these tools. I consider this section of my CV a catch-all for my tech knowledge, but it’s up to my cover letter to educate the reader about my depth of understanding on specific tools relevant to the job description. I don’t include any tools here that I wouldn’t be able to easily answer, “So tell me how you have used ___ in the past?”

I have tinkered around with this section more times than I can count over the past few years.  Even now, writing this blog post, I’m looking at it and thinking, “Is that really relevant to me anymore?” I haven’t looked at other people’s CVs in a long time, and though those might be good to reference in this post, let’s be real: it’s a gloomy Friday afternoon as I type this and I just can’t bring myself to do a quick search.

My laziness aside, I’m particularly interested in how different types of info professionals, from archivists to public, academic, and special librarians, convey their tech skills in professional documents. So many jobs in libraries involve working with technology. I would think you’d be hard-pressed to find a new job that doesn’t involve tech in some way. So is there a way to standardize how we convey this type of information, or are our jobs so diverse that there’s really no way to do so?

I’m curious: How do you highlight your technology skills on professional documents like a resume or CV? Tell me in the comments!

District Dispatch: Reminder: Social Security webinar this week

planet code4lib - Mon, 2014-09-15 06:16

Photo by Jessamyn West via flickr

Reminder: The American Library Association (ALA) is encouraging librarians to participate in “My SSA,” a free webinar that will teach participants how to use My Social Security (MySSA), the online Social Security resource.

Do you know how to help your patrons locate information on Supplemental Security Income or Social Security? Presented by leaders and members of the development team of MySSA, this session will provide attendees with an overview of MySSA. In addition to receiving benefits information in print, the Social Security Administration is encouraging librarians to create an online MySSA account to view and track benefits.

Attendees will learn about viewing earnings records and receiving instant estimates of their future Social Security benefits. Those already receiving benefits can check benefit and payment information and manage their benefits.

Speakers include:

  • Maria Artista-Cuchna, Acting Associate Commissioner, External Affairs
  • Kia Anderson, Supervisory Social Insurance Specialist
  • Arnoldo Moore, Social Insurance Specialist
  • Alfredo Padilia Jr., Social Insurance Specialist
  • Diandra Taylor, Management Analyst

Date: Wednesday, September 17, 2014
Time: 2:00 PM – 3:00 PM EDT
Register for the free event

If you cannot attend this live session, a recorded archive will be available. To view past webinars also hosted collaboratively with iPAC, please visit

The post Reminder: Social Security webinar this week appeared first on District Dispatch.

Mita Williams: The story of our future : This changes everything

planet code4lib - Mon, 2014-09-15 01:54
In the middle of her column that is ostensibly about the television series Red Band Society, New Yorker critic Emily Nausbaum summarized John Green's YA bestseller The Fault in Our Stars with insight:

Among the many appealing qualities of Green's novel is how much it's about storytelling itself, and the way in which books function as a badge of identity, a marker of taste and values... For all it's romantic contours, "The Fault in Our Stars" is centrally a dialectic about why people seek out stories, one that never quite takes a stand on the question of whether we're right to wish for greater clarity in our art, characters we can "relate" to, or, for that matter, a happy ending.
If you had to encapsulate the future of libraries as a story, what story would that be?

Stewart Brand's How Buildings Learn?

In this world, technology creates a fast, globalised world where digital services and virtual presence are commonplace. Overall, the mood is fairly optimistic, but digitalisation and connectivity soon create too much information and format instability, so there is a slight feeling of unease amongst the general population. Physical books are in slight decline in this world although library services are expanding. The reason for this is that public libraries now take on a wide range of e-government services and are important as drop-in centres for information and advice relating to everything from education and childcare to immigration. In this scenario, libraries have also mutated into urban hubs and hangouts; vibrant meeting places for people and information that house cafés, shops, gyms, crèches, theatres, galleries and various cultural activities and events.
William Gibson's Neuromancer?

This is a world gone mad. Everything is accelerating and everything is in short supply and is priced accordingly. Electricity prices are sky-high and the internet is plagued by a series of serious issues due to overwhelming global demand. In this scenario, public libraries are initially written-off as digital dinosaurs, but eventually there is a swing in their favour as people either seek out reliable internet connections or because there is a real need for places that allow people to unplug, slow down and reflect. In this world, information also tends to be created and owned by large corporations and many small and medium sized firms cannot afford access. Therefore, public libraries also become providers of business information and intelligence. This creates a series of new revenue streams but funding is still tight and libraries are continually expected to do more with less and less funding and full-time staff.
Ray Bradbury's Fahrenheit 451?

This world is a screenager’s paradise. It is fast-paced, global and screen-based. Digitalisation has fundamentally changed the way that people consume information and entertainment, but it has also changed the way that people think. this is a post-literate world where physical books are almost dead and public libraries focus on digital collections and virtual services. In this scenario, books take up very little physical space so more space is given over to internet access, digital books and various other forms of digital entertainment. Public libraries blur the boundaries with other retailers of information and entertainment and also house mental health gyms, technology advice desks, download centres and screening rooms. Despite all this, public libraries struggle to survive due to a combination of ongoing funding cuts, low public usage and global competition. 
Or Rachel Carson's Silent Spring?

In this scenario, climate change turns out to be much worse than expected. Resource shortages and the high cost of energy in particular mean that the physical movement of products and people is greatly reduced and individuals are therefore drawn back to their local communities. It is a world where globalisation slows down, digital technology is restrained and where all activities are related to community impact. Public libraries do well in this world. People become voracious consumers of physical books (especially old books) and libraries are rediscovered and revered by the majority of the population due to their safety and neutrality. they are also highly valued because they are free public spaces that promote a wide variety of community-related events. Nevertheless, there are still pressures caused by the high cost of energy and the need to maintain facilities. The phrase ‘dark euphoria’ (Bruce Sterling) sums up the mood in this scenario, because on one level the world is falling apart but on another level people are quite content. 
These scenarios come from a remarkable document produced five years ago in 2009 for The Library Council of New South Wales called The Bookends Scenarios [pdf].

It's the only document in the library literature that I've seen that seriously addresses our global warming future.  It's the only one that I've come across that confronts us and forces us to consider how we may shape our institution and our services now so we can be there for our community when its in greatest need.

If you had to encapsulate the future as a story, what story would that be?

I suffer from dark euphoria.  I worry about global warming.

That's why I'm going to take part in the People's Climate March in New York City on September 21th, 2014.

I'm going because our leaders are not even talking about taking the necessary action to reduce atmospheric carbon and to mitigate the effects of climate change.  This is a movement that requires all of us to become the leaders that we so desperately need.

There's a book that goes with this march: This changes everything.

I'm not normally one for marches. I share the suspicion that gatherings and marches themselves don't change anything.

But events change people. There are events that define movements.

You couldn't have an Occupy Movement without Occupy Wall Street.  And without Occupy Wall Street, we wouldn't have had Occupy Sandy.

Fight to #EndRacism...for #ClimateJustice. #peoplesclimate BOOM
— REEP (@reep_ace) September 14, 2014
I understand the feelings of helplessness and darkness when reading or hearing about another terrifying warning about the threat of global warming. I struggle with these feelings more than I care to admit.

I find solace from these feelings from a variety of different sources beyond my family, friends and community.  Of these, the study of history oddly enough, gives me great comfort.  It has helped me find stories to help me understand the present.

There are those who call the Climate Change Movement, the second Abolition Movement, and I think this description is fitting for several reasons. For one, it gets across that we need to draw upon our shared moral fortitude to make it politically necessary to force those in power to forfeit profit from oil and coal, which unchecked, will continue to cost us grievous human suffering.

It also describes the sheer enormity of the work that must be done. The analogy makes clear how it will be necessary to change every aspect of society to mitigate climate change at this point.

And yet, it has happened before.  Ordinary people came together to stop slavery.

On that note, and I hope I'm not spoiling it for you, I took great comfort in the last passage of David Mitchell's Cloud Atlas, a book of several pasts and a future.

Upon my return to San Francisco, I shall pledge myself to the abolitionist cause, because I owe my life to a self-freed slave & because I must begin somewhere.
I hear my father-in-law’s response:  “Oho, fine, Whiggish sentiments, Adam.  But don’t tell me about justice!  Ride to Tennessee on an ass and convince the rednecks they are merely white-washed negroes and their negroes are black-washed whites!  Sail to the Old World, tell ‘em their imperial slaves’ rights are as inalienable as the Queen of Belgium’s!  Oh, you’ll grow hoarse, poor and gray in caucuses!  You’ll be spat upon, shot at, lynched, pacified with medals, spurned by backwoodsmen! Crucified!  Naïve, dreaming Adam.  He who would do battle with the many headed hydra of human nature must pay a world of pain and his family must pay it along with him! And only as you gasp your dying breath shall you understand your life amounted to no more than one drop in a limitless ocean!”

Yet what is any ocean but a multitude of drops?

Casey Bisson: Ruins of Roebling’s Works

planet code4lib - Sun, 2014-09-14 16:55

From Flux Machine: a tumbler of Kevin Weir’s creepy gifs.

The original is from the Library of Congress. If the name “Roebling” sounds familiar, it’s because this is the company, founded by John A. Roebling, that built the Brooklyn Bridge and setup a good business making cables, or wire rope.

The Roebling brothers suspected the fire was German sabotage. Given the activities of the German ambassador at the time, the claim has a whiff of plausibility. Of course, it could also have been aliens.

Patrick Hochstenbach: Trying out caricature styles

planet code4lib - Sun, 2014-09-14 11:08
Filed under: Doodles Tagged: art, caricature, cartoon, copic, depardieu, doodle, marker

Patrick Hochstenbach: Doodling at JCDL2014

planet code4lib - Sat, 2014-09-13 17:22
Visited JCDL2014 in London last week. Some talks ended up in cartoons Filed under: Doodles Tagged: adobe, copic, dhcl, digital library, jcdl, london, marker

Dan Scott: My small contribution to this week

planet code4lib - Sat, 2014-09-13 07:27

Version 1.91 of the vocabulary was released a few days ago, and I once again had a small part to play in it.

With the addition of the workExample and exampleOfWork properties, we (Richard Wallis, Dan Brickley, and I) realized that examples of these CreativeWork example properties were desperately needed to help clarify their appropriate usage. I had developed one for the blog post that accompanied the launch of those properties, but the question was, where should those examples live in the official docs? CreativeWork has so many children, and the properties are so broadly applicable, that it could have been added to dozens of type pages.

It turns out that an until-now unused feature of the infrastructure is that examples can live on property pages; even Dan Brickley didn't think this was working. However, a quick test in my sandbox showed that it _was_ in perfect working order, so we could locate the examples on their most relevant documentation pages... Huzzah!

I was then able to put together a nice, juicy example showing relationships between a Tolkien novel (The Fellowship of the Ring), subsequent editions of that novel published by different companies in different locations at different times, and movies based on that novel. From this librarian's perspective, it's pretty cool to be able to do this; it's a realization of a desire to express relationships that, in most library systems, are hard or impossible to accurately specify. (Should be interesting to try and get this expressed in Evergreen and Koha...)

In an ensuing conversation on public-vocabs about the appropriateness of this approach to work relationships, I was pleased to hear Jeff Young say "+1 for using exampleOfWork / workExample as many times as necessary to move vaguely up or down the bibliographic abstraction layers."... To me, that's a solid endorsement of this pragmatic approach to what is inherently messy bibliographic stuff.

Kudos to Richard for having championed these properties in the first place; sometimes we're a little slow to catch on!

FOSS4Lib Recent Releases: OpenWayback - 2.0.0

planet code4lib - Fri, 2014-09-12 16:03

Last updated September 12, 2014. Created by Peter Murray on September 12, 2014.
Log in to edit this page.

Package: OpenWaybackRelease Date: Friday, September 12, 2014

FOSS4Lib Updated Packages: OpenWayback

planet code4lib - Fri, 2014-09-12 15:08

Last updated September 12, 2014. Created by Peter Murray on September 12, 2014.
Log in to edit this page.

OpenWayback is an open source Java application designed to query and access archived web material. It was first released by the Internet Archive in September 2005, based on the (then) perl-based Internet Archive Wayback Machine, to enable public distribution of the application and increase its maintainability and extensibility. The Open Source Wayback Machine (OSWM) since then has been widely used by members of the International Internet Preservation Consortium (IIPC) and become the de facto rendering software for web archives.

Package Links Releases for OpenWayback Upcoming Events for the OpenWayback Package TechnologyPackage Type: Data Preservation and ManagementLicense: Apache 2.0Development Status: Production/StableOperating System: Browser/Cross-PlatformTechnologies Used: SOLRTomcatProgramming Language: Java

Library of Congress: The Signal: Teaching Integrity in Empirical Research: An Interview with Richard Ball and Norm Medeiros

planet code4lib - Fri, 2014-09-12 14:26

Richard Ball (Associate Professor of Economics, Haverford College) and Norm Medeiros (Associate Librarian, Haverford Libraries)

This post is the latest in our NDSA Innovation Working Group’s ongoing Insights Interview series. Chelcie Rowell (Digital Initiatives Librarian, Wake Forest University) interviews Richard Ball (Associate Professor of Economics, Haverford College) and Norm Medeiros (Associate Librarian, Haverford Libraries) about Teaching Integrity in Empirical Research, or Project Tier.

Chelcie: Can you briefly describe Teaching Integrity in Empirical Research, or Project TIER, and its purpose?

Richard: For close to a decade, we have been teaching our students how to assemble comprehensive documentation of the data management and analysis they do in the course of writing an original empirical research paper. Project TIER is an effort to reach out to instructors of undergraduate and graduate statistical methods classes in all the social sciences to share with them lessons we have learned from this experience.

When Norm and I started this work, our goal was simply to help our students learn to do good empirical research; we had no idea it would turn into a “project.” Over a number of years of teaching an introductory statistics class in which students collaborated in small groups to write original research papers, we discovered that it was very useful to have students not only turn in a final printed paper reporting their analysis and results, but also submit documentation of exactly what they did with their data to obtain those results.

We gradually developed detailed instructions describing all the components that should be included in the documentation and how they should be formatted and organized. We now refer to these instructions as the TIER documentation protocol. The protocol specifies a set of electronic files (including data, computer code and supporting information) that would be sufficient to allow an independent researcher to reproduce–easily and exactly–all the statistical results reported in the paper. The protocol is and will probably always be an evolving work in progress, but after several years of trial and error, we have developed a set of instructions that our students are able to follow with a high rate of success.

Even for students who do not go on to professional research careers, the exercise of carefully documenting the work they do with their data has important pedagogical benefits. When students know from the outset that they will be required to turn in documentation showing how they arrive at the results they report in their papers, they approach their projects in a much more organized way and keep much better track of their work at every phase of the research. Their understanding of what they are doing is therefore substantially enhanced, and I in turn am able to offer much more effective guidance when they come to me for help.

Research Data Management by user jannekestaaks on Flickr

Despite these benefits, methods of responsible research documentation are virtually, if not entirely, absent from the curricula of all the social sciences. Through Project TIER, we are engaging in a variety of activities that we hope will help change that situation. The major events of the last year were two faculty development workshops that we conducted on the Haverford campus. A total of 20 social science faculty and research librarians from institutions around the US attended these workshops, at which we described our experiences teaching our students good research documentation practices, explained the nuts and bolts of the TIER documentation protocol, and discussed with workshop participants the ways in which they might integrate the protocol into their teaching and research supervision. We have also been spreading the word about Project TIER by speaking at conferences and workshops around the country, and by writing articles for publications that we hope will attract the attention of social science faculty who might be interested in joining this effort.

We are encouraged that faculty at a number of institutions are already drawing on Project TIER and teaching their students and research advisees responsible methods of documenting their empirical research. Our ultimate goal is eventually to see a day when the idea of a student turning in an empirical research paper without documentation of the underlying data management and analysis is considered as aberrant as the idea of a student turning in a research paper for a history class without footnotes or a reference list.

Chelcie: How did TIER and your 10-year collaboration (so far!) get started?

Norm: When I came to the Haverford Libraries in 2000, I was assigned responsibility for the Economics Department. Soon thereafter I began providing assistance to Richard’s introductory statistics students, both in locating relevant literature as well as in acquiring data for statistical analysis. I provided similar, albeit more specialized, assistance to seniors in the context of their theses. Richard invited me to his classes and advised students to make appointments with me. Through regular communication, I came to understand the outcomes he sought from his students’ research assignments, and tailored my approach to meet these expectations. A strong working relationship ensued.

Meanwhile, in 2006 the Haverford Libraries in conjunction with Bryn Mawr and Swarthmore Colleges implemented DSpace, the widely-deployed open source repository system. The primary collection Haverford migrated into DSpace was its senior thesis archive, which had existed for the previous five years in a less-robust system. Based on the experience I had accrued to that point working with Richard and his students, I thought it would be helpful to future generations of students if empirical theses coexisted with the data from which the results were generated.

The DSpace platform provided a means of storing such digital objects and making them available to the public. I mentioned this idea to Richard, who suggested that not only should we post the data, but also all the documentation (the computer command files, data files and supporting information) specified by our documentation protocol. We didn’t know it at the time, but the seeds of Project TIER were planted then. The first thesis with complete documentation was archived on DSpace in 2007, and several more have been added every year since then.

Chelcie: You call TIER a “soup-to-nuts protocol for documenting data management and analysis.” Can you walk us through the main steps of that protocol?

The data by ken fager on Flickr

Richard: The term “soup-to-nuts” refers to the fact that the TIER protocol entails documenting every step of data management and analysis, from the very beginning to the very end of a research project. In economics, the very beginning of the empirical work is typically the point at which the author first obtains the data to be used in the study, either from an existing source such as a data archive, or by conducting a survey or experiment; the very end is the point at which the final paper reporting the results of the study is made public.

The TIER protocol specifies that the documentation should contain the original data files the author obtained at the very beginning of the study, as well as computer code that executes all the processing of the data necessary to prepare them for analysis–including, for example, combining files, creating new variables, and dropping cases or observations–and finally generating the results reported in the paper. The protocol also specifies several kinds of additional information that should be included in the documentation, such as metadata for the original data files, a data appendix that serves as a codebook for the processed data used in the analysis and a read-me file that serves as a users’ guide to everything included in the documentation.

This “soup-to-nuts” standard contrasts sharply with the policies of academic journals in economics and other social sciences. Some of these journals require authors of empirical papers to submit documentation along with their manuscripts, but the typical policy requires only the processed data file used in the analysis and the computer code that uses this processed data to generate the results. These policies do not require authors to include copies of the original data files or the computer code that processes the original data to prepare them for analysis. In our view, this standard, sometimes called “partial replicability,” is insufficient. Even in the simplest cases, construction of the processed dataset used in the analysis involves many decisions, and documentation that allows only partial replication provides no record of the decisions that were made.

Complete instructions for the TIER protocol are available online. The instructions are presented in a series of web pages, and they are also available for download in a single .pdf document.

Chelcie: You’ve taught the TIER protocol in two main curricular contexts: introductory statistics courses and empirical senior thesis projects. What is similar or different about teaching TIER in these two contexts?

Richard: The main difference is that in the statistics courses students do their research projects in groups made up of 3-5 members. It is always a challenge for students to coordinate work they do in groups, and the challenge is especially great when the work involves managing several datasets and composing several computer command files. Fortunately, there are some web-based platforms that can facilitate cooperation among students working on this kind of project. We have found two platforms to be particularly useful: Dataverse, hosted by the Harvard Institute for Quantitative Social Science, and the Open Science Framework, hosted by the Center for Open Science.

Another difference is that when seniors write their theses, they have already had the experience of using the protocol to document the group project they worked on in their introductory statistics class. Thanks to that experience, senior theses tend to go very smoothly.

Chelcie: Can you elaborate a little bit about the Haverford Dataverse you’ve implemented for depositing the data underlying senior theses?

Norm: In 2013 Richard and I were awarded a Sloan/ICPSR challenge grant with which to promote Project TIER and solicit participants. As we considered this initiative, it was clear to us that a platform for hosting files would be needed both locally for instructors who perhaps didn’t have a repository system in place, as well as for fostering cross-institutional collaboration, whereby students learning the protocol in one participating institution could run replications against finished projects at another institution.

We imagined such a platform would need an interactive component, such that one could comment on the exactness of the replication. DSpace is a strong platform in many ways, but it is not designed for these purposes, so Richard and I began investigating available options. We came across Dataverse, which has many of the features we desired. Although we have uploaded some senior theses as examples of the protocol’s application, it was really the introductory classes for which we sought to leverage Dataverse. Our Project TIER Dataverse is available online.

In fall 2013, we experimented with using Dataverse directly with students. We sought to leverage the platform as a means of facilitating file management and communication among the various groups. We built Dataverses for each of the six groups in Richard’s introductory statistics course. We configured templates that helped students understand where to load their data and associated files. The process of building these Dataverses was time consuming, and at points we needed to jury rig the system to meet our needs. Although Dataverse is a robust system, we found its interface too complex for our needs. This fall we plan to use the Open Science Framework system to see if it can serve our students slightly better. Down the road, we can envision complementary roles for Dataverse and OSF as it relates to Project TIER.

Chelcie: After learning the TIER protocol, do students’ perceptions of the value of data management change?

Richard: Students’ perceptions change dramatically. I see this every semester. For the first few weeks, students have to do a few things to prepare to do what is required by the protocol, like setting up a template of folders in which to store the documentation as they work on the project throughout the semester, and establishing a system that allows all the students in the group to access and work on the files in those folders. There are always a few wrinkles to work out, and sometimes there is a bit of grumbling, but as soon as students start working seriously with their data they see how useful it was to do that up-front preparation. They realize quickly that organizing their work as prescribed by the protocol increases their efficiency dramatically, and by the end of the semester they are totally sold–they can’t imagine doing it any other way.

Chelcie: Have you experienced any tensions between developing step-by-step documentation for a particular workflow and technology stack versus developing more generic documentation?

Richard: The issue of whether the TIER protocol should be written in generic terms or tailored to a particular platform and/or a particular kind of software is an important one, but for the most part has not been the source of any tensions. All of the students in our introductory statistics class and most of our senior thesis advisees use Stata, on either a Windows or Mac operating system. The earliest versions of the protocol were therefore written particularly for Stata users, which meant, for example, we used the term “do-file” instead of “command file,” and instead of saying something like “a data file saved in the proprietary format of the software you are using” we would say “a data file saved in Stata’s .dta format.”

But fundamentally there is nothing Stata-specific about the protocol. Everything that we teach students to do using Stata works just fine with any of the other major statistical packages, like SPSS, R and SAS. So we are working on two ways of making it as easy as possible for users of different software to learn and teach the protocol. First, we have written a completely software-neutral version. And second, with the help of colleagues with expertise in other kinds of software, we are developing versions for R and SPSS, and we hope to create a SAS version soon. We will make all these versions available on the Project TIER website as they become available.

The one program we have come across for which the TIER protocol is not well suited is Microsoft Excel. The problem is that Excel is an exclusively interactive program; it is difficult or impossible to write an editable program that executes a sequence of commands. Executable command files are the heart and soul of the TIER protocol; they are the tool that makes it possible literally to replicate statistical results. So Excel cannot be the principal program used for a project for which the TIER documentation protocol is being followed.

Chelcie: What have you found to be the biggest takeaways from your experience introducing a data management protocol to undergraduates?

Richard: In the response to the first question in this interview, I described some of the tangible pedagogical benefits of teaching students to document their empirical research carefully. But there is a broader benefit that I believe is more fundamental. Requiring students to document the statistical results they present in their papers reinforces the idea that whenever they want to claim something is true or advocate a position, they have an intellectual responsibility to be able to substantiate and justify all the steps of the argument that led them to their conclusion. I believe this idea should underlie almost every aspect of an undergraduate education, and Project TIER helps students internalize it.

Chelcie: Thanks to funding from the Sloan Foundation and ICPSR at the University of Michigan, you’ve hosted a series of workshops focused on teaching good practices in documenting data management and analysis. What have you learned from “training the trainers”?

Richard: Our experience with faculty from other institutions has reinforced our belief that the time is right for initiatives that, like Project TIER, aim to increase the quality and credibility of empirical research in the social sciences. Instructors frequently tell us that they have thought for a long time that they really ought to include something about documentation and replicability in their statistics classes, but never got around to figuring out just how to do that. We hope that our efforts on Project TIER, by providing a protocol that can be adopted as-is or modified for use in particular circumstances, will make it easier for others to begin teaching these skills to their students.

We have also been reminded of the fact that faculty everywhere face many competing demands on their time and attention, and that promoting the TIER protocol will be hard if it is perceived to be difficult or time-consuming for either faculty or students. In our experience, the net costs of adopting the protocol, in terms of time and attention, are small: the protocol complements and facilitates many aspects of a statistics class, and the resulting efficiencies largely offset the start-up costs. But it is not enough for us to believe this: we need to formulate and present the protocol in such a way that potential adopters can see this for themselves. So as we continue to tinker with and revise the protocol on an ongoing basis, we try to be vigilant about keeping it simple and easy.

Chelcie: What do you think performing data management outreach to undergraduate, or more specifically TIER as a project, will contribute to the broader context of data management outreach?

Richard: Project TIER is one of a growing number of efforts that are bubbling up in several fields that share the broad goal of enhancing the transparency and credibility of research in the social sciences. In Sociology, Scott Long of Indiana University is a leader in the development of best practices in responsible data management and documentation. The Center for Open Science, led by psychologists Brian Nosek and Jeffrey Spies of the University of Virginia, is developing a web-based platform to facilitate pre-registration of experiments as well as replication studies. And economist Ted Miguel at UC Bekeley has launched the Berkeley Initiative for Transparency in the Social Sciences (BITSS), which is focusing its efforts to strengthen professional norms of research transparency by reaching out to early career social scientists. The Inter-university Consortium for Political and Social Research (ICPSR), which for over 50 year has served as a preeminent archive for social science research data, is also making important contributions to responsible data stewardship and research credibility. The efforts of all these groups and individuals are highly complementary, and many fruitful collaborations and interactions are underway among them. Each has a unique focus, but all are committed to the common goal of improving norms and practices with respect to transparency and credibility in social science research.

These bottom-up efforts also align well with several federal initiatives. Beginning in 2011, the NSF requires all proposals to include a “data management plan” outlining procedures that will be followed to support the dissemination and sharing of research results. Similarly, the NIH requires all investigator-initiated applications with direct costs greater than $500,000 in any single year to address data sharing in the application. More recently, in 2013 the White House Office on Science and Technology Policy issued a policy memorandum titled “Increasing Access to the Results of Federally Funded Scientific Research,” directing all federal agencies with more than $100 million in research and development expenditures to establish guidelines for the sharing of data from federally funded research.

Like Project TIER, many of these initiatives have been launched just within the past year or two. It is not clear why so many related efforts have popped up independently at about the same time, but it appears that momentum is building that could lead to substantial changes in the conduct of social science research.

Chelcie: Do you think the challenges and problems of data management outreach to students will be different in 5 years or 10 years?

Richard: As technology changes, best practices in all aspects of data stewardship, including the procedures specified by the TIER protocol, will necessarily change as well. But the principles underlying the protocol–replicability, transparency, integrity–will remain the same. So we expect the methods of implementing Project TIER will continually be evolving, but the aim will always be to serve those principles.

Chelcie: Based on your work with TIER, what kinds of challenges would you like for the digital preservation and stewardship community to grapple with?

Norm: We’re glad to know that research data are specifically identified in the National Agenda for Digital Stewardship. There is an ever-growing array of non-profit and commercial data repositories for the storage and provision of research data; ensuring the long-term availability of these is critical. Although our protocol relies on a platform for file storage, Project TIER is focused on teaching techniques that promote transparency of empirical work, rather than on digital object management per se. This said, we’d ask that the NDSA partners consider the importance of accommodating supplemental files, such as statistical code, within their repositories, as these are necessary for the computational reproducibility advocated by the TIER protocol. We are encouraged by and grateful to the Library of Congress and other forward-looking institutions for advancing this ambitious Agenda.

FOSS4Lib Upcoming Events: Sharing Images of Global Cultural Heritage

planet code4lib - Fri, 2014-09-12 13:27
Date: Monday, October 20, 2014 - 09:00 to 17:00Supports: IIPImageLorisOpenSeadragon

Last updated September 12, 2014. Created by Peter Murray on September 12, 2014.
Log in to edit this page.

The International Image Interoperability Framework community ( is hosting a one day information sharing event about the use of images in and across Cultural Heritage institutions. The day will focus on how museums, galleries, libraries and archives, or any online image service, can take advantage of a powerful technical framework for interoperability between image repositories.

FOSS4Lib Recent Releases: Loris - 2.0.0-alpha2

planet code4lib - Fri, 2014-09-12 13:23

Last updated September 12, 2014. Created by Peter Murray on September 12, 2014.
Log in to edit this page.

Package: LorisRelease Date: Tuesday, September 9, 2014

LITA: 2014 LITA Forum Student Registration Rate Available

planet code4lib - Thu, 2014-09-11 21:23

LITA is offering a special student registration rate to the 2014 LITA National Forum for a limited number of graduate students enrolled in ALA accredited programs.   The Forum will be held November 5-8, 2014 at the Hotel Albuquerque in Albuquerque, NM.  Learn more about the Forum here.

In exchange for a discounted registration, students will assist the LITA organizers and the Forum presenters with on-site operations.  This year’s theme is “Transformation: From Node to Network.”  We are anticipating an attendance of 300 decision makers and implementers of new information technologies in libraries.

The selected students will be expected to attend the full LITA National Forum, Thursday noon through Saturday noon.  This does not include the pre-conferences on Thursday and Friday.  You will be assigned a variety of duties, but you will be able to attend the Forum programs, which include 3 keynote sessions, 30 concurrent sessions, and a dozen poster presentations.

The special student rate is $180 – half the regular registration rate for LITA members.  This rate includes a Friday night reception at the hotel, continental breakfasts, and Saturday lunch.  To get this rate you must apply and be accepted per below.

To apply for the student registration rate, please provide the following information:

  1. Complete contact information including email address,
  2. The name of the school you are attending, and
  3. 150 word (or less) statement on why you want to attend the 2014 LITA Forum

Please send this information no later than September 30, 2014 to, with 2014 LITA Forum Student Registration Request in the subject line.

Those selected for the student rate will be notified no later than October 3, 2014.

Open Knowledge Foundation: Matchmakers in Action – Help Wanted

planet code4lib - Thu, 2014-09-11 19:30

Do you have a skill to share? Want to host an online discussion/debate about an Open Knowledge-like topic? Have an idea for a skillshare or discussion, but need help making it happen? Some of you hosted or attended sessions at OKFest. Why not host one online? At OKFestival, we had an Open Matchmaker wall to connect learning and sharing. This is a little experiment to see if we can replicate that spirit online. We’d love to collaborate with you to make this possible.

How to help with Online Community Sessions:

We’ve set up a Community Trello board where you can add ideas, sign up to host or vote for existing ideas. Trello, a task management tool, has fairly simple instructions.

The Community Sessions Trello Board is live. Start with the Read me First card.

Hosting or leading a Community Session is fairly easy. You can host it via video or even as an editathon or a IRC chat.

  • For video, we have been using G+. We can help you get started on this.
  • For Editathons, you could schedule it, share on your favourite communications channel and then use a shared document like a google doc or an etherpad.
  • For an IRC chat, simply set up a topic, time and trello card to start planning.

We highly encourage you to do the sessions in your own language.

Upcoming Community Sessions

We have a number of timeslots open for September – October 2014. We will help you get started and even co-host a session online. As a global community, we are somewhat timezone agnostic. Please suggest a time that works for you and that might work with others in the community.

In early October, we will be joined by Nika Aleksejeva of to do a Data Viz 101 skillshare. She makes it super easy for beginners to use data to tell stories.

The Data Viz 101 session is October 8, 2014. Register here.

Community Session Conversation – September 10, 2014

In this 40 minute community conversation, we brainstormed some ideas and talked about some upcoming community activities:

Some of the ideas shared including global inclusiveness and how to fundraise. Remember to vote or share your ideas. Or, if you are super keen, we would love it if you would lead an online session.

(photo by Gregor Fischer)

LITA: Voice your ideas on LITA’s strategic goals

planet code4lib - Thu, 2014-09-11 17:45

As mentioned in a previous post, LITA is beginning a series of informal discussions to let members voice their thoughts about the current strategic goals of LITA. These “kitchen table talks” will be lead by President Rachel Vacek and Vice-President Thomas Dowling.

The kitchen table talks will discuss LITA’s strategic goals – collaboration and networking; education and sharing of expertise; advocacy; and infrastructure – and how meeting those goals will help LITA better serve you. The talks also align with ALA’s strategic planning process and efforts to communicate the association’s overarching goals of professional development, information policy, and advocacy.

  • ONLINE: Friday, September 19, 2014, 1:30-2:30 pm EDT
  • ONLINE: Tuesday, October 14, 2014, 12:00-1:00 pm EDT
  • IN-PERSON: Friday, November 7, 2014, 6:45-9:00 pm MDT at the LITA Forum in Albuquerque, NM
How to join the online conversations

On the day and time of the online events, join in on the conversation in this Google Hangout.

We look forward to the conversations!

State Library of Denmark: Even sparse faceting is limited

planet code4lib - Thu, 2014-09-11 15:08

Recently, Andy Jackson from UK Web Archive discovered a ginormous Pit Of Pain with Solr distributed faceting, where some response times reached 10 minutes. The culprit is facet.limit=100 (the number of returned values for each facet is 100), as the secondary fine-counting of facet terms triggers a mini-search for each term that has to be checked. With the 9 facets UK Web Archive uses, that’s 9*100 searches in the worst-case. Andy has done a great write-up on their setup and his experiments: Historical UKWA Solr Query Performance Benchmarking.

Pit Of Pain by Andy Jackson, UK Web Archive

The shape of the pit can be explained by the probability of the need for fine-counts: When there is less than 1K hits, chances are that all shards has delivered all matching terms with count > 0 and thus need not be queried again (clever merging). When there are more than 1M hits, chances are that the top-100 terms in each facet are nearly the same for all shards, so that only a few of the terms needs fine-counting. Between those two numbers, chances are that a lot of the terms are not present in all initial shard results and thus require fine-counting.

While the indexes at Statsbiblioteket and UK Web Archive are quite comparable; 12TB vs. 16TB, build with nearly the same analysis chain, the setups differ with regard to hardware as well as facet setup. Still, it would be interesting to see if we can reproduce the Pit Of Pain™ with standard Solr faceting on our 6 facet fields and facet.limit=100.

12TB index / 4.2B docs / 2565GB RAM, Solr fc faceting on 6 fields, facet limit 100

Sorta, kinda? We do not have the low response times < 100 hits, and 10 minutes testing only gave 63 searches, but with the right squinting of eyes, the Pit Of Pain (visualized as a hill to trick the enemy) is visible from ~1K to 1M hits. As for the high response times < 100 hits, it is due to a bad programming decision from my side – expect yet another blog post. As for the pit itself, let’s see how it changes when the limit goes down.

12TB index / 4.2B docs / 2565GB RAM, Solr fc faceting on 6 fields, facet limit 100, 50 & 5

Getting a little crowded with all those dots, so here’s a quartile plot instead.

12TB index / 4.2B docs / 2565GB RAM, Solr fc faceting on 6 fields, facet limit 100, 50 & 5

Again, please ignore results below 100 hits. I will fix it! Promise! But other than that, it seems pretty straight forward: High limits has a severe performance penalty, which seems to be more or less linear to the limit requested (hand waving here).

The burning question is of course how it looks with sparse faceting. Technically, distributed sparse faceting avoids the mini-searches in the fine-counting phase, but still requires each term to be looked up in order to resolve its ordinal (it is used as index in the internal sparse faceting counter structure). Such a lookup does take time, something like 0.5ms on average on our current setup, so sparse faceting is not immune to large facet limits. Let’s keep the y-axis-max of 20 seconds for comparison with standard Solr.

12TB index / 4.2B docs / 2565GB RAM, sparse faceting on 6 fields, facet limit 100, 50 & 5

There does appear to be a pit too! Switching to quartiles and zooming in:

12TB index / 4.2B docs / 2565GB RAM, sparse faceting on 6 fields, facet limit 100, 50 & 5


This could use another round of tests, but it seems that the pit is present from 10K to 1M hits, fairly analogue to Solr fc faceting. The performance penalty of high limits also matches, just an order of magnitude lower. With worst-case of 6*100 fine-counts (with ~10^5 hits) on each shard and an average lookup time of ½ms, having a mean for the total response time around 1000ms seems reasonable. Everything checks out and we are happy.

Update 20140912

The limit for each test were increased to 1 hour or 1000 searches, whichever comes first, and the tests repeated with facet.limits of 1K, 10K and 100K. The party stopped early with OutOfMemoryError for 10K and since raising the JVM heap size skews all previous results, what we got is what we have.

12TB index / 4.2B docs / 2565GB RAM, Solr fc faceting on 6 fields, facet limit 1000

Quite similar to the Solr fc faceting test with facet.limit=100 at the beginning of this post, but with the Pit Of Pain moved a bit to the right and a worst-case of 3 minutes. Together with the other tested limits and quartiled, we have

12TB index / 4.2B docs / 2565GB RAM, Solr fc faceting on 6 fields, facet limit 1000, 100, 50 & 5

Looking isolated at the Pit Of Pain, we have the median numbers

facet.limit 10^4 hits 10^5 hits 10^6 hits 10^7 hits 1000 24559 70061 141660 95792 100 9498 16615 12876 11582 50 9569 9057 7668 6892 5 2469 2337 2249 2168

Without cooking the numbers too much, we can see that the worst increase switching from limit 50 to 100 is for 10^5 hits: 9057ms -> 16615ms or 1.83 times, with the expected increase being 2 (50 -> 100). Likewise the worst increase from limit 100 to 1000 is for 10^6 hits: 12876ms -> 141660ms or 11.0 times, with the expected increase being 10 (100 -> 1000). In other words: Worst-case median response times (if such a thing makes sense) for distributed fc faceting with Solr scales lineary to the facet.limit.

Repeating with sparse faceting and skipping right to the quartile plot (note that the y-axis dropped by a factor 10):

12TB index / 4.2B docs / 2565GB RAM, sparse faceting on 6 fields, facet limit 1000, 100, 50 & 5

Looking isolated at the Pit Of Pain, we have the median numbers

facet.limit 10^4 hits 10^5 hits 10^6 hits 10^7 hits 1000 512 2397 3311 2189 100 609 960 698 939 50 571 635 395 654 5 447 215 248 588

The worst increase switching from limit 50 to 100 is for 10^6 hits: 395ms -> 698ms or 1.76 times, with the expected increase being 2. Likewise the worst increase from limit 100 to 1000 is also for 10^6 hits: 698ms -> 3311ms or 4.7 times, with the expected increase being 10. In other words: Worst-case median response times for distributed sparse faceting appears to scale better than lineary to the facet.limit.

Re-thinking this, it becomes apparent that there are multiple parts to facet fine-counting: A base overhead and an overhead for each term. Assuming the base overhead is the same, since the number of hits is so, we calculate this to 408ms and the overhead per term to 0.48ms for sparse (remember we have 6 facets so facet.limit=1000 means a worst-case of fine-counting 6000 terms). If that holds, setting facet.limit=10K would have a worst-case median response time of around 30 seconds.

OCLC Dev Network: Software Development Practices: Getting Specific with Acceptance Criteria

planet code4lib - Thu, 2014-09-11 13:30

If you’ve been following our product development practices series, you know how to think about identifying problems and articulating those problems as user stories. But even the best user story can’t encompass all of the details of the user experience that need to be considered in the development process.  This week’s post explains the important role of acceptance criteria.

Karen Coyle: Philosophical Musings: The Work

planet code4lib - Thu, 2014-09-11 11:30
We can't deny the idea of work - opera, oeuvre - as a cultural product, a meaningful bit of human-created stuff. The concept exists, the word exists. I question, however that we will ever have, or that we should ever have, precision in how works are bounded; that we'll ever be able to say clearly that the film version of Pride and Prejudice is or is not the same work as the book. I'm not even sure that we can say that the text of Pride and Prejudice is a single work. Is it the same work when read today that it was when first published? Is it the same work each time that one re-reads it? The reading experience varies based on so many different factors - the cultural context of the reader; the person's understanding of the author's language; the age and life experience of the reader.

The notion of work encompasses all of the complications of human communication and its consequent meaning. The work is a mystery, a range of possibilities and of possible disappointments. It has emotional and, at its best, transformational value. It exists in time and in space. Time is the more canny element here because it means that works intersect our lives and live on in our memories, yet as such they are but mere ghosts of themselves.

Take a book, say, Moby Dick; hundreds of pages, hundreds of thousands of words. We read each word, but we do not remember the words -- we remember the book as inner thoughts that we had while reading. Those could be sights and smells, feelings of fear, love, excitement, disgust. The words, external, and the thoughts, internal, are transformations of each other; from the author's ideas to words, and from the words to the reader's thoughts. How much is lost or gained during this process is unknown. All that we do know is that, for some people at least, the experience is vivid one. The story takes on some meaning in the mind of the reader, if one can even invoke the vague concept of mind without torpedoing the argument altogether.

Brain scientists work to find the place in the maze of neuronic connections that can register the idea of "red" or "cold" while outside of the laboratory we subject that same organ to the White Whale, or the Prince of Denmark, or the ever elusive Molly Bloom. We task that organ to taste Proust's madeleine; to feel the rage of Ahab's loss; to become a neighbor in one of Borges' villages. If what scientists know about thought is likened to a simple plastic ping-pong ball, plain, round, regular, white, then a work is akin to a rainforest of diversity and discovery, never fully mastered, almost unrecognizable from one moment to the next.

As we move from textual works to musical ones, or on to the visual arts, the transformation from the work to the experience of the work becomes even more mysterious. Who hasn't passed quickly by an unappealing painting hanging on the wall of a museum before which stands another person rapt with attention. If the painting doesn't speak to us, then we have no possible way of understanding what it is saying to someone else.

Libraries are struggling to define the work as an abstract but well-bounded, nameable thing within the mass of the resources of the library. But a definition of work would have to be as rich and complex as the work itself. It would have to include the unknown and unknowable effect that the work will have on those who encounter it; who transform it into their own thoughts and experiences. This is obviously impractical. It would also be unbelievably arrogant (as well as impossible) for libraries to claim to have some concrete measure of "workness" for now and for all time. One has to be reductionist to the point of absurdity to claim to define the boundaries between one work and another, unless they are so far apart in their meaning that there could be no shared messages or ideas or cultural markers between them. You would have to have a way to quantify all of the thoughts and impressions and meanings therein and show that they are not the same, when "same" is a target that moves with every second that passes, every synapse that is fired.

Does this mean that we should not try to surface workness for our users? Hardly. It means that it is too complex and too rich to be given a one-dimensional existence within the current library system. This is, indeed, one of the great challenges that libraries present to their users: a universe of knowledge organized by a single principle as if that is the beginning and end of the story. If the library universe and the library user's universe find few or no points of connection, then communication between them fails. At best, like the user of a badly designed computer interface, if any communication will take place it is the user who must adapt. This in itself should be taken the evidence of superior intelligence on the part of the user as compared to the inflexibility of the mechanistic library system.

Those of us in knowledge organization are obsessed with neatness, although few as much as the man who nearly single-handled defined our profession in the late 19th century; the man who kept diaries in which he entered the menu of every meal he ate; whose wedding vows included a mutual promise never to waste a minute; the man enthralled with the idea that every library be ordered by the simple mathematical concept of the decimal.

To give Dewey due credit, he did realize that his Decimal Classification had to bend reality to practicality. As the editions grew, choices had to be made on where to locate particular concepts in relation to others, and in early editions, as the Decimal Classification was used in more libraries and as subject experts weighed in, topics were relocated after sometimes heated debate. He was not seeking a platonic ideal or even a bibliographic ideal; his goal was closer to the late 19th century concept of efficiency. It was a place for everything, and everything in its place, for the least time and money.

Dewey's constraints of an analog catalog, physical books on physical shelves, and a classification and index printed in book form forced the limited solution of just one place in the universe of knowledge for each book. Such a solution can hardly be expected to do justice to the complexity of the Works on those shelves. Today we have available to us technology that can analyze complex patterns, can find connections in datasets that are of a size way beyond human scale for analysis, and can provide visualizations of the findings.

Now that we have the technological means, we should give up the idea that there is an immutable thing that is the work for every creative expression. The solution then is to see work as a piece of information about a resource, a quality, and to allow a resource to be described with as many qualities of work as might be useful. Any resource can have the quality of the work as basic content, a story, a theme. It can be a work of fiction, a triumphal work, a romantic work. It can be always or sometimes part of a larger work, it can complement a work, or refute it. It can represent the philosophical thoughts of someone, or a scientific discovery. In FRBR, the work has authorship and intellectual content. That is precisely what I have described here. But what I have described is not based on a single set of rules, but is an open-ended description that can grow and change as time changes the emotional and informational context as the work is experienced.

I write this because we risk the petrification of the library if we embrace what I have heard called the "FRBR fundamentalist" view. In that view, there is only one definition of work (and of each other FRBR entity). Such a choice might have been necessary 50 or even 30 years ago. It definitely would have been necessary in Dewey's time. Today we can allow ourselves greater flexibility because the technology exists that can give us different views of the same data. Using the same data elements we can present as many interpretations of Work as we find useful. As we have seen recently with analyses of audio-visual materials, we cannot define work for non-book materials identically to that of books or other texts. [1] [2] Some types of materials, such as works of art, defy any separation between the abstraction and the item. Just where the line will fall between Work and everything else, as well as between Works themselves, is not something that we can pre-determine. Actually, we can, I suppose, and some would like to "make that so", but I defy such thinkers to explain just how such an uncreative approach will further new knowledge.

[1] Kara Van Malssen. BIBFRAME A-V modeling study
[2] Kelley McGrath. FRBR and Moving Images

Peter Murray: Thursday Threads: Sakai Reverberations, Ada Initiative Fundraising, Cost of Bandwidth

planet code4lib - Thu, 2014-09-11 10:43
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

Welcome to the latest edition of Thursday Threads. This week’s post has a continuation of the commentary about the Kuali Board’s decisions from last month. Next, news of a fundraising campaign by the Ada Initiative in support of women in technology fields. Lastly, an article that looks at the relative bulk bandwidth costs around the world.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

Discussion about Sakai’s Shift Continues

The Kuali mission continues into its second decade. Technology is evolving to favor cloud-scale software platforms in an era of greater network bandwidth via fast Internet2 connections and shifting economics for higher education. The addition of a Professional Open Source organization that is funded with patient capital from university interests is again an innovation that blends elements to help create options for the success of colleges and universities.

- The more things change, the more they look the same… with additions, by Brad Wheeler, Kuali Blog, 27-Aug-2014

Yet many of the true believers in higher education’s Open Source Community, which seeks to reduce software costs and provide better e-Learning and administrative IT applications for colleges and universities, may feel that they have little reason to celebrate the tenth anniversaries of Sakai, an Open Source Learning Management System and Kuali, a suite of mission critical, Open Source, administrative applications, both of which launched in 2004.  Indeed, for some Open Source evangelists and purists, this was probably a summer marked by major “disturbances in the force” of Open Source

- Kuali Goes For Profits by Kenneth C. Green, 9-Sep-2014, Digital Tweed blog at Inside Higher Ed

The reverberations from the decision by the Kuali Foundation Board to fork the Kuali code to a different open source license and to use Kuali capital reserves to form a for-profit corporation continue to reverberate. (This was covered in last week’s DLTJ Thursday Threads and earlier in a separate DLTJ post.) In addition to the two articles above, I would encourage readers to look at Charles Severance’s “How to Achieve Vendor Lock-in with a Legit Open Source License – Affero GPL”. Kuali is forking its code from using the Educational Community License to the Affero GPL license, which it has the right to do. It also comes with some significant changes, as Kenneth Green points out. There is still more to this story, so expect it to be covered in additional Thursday Threads posts.

Ada Initiative, Supporting Women in Open Technology and Culture, Focuses Library Attention with a Fundraising Campaign

The Ada Initiative has my back. In the past several years they have been a transformative force in the open source software community and in the lives of women I know and care about. To show our support, Andromeda Yelton, Chris Bourg, Mark Matienzo and I have pledged to match up to $5120 of donations to the Ada Initiative made through this link before Tuesday September 16. That seems like a lot of money, right? Well, here’s my story about how the Ada Initiative helped me when I needed it most.

- The Ada Initiative Has My Back, by Bess Sadler, Solvitur Ambulando blog, 9-Sep-2014

The Ada Initiative does a lot to support women in open technology and culture communities; in the library technology community alone, many women have been affected by physical and emotional violence. (See the bottom of the campaign update blog post from Ada Initiative for links to the stories.) I believe it is only decent to enable anyone to participate in our communities without fear for their physical and psychic space, and that our communities are only as strong as they can be when the barriers to participation are low. The Ada Initiative is making a difference, and I’m proud to have supported them with a financial contribution as well as being an ally and a amplifier for the voice of women in technology.

The Relative Cost of Bandwidth Around the World

The chart above shows the relative cost of bandwidth assuming a benchmark transit cost of $10/Megabits per second (Mbps) per month (which we know is higher than actual pricing, it’s just a benchmark) in North America and Europe. From CloudFlare

Over the last few months, there’s been increased attention on networks and how they interconnect. CloudFlare runs a large network that interconnects with many others around the world. From our vantage point, we have incredible visibility into global network operations. Given our unique situation, we thought it might be useful to explain how networks operate, and the relative costs of Internet connectivity in different parts of the world.

- The Relative Cost of Bandwidth Around the World, by Matthew Prince, CloudFlare Blog, 26-Aug-2014

Bandwidth is cheapest in Europe and highest in Australia? Who knew? CloudFlare published this piece showing their costs on most of the world’s continents with some interesting thoughts about the role competition has on the cost of bandwidth.

Link to this post!


Subscribe to code4lib aggregator