From The Fedora Steering Group
The Quarterly Report from Fedora
Fedora Development - In the past quarter, the development team released one Alpha and three Beta releases of Fedora 4; detailed release notes are here:
Two different but very related things happened last week which brought my own fallibility into painful focus for me.
One is that I blogged in support of the work of the Ada Initiative. They do great work to advance women in open technology and culture. If you are not familiar with their work, then by all means go and find out.
The other is that I discovered I had acted badly in exactly the kind of situation where I should have known better. The wake-up call came in the form of a blog post where the writer was kind enough not to call me out by name. But I will. It was me. Go ahead, read it. I’ll wait.
This, from someone who had fancied himself a feminist. I mean, srlsy. To me this shows just how deeply these issues run.
I was wrong, for which I am now apologizing. But allow me to be more specific. What am I sorry about?
- I’m sorry that I shoved my way into a conversation where I didn’t belong.
- I’m sorry that I was wrong in what I advocated.
- I’m sorry that my privilege and reputation can be unwittingly used to silence someone else.
- I’m sorry that ignorance of my innate privilege has tended to support ignorance of my bad behavior.
I can’t change the past, but I can change the future. My slowly growing awareness of the effects of my words and actions can only help reduce my harmful impacts, while hopefully enforcing my positive actions.
Among the things that the Ada Initiative lists as ways that they are making a difference is this:
Asking men and influential community members to take responsibility for culture change.
I hear you, and I’m trying, as best as I can, to do this. It isn’t always quick, it isn’t always pretty, but it’s something. Until men stand up and own their own behavior and change it, things aren’t going to get better. I know this. I’m sorry for what I’ve done to perpetuate the problem, and I’m taking responsibility for my own actions, both in the past and in the future. Here’s hoping that the future is much brighter than the past.
Photo by butupa, Creative Commons License Attribution 2.0 Generic.
Last Friday, the American Library Association (ALA) made its first appearance (and through a whole panel no less) at the Telecommunications Policy and Research Conference (TPRC), the most prestigious conference in information policy. The telecommunications policy topic, not surprisingly, that has dominated our time for over the past year: E-rate.
The panel “900 Questions: A Case Study of Multistakeholder Policy Advocacy through the E-rate Lens” was moderated by Larra Clark, director of the Program on Networks for ALA’s Office for Information Technology Policy (OITP). The panel featured Jon Peha, professor of Engineering and Public Policy at Carnegie Mellon University and former chief technologist of the Federal Communications Commission (FCC); and Tom Koutsky, chief policy counsel for Connected Nation and a former Attorney-Advisor at the FCC. Rounding out the panel were Marijke Visser, ALA’s own Empress of E-rate and OITP Director Alan S. Inouye.
The panel served as a great opportunity for ALA to cohesively consider the extensive effort on the current proceeding that we’ve expended since June 2013. Of course, it was rather a challenge to pack it in 90 minutes!
Marjike Visser, Larra Clark, and Alan S. Inouye focused on the multiple key tradeoffs that arose in the past year. Supporting the FCC proposal that led to the first order, even though it focused on Wi-Fi—important, but not ALA’s top priority, which is broadband to libraries (and schools)—based on the promise of a second order focusing on broadband to the building. We worked hard to stand with our long-standing coalitions, while not in full agreement with some coalition positions. The panel explored tensions with: school versus library interests and the importance of both differentiation and collaboration; rural versus urban concerns; near-term versus long-term considerations; and the risks and rewards of creative disruption.
Tom Koutsky and Jon Peha provided context and analysis beyond the library lens. The E-rate proceeding emanated from a multi-year process that began with the National Broadband Plan and investments in the Broadband Technology Opportunities Program (BTOP). Koutsky and Peha illuminated the oft-hidden complexity behind advocate groups, who on the surface may seem to represent similar interests or organizations, but in fact engage in considerable conflict and compromise among themselves. They also discussed the challenges with new stakeholder entrants and their competing interests, both in the short run and long run.
This TPRC session is an important milestone for OITP. The Policy Revolution! Initiative is predicated upon reaching decision makers and influencers outside of the library community who affect critical public policies of interest to our community. Thus, increasing the ALA and library presence at key venues such as TPRC represents important progress for us as we continue to work through re-imagining and re-engineering national public policy advocacy. Also in the September-October timeframe, OITP representatives will present at the conferences of the International City/County Management Association (ICMA), NTCA—the Rural Broadband Association, and the National Association of Telecommunications Officers and Advisors (NATOA).
The E-rate saga continues: ALA will submit comments in the most recent round—due tonight (September 15th)—and will submit further comments in the weeks ahead, as well as continue our discussions with the commissioners and staff of the FCC and our key contacts on Capitol Hill.
Some of you have probably noted that we’ve been somewhat quiet recently, but as usual, it doesn’t mean nothing is going on, more that we’ve been too busy to come up for air to talk about it.
A few of you might have noticed a tweet from the PBCore folks on a conversation we had with them recently. There’s a fuller note on their blog, with links to other posts describing what they’ve been thinking about as they move forward on upgrading the vocabularies they already have in the OMR.
Shortly after that, a post from Bernard Vatant of the Linked Open Vocabularies project (LOV) came over the W3C discussion list for Linked Open Data. Bernard is a hero to those of us toiling in this vineyard, and LOV (lov.okfn.org/dataset/lov/) one of the go-to places for those interested in what’s available in the vocabulary world and the relationships between those vocabularies. Bernard was criticizing the recent release of the DBpedia Ontology, having seen the announcement and, as is his habit, going in to try and add the new ontology to LOV. His gripes fell into a couple of important categories:
* the ontology namespace was dereferenceable, but what he found there was basically useless (his word)
* finding the ontology content itself required making a path via the documentation at another site to get to the goods
* the content was available as an archive that needed to be opened to get to the RDF
* there was no versioning available, thus no way to determine when and where changes were made
I was pretty stunned to see that a big important ontology was released in that way–so was Bernard apparently, although since that release there has apparently been a meeting of the minds, and the DBpedia Ontology is now resident in LOV. But as I read the post and its critique my mind harkened back to the conversation with PBCore. The issues Bernard brought up were exactly the ones we were discussing with them–how to manage a vocabulary, what tools were available to distribute the vocabulary to ensure easy re-use and understanding, the importance of versioning, providing documentation, etc.
These were all issues we’d been working hard on for RDA, and are still working on behind the RDA Registry. Clearly, there are a lot of folks out there looking for help figuring out how to provide useful access to their vocabularies and to maintain them properly. We’re exploring how we might do similar work for others (so ask us!).
Oh, and if you’re interested on our take on vocabulary versioning, take a look at our recent paper on the subject, presented at the IFLA satellite meeting on LOD in Paris last month.
I plan on posting more about that paper and its ideas later this week.
On August 11-12, I taught an Introduction to Programming Concepts via jQuery course at the DLF/Code4Lib unconference at the George Washington University. I was playing with several theories in developing this course:
- Porting to jQuery so that it could be 100% browser-based: familiar environment, no installfest, maximizes time available for actual programming concepts.
- Porting to jQuery so that it could be 100% visual (on which more below).
- Simply giving up on the idea of getting true novices to the point of being able to write real-world-applicable code in a day-and-a-half workshop, and focusing instead on building a foundation that makes existing code-learning resources more intelligible, and leaves students with enough good feelings about code that they’ll be inclined to learn more.
Bottom line: I think it worked really well!
Today I’m going to talk about my theoretical inspiration for the course; upcoming posts will cover teaching techniques I used to operationalize that, and then future plans. (Look, there’s a jquery workshop tag so you can find them all!)yo dawg i heard you like tests…
The whole workshop was, in a sense, a way to play with this paper: “A fresh look at novice programmers’ performance and their teachers’ expectations”. Its jaw-dropping result was that providing novice programming students with a test apparatus for a programming task quadrupled the number of subtasks they could successfully complete (students without the tests completed an average of 0.83 out of 4 tasks, compared to 3.26 for students who could check their work against the tests — in other words, students without tests didn’t even get one subtask working, on average).
Well gosh. If tests are that effective, I’m obligated to provide them. This is consistent with my intuitive observations of the CodingBat section of Boston Python Workshop — being asked to write provably correct code is the point where students discover whether their existing mental models are right, and start to iterate them. But the CodingBat interface is confusing, and you need to sink some instructional time just into making sense of it. And, honestly, with a lot of conventional intro programming tasks, it’s hard to tell if you’ve succeeded; you’ve got a command-line-ish interface (unfamiliar to many of my students) and a conceptual problem with abstract success criteria. I wanted something that would give immediate, obvious feedback.
Hence, jQuery. Manipulating the DOM produces instant visual effects. If you were asked to make a button disappear, it’s super obvious if you succeeded. (Well. Possibly assuming sightedness, and (with some of my tasks) possibly also non-colorblindness — I stayed away from red/green semantic pairs, but I didn’t audit for all the forms of colorblindness. I need to mull this one over.) And as it turns out, when you ask your students to add a class that changes a bunch of things to have a kitten pic background, it’s also super obvious to you as the instructor when they’ve succeeded (wait for it…wait…“awwww!”).
My hope for this class was that it would provide students who were genuinely novices at coding with the conceptual background they needed to get mileage out of the many intro-programming learning options out there. As Abigail Goben notes, these courses tend to implicitly assume that you already know how to code and just need to be told how to do it in this language, even when they brand themselves as intro courses. People will need much more practice than a day-and-a-half bootcamp to get from novice to proficient enough to write things they can use in everyday work, so I want to get them to a place where that practice will feel manageable. And for the students who do have some experience, hopefully I can introduce them to a language they don’t know yet in a way that has enough meat not to bore them.
Tomorrow, teaching techniques I used to get there, part 1: pacing.
As I thought about what I wanted to write for my first LITA post, I really wasn’t sure until inspiration struck as I procrastinated by scrolling down my Facebook feed. I had been tagged in a status written by a library student who felt unsure of how she was displaying her tech skills on her CV. She asked for opinions. Was it even relevant to put a tech section on her CV if she wasn’t applying for a digital library job? If she already mentioned tech skills in a cover letter, did they need to be put on a CV, too?
The thread got a lot of different responses, some aligning with my thoughts on the subject and others that befuddled me. Why, for instance, was someone suggesting that you should only list tech skills you got in the classroom and not those you picked up on the job? Why did people seem to think that if you were writing a cover letter you should list your tech skills there and not on a CV?
Today, I thought I would share a few brief thoughts on how I list tech skills on my professional documents and how that connects to how I talk about them in a cover letter. Keep in mind that I am an academic librarian with a job in digital libraries, so the usefulness of my perspective beyond this specific area may be limited. And just to clarify, I recognize that everyone has different opinions on content, formatting, and length of professional documents. Just check out one of the myriad library resources for job hunters. It’s a good thing to have varying perspectives, actually, and I welcome all the opinions out there, whether they agree or disagree with my take on the subject.
What I Do
Why would I write a paragraph about it when I can just show you? This is how the tech section of my resume and CV looks now (very similar to when I applied for jobs in late 2013/early 2014).
- Coding – HTML5, CSS
- Digital Collection/Content Management – Drupal, Omeka
- Digitization Software – Epson Scan, Silverfast
- Document Design – Adobe Creative Suite 5, Microsoft Office 2010 suite
- Markup Languages & Standards – EAD, MODS, RDF, TEI, XML
- Operating Systems – Mac OS X, Windows, UNIX
- Social Media – Facebook, Twitter, WordPress, podcasting, wikis
- Repository Software – DSpace, Fedora
- Other – ArcGIS, Neatline
This section is listed under the header “Technology” and does not include bullet points (used in this post for formatting reasons). Check out my entire CV to see how this section fits in with the rest of my content.
Conveying my tech skills in this way provides a quick way for a potential employer to understand the different software I know. It doesn’t provide a lot of usable information since there’s no indication of my skill level or familiarity with these tools. I consider this section of my CV a catch-all for my tech knowledge, but it’s up to my cover letter to educate the reader about my depth of understanding on specific tools relevant to the job description. I don’t include any tools here that I wouldn’t be able to easily answer, “So tell me how you have used ___ in the past?”
I have tinkered around with this section more times than I can count over the past few years. Even now, writing this blog post, I’m looking at it and thinking, “Is that really relevant to me anymore?” I haven’t looked at other people’s CVs in a long time, and though those might be good to reference in this post, let’s be real: it’s a gloomy Friday afternoon as I type this and I just can’t bring myself to do a quick search.
My laziness aside, I’m particularly interested in how different types of info professionals, from archivists to public, academic, and special librarians, convey their tech skills in professional documents. So many jobs in libraries involve working with technology. I would think you’d be hard-pressed to find a new job that doesn’t involve tech in some way. So is there a way to standardize how we convey this type of information, or are our jobs so diverse that there’s really no way to do so?
I’m curious: How do you highlight your technology skills on professional documents like a resume or CV? Tell me in the comments!
Reminder: The American Library Association (ALA) is encouraging librarians to participate in “My SSA,” a free webinar that will teach participants how to use My Social Security (MySSA), the online Social Security resource.
Do you know how to help your patrons locate information on Supplemental Security Income or Social Security? Presented by leaders and members of the development team of MySSA, this session will provide attendees with an overview of MySSA. In addition to receiving benefits information in print, the Social Security Administration is encouraging librarians to create an online MySSA account to view and track benefits.
Attendees will learn about viewing earnings records and receiving instant estimates of their future Social Security benefits. Those already receiving benefits can check benefit and payment information and manage their benefits.
- Maria Artista-Cuchna, Acting Associate Commissioner, External Affairs
- Kia Anderson, Supervisory Social Insurance Specialist
- Arnoldo Moore, Social Insurance Specialist
- Alfredo Padilia Jr., Social Insurance Specialist
- Diandra Taylor, Management Analyst
Date: Wednesday, September 17, 2014
Time: 2:00 PM – 3:00 PM EDT
Register for the free event
Among the many appealing qualities of Green's novel is how much it's about storytelling itself, and the way in which books function as a badge of identity, a marker of taste and values... For all it's romantic contours, "The Fault in Our Stars" is centrally a dialectic about why people seek out stories, one that never quite takes a stand on the question of whether we're right to wish for greater clarity in our art, characters we can "relate" to, or, for that matter, a happy ending.
If you had to encapsulate the future of libraries as a story, what story would that be?
Stewart Brand's How Buildings Learn?
In this world, technology creates a fast, globalised world where digital services and virtual presence are commonplace. Overall, the mood is fairly optimistic, but digitalisation and connectivity soon create too much information and format instability, so there is a slight feeling of unease amongst the general population. Physical books are in slight decline in this world although library services are expanding. The reason for this is that public libraries now take on a wide range of e-government services and are important as drop-in centres for information and advice relating to everything from education and childcare to immigration. In this scenario, libraries have also mutated into urban hubs and hangouts; vibrant meeting places for people and information that house cafés, shops, gyms, crèches, theatres, galleries and various cultural activities and events.
William Gibson's Neuromancer?
This is a world gone mad. Everything is accelerating and everything is in short supply and is priced accordingly. Electricity prices are sky-high and the internet is plagued by a series of serious issues due to overwhelming global demand. In this scenario, public libraries are initially written-off as digital dinosaurs, but eventually there is a swing in their favour as people either seek out reliable internet connections or because there is a real need for places that allow people to unplug, slow down and reflect. In this world, information also tends to be created and owned by large corporations and many small and medium sized firms cannot afford access. Therefore, public libraries also become providers of business information and intelligence. This creates a series of new revenue streams but funding is still tight and libraries are continually expected to do more with less and less funding and full-time staff.
Ray Bradbury's Fahrenheit 451?
This world is a screenager’s paradise. It is fast-paced, global and screen-based. Digitalisation has fundamentally changed the way that people consume information and entertainment, but it has also changed the way that people think. this is a post-literate world where physical books are almost dead and public libraries focus on digital collections and virtual services. In this scenario, books take up very little physical space so more space is given over to internet access, digital books and various other forms of digital entertainment. Public libraries blur the boundaries with other retailers of information and entertainment and also house mental health gyms, technology advice desks, download centres and screening rooms. Despite all this, public libraries struggle to survive due to a combination of ongoing funding cuts, low public usage and global competition.
Or Rachel Carson's Silent Spring?
In this scenario, climate change turns out to be much worse than expected. Resource shortages and the high cost of energy in particular mean that the physical movement of products and people is greatly reduced and individuals are therefore drawn back to their local communities. It is a world where globalisation slows down, digital technology is restrained and where all activities are related to community impact. Public libraries do well in this world. People become voracious consumers of physical books (especially old books) and libraries are rediscovered and revered by the majority of the population due to their safety and neutrality. they are also highly valued because they are free public spaces that promote a wide variety of community-related events. Nevertheless, there are still pressures caused by the high cost of energy and the need to maintain facilities. The phrase ‘dark euphoria’ (Bruce Sterling) sums up the mood in this scenario, because on one level the world is falling apart but on another level people are quite content.
These scenarios come from a remarkable document produced five years ago in 2009 for The Library Council of New South Wales called The Bookends Scenarios [pdf].
It's the only document in the library literature that I've seen that seriously addresses our global warming future. It's the only one that I've come across that confronts us and forces us to consider how we may shape our institution and our services now so we can be there for our community when its in greatest need.
If you had to encapsulate the future as a story, what story would that be?
I suffer from dark euphoria. I worry about global warming.
That's why I'm going to take part in the People's Climate March in New York City on September 21th, 2014.
I'm going because our leaders are not even talking about taking the necessary action to reduce atmospheric carbon and to mitigate the effects of climate change. This is a movement that requires all of us to become the leaders that we so desperately need.
There's a book that goes with this march: This changes everything.
I'm not normally one for marches. I share the suspicion that gatherings and marches themselves don't change anything.
But events change people. There are events that define movements.
You couldn't have an Occupy Movement without Occupy Wall Street. And without Occupy Wall Street, we wouldn't have had Occupy Sandy.
Fight to #EndRacism...for #ClimateJustice. #peoplesclimate BOOM pic.twitter.com/nOJSoLMUJd
— REEP (@reep_ace) September 14, 2014
I understand the feelings of helplessness and darkness when reading or hearing about another terrifying warning about the threat of global warming. I struggle with these feelings more than I care to admit.
I find solace from these feelings from a variety of different sources beyond my family, friends and community. Of these, the study of history oddly enough, gives me great comfort. It has helped me find stories to help me understand the present.
There are those who call the Climate Change Movement, the second Abolition Movement, and I think this description is fitting for several reasons. For one, it gets across that we need to draw upon our shared moral fortitude to make it politically necessary to force those in power to forfeit profit from oil and coal, which unchecked, will continue to cost us grievous human suffering.
It also describes the sheer enormity of the work that must be done. The analogy makes clear how it will be necessary to change every aspect of society to mitigate climate change at this point.
And yet, it has happened before. Ordinary people came together to stop slavery.
On that note, and I hope I'm not spoiling it for you, I took great comfort in the last passage of David Mitchell's Cloud Atlas, a book of several pasts and a future.
Upon my return to San Francisco, I shall pledge myself to the abolitionist cause, because I owe my life to a self-freed slave & because I must begin somewhere.
I hear my father-in-law’s response: “Oho, fine, Whiggish sentiments, Adam. But don’t tell me about justice! Ride to Tennessee on an ass and convince the rednecks they are merely white-washed negroes and their negroes are black-washed whites! Sail to the Old World, tell ‘em their imperial slaves’ rights are as inalienable as the Queen of Belgium’s! Oh, you’ll grow hoarse, poor and gray in caucuses! You’ll be spat upon, shot at, lynched, pacified with medals, spurned by backwoodsmen! Crucified! Naïve, dreaming Adam. He who would do battle with the many headed hydra of human nature must pay a world of pain and his family must pay it along with him! And only as you gasp your dying breath shall you understand your life amounted to no more than one drop in a limitless ocean!”
Yet what is any ocean but a multitude of drops?
The original is from the Library of Congress. If the name “Roebling” sounds familiar, it’s because this is the company, founded by John A. Roebling, that built the Brooklyn Bridge and setup a good business making cables, or wire rope.
The Roebling brothers suspected the fire was German sabotage. Given the activities of the German ambassador at the time, the claim has a whiff of plausibility. Of course, it could also have been aliens.
Version 1.91 of the http://schema.org vocabulary was released a few days ago, and I once again had a small part to play in it.
With the addition of the workExample and exampleOfWork properties, we (Richard Wallis, Dan Brickley, and I) realized that examples of these CreativeWork example properties were desperately needed to help clarify their appropriate usage. I had developed one for the blog post that accompanied the launch of those properties, but the question was, where should those examples live in the official schema.org docs? CreativeWork has so many children, and the properties are so broadly applicable, that it could have been added to dozens of type pages.
It turns out that an until-now unused feature of the schema.org infrastructure is that examples can live on property pages; even Dan Brickley didn't think this was working. However, a quick test in my sandbox showed that it _was_ in perfect working order, so we could locate the examples on their most relevant documentation pages... Huzzah!
I was then able to put together a nice, juicy example showing relationships between a Tolkien novel (The Fellowship of the Ring), subsequent editions of that novel published by different companies in different locations at different times, and movies based on that novel. From this librarian's perspective, it's pretty cool to be able to do this; it's a realization of a desire to express relationships that, in most library systems, are hard or impossible to accurately specify. (Should be interesting to try and get this expressed in Evergreen and Koha...)
In an ensuing conversation on public-vocabs about the appropriateness of this approach to work relationships, I was pleased to hear Jeff Young say "+1 for using exampleOfWork / workExample as many times as necessary to move vaguely up or down the bibliographic abstraction layers."... To me, that's a solid endorsement of this pragmatic approach to what is inherently messy bibliographic stuff.
Kudos to Richard for having championed these properties in the first place; sometimes we're a little slow to catch on!
Last updated September 12, 2014. Created by Peter Murray on September 12, 2014.
Log in to edit this page.
OpenWayback is an open source Java application designed to query and access archived web material. It was first released by the Internet Archive in September 2005, based on the (then) perl-based Internet Archive Wayback Machine, to enable public distribution of the application and increase its maintainability and extensibility. The Open Source Wayback Machine (OSWM) since then has been widely used by members of the International Internet Preservation Consortium (IIPC) and become the de facto rendering software for web archives.Package Links
- OpenWayback - 2.0.0 12-Sep-2014
2015 IIPC General Assembly
27-Apr-2015 to 1-May-2015
Library of Congress: The Signal: Teaching Integrity in Empirical Research: An Interview with Richard Ball and Norm Medeiros
This post is the latest in our NDSA Innovation Working Group’s ongoing Insights Interview series. Chelcie Rowell (Digital Initiatives Librarian, Wake Forest University) interviews Richard Ball (Associate Professor of Economics, Haverford College) and Norm Medeiros (Associate Librarian, Haverford Libraries) about Teaching Integrity in Empirical Research, or Project Tier.
Chelcie: Can you briefly describe Teaching Integrity in Empirical Research, or Project TIER, and its purpose?
Richard: For close to a decade, we have been teaching our students how to assemble comprehensive documentation of the data management and analysis they do in the course of writing an original empirical research paper. Project TIER is an effort to reach out to instructors of undergraduate and graduate statistical methods classes in all the social sciences to share with them lessons we have learned from this experience.
When Norm and I started this work, our goal was simply to help our students learn to do good empirical research; we had no idea it would turn into a “project.” Over a number of years of teaching an introductory statistics class in which students collaborated in small groups to write original research papers, we discovered that it was very useful to have students not only turn in a final printed paper reporting their analysis and results, but also submit documentation of exactly what they did with their data to obtain those results.
We gradually developed detailed instructions describing all the components that should be included in the documentation and how they should be formatted and organized. We now refer to these instructions as the TIER documentation protocol. The protocol specifies a set of electronic files (including data, computer code and supporting information) that would be sufficient to allow an independent researcher to reproduce–easily and exactly–all the statistical results reported in the paper. The protocol is and will probably always be an evolving work in progress, but after several years of trial and error, we have developed a set of instructions that our students are able to follow with a high rate of success.
Even for students who do not go on to professional research careers, the exercise of carefully documenting the work they do with their data has important pedagogical benefits. When students know from the outset that they will be required to turn in documentation showing how they arrive at the results they report in their papers, they approach their projects in a much more organized way and keep much better track of their work at every phase of the research. Their understanding of what they are doing is therefore substantially enhanced, and I in turn am able to offer much more effective guidance when they come to me for help.
Despite these benefits, methods of responsible research documentation are virtually, if not entirely, absent from the curricula of all the social sciences. Through Project TIER, we are engaging in a variety of activities that we hope will help change that situation. The major events of the last year were two faculty development workshops that we conducted on the Haverford campus. A total of 20 social science faculty and research librarians from institutions around the US attended these workshops, at which we described our experiences teaching our students good research documentation practices, explained the nuts and bolts of the TIER documentation protocol, and discussed with workshop participants the ways in which they might integrate the protocol into their teaching and research supervision. We have also been spreading the word about Project TIER by speaking at conferences and workshops around the country, and by writing articles for publications that we hope will attract the attention of social science faculty who might be interested in joining this effort.
We are encouraged that faculty at a number of institutions are already drawing on Project TIER and teaching their students and research advisees responsible methods of documenting their empirical research. Our ultimate goal is eventually to see a day when the idea of a student turning in an empirical research paper without documentation of the underlying data management and analysis is considered as aberrant as the idea of a student turning in a research paper for a history class without footnotes or a reference list.
Chelcie: How did TIER and your 10-year collaboration (so far!) get started?
Norm: When I came to the Haverford Libraries in 2000, I was assigned responsibility for the Economics Department. Soon thereafter I began providing assistance to Richard’s introductory statistics students, both in locating relevant literature as well as in acquiring data for statistical analysis. I provided similar, albeit more specialized, assistance to seniors in the context of their theses. Richard invited me to his classes and advised students to make appointments with me. Through regular communication, I came to understand the outcomes he sought from his students’ research assignments, and tailored my approach to meet these expectations. A strong working relationship ensued.
Meanwhile, in 2006 the Haverford Libraries in conjunction with Bryn Mawr and Swarthmore Colleges implemented DSpace, the widely-deployed open source repository system. The primary collection Haverford migrated into DSpace was its senior thesis archive, which had existed for the previous five years in a less-robust system. Based on the experience I had accrued to that point working with Richard and his students, I thought it would be helpful to future generations of students if empirical theses coexisted with the data from which the results were generated.
The DSpace platform provided a means of storing such digital objects and making them available to the public. I mentioned this idea to Richard, who suggested that not only should we post the data, but also all the documentation (the computer command files, data files and supporting information) specified by our documentation protocol. We didn’t know it at the time, but the seeds of Project TIER were planted then. The first thesis with complete documentation was archived on DSpace in 2007, and several more have been added every year since then.
Chelcie: You call TIER a “soup-to-nuts protocol for documenting data management and analysis.” Can you walk us through the main steps of that protocol?
Richard: The term “soup-to-nuts” refers to the fact that the TIER protocol entails documenting every step of data management and analysis, from the very beginning to the very end of a research project. In economics, the very beginning of the empirical work is typically the point at which the author first obtains the data to be used in the study, either from an existing source such as a data archive, or by conducting a survey or experiment; the very end is the point at which the final paper reporting the results of the study is made public.
The TIER protocol specifies that the documentation should contain the original data files the author obtained at the very beginning of the study, as well as computer code that executes all the processing of the data necessary to prepare them for analysis–including, for example, combining files, creating new variables, and dropping cases or observations–and finally generating the results reported in the paper. The protocol also specifies several kinds of additional information that should be included in the documentation, such as metadata for the original data files, a data appendix that serves as a codebook for the processed data used in the analysis and a read-me file that serves as a users’ guide to everything included in the documentation.
This “soup-to-nuts” standard contrasts sharply with the policies of academic journals in economics and other social sciences. Some of these journals require authors of empirical papers to submit documentation along with their manuscripts, but the typical policy requires only the processed data file used in the analysis and the computer code that uses this processed data to generate the results. These policies do not require authors to include copies of the original data files or the computer code that processes the original data to prepare them for analysis. In our view, this standard, sometimes called “partial replicability,” is insufficient. Even in the simplest cases, construction of the processed dataset used in the analysis involves many decisions, and documentation that allows only partial replication provides no record of the decisions that were made.
Complete instructions for the TIER protocol are available online. The instructions are presented in a series of web pages, and they are also available for download in a single .pdf document.
Chelcie: You’ve taught the TIER protocol in two main curricular contexts: introductory statistics courses and empirical senior thesis projects. What is similar or different about teaching TIER in these two contexts?
Richard: The main difference is that in the statistics courses students do their research projects in groups made up of 3-5 members. It is always a challenge for students to coordinate work they do in groups, and the challenge is especially great when the work involves managing several datasets and composing several computer command files. Fortunately, there are some web-based platforms that can facilitate cooperation among students working on this kind of project. We have found two platforms to be particularly useful: Dataverse, hosted by the Harvard Institute for Quantitative Social Science, and the Open Science Framework, hosted by the Center for Open Science.
Another difference is that when seniors write their theses, they have already had the experience of using the protocol to document the group project they worked on in their introductory statistics class. Thanks to that experience, senior theses tend to go very smoothly.
Chelcie: Can you elaborate a little bit about the Haverford Dataverse you’ve implemented for depositing the data underlying senior theses?
Norm: In 2013 Richard and I were awarded a Sloan/ICPSR challenge grant with which to promote Project TIER and solicit participants. As we considered this initiative, it was clear to us that a platform for hosting files would be needed both locally for instructors who perhaps didn’t have a repository system in place, as well as for fostering cross-institutional collaboration, whereby students learning the protocol in one participating institution could run replications against finished projects at another institution.
We imagined such a platform would need an interactive component, such that one could comment on the exactness of the replication. DSpace is a strong platform in many ways, but it is not designed for these purposes, so Richard and I began investigating available options. We came across Dataverse, which has many of the features we desired. Although we have uploaded some senior theses as examples of the protocol’s application, it was really the introductory classes for which we sought to leverage Dataverse. Our Project TIER Dataverse is available online.
In fall 2013, we experimented with using Dataverse directly with students. We sought to leverage the platform as a means of facilitating file management and communication among the various groups. We built Dataverses for each of the six groups in Richard’s introductory statistics course. We configured templates that helped students understand where to load their data and associated files. The process of building these Dataverses was time consuming, and at points we needed to jury rig the system to meet our needs. Although Dataverse is a robust system, we found its interface too complex for our needs. This fall we plan to use the Open Science Framework system to see if it can serve our students slightly better. Down the road, we can envision complementary roles for Dataverse and OSF as it relates to Project TIER.
Chelcie: After learning the TIER protocol, do students’ perceptions of the value of data management change?
Richard: Students’ perceptions change dramatically. I see this every semester. For the first few weeks, students have to do a few things to prepare to do what is required by the protocol, like setting up a template of folders in which to store the documentation as they work on the project throughout the semester, and establishing a system that allows all the students in the group to access and work on the files in those folders. There are always a few wrinkles to work out, and sometimes there is a bit of grumbling, but as soon as students start working seriously with their data they see how useful it was to do that up-front preparation. They realize quickly that organizing their work as prescribed by the protocol increases their efficiency dramatically, and by the end of the semester they are totally sold–they can’t imagine doing it any other way.
Chelcie: Have you experienced any tensions between developing step-by-step documentation for a particular workflow and technology stack versus developing more generic documentation?
Richard: The issue of whether the TIER protocol should be written in generic terms or tailored to a particular platform and/or a particular kind of software is an important one, but for the most part has not been the source of any tensions. All of the students in our introductory statistics class and most of our senior thesis advisees use Stata, on either a Windows or Mac operating system. The earliest versions of the protocol were therefore written particularly for Stata users, which meant, for example, we used the term “do-file” instead of “command file,” and instead of saying something like “a data file saved in the proprietary format of the software you are using” we would say “a data file saved in Stata’s .dta format.”
But fundamentally there is nothing Stata-specific about the protocol. Everything that we teach students to do using Stata works just fine with any of the other major statistical packages, like SPSS, R and SAS. So we are working on two ways of making it as easy as possible for users of different software to learn and teach the protocol. First, we have written a completely software-neutral version. And second, with the help of colleagues with expertise in other kinds of software, we are developing versions for R and SPSS, and we hope to create a SAS version soon. We will make all these versions available on the Project TIER website as they become available.
The one program we have come across for which the TIER protocol is not well suited is Microsoft Excel. The problem is that Excel is an exclusively interactive program; it is difficult or impossible to write an editable program that executes a sequence of commands. Executable command files are the heart and soul of the TIER protocol; they are the tool that makes it possible literally to replicate statistical results. So Excel cannot be the principal program used for a project for which the TIER documentation protocol is being followed.
Chelcie: What have you found to be the biggest takeaways from your experience introducing a data management protocol to undergraduates?
Richard: In the response to the first question in this interview, I described some of the tangible pedagogical benefits of teaching students to document their empirical research carefully. But there is a broader benefit that I believe is more fundamental. Requiring students to document the statistical results they present in their papers reinforces the idea that whenever they want to claim something is true or advocate a position, they have an intellectual responsibility to be able to substantiate and justify all the steps of the argument that led them to their conclusion. I believe this idea should underlie almost every aspect of an undergraduate education, and Project TIER helps students internalize it.
Chelcie: Thanks to funding from the Sloan Foundation and ICPSR at the University of Michigan, you’ve hosted a series of workshops focused on teaching good practices in documenting data management and analysis. What have you learned from “training the trainers”?
Richard: Our experience with faculty from other institutions has reinforced our belief that the time is right for initiatives that, like Project TIER, aim to increase the quality and credibility of empirical research in the social sciences. Instructors frequently tell us that they have thought for a long time that they really ought to include something about documentation and replicability in their statistics classes, but never got around to figuring out just how to do that. We hope that our efforts on Project TIER, by providing a protocol that can be adopted as-is or modified for use in particular circumstances, will make it easier for others to begin teaching these skills to their students.
We have also been reminded of the fact that faculty everywhere face many competing demands on their time and attention, and that promoting the TIER protocol will be hard if it is perceived to be difficult or time-consuming for either faculty or students. In our experience, the net costs of adopting the protocol, in terms of time and attention, are small: the protocol complements and facilitates many aspects of a statistics class, and the resulting efficiencies largely offset the start-up costs. But it is not enough for us to believe this: we need to formulate and present the protocol in such a way that potential adopters can see this for themselves. So as we continue to tinker with and revise the protocol on an ongoing basis, we try to be vigilant about keeping it simple and easy.
Chelcie: What do you think performing data management outreach to undergraduate, or more specifically TIER as a project, will contribute to the broader context of data management outreach?
Richard: Project TIER is one of a growing number of efforts that are bubbling up in several fields that share the broad goal of enhancing the transparency and credibility of research in the social sciences. In Sociology, Scott Long of Indiana University is a leader in the development of best practices in responsible data management and documentation. The Center for Open Science, led by psychologists Brian Nosek and Jeffrey Spies of the University of Virginia, is developing a web-based platform to facilitate pre-registration of experiments as well as replication studies. And economist Ted Miguel at UC Bekeley has launched the Berkeley Initiative for Transparency in the Social Sciences (BITSS), which is focusing its efforts to strengthen professional norms of research transparency by reaching out to early career social scientists. The Inter-university Consortium for Political and Social Research (ICPSR), which for over 50 year has served as a preeminent archive for social science research data, is also making important contributions to responsible data stewardship and research credibility. The efforts of all these groups and individuals are highly complementary, and many fruitful collaborations and interactions are underway among them. Each has a unique focus, but all are committed to the common goal of improving norms and practices with respect to transparency and credibility in social science research.
These bottom-up efforts also align well with several federal initiatives. Beginning in 2011, the NSF requires all proposals to include a “data management plan” outlining procedures that will be followed to support the dissemination and sharing of research results. Similarly, the NIH requires all investigator-initiated applications with direct costs greater than $500,000 in any single year to address data sharing in the application. More recently, in 2013 the White House Office on Science and Technology Policy issued a policy memorandum titled “Increasing Access to the Results of Federally Funded Scientific Research,” directing all federal agencies with more than $100 million in research and development expenditures to establish guidelines for the sharing of data from federally funded research.
Like Project TIER, many of these initiatives have been launched just within the past year or two. It is not clear why so many related efforts have popped up independently at about the same time, but it appears that momentum is building that could lead to substantial changes in the conduct of social science research.
Chelcie: Do you think the challenges and problems of data management outreach to students will be different in 5 years or 10 years?
Richard: As technology changes, best practices in all aspects of data stewardship, including the procedures specified by the TIER protocol, will necessarily change as well. But the principles underlying the protocol–replicability, transparency, integrity–will remain the same. So we expect the methods of implementing Project TIER will continually be evolving, but the aim will always be to serve those principles.
Chelcie: Based on your work with TIER, what kinds of challenges would you like for the digital preservation and stewardship community to grapple with?
Norm: We’re glad to know that research data are specifically identified in the National Agenda for Digital Stewardship. There is an ever-growing array of non-profit and commercial data repositories for the storage and provision of research data; ensuring the long-term availability of these is critical. Although our protocol relies on a platform for file storage, Project TIER is focused on teaching techniques that promote transparency of empirical work, rather than on digital object management per se. This said, we’d ask that the NDSA partners consider the importance of accommodating supplemental files, such as statistical code, within their repositories, as these are necessary for the computational reproducibility advocated by the TIER protocol. We are encouraged by and grateful to the Library of Congress and other forward-looking institutions for advancing this ambitious Agenda.
Last updated September 12, 2014. Created by Peter Murray on September 12, 2014.
Log in to edit this page.
The International Image Interoperability Framework community (http://iiif.io/) is hosting a one day information sharing event about the use of images in and across Cultural Heritage institutions. The day will focus on how museums, galleries, libraries and archives, or any online image service, can take advantage of a powerful technical framework for interoperability between image repositories.
LITA is offering a special student registration rate to the 2014 LITA National Forum for a limited number of graduate students enrolled in ALA accredited programs. The Forum will be held November 5-8, 2014 at the Hotel Albuquerque in Albuquerque, NM. Learn more about the Forum here.
In exchange for a discounted registration, students will assist the LITA organizers and the Forum presenters with on-site operations. This year’s theme is “Transformation: From Node to Network.” We are anticipating an attendance of 300 decision makers and implementers of new information technologies in libraries.
The selected students will be expected to attend the full LITA National Forum, Thursday noon through Saturday noon. This does not include the pre-conferences on Thursday and Friday. You will be assigned a variety of duties, but you will be able to attend the Forum programs, which include 3 keynote sessions, 30 concurrent sessions, and a dozen poster presentations.
The special student rate is $180 – half the regular registration rate for LITA members. This rate includes a Friday night reception at the hotel, continental breakfasts, and Saturday lunch. To get this rate you must apply and be accepted per below.
To apply for the student registration rate, please provide the following information:
- Complete contact information including email address,
- The name of the school you are attending, and
- 150 word (or less) statement on why you want to attend the 2014 LITA Forum
Please send this information no later than September 30, 2014 to firstname.lastname@example.org, with 2014 LITA Forum Student Registration Request in the subject line.
Those selected for the student rate will be notified no later than October 3, 2014.
Do you have a skill to share? Want to host an online discussion/debate about an Open Knowledge-like topic? Have an idea for a skillshare or discussion, but need help making it happen? Some of you hosted or attended sessions at OKFest. Why not host one online? At OKFestival, we had an Open Matchmaker wall to connect learning and sharing. This is a little experiment to see if we can replicate that spirit online. We’d love to collaborate with you to make this possible.
We’ve set up a Community Trello board where you can add ideas, sign up to host or vote for existing ideas. Trello, a task management tool, has fairly simple instructions.
Hosting or leading a Community Session is fairly easy. You can host it via video or even as an editathon or a IRC chat.
- For video, we have been using G+. We can help you get started on this.
- For Editathons, you could schedule it, share on your favourite communications channel and then use a shared document like a google doc or an etherpad.
- For an IRC chat, simply set up a topic, time and trello card to start planning.
We highly encourage you to do the sessions in your own language.Upcoming Community Sessions
We have a number of timeslots open for September – October 2014. We will help you get started and even co-host a session online. As a global community, we are somewhat timezone agnostic. Please suggest a time that works for you and that might work with others in the community.
In early October, we will be joined by Nika Aleksejeva of Infogr.am to do a Data Viz 101 skillshare. She makes it super easy for beginners to use data to tell stories.
The Data Viz 101 session is October 8, 2014. Register here.Community Session Conversation – September 10, 2014
In this 40 minute community conversation, we brainstormed some ideas and talked about some upcoming community activities:
As mentioned in a previous post, LITA is beginning a series of informal discussions to let members voice their thoughts about the current strategic goals of LITA. These “kitchen table talks” will be lead by President Rachel Vacek and Vice-President Thomas Dowling.
The kitchen table talks will discuss LITA’s strategic goals – collaboration and networking; education and sharing of expertise; advocacy; and infrastructure – and how meeting those goals will help LITA better serve you. The talks also align with ALA’s strategic planning process and efforts to communicate the association’s overarching goals of professional development, information policy, and advocacy.When
- ONLINE: Friday, September 19, 2014, 1:30-2:30 pm EDT
- ONLINE: Tuesday, October 14, 2014, 12:00-1:00 pm EDT
- IN-PERSON: Friday, November 7, 2014, 6:45-9:00 pm MDT at the LITA Forum in Albuquerque, NM
On the day and time of the online events, join in on the conversation in this Google Hangout.
We look forward to the conversations!