It has been exactly 10 years today since I started Information Wants to be Free. My life has changed in so many ways since then. I’m not sure I really had a vision of where I’d be at 37, but I don’t think it looked quite like this (I certainly never guessed I’d be living on the West Coast!). Back then, I thought climbing the professional ladder was important. I wanted to be in charge. I was impatient to change everything. Now, I just want a job I enjoy that challenges me and to work with people I like. I have that now and I’ve achieved more professionally than I could ever have thought possible back when I was an unemployed new librarian. I feel very lucky.
I started this blog as a newlywed in my mid-20s, about to graduate library school. I initially wrote about my frustrations with the job market, my experiences job-hunting, emerging social technologies I found interesting, and other professional trends. I wrote 300 posts in that first year; a number which I now find staggering (then again, I was unemployed and didn’t have a child, so I did have more time on my hands). From all that writing and commenting on other blogs, I became part of this incredible community of other bloggers and commenters. I found kindred spirits at a time when I needed them most. And while many of those people have left blogging for things like Twitter, Facebook, and FriendFeed, I still value this medium far more than any other and am glad for the bloggers who still challenge me and make me think (I’m also still glad to call many of those lapsed bloggers friends).
My views on many things have changed over the years. Reading some of my old posts makes me cringe. I’ve made mistakes. I written dumb things. But I’m kind of glad all of my mistakes are up there in black-and-white; it reminds me of how far I’ve come. What has been constant is that I’ve been a voice against groupthink, against labels, and in favor of charitable reading, even when my opinions have set me in opposition to people I respect and admire. And that will never change.
Why do I still blog? First of all, I’m a very slow thinker. I’m not good at the witty comeback, especially not in 140 characters. I use blogging to work out ideas and make sense of things, and I frequently find that I understand my own feelings on a topic better after I’ve written about them. I also love writing (couldn’t you guess?). I have been writing songs, poetry, short stories, and non-fiction since I could hold a pen and have found this medium to be a perfect fit for me. Finally, people still tell me that the things I write are useful; that my blog posts have helped them work out their own thoughts on things or that they felt good to find that someone else shared their opinion. When you can find something you enjoy doing that other people value… well, it’s a match made in heaven.
So, while you’re never going to find me writing 300 posts a year, I plan to keep this blog going as long as I have readers. And heck, maybe longer than that, since I find value in it for myself.
Thank you for reading, especially those of you who really made this feel like a community over the years. I know some of you have been reading this blog for pretty close to a decade and that stuns me. I’m not a particularly interesting or charismatic person. I’m an introvert who leads a pretty unexceptional life and just happens to share her opinions online. I’m always startled when someone tells me how excited they are to meet me at a conference or when someone I just met acts like they know me simply because they read my blog. I’m so much more and so much less than what you might think I am based on what you read here.
I’m not a rock star, though at one point, I mostly (or completely, I was probably a jerk at some point) fit the description of one. I had some lucky breaks and also worked very hard for what I’ve achieved. I think the bulk of my success can be attributed to one thing: chutzpah. I suffer from horrible, almost crippling impostor syndrome, but I have always been the sort of person who’d rather try and fail than not try at all. I was always the girl who would tell the guy I liked him, even if I thought I didn’t have a snowball’s chance in hell. The worst case scenario in that situation never seemed all that bad. And that’s how I’ve approached my career. At a lunch at ALA when I was teaching the staff about wikis in 2006, I told the head of ALA Publishing “you know… y’all should really give me a column in American Libraries. I could help spice up the magazine.” Next thing you know, I had a column. I thought the idea I had for Library DIY was probably stupid, but I figured it’s better to put it out there and get rejected than to let a potentially awesome idea sit. I and the team that helped make it happen ended up winning an award for it and it has been replicated by some major academic libraries and other institutions (like NPR!).
I guess what I’m saying is that, frequently, you have to make your own opportunities. Things rarely fall into anyone’s lap, so if you’re frustrated that you’re not getting x and other people are, you may have to go out there and get it. But also, many of those things really aren’t as wonderful and shiny as you think. They don’t guarantee a life of happiness nor are they a worthwhile stick for which to measure your worth. Life sans banana slicer (and for the record, I wasn’t cool or shiny enough to receive one) is just fine. As Karen said “The odds are you’re amazing anyway.”
So here’s to another decade of blogging, if I still have anything useful to say by then, and thanks for sticking around. It’s been a true pleasure to share myself with you (and get to know many of you) these 10 years.
I know it’s been a while since I last posted. I’ve almost written a few posts on the vitriol I’ve been seeing from librarians on social media over the past couple of months, but in the end, I decided it was better not to. All I’ll say is that I expect a lot more tolerance, charitable reading, and critical thinking from librarians. I know most librarians are exemplars of all those things, but it seems like Twitter and Facebook bring out the rush-to-judgment-and-grab-a-pitchfork mentality in many normally level-headed people.
Besides, I’d much rather write about happy things. Like my job. It’s great! I spent the first month and a half feeling like a huge weight had been lifted off my shoulders; like I could breathe again. It’s not like my job here is all lollipops and rainbows. I do a lot of teaching and I spend a lot of time at the reference desk. I’m a liaison and do collection development for my departments. In between, I’m involved in projects like implementing LibGuides (again!) and making a cool evaluating sources video with a colleague. It’s a very meat-and-potatoes public services librarian job. But I like meat and potatoes. And I feel like I’ve won the lottery being here.
Part of it is the commitment to supporting student success that I see from everyone who works at PCC. Everyone knows why they are here. Their priorities are in the right place. That’s not to say that there aren’t wacky things here (like every other academic institution on earth), but the overarching goal of every unit seems to be to help students be successful. It’s nice to work somewhere where the values of the institution are so consistent with my own.
Part of it is the students. They are a pleasure to work with and I feel very fulfilled by my interactions with them at the reference desk and in the classroom. PCC is very diverse, but I’m seeing a lot more students who clearly have been marginalized or underestimated or beaten up by life. And I see how strong and bright so many of these students are and how successful they will be if they just allow themselves to believe it. So much of what I see in terms of student weaknesses is a lack of experience and a lack of self-efficacy, not at all a lack of ability. I’m still getting a handle on how to tailor my teaching to community college students — I didn’t think it would be that different and I was wrong — but I’m enjoying the learning curve. It’s kind of refreshing to feel like a beginner again.
A big part of it is my colleagues. They are just an amazingly nice, engaged, committed, positive, and thoughtful bunch of professionals. I’m sure we’ve all worked with at least one other person in the past who sees their job as a stepping stone to something better. And it’s clear that their commitment is not to the institution but to their own ambitions and whatever will further those. I do not have the sense that any of my colleagues are trying to climb a ladder. They are driven more by their commitment to students and faculty than by a desire to get ahead or get promoted. And because of that, being collegial and team-oriented is a no-brainer. Tearing someone else down will provide no benefit in an environment like this.
Now that I’m past the wide-eyed “am I really here?” stage, it’s starting to just feel like “my job.” And that’s a good thing. I look forward to building the sorts of relationships with students and faculty that I had at my last two library jobs. It takes time to be able to do really meaningful stuff at any institution (I used to say progress in improving library instruction can be measured in geologic time) and, at this point in my career, I don’t feel that same impatience to do the big exciting innovative thing right this minute. But I am very much looking forward to a time when I know all the acronyms and what my departments are up to and how I can best support them.
And I’m just happy to be able to write about my work again. Jenica wrote about how isolating it is to be a director sometimes because there are so many things she wants to blog about but can’t. I’d say the same can be said about being unhappy in your job. Writing about the details on social media is just about the most impolitic thing you can do (and I can say this because I made this mistake early in my career and fortunately had a wonderful director who supported me through that teachable moment). At a time when work was making me feel despondent, I didn’t feel I could reach out much to the network of wonderful friends I’ve made over the past decade on social media. So it’s nice to feel like I can be myself again and write freely. Talk about a weight lifted.
Photo credit: Weight lifter
Winchester, MA In the coming weeks DuraSpace will take a closer look at some of the key features that will be available to the community in DSpace 5.0.
DuraSpace News: SLIDES AVAILABLE: New England National Digital Stewardship Alliance Regional Meeting
Winchester, MA The New England National Digital Stewardship Alliance regional meeting was held on October 30 and hosted by the Five College Digital Preservation Task Force at the University of Massachusetts, Amherst. Michele Kimpton, CEO, DuraSpace gave a presentation that reviewed the DuraCloud/Archivematica Pilot and offered details on how this end-to-end preservation solution meets all the needs identified by POWRR (Preserving Digital Objects with Restricted Resources).
Lorcan’s recent blog post on “Research information management systems – a new service category?” has drawn the attention of some of the euroCRIS board members and so I was invited to attend their strategic management meeting in Amsterdam, 11-12 November 2014.
The meeting brought together 1) Research Managers and Administrators (they form a vibrant profession of their own: see also EARMA), 2) University Librarians (repository managers and data librarians working for Research Support) and 3) vendors of CRIS systems (Elsevier, Thomson Reuters, CINECA, etc.). There were ca. 140 attendees from across Europe (including UK, NL, France, Italy, Spain, Greece, Germany, Scandinavia, Belgium, Serbia, Czech Republic).
euroCRIS gets a rich variety of stakeholders to the podium
euroCRIS is an organization that is “dedicated to the development of Research Information Systems and their interoperability”, it maintains the CERIF metadata standard for CRIS systems and it acts as a forum for stakeholders of RIM (see membership). They have strong support from the EC, which recommends/mandates the use of CERIF. Not surprisingly, the theme of the meeting was about interoperability and standards in Research Information (RI).
Much used adaptation of a Swedish cartoon to visualize infrastructure incompatibility –
presented to euroCRIS meeting by Ed Simons, November 2014.
The introduction to the theme was a co-presentation by euroCRIS President (Ed Simons, Nijmegen University) together with David Baker (CASRAI) and Josh Brown (ORCID) – demonstrating the will of euroCRIS to advance interoperability through strategic partnerships with stakeholders in the field.
What impressed me most was the breadth of the RI-domain and its stakeholders’ ecosystem: in his presentation Ed listed funders, researchers, research managers and administrators, peer-reviewers and research evaluators, libraries, etc.; and all the presentations, during the 2-days meeting, reflected that same broad perspective. Even though not all stakeholders were represented at the meeting, they were clearly regarded as interlocutors and invited to the podium.
A maturity issue?
Friedrich Summann, from the University Library of Bielefeld and representing COAR, highlighted the interoperability issues between CRIS and IR-systems. Concerning publication metadata, which is the common denominator of the data held in these systems, he noted there is very little exchange taking place. CRIS-systems do not expose CERIF-data and they generally do not support harvesting protocols (except for Pure, which supports OAI-PMH). He observed 3 trends around the perceived “dichotomy” between the CRIS and the IR-system: 1) using the CRIS as an IR, 2) using the IR as a CRIS and 3) combining both the CRIS and the IR in a symbiotic relationship. He touched lightly on the different purpose of each system: for the IR, visibility and OA; for the CRIS, research information management – which seemed to justify an evolution to a symbiotic ecosystem instead of a standards-driven integrated system. During the meeting, the increasing complexity of the emerging RI-infrastructure, with many more different systems than just the CRIS and IR being tied in, became evident as each presentation had a slide similar to this one (this one was not presented … but I like it, because it is prototypical).
There were a lot of slides with bullet lists of needs. It was clear that in all use cases, researchers are important users of CRIS and IR systems because they have to supply the research information. There is a strong awareness that the systems should be simple and easy to use for them. Another mantra was: “Researchers should not be required to supply the same information more than once”. Nevertheless, it was equally clear from the presentations, that the researchers are not the end-users for whom the systems are designed and whose needs were listed on the slides. The needs come from the research managers, the funders, the government policies and mandates, the research assessment exercises, etc., and those needs have not been sorted out. The vendors at the meeting politely, but repeatedly, asked for robust use cases. It reminded me of what Ed Simons said at the beginning of the meeting: “There are no standard use cases in the RI-domain: we are still growing our own vegetables”. This was exactly the feeling I had after these 2 days: the RI-domain is not mature yet and it has no chance to mature because it keeps expanding at the rate of the universe’s expansion.
A nice example of how overwhelming and at the same time exciting, RI-developments are becoming for libraries in the UK was given by Anna Clements’ presentation. Anna is from the University Library of St Andrews and carries many different hats as board member of euroCRIS, chair of the CASRAI working group on data management planning and chair of the Pure UK strategy group. She explained that since the last Research Excellence Framework (REF)-assessment in 2008 in the UK, huge investments have been made in CRIS-systems – for example, in linking publications to project information. She anticipates that the next assessment-driven CRIS-development stage will require investments in linking datasets to articles and funding. At St Andrews, they are re-designing the use of their CRIS system to support new REF-requirements and they are currently contemplating to integrate the deposit of the long tail of small datasets in the CRIS. For this new workflow they will also need a data repository with access storage and archive storage (she mentioned Arkivum) and a “data librarian” to assist researchers with the deposit process and the provision of good metadata.
John Donovan (EARMA Chair and Head of Research at Dublin Institute of Technology in Ireland) gave an intriguing short talk. He showed an endless list of sources from which research information was collected (Research support pre- and post-award; Research Finance; Graduate school; Ethics and Integrity; Structured Postdoc training; Research HR; Research awareness raising; etc.) and then he said: “we collect information from so many different sources, that it is completely unsustainable”. John is currently interested in what makes research sustainable in new, small universities – his perspective may be somewhat biased, still he raises a legitimate issue: it seems the fever of registering data has overtaken the need to be informed. However, the next day, Julia Lane was going to give us the big data perspective of RI and remind us that the scientific approach will push us further down the road of RI.
Is RI being taken over by Science?
The keynote by Julia Lane (Senior Managing Economist, American Institutes for Research) stretched the policy perspective to its logical extreme, introducing the need for a “science of science policy” to answer the big questions: “How much does a nation spend on science” and “what is the return to investment”? It is about making science metrics more scientific and gathering scientific evidence to better understand what the effects are of funding research. To this end Julia and her team developed the STAR METRICS program.
They are looking at the process of how funding creates output: Funding goes to institutions that employ and provide infrastructure to people who, with their knowledge and skills, produce outputs. Her team collects and analyzes data around this process (grant funds, HR records, financial transactions records, awards data, email, publications, blogs, etc.) – the data are not standardized but can be combined and mined – giving interesting results that help unpack how research is being done. They use external sources as well (Census Bureau data, LinkedIn data, etc.). In this way they can link the data to where people get jobs, start up businesses and to workforce growth in the proximity of scientific hubs. Their findings confirm that the majority of the impact of funded research is regional. They also observe that the vast majority of knowledge transmission is through human interactions and clearly not through paper and publications. If social networks are a major vehicle for knowledge transfer, we should start understanding (and measuring!) how people interact. That starts sounding creepy to me.
The presentations by the university and government representatives giving a policy perspective, René Hageman (VSNU-Dutch Association of Universities) and Geert van Grootel (Flemish Government, Dpt of Economy, Science and Innovation), hinted at what policy makers dream about, in terms of getting a 360 degree view of RI. But their thinking was confined within the safe boundaries of the CRIS. Or the FRIS – in Flemish speak. “When a research project goes into execution, then the data automatically goes into FRIS. FRIS will continually monitor KPIs.”
A word of caution from the evaluation and benchmarking perspective
Paul Wouters (CWTS) was the perfect speaker to question the KPI-rush and to give us a scientific critique of research evaluation methods. He quoted Peter Dalher Larsen (The Evaluation Society): “Evaluation has become a profession on itself”. Data has become input for “evaluation machines” – to make stuff auditable. The trend towards mechanisation of control and standardization leads to less variety and diversity of scientific discovery practices. He argued that academics need to be in the driver’s seat and ask themselves: how can we monitor our research? How can we profile ourselves to attract the right students and staff? How should we divide funds? What is our scientific/societal impact? Instead of being “just” data-suppliers and subject to evaluation, they need to become full-partners in the emerging RIM landscape.
The RDM perspective
There were more presentations, giving the perspectives of several other stakeholder communities: the funders, the libraries, the data archives. Surprisingly, there were few attendees representing the RDM community. DANS (a national research data archiving institution in the Netherlands), who hosted the euroCRIS meeting in Amsterdam, was the notable exception. Peter Doorn’s presentation was interesting because it showed how DANS is adapting its mission and ambitions to the changing landscape of stewardship opportunities. Peter described the mission of DANS as “to provide permanent access to Research Information”. A major part of their focus is still RDM, but they are moving into the broader RIM space. Concerning RDM, which he defined as “how you organize/curate the data during the research project and afterwards”, he mentioned explicitly that for DANS the focus of stewardship during research is new. A noteworthy shift.
Re-reading Lorcan’s blog on RIM
After attending the euroCRIS meeting, I re-read Lorcan’s blog and its title makes much more sense to me now. Indeed there are many signals that this is an emerging new service category. There are many vendors out there, signaling that there is a market for RIM-systems. They are looking for robust use cases to develop their products and services, but the RI-space seems to be evasive, as it continues to expand and new demands and needs keep piling up. The sources for collecting data keep diversifying and their numbers growing. RIM is moving into the data science domain and this opens up new perspectives. It also begs the question if it is necessary to register data anew, when it is sitting somewhere in other systems? Data aggregation and data mining seem to be able to provide the business intelligence policymakers and funders are seeking.
Libraries are engaged in RIM. In Europe more so than in the U.S., because of the national governments and EC policies towards Open Access and Open Data and the drive to register data that informs the impact of such policies. What struck me though, was that the euroCRIS meeting presentations touched on standardization and interoperability issues in a way reminiscent of the library automation meetings (ELAG-like) conducted 20 years ago: promoting layered architecture models, the full-implementation of standards, the need for evangelists to persuade governments to impose standards, etc. Libraries can help jump-start the RIM-discussion and OCLC could certainly contribute (there are many potential areas: aggregation, extracting knowledge from data, name authorities and name disambiguation, etc.).About Titia van der Werf
Titia van der Werf is a Senior Program Officer in OCLC Research based in OCLC's Leiden office. Titia coordinates and extends OCLC Research work throughout Europe and has special responsibilities for interactions with OCLC Research Library Partners in Europe. She represents OCLC in European and international library and cultural heritage venues.Mail | Web | More Posts (1)
Notes on converting this Github user page based site to Pelican, a Python based static site generator.
Today I found the following resources and bookmarked them on <a href=
- PyKota Open Source print management
Digest powered by RSS Digest
- RDA Print Survey
- E-book reading on the rise
- ATO2014: Building a premier storytelling platform on open source
To what extent is it important to get familiar with our environment?
If we think about how the world surrounding us has changed throughout the years, it is not so unreasonable that, while walking to work, we might encounter some new little shops, restaurants, or gas stations we had never noticed before. Likewise, how many times did we wander about for hours just to find green spaces for a run? And the only one we noticed was even more polluted than other urban areas!
Citizens are not always properly informed about the evolution of the places they live in. And that is why it would be crucial for people to be constantly up-to-date with accurate information of the neighborhood they have chosen or are going to choose.
(Image source: London Evening Standard)
London is a neat evidence of how transparency in providing data is basic in order to succeed as a Smart City. The GLA’s London Datastore, for instance, is a public platform of datasets revealing updated figures on the main services offered by the town, in addition to population’s lifestyle and environmental risks. These data are then made more easily accessible to the community through the London Dashboard.
The importance of dispensing free information can be also proved by the integration of maps, which constitute an efficient means of geolocation. Consulting a map where it’s easy to find all the services you need as close as possible can be significant in the search for a location.
(Image source: Smart London Plan)
The Global Open Data Index, published by Open Knowledge in 2013, is another useful tool for data retrieval: it showcases a rank of different countries in the world with scores based on openness and availability of data attributes such as transport timetables and national statistics.
As it was stated, making open data available and easily findable online not only represented a success for US cities but favoured apps makers and civic hackers too. Lauren Reid, a spokesperson at Code for America, reported according to Government Technology: “The more data we have, the better picture we have of the open data landscape.”
That is, on the whole, what Place I Live puts the biggest effort into: fostering a new awareness of the environment by providing free information, in order to support citizens willing to choose the best place they can live.
The outcome is soon explained. The website’s homepage offers visitors the chance to type address of their interest, displaying an overview of neighborhood parameters’ evaluation and a Life Quality Index calculated for every point on the map.
The research of the nearest medical institutions, schools or ATMs thus gets immediate and clear, as well as the survey about community’s generic information. Moreover, data’s reliability and accessibility are constantly examined by a strong team of professionals with high competence in data analysis, mapping, IT architecture and global markets.
For the moment the company’s work is focused on London, Berlin, Chicago, San Francisco and New York, while higher goals to reach include more than 200 cities.
US Open Data Census finally saw San Francisco’s highest score achievement as a proof of the city’s labour in putting technological expertise at everyone’s disposal, along with the task of fulfilling users’ needs through meticulous selections of datasets. This challenge seems to be successfully overcome by San Francisco’s new investment, partnering with the University of Chicago, in a data analytics dashboard on sustainability performance statistics named Sustainable Systems Framework, which is expected to be released in beta version by the the end of 2015’s first quarter.
(Image source: Code for America)
Another remarkable collaboration in Open Data’s spread comes from the Bartlett Centre for Advanced Spatial Analysis (CASA) of the University College London (UCL); Oliver O’Brien, researcher at UCL Department of Geography and software developer at the CASA, is indeed one of the contributors to this cause. Among his products, an interesting accomplishment is London’s CityDashboard, a real-time reports’ control panel in terms of spatial data. The web page also allows to visualize the whole data translated into a simplified map and to look at other UK cities’ dashboards.
Plus, his Bike Share Map is a live global view to bicycle sharing systems in over a hundred towns around the world, since bike sharing has recently drawn a greater public attention as an original form of transportation, in Europe and China above all.
O’Brien’s collaboration with James Cheshire, Lecturer at UCL CASA, furthermore gave life to a groundbreaking project called DataShine, aimed to develop the use of large and open datasets within the social science community through new means of data’s visualisation, starting from a mapping platform with 2011 Census data, followed by maps of individual census tables and the new Travel to Work Flows table.
(Image source: Suprageography)
The holidays are upon us, LITA Blog readers. As we all wind down end of year tasks and prepare for our own celebrations, this final installment of Tech Yourself Before You Wreck Yourself for 2014 is my way of saying thanks. Thanksgiving is maybe my favorite holiday- I love the way in which it is casual, hangout-focused, and food-intensive- but I also love the tone of gratitude that colors it. So, let me express how grateful I am for all of you, reading this blog and supporting our efforts. Thank you for being there.
For the uninitiated, Tech Yourself Before You Wreck Yourself (TYBYWY) is a monthly selection of free webinars, classes, and other education opportunities for the aspiring technologist and the total newbie alike.
The Monthly MOOC
If, like so many of us, you’re intrigued by use of gamification in content design and delivery, Coursera’s perennially popular MOOC on the subject is open starting January 26th. Make your New Year’s resolution to educate yourself on this powerful outreach method. It’s particularly interesting from a training/instructional design perspective.
OpenCon has posted its 2014 Webcast Round-Up, and the resources there are excellent if you are trying to learn more about Open Access.
I know that I’ve mentioned them in past post, but Library Journal’s Webcast series has been stepping up its game recently. These programs are on my docket, and you should consider attending too:
Two Cool Gigs:
Interested in in pursuing a career in media archives and social justice? Consider this paid internship in Democracy Now!’s Archives. Application deadline 11/15.
Another option, NPR’s Library Archives has a paid internship. Get on it and apply by 11/21!
Tech On, TYBYWYers-
Happy Thanksgiving! TYBYWY will return 12/12. As always, let me know if you have any questions or suggestions. Leave a message here or catch me on Twitter, @linds_bot.
I'm David Rosenthal and I'm a customer. This will be a very short talk making one simple point, which is the title of the talk:
Storage Will Be
Much Less Free
Than It Used To BeMy five minutes of fame happened last Monday when Chris Mellor at The Register published this piece, with a somewhat misleading title. It is based on work I had been blogging about since at least 2011, ever since a conversation at the Library of Congress with Dave Anderson of Seagate. For the last 16 years I've been working at Stanford Library's LOCKSS Program on the problem of keeping data safe for the long term. There are technical problems, but the more important problems are economic. How do you fund long-term preservation?
Working with students at UC Santa Cruz's Storage Systems Research Center I built an economic model of long-term storage. Here is an early version computing the net present value of the expenditures through time to keep an example dataset for 100 years, the endowment for short, as the rate at which storage gets cheaper, the Kryder rate for short, varies. The different lines reflect media service lives of 1 to 5 years.
At the historic 30-40%/year we are in the flat part of the graph, where the endowment is low and it doesn't vary much with the Kryder rate. This meant that long-term storage was effectively free; if you could afford to store the data for a few years, you could afford to store it "for ever" because the cost of storing it for the rest of time would have become negligible.
But suppose the Kryder rate drops below about 20%/year. We are in the steep part of the graph where the endowment needed is much higher and depends strongly on the precise Kryder rate. Which, of course, we are not going to know, so the cost of long-term storage becomes much harder to predict.
We don't have to suppose. This graph, from Preeti Gupta at UCSC, shows that in 2010, before the floods in Thailand, the Kryder rate had dropped. Right now, disk is about 7 times as expensive as would have been predicted in 2010. The red lines show the range of industry projections going forward, 10-20%/year. In 2020 disk is projected to be between 100 and 300 times as expensive as would have been projected in 2010. As my first graph showed, this is a big deal for anyone who needs to keep data for the long term.
No-one should be surprised that in the real world exponential curves can't go on for ever. Here is Randall Munroe's explanation. In the real world exponential growth is always the first part of an S-curve.
Why has the Kryder rate slowed? This 2009 graph from Seagate shows that what looks like a smooth Kryder graph is actually the superimposition of a series of S-curves, one for each technology. One big reason for the slowing is technical, each successive technology transition gets harder - the long delay in getting HAMR into production is the current example. But this has economic implications. Each technology transition is more expensive, so the technology needs to remain in the market longer to earn a return on the investment. And the cost of the transition drives industry consolidation, so we now have only a little over 2 disk manufacturers. This has transformed disks from a very competitive, low-margin business into a stable 2-vendor one with reasonably good margins. Increasing margins slows the Kryder rate.
This isn't about technology "hitting a wall" and the increase in bit density stopping. It is about the interplay of technological and business factors slowing the rate of decrease in $/GB. For people who look only at the current cost of storage, this is irritating. For those of us who are concerned with the long-term cost of storage, it is a very big deal.
The following is a guest post by the entire cohort of the NDSR Boston class of 2014-15.
The first ever Boston cohort of the National Digital Stewardship Residency kicked off in September, and the five residents have been busy drinking from the digital preservation firehose at our respective institutions. You can look forward to individual blog posts from each resident as this 9-month residency goes on, but we decided to start with a group post to outline each of our projects as they’ve developed so far. (To keep up with us on a more regular basis, keep an eye on our digital preservation test kitchen blog.)
Sam DeWitt – Tufts University
I will be at Tufts’ Tisch Library during my residency, looking at ways that the university might better understand the research data it produces. The National Science Foundation has required data management plans from grant-seekers for several years now and some scholarly journals have followed suit by mandating that researchers submit their data sets along with accepted work. These dictates play a significant role in the widespread movement.
Data sharing, as a concept, is particularly trendy right now (try adding ‘big data’ to the term ‘data sharing’ in a Google search) but the the practice is open to debate. Its advantages and disadvantages are articulated quite nicely here. As someone who works in the realm of information science, I generally believe research is meant to be shared and that concerns can be mitigated by policy. But that is easier said than done, as Christine Borgman so succinctly argues in “The Conundrum of Sharing Research Data”: “The challenges are to understand which data might be shared with whom, under what conditions, why, and to what effects. Answers to these questions will inform data policy and practice.”
I hope that in these few months I can gain a broader understanding of the data Tufts produces while I continue to examine the policies, practices and procedures that aid in their curation and dissemination.
Rebecca Fraimow – WGBH
My project is designed a little differently from the ones that my NDSR peers are undertaking; instead of tackling a workflow from the top down, I’m starting with the individual building blocks and working up. Over the course of my residency, my job is to embed myself into the different aspects of daily operations within the WGBH Media, Library and Archives department. Everything that I find myself banging my head into as I go along, I document and make part of the process for redesigning the overall workflow.
Since WGBH MLA is currently in the process of shifting over to a Fedora-based Hydra repository — a major shift from the previous combination of Filemaker databases and proprietary Artesia DAM — it’s the perfect time for the archives to take a serious look at reworking some legacy practices, as well as designing new processes and procedures for securing the longevity of a growing ingest stream that is still shifting from primarily object-based to almost entirely file-based.
At the end of the residency, I’ll be creating a webinar in order to share some best practices (or, at least, working practices) with the rest of the public broadcasting world. Many broadcasting organizations are struggling through archival workflow problems without having the benefit of WGBH’s strong archiving department. It’s exciting to know that the work I’m doing is going to have a wider outward-facing impact — after all, sharing knowledge is kind of what public broadcasting is all about.
Joey Heinen – Harvard University
As has been famously outlined by the Library of Congress, digital formats are just as susceptible to obsolescence as analog formats due to any number of factors. At Harvard Library, my host for the NDSR, we are grappling with formats migration frameworks at a broad level though looking to implement a plan for three specific, now-obsolete formats — Kodak PhotoCD, RealAudio and SMIL Playlists. So far my work has involved an examination of the biggest challenges for each format.
For example, Kodak PhotoCD incorporates a form of chroma subsampling (Photo YCC) based off of the Rec. 709 standard for digital video rather than the various RGB or CIE profiles more typical for still images. Photo YCC captures color information that is beyond what is perceptible to the human eye and is well beyond the confines of color profiles such as RGB (an example of format attributes that drive the migration process so as not to lose fundamental content and information from the original).
Other challenges that impact a project such as this are managing the human components (stakeholder roles and arriving upon shared conclusions about the format’s most noteworthy characteristics) as well as ensuring that existing tools for converting, validating and characterizing are correctly managing and reporting on the format (I explored some of these issues here). A bibliography (PDF) that I compiled is guiding this process, the contents of which has allowed me to approach the systems at Harvard in order to find the right partners and technological avenues for developing a framework. Look for more updates on the NDSR-Boston website (as well as my more substantive project update on “The Signal” in April 2015).
Jen LaBarbera – Northeastern University
My residency is at Northeastern University’s Archives and Special Collections, though as with a lot of digital preservation projects and/or programs, my work spans a number of other departments — library technology services, IT, Digital Scholarship Group and metadata management.
My project at Northeastern relies heavily on the new iteration of Northeastern’s Fedora-based digital repository (DRS), which is currently in its soft-launch phase and is set to roll out in a more public way in early 2015. My projects at Northeastern are best summed up by the following three goals: 1) create a workflow for ingesting recently born-digital content to the new DRS, 2) create a workflow for ingesting legacy born-digital (obsolete format) content to the new DRS, and 3) help Northeastern Libraries develop a digital preservation plan.
I’m starting with the first goal, ingesting recently born-digital content. As a test case to help us create a more general workflow, we’re working on ingesting the content of the Our Marathon archive. Our Marathon is a digital archive created as a digital humanities project following the bombing at the 2013 Boston Marathon. The goal is to transfer all materials (in a wide variety of formats) from their current states/platforms (Omeka, external hard drives, Google Drive, local server) to the new DRS. I’ve spent the first part of this residency drinking in all the information I can about the DRS, digital humanities projects (in general and at Northeastern), and wrapping my brain around these projects; now, the real fun begins!
Tricia Patterson – MIT Libraries
My residency is within MIT’s Lewis Music Library, a subject-specific library at MIT that is much-loved by students, faculty, and alumni. They are currently looking at digitizing and facilitating access to some of their analog audio special collections of MIT music performances, which has also catalyzed a need to think about their digital preservation. The “Music at MIT” digital audio project was developed in order to inventory, digitize, preserve, and facilitate access to audio content in their collections. And since audio content is prevalent throughout MIT collections, the “Making Music Last” initiative was designed to extend the work of the “Music at MIT” digital audio project and develop an optimal, detailed digital preservation workflow – which is where I came in!
Through the completion of a gap analysis of the existing workflow, a broad review of other fields’ workflow methodologies, and collaborations with stakeholders across the board, our team is working on creating a high and low-level life cycle workflow, calling out a digital audio use case, and evaluating suitable options for an access platform. This comprehensive workflow will contribute to the overall institutional knowledge instead of limiting important information to one stakeholder and clarify roles between individuals throughout the process, improving engagement and communication. Finally, mapping out the work process enhances our understanding of requirements for tools – such as Archivematica or BitCurator – that should be adopted and incorporated with a high degree of confidence for success. As the process moves from design to implementation and testing, the detailed workflow also ensures reliability and repeatable quality in our processes. It’s been a highly collaborative and educational process so far – stay tuned for how it pans out!
Another year, another lineup of Islandora Camps to bring the community together. We have a great roster of camps for 2015, hopefully providing all of you out there with at least one that's close and convenient so you can partake in islandora's secret sauce.
Dates are not quite set yet for the latter events, but here's the general schedule so you can plan ahead:Islandora Camp BC - Vancouver, BC February 16 - 18 Islandora Camp EU2 - Madrid, Spain May 27 - 29 Islandora Conference (way more info on this in days to come) - August Islandora Camp CT - Hartford, CT Late October or Early November See you at Islandora Camp!
Open Knowledge Foundation: Global Witness and Open Knowledge – Working together to investigate and campaign against corruption related to the extractives industries
Sam Leon, one of Open Knowledge’s data experts, talks about his experiences working as an School of Data Embedded Fellow at Global Witness.
Global Witness are a Nobel Peace Prize nominated not-for-profit organisation devoted to investigating and campaigning against corruption related to the extractives industries. Earlier this year they received the TED Prize and were awarded $1 million to help fight corporate secrecy and on the back of which they launched their End Anonymous Companies campaign.
In February 2014 I began a six month ‘Embedded Fellowship’ at Global Witness, one of the world’s leading anti-corruption NGOs. Global Witness are no strangers to data. They’re been publishing pioneering investigative research for over two decades now, piecing together the complex webs of financial transactions, shell companies and middlemen that so often lie at the heart of corruption in the extractives industries.
Like many campaigning organisations, Global Witness are seeking new and compelling ways to visualise their research, as well as use more effectively the large amounts of public data that have become available in the last few years.“Sam Leon has unleashed a wave of innovation at Global Witness”
-Gavin Hayman, Executive Director of Global Witness
As part of my work, I’ve delivered data trainings at all levels of the organisation – from senior management to the front line staff. I’ve also been working with a variety of staff to use data collected by Global Witness to create compelling infographics. It’s amazing how powerful these can be to draw attention to stories and thus support Global Witness’s advocacy work.
The first interactive we published on the sharp rise of deaths of environmental defenders demonstrated this. The way we were able to pack some of the core insights of a much more detailed report into a series of images that people could dig into proved a hit on social media and let the story travel further.
See here for the full infographic on Global Witness’s website.
But powerful visualisation isn’t just about shareability. It’s also about making a point that would otherwise be hard to grasp without visual aids. Global Witness regularly publish mind-boggling statistics on the scale of corruption in the oil and gas sector.“The interactive infographics we worked on with Open Knowledge made a big difference to the report’s online impact. The product allowed us to bring out the key themes of the report in a simple, compelling way. This allowed more people to absorb and share the key messages without having to read the full report, but also drew more people into reading it.”
-Oliver Courtney, Senior Campaigner at Global Witness
Take for instance, the $1.1 billion that the Nigerian people were deprived of due to the corruption around the sale of Africa’s largest oil block, OPL 245.
$1.1 billion doesn’t mean much to me, it’s too big of a number. What we sought to do visually was represent the loss to Nigerian citizens in terms of things we could understand like basic health care provision and education.
See here for the full infographic on Shell, ENI and Nigeria’s Missing Millions.
The aim was to bring together and visualise the vast number of corruption case studies involving shell companies that Global Witness and its partners have unearthed in recent years.
It was a challenging project that required input from designers, campaigners, developers, journalists and researchers, but we’re proud of what we produced.
Open data principles were followed throughout as Global Witness were committed to creating a resource that its partners could draw on in their advocacy efforts. The underlying data was made available in bulk under a Creative Commons Attribution Sharealike license and open source libraries like Leaflet.js were used. There was also an invite for other parties to submit case studies into the database.“It’s transformed the way we work, it’s made us think differently how we communicate information: how we make it more accessible, visual and exciting. It’s really changed the way we do things.”
-Brendan O’Donnell, Campaign Leader at Global Witness
For more information on the School of Data Embedded Fellowship Scheme, and to see further details on the work we produced with Global Witness, including interactive infographics, please see the full report here.
SPARQL queries are a great way to explore Linked Data sets - be it our STW with it's links to other vocabularies, the papers of our repository EconStor, or persons or institutions in economics as authority data. ZBW therefore offers since a long time public endpoints. Yet, it is often not so easy to figure out the right queries. The classes and properties used in the data sets are unknown, and the overall structure requires some exploration. Therefore, we have started collecting queries in our new SPARQL Lab, which are in use at ZBW, and which could serve as examples to deal with our datasets for others.
A major challenge was to publish queries in a way that allows not only their execution, but also their modification by users. The first approach to this was pre-filled HTML forms (e.g. http://zbw.eu/beta/sparql/stw.html). Yet that couples the query code with that of the HTML page, and with a hard-coded endpoint address. It does not scale to multiple queries on a diversity of endpoints, and it is difficult to test and to keep in sync with changes in the data sets. Besides, offering a simple text area without any editing support makes it quite hard for users to adapt a query to their needs.
- SPARQL syntax highlighting and error checking
- Extremely customizable: All functions and handlers from the CodeMirror library are accessible
- Persistent values (optional): your query is stored for easier reuse between browser sessions
- Prefix autocompletion (using prefix.cc)
- Property and class autocompletion (using the Linked Open Vocabularies API)
- Can handle any valid SPARQL resultset format
- Integration of preflabel.org for fetching URI labels
With a few lines of custom clue code, and with the friendly support of Laurens Rietveld, author of the YASGUI suite, it is now possible to load any query stored on GitHub into an instance on our beta site and execute it. Check it out - the URI
loads, views and executes the query stored at https://github.com/jneubert/sparql-queries/blob/master/class_overview.rq on the endpoint http://data.nobelprize.org/sparql (which is CORS enabled - a requirement for queryRef to work).
Links like this, with descriptions of query's purpose, grouped according to tasks and datasets, and ordered in a sensible way, may provide a much more accessible repository and starting point for explorations than just a directory listing of query files. For ongoing or finished research projects, such a repository - together with versioned data sets deployed on SPARQL endpoints - may offer a easy-to-follow and traceable way to verify presented results. GitHub provides an infrastructure for publicly sharing the version history, and makes contributions easy: Changes and improvements to the published queries can be proposed and integrated via pull requests, an issue queue can handle bugs and suggestions. Links to queries authored by contributors, which may be saved in different repositories and project contexts, can be added straightaway. We would be very happy to include such contributions - please let us know.SPARQL Lab Linked data
In early 2012, I started on the report that became Reordering Ranganathan: Shifting User Behaviors, Shifting Priorities with Lynn Silipigni Connaway. Back then we called it the User Behavior Report. Not a catchy title, but it broadly reflected what we both studied. Our intention was to learn about each others’ research and bring our experiences, perspectives, and research together under one umbrella.
You may be wondering why we had to learn about each others’ research given we worked for the same organization. I actually started at OCLC Research just 6 months prior in 2011. Lynn and I had very disparate experiences, perspectives, and paths to OCLC. I earned a Ph.D. in Business Administration – Information Systems; Lynn earned her Ph.D. in Library and Information Science with a minor in Public Policy. Before beginning a research career, I worked in tech companies and Lynn worked in school, academic, and public libraries.
As colleagues, we wanted to explore how our research interests overlapped and begin to think about collaborative user behavior projects. We wanted to develop a common set of ideas we could collectively contribute to through our research. We also wanted to describe the ideas in ways that would be relevant to our intended audiences– librarians, library researchers, information scientists.
In studying user behavior, we both are interested in how people discover, access, and use/reuse content. In an early outline for our report we wrote “We want to know how people are getting their information, why they are making these choices, and what information or sources are meeting their needs.”
At one of our meetings, Lynn suggested using Ranganathan’s five laws as a framework for our report. I was intrigued. Given my background, I never had heard of them. But as we began reviewing the laws and literature about them, it was interesting for me to think about them in the context of my research interests.
Over the course of several meetings we discussed our understanding of each law and thought about how our research areas applied. In doing so we began to stretch, adapt, and change each law’s wording to help us more clearly articulate to each other why we thought our research fit.
Take the first law, “books are for use.” Like many researchers, our interests extend beyond books to other physical and digital materials in the library and more generally on the Web. Moreover, we are interested in “how people are getting their information.” Our interpretation of the law reflects these overlapping interests – develop the physical and technical infrastructure needed to deliver physical and digital materials. Our interpretations of the other laws developed in similar ways.
Discussions with a colleague, Andy Havens, prompted us to reorder the laws as well. When we thought about it, we agreed that scarcity of time not content is the challenge for people these days. Inundated with information, we want not only quick, but also convenient ways to find, get, and use what we need. And with that the reordering began.
We organized the report so that each chapter could stand on its own. In each chapter, we examine the law in today’s environment given scholars’ interpretations and research in our areas of interest. We also discuss some ideas about how to apply our interpretations of the law given findings from the research.
Although the project began as a means to help us think about the purpose and scope of our research and how our interests overlapped, we also were interested to see what libraries were doing in practice when it came to our interpretation of Ranganathan’s five laws. Could we find examples of what we described?
We found a number of exciting, interesting ways the laws are currently unfolding in practice. We only could include a small fraction, but our hope is that reading the report or listening to the webinar will not only spark new initiatives, but also encourage you to share your current ones.About Ixchel Faniel
Ixchel Faniel is Associate Research Scientist for OCLC Research. She is currently working on projects examining data reuse within academic communities to identify how contextual information about the data that supports reuse can best be created and preserved. She also examines librarians' early experiences designing and delivering research data services with the objective of informing practical, effective approaches for the larger academic community.Mail | Web | More Posts (1)