Earlier this April I traveled to the Mile-High City for the Public Library Association’s biennial conference in search of all things e-content. Held at the Colorado Convention Center, PLA 2016 empowered attendees with the tools to return to their library and “Make It Extraordinary.”
As the Digital Public Library of America, DPLA is utilizing its national network of libraries and cultural heritage institutions to explore how it can help improve the state of library ebooks. We are convening community conversations with stakeholders around ebooks to move towards a national digital strategy. In Denver I was part of conversations with these library e-content leaders as they formulated a vision for an umbrella group that would organize communications, streamline efforts, and advocate for our work amongst each other and to the larger ecosystem. DPLA is proud to have a leading role in coordinating these conversations. Click here for more on getting involved.
Three of these e-content stakeholders provided an update on their work at ‘Making Progress in Digital Content’ Friday morning to a packed audience. Carolyn Anthony, Director of the Skokie Public Library (IL) and Co-Chair of ALA’s Digital Content Working Group updated on progress in pushing publishers for better licensing models (slowly improving) and overall trends (ebook market down; self-publishing up). Veronda Pitchford of the Reaching Across Illinois Library System (RAILS) laid out the problems (budgets, platform fatigue) and called for librarians to unite in telling publishers and vendors what they want. Micah May of The New York Public Library described the IMLS-funded Library E-content Access Project (LEAP), which will create a library-owned marketplace, and demoed Open eBooks, NYPL’s first iteration of their SimplyE platform. DPLA is partner on LEAP and Open eBooks and is working with the community to address the opportunities and challenges Carolyn and Veronda identified.
Through Open eBooks DPLA hopes to further highlight the issues with diversity in children’s books and use the initiative as an opportunity to bring diverse authors and content to kids. A Book Buzz session featuring the youth divisions of Little, Brown, Macmillan, Random House and Disney previewed upcoming children’s books which strongly featured diversity in characters, authors, and genres. A rousing discussion of diversity in children’s lit ensued, with publishers crediting librarians for helping to raise awareness of the issue. Issues discussed included lobbying the Book Industry Study Group to improve BISAC subject headings to better reflect diversity in metadata, and a need to push for characters that are diverse but also don’t play into stereotypes.
Stay tuned for more updates on DPLA + Ebooks!
IAL Grant Applications due by May 9
The Department of Education today issued a notice in the Federal Register clarifying that 50 percent of all Innovative Approaches to Literacy grant funds, more than $13 million, are reserved for use by school libraries. In it, the Department stated categorically that last year’s Consolidated Appropriations Act committee report directed DOE to “ensure that no less than 50 percent of IAL funds go to applications from LEAs (on behalf of school libraries)…”
Today’s notice follows up its earlier release of April 7. As noted previously in District Dispatch, all eligible applicants seeking a grant have until May 9, 2016 to submit their proposal. DOE is expected to announce its grant awards in July.
To be eligible, a school library must be considered a “high-need” Local Education Agency (LEA), meaning that at least 25 percent of its students aged 5 – 17 are from families with incomes below the poverty line (or are similarly defined by a State educational agency). A grant application must include: a program description of proposed literacy and book distribution activities; grade levels to be served or the ages of the target audience; and a description of how the program is supported by strong theory. Additional information, like timelines and results measurement methods, is also required. DOE also will consider programs that seek to integrate the use of technology tools, such as e-readers, into addressing literacy needs.
According to DOE, priority consideration for IAL funding is given to programs that include book distribution and childhood literacy development activities, and whose success can be demonstrated. Additional “points” in assessing competing grant proposals may be awarded to an application that meets additional program objectives. As detailed in the DOE’s Notice, there are many such additional goals, including distributing books to children who may lack age-appropriate books at home to read with their families.
The post DOE confirms half of IAL funds reserved for school libraries appeared first on District Dispatch.
DuraSpace News: AVAILABLE: Recording and Slides from April 21 LYRASIS and DuraSpace CEO Town Hall Meeting
Austin, TX On April 21, 2016, Robert Miller, CEO of LYRASIS and Debra Hanken Kurtz, CEO of DuraSpace presented the second in a series of online Town Hall Meetings. They reviewed how their organizations came together to investigate a merger in order to build a more robust, inclusive, and truly global community with multiple benefits for members and users. They also unveiled a draft mission statement for the merged organization.
Austin, TX The Fedora community is currently in the initial phases of drafting a standards-based application programming interface (API) specification that will result in a stable, independently-versioned Fedora RESTful API. A Fedora API specification will be a significant milestone for the project and the community enabling a concrete and common understanding of Fedora's role in an institution's infrastructure ecosystem.
It was a standing room only crowd at today’s confirmation hearing of Dr. Carla Hayden, President Obama’s nominee to serve as Librarian of Congress, with the Senate Committee on Rules and Administration. The hearing marks the first step in the Senate review process.
Three Maryland senators (one former) presented Dr. Carla Hayden to the committee. Former Senator Paul Sarbanes (and current Enoch Pratt Free Library board member) joined Dr. Hayden and Senators Barbara Mikulski and Ben Cardin at the microphone to open the hearing.
“It would be a great, great day for the nation, but a loss for Baltimore,” if Dr. Hayden were confirmed, said Senator Mikulski in her introduction to Senate colleagues. She highlighted Dr. Hayden’s ability to work with everyone from “electeds” to people in both wealthy and “hard scrabble” neighborhoods, and referenced the fact that the library stayed open in the wake of massive protests that followed Freddie Gray’s death in police custody. Senator Cardin recognized her leadership not only of the Enoch Pratt Free Library but also the Maryland State Library Resource Center, including managing technology transitions and capital improvements. “Dr. Hayden is the best qualified and will bring the respect that is needed,” Senator Cardin said.
“(Dr. Hayden) is an extraordinarily able, committed person,” said Senator Sarbanes. “The nation will be extremely well-served (by her) and I strongly urge her confirmation.”
Dr. Hayden’s testimony shared the evolution of her career, as well as changes across the profession and the Library of Congress. “As I envision the future of this venerable institution, I see it growing its stature as a leader not only in librarianship but in how people view libraries in general,” she said. “As more of its resources are readily available for everyone to view online, users will not need to be in Washington, D.C.; everyone can have a sense of ownership and pride in this national treasure.”
Committee Chair Senator Roy Blunt (R-MO) followed the Maryland senators with a recollection of his own visit to the Ferguson (MO) Public Library and an acknowledgement of the “big job” ahead for the future Librarian of Congress. The big job was made plain in questions from committee members that largely focused on the modernization of the Library’s technology infrastructure, the future of the Copyright Office, and public access to reports from the Congressional Research Service.
Throughout, though, the questions were open and respectful—even warm and encouraging—on both sides of the political aisle. During a hotly contested presidential election year, it was a welcome respite and encouraging sign for Dr. Hayden’s ultimate confirmation. You can watch a webcast of the hearing here.
The ALA also submitted to committee members yesterday a letter of support for Dr. Hayden’s nomination signed by more than 20 leading national nonprofit organizations, two dozen educational institutions (ranging from community colleges to the Big Ten and Ivy League); two dozen academic libraries from every corner of the country; more than a score of national library groups; and virtually all of the nation’s state library associations.
Stay tuned to the District Dispatch for the most current news related to the confirmation process.
The post ALA Past President Carla Hayden receives warm Senate welcome appeared first on District Dispatch.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
One of the minor annoyances about using Emacs on Mac OS is that the PATH environment variable isn't set properly when you launch Emacs from the GUI (that is, the way we always do it). This is because the Mac OS GUI doesn't really care about the shell as a way to launch things, but if you are using brew, or other packages that install command line tools, you do.
Apple has changed the way that the PATH is set over the years, and the old environment.plist method doesn't actually work anymore, for security reasons. For the past few releases, the official way to properly set up the PATH is to use the path_helper utility program. But again, that only really works if your shell profile or rc file is run before you launch Emacs.
So, we need to put a bit of code into Emacs' site_start.el file to get things set up for us:
(when (file-executable-p "/usr/libexec/path_helper")
(let ((path (shell-command-to-string
"eval `/usr/libexec/path_helper -s`;
echo -n \"$PATH\"")))
(setenv "PATH" path)
(setq exec-path (append (parse-colon-path path)
This code runs the path_helper utility, saves the output into a string, and then uses the string to set both the PATH environment variable and the Emacs exec-path lisp variable, which Emacs uses to run subprocesses when it doesn't need to launch a shell.
If you are using the brew version of Emacs, put this code in /usr/local/share/emacs/site-lisp/site-start.el and restart Emacs.
This blog post was written by Riyadh Al Balushi from the Sultanate of Oman.
I recently co-authored with Sadeek Hasna a report that looks at the status of open data in the Arab World and the extent to which governments succeed or fail in making their data available to the public in a useful manner. We decided to use the results of the Global Open Data Index as the starting point of our research because the Index covered all the datasets that we chose to examine for almost all Arab countries. Choosing to use the Global Open Data Index as a basis for our paper saved us time and provided us with a systematic framework for evaluating how Arab countries are doing in the field of open data.
We chose to examine only four datasets, namely: the annual budget, legislation, election results, and company registration data. Our selection was driven by the fact that most Arab countries already have published data in this area and therefore there is content to look at and evaluate. Furthermore, most of the laws of the countries we examined make it a legal obligation on the government to release these datasets and therefore it was more likely for the government to make an effort to make this data public.
Our analysis uncovered that there are many good examples of government attempts at releasing data in an open manner in the Arab World. Examples include the website of Ministry of Finance of the UAE which releases the annual budget in Excel format, the legislation website of Qatar which publishes the laws in text format and explicitly adopts a Creative Commons license to the website, the Elections Committee website of Egypt, which releases the elections data in Excel format, and the website of the Company Register of Bahrain, which does not make the data directly available for download, but provides a very useful search engine to find all sorts of information about companies in Bahrain. We also found several civil society projects and business initiatives that take advantage of government data such as Mwazna – a civil society project that uses the data of the annual budget in Egypt to communicate to the public the financial standing of the government in a visual way, and Al Mohammed Network – a business based on the legislation data in the Arab World.
What was interesting is that even though many Arab countries now have national open data initiatives and dedicated open data portals, all the successful open data examples in the Arab World are not part of the national data portals and are operated independently by the departments responsible for creating the data in question. While the establishment of these open data portals is a great sign of the growing interest in open data by Arab governments, in many circumstances these portals appear to be of a very limited benefit, primarily because the data is usually out of date and incomplete. For example, the Omani open data portal provides population data up to the year 2007, while Saudi’s open data portal provides demographic data up to the year 2012. In some cases, the data is not properly labeled, and it is impossible for the user to figure out when the data was collected or published. An example of this would be the dataset for statistics of disabilities in the population on the Egyptian government open data page. The majority of the websites seem to be created through a one-off initiative that was never later updated, probably in response to the global trend of improving e-government services. The websites are also very hard to navigate and are not user-friendly.
Another problem we noticed, which applies to the majority of government websites in the Arab World, is that very few of these websites license their data using an open license and instead they almost always explicitly declare that they retain the copyright over their data. In many circumstances, this might not be in line with the position of domestic copyright laws that exempt official documents, such as the annual budget and legislation, from copyright protection. Such practices confuse members of the public and give the impression to many that they are not allowed to copy the data or use it without the permission of the government, even when that is not true. Another big challenge for utilising government data is that many Arab government websites upload their documents as scanned PDF files that cannot be read or processed by computer software. For example, it is very common for the annual budget to be uploaded as a scanned PDF file when instead it would be more useful to the end user if it was uploaded in a machine-readable format such as Excel or CSV. Such formats can easily be used by journalists and researchers to analyse the data in more sophisticated ways and enables them to create charts that help present the data in a more meaningful manner. Finally, none of the datasets examined above were available for download in bulk, and each document had to be downloaded individually. While this may be acceptable for typical users, those who need to do a comprehensive analysis of the data over an extensive period of time will not be able to do efficiently so. For example, if a user wants to analyse the change in the annual budget over a period of 20 years, he or she would have to download 20 individual files. A real open data portal should enable the user to download the whole data in bulk. In conclusion, even though many governments in the Arab World have made initiatives to release and open their data to the public, for these initiatives to have a meaningful impact on government efficiency, business opportunities, and civil society participation, the core principles of open data must be followed. There is an improvement in the amount of data that governments in the Arab World release to the public, but more work needs to be done. For a detailed overview of the status of open data in the Arab World, you can read our report in full here.
Austin, TX David Wilcox, Fedora product manager and Andrew Woods, Fedora tech lead, will offer a workshop entitled, "Publishing Assets as Linked Data with Fedora 4" at the Library Publishing Forum (LPForum 2016) to be held at the University of North Texas Libraries, Denton, Texas on May 18 from 1:00 PM-3:30 PM. All LPForum 2016 attendees are welcome—there is no need to pre-register for this introductory-level workshop.
The MPG/SFX server updates to a new database (MariaDB) on Wednesday morning. The downtime will begin at 8 am and is scheduled to last until 9 am.
We apologize for any inconvenience.
The Confirmation Hearing for Librarian of Congress Nominee, Carla Hayden, by the U.S. Senate Committee on Rules and Administration, will air LIVE on C-SPAN3, C-SPAN Radio and C-SPAN.org on Wednesday, April 20, 2016 at 2:15pm ET.
The hearing will also be webcast from the Senate Committee on Rules and Administration hearing page. The webcast will be available approximately 15 minutes prior to the start of the hearing, and the archive will be available approximately 1 hour after the completion of the hearing.
Previous: Preparing for a librarian…Librarian (March 4, 2016)
The post LIVE: Watch the confirmation hearing for Dr. Carla Hayden appeared first on District Dispatch.
On April 19, 2016, the Institute of Museum and Library Services (IMLS) announced the 10 recipients of the 2016 National Medal for Museum and Library Service, the nation’s highest honor given to museums and libraries for service to the community. Now in its 22nd year, the National Medal celebrates libraries and museums that “respond to societal needs in innovative ways, making a difference for individuals, families, and their communities.”
The award will be presented in Washington, D.C. on June 1st. To learn more about the 2016 National Medal winners and 30 finalists, click here.
The 2016 National Medal recipients are:
- Brooklyn Public Library (Brooklyn, N.Y.)
- The Chicago History Museum (Chicago, Ill.)
- Columbia Museum of Art (Columbia, S.C.)
- Lynn Meadows Discovery Center for Children (Gulfport, Miss.)
- Madison Public Library (Madison, Wis.)
- Mid-America Science Museum (Hot Springs, Ark.)
- North Carolina State University Libraries (Raleigh, N.C.)
- Otis Library (Norwich, Conn.)
- Santa Ana Public Library (Santa Ana, Calif.)
- Tomaquag Museum (Exeter, R.I.)
This year’s National Medal recipients show the transforming role of museums and libraries from educational destinations to full-fledged community partners and anchors,” said Dr. Kathryn K. Matthew, director of the Institute of Museum and Library Services. “We are proud to recognize the extraordinary institutions that play an essential role in reaching underserved populations and catalyzing new opportunities for active local involvement.”
The Institute of Museum and Library Services (IMLS) is the primary source of federal support for the nation’s 123,000 libraries and 35,000 museums.
Previous: Libraries: Apply now for 2016 IMLS National Medals (July 23, 2015)
The post 2016 winners of the National Medal for Museum and Library Service announced appeared first on District Dispatch.
Last week, I participated in 3D/DC, an annual Capitol Hill event exploring 3D printing and public policy. Programming focused on 3D printing’s implications for education, the arts, the environment, the workforce and the public good. In my reflections on last year’s 3D/DC, I averred that the event was “a good day for libraries.” This year, “good” graduated to “great.” Libraries were mentioned as democratizers of technology too many times to count over the course of the day, and the library community had not one, but two representatives on the speaker slate.
It was my privilege to be a panelist for a program exploring the role of 3D printing in closing the workforce skills gap. Thankfully, my national-level outlook on how libraries harness 3D printing to build critical workforce skills was buttressed by the on-the-ground perspective of Library Associate and Maker Extraordinaire Adam Schaeffer of the Washington, D.C. Public Library (DCPL). The other participants on the panel were Robin Juliano of the White House National Economic Council, Gad Merrill of TechShop and Diego Tamburini of Autodesk.
I described libraries as informal learning labs; places where people are free to use digital technologies like 3D printers, laser cutters and computer numerical control (CNC) routers to build advanced engineering skills through the pursuit of their personal creative interests. I argued that in combination with the suite of other job search and skill-building services libraries provide, library 3D printers are powerful tools for fostering workforce preparedness. Adam Schaeffer offered powerful anecdotes to illustrate this point. His kinetic overview of the wide array of products he’d helped patrons launch with a 3D printer was a tour de force of the 21st century library’s power as an innovative space.
It was a pretty light lift to convince those in attendance of the value of library 3D printing services to the task of workforce development. Nearly every word I, and my Library Land compatriot, Adam, uttered in furtherance of this effort was met with a sturdy nod or a knowing grin. I found this surprising at first – but after a minute, I realized it was in keeping with an ongoing trend. In my just-over two years at ALA, I’ve seen a steady proliferation of stories in popular news and blog outlets about 3D printers being used in libraries to build prototypes of new products and foster engineering and design skills. As a result, library “making” has reached an inflection point. It’s no longer seen as quaint, cute or trivial; it’s acknowledged as a means of advancing personal and societal goals.
That this is the case is a testament to the ingenuity of library professionals. From New York to California and everywhere in between, the men and women of the library community have built communities around their 3D printers; library makerspaces have become cathedrals of creativity – and their congregations are growing by the day. I know…Because last week, I was preaching to the converted. To all the library makers out there: keep up the good work.
ALA would like to thank Public Knowledge for including libraries in 3D/DC this year. I’d personally like to thank Public Knowledge for the opportunity to speak during the event.
User rblandau at MIT-Informatics has a high-level simulation of distributed preservation that looks like an interesting way of exploring these questions. Below the fold, my commentary.
rblandua's conclusions from the first study using the simulation are:
- Across a wide range of error rates, maintaining multiple copies of documents improves the survival rate of documents, much as expected.
- For moderate storage error rates, in the range that one would expect from commercial products, small numbers of copies suffice to minimize or eliminate document losses.
- Auditing document collections dramatically improves the survival rate of documents using substantially fewer copies (than required without auditing).
- Auditing is expensive in bandwidth. We should work on (cryptographic) methods of auditing that do not require retrieving the entire document.
- Auditing does not need to be performed very frequently.
- Glitches increase document loss more or less in proportion to their frequency and impact. They cannot be distinguished from overall increases in error rate.
- Institutional failures are dangerous in that they remove entire collections and expose client collections to higher risks of permanent document loss.
- Correlated failures of institutions could be particularly dangerous in this regard by removing more than one copy from the set of copies for long periods.
- We need more information on plausible ranges of document error rates and on institutional failure rates.
- Auditing document collections dramatically improves the survival rate - no kidding! If you never find out that something has gone wrong you will never fix it, so you will need a lot more copies.
- Auditing is expensive in bandwidth - not if you do it right. There are several auditing systems that do not require retrieving the entire document, including LOCKSS, ACE and a system from Mehul Shah et al at HP Labs. None of these systems is ideal in all possible cases, but their bandwidth use isn't significant in their appropriate cases. And note the beneficial effects of combining local and networked detection of damage.
- Auditing does not need to be performed very frequently - it depends. Oversimplifying, the critical parameters are MeanTimeToFailure (MTTF), MeanTimeToDetection (MTTD) and MeanTimeToRepair (MTTR), and the probability that the system is in a state with an un-repaired failure is (MTTD+MTTR)/MTTF. MTTD is the inverse of the rate at which auditing occurs. A system with an un-repaired failure is at higher risk because its replication level is reduced by one.
- Institutional failures are dangerous - yes, because repairs are not instantaneous. At scale, MTTR is proportional to the amount of damage that needs to be repaired. The more data a replica loses, the longer it will take to repair, and thus the longer the system will be at increased risk. And the bandwidth that it uses will compete with whatever bandwidth the audit process uses.
- Correlated failures of institutions could be particularly dangerous - yes! Correlated failures are the elephant in the room when it comes to simulations of systems reliability, because instead of decrementing the replication factor of the entire collection by one, they can reduce it by an arbitrary number, perhaps even to zero. If it gets to zero, its game over.
- We need more information - yes, but we probably won't get much. There are three kinds of information that would improve our ability to simulate the reliability of digital preservation:
- Failure rates of storage media. The problem here is that storage media are (a) very reliable, but (b) less reliable in the field than their specification. So we need experiments, but to gather meaningful data they need to be at an enormous scale. Google, NetApp and even Backblaze can do these experiments, preservation systems can't, simply because they aren't big enough. It isn't clear how representative of preservation systems these experiments are, and in any case it is known that media cause only about half the failures in the field.
- Failure rates of storage systems from all causes including operator error and organizational failure. Research shows that the root cause for only about half of storage system failures is media failure. But this means that storage systems are also so reliable that collecting failure data requires operating at large scale.
- Correlation probabilities between these failures. Getting meaningful data on the full range of possible correlations requires collecting vastly more data than for individual media reliability.
Austin, TX The LYRASIS and DuraSpace Boards announced an "Intent to Merge" the two organizations in January. As part of ongoing merger investigations LYRASIS CEO Robert Miller and DuraSpace CEO Debra Hanken Kurtz have been working with our communities to share information widely about the proposed merger and to gather input.
IAL Grant Applications Due by May 9
The American Library Association filed comments last week with House and Senate Appropriations Committees in support of funding for the Library Services and Technology Act (LSTA) and Innovative Approaches to Literacy (IAL).
As the Appropriations Committees begin their consideration of 12 appropriations bills, ALA is urging the Committees to fund LSTA at $186.6 million and IAL at $27 million for FY 2017. Both programs received increases in last year’s FY 2016 funding bills and were included in the President’s February budget request to Congress.
“Without LSTA funding, these and many other specialized programs targeted to the needs of their communities across the country likely will be entirely eliminated, not merely scaled back. In most instances, LSTA funding (and its required but smaller state match) allows libraries to create new programs for their patrons,” noted Emily Sheketoff in comments to both Committees.
The $186.6 million funding level for LSTA mirrors last year’s request to Congress from the President and is also supported by “Dear Appropriator” letters recently circulated in the Senate and House for Members’ signatures. LSTA was funded at $155.8 million for FY 2016 and ALA expressed concern that the President is requesting only $154.8 million for FY 2017.
In supporting $27 million in IAL funding for school libraries, ALA commented that “studies show that strong literacy skills and year-round access to books is a critical first-step towards literacy and life-long learning. For American families living in poverty, access to reading materials is severely limited. These children have fewer books in their homes than their peers, which hinders their ability to prepare for school and to stay on track.”
Congress provided $27 million in FY 2016 IAL funding and the President requested the same level for FY 2017. IAL, which dedicates half of its resources for school libraries, was authorized in last year’s Every Student Succeeds Act. “Dear Appropriator” letters circulated in the Senate and House on its behalf called for $27 million in FY 2017 funding.
ALA reminds its members that the Department of Education recently announced that it has opened its FY 2016 window for new IAL grant applications. The DOE’s announcement with full application filing details is available online. Grant applications must be submitted by May 9, 2016.
In additional support for library funding, LSTA and IAL were highlighted in the annual Committee for Education Funding (CEF) Budget Response to Congress: Education Matters: Investing in America’s Future. The CEF budget response, which reserves two chapters for the LSTA and IAL programs, provides an explanation of the programs, examples of how funds have been used, and a justification for the funding levels sought.
The post ALA urges House and Senate approps subcommittees to support LSTA, IAL appeared first on District Dispatch.
That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by John Riemer of UCLA. With increasing expectations that research data creation made possible through grant funding will be archived and made available to others, many institutions are becoming aware of the need to collect and curate this new scholarly resource. To maximize the chances that metadata for research data are shareable (that is, sufficiently comparable) and helpful to those considering re-using the data, our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs. OCLC Research Scientist Ixchel Faniel’s two-part blog entry “Data Management and Curation in 21st Century Archives” (Sept 2015) provided useful background to this discussion.
The discussions revealed a wide range of experiences, from those just encountering researchers who come to them with requests to archive and preserve their research data to those who have been handling research data for some years. National contexts differ. For example, our Australian colleagues can take advantage of Australia’s National Computational Infrastructure for big data and the Australian Data Archive for the social sciences. Canada is developing a national network called Portage for the “shared stewardship of research data”.
The US-based metadata managers were split about whether to have a single repository for all data or a separate repository for research data, although there seems to be a movement to separate data that is to be re-used (providing some capacity for computing on it) from data that is only to be stored. A number of fields have a discipline-based repository, or researchers take advantage of a third-party service such as DataCite, also used for discovery. The library can fill the gap for research data without a better home.
Recently-published Building Blocks: Laying the Foundation for a Research Data Management Program includes a section on metadata:
Datasets are useful only when they can be understood. Encourage researchers to provide structured information about their data, providing context and meaning and allowing others to find, use and properly cite the data. At minimum, advise researchers to clearly tell the story of how they gathered and used the data and for what purpose. This information is best placed in a readme.txt file that includes project information and project-level metadata, as well as metadata about the data itself (e.g., file names, file formats and software used, title, author, date, funder, copyright holder, description, keywords, observation unit, kind of data, type of data and language).
A number of institutions have developed templates to capture metadata in a structured form. Some metadata managers noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in. All agreed data creators needed to be the main source of metadata. But how to inspire data creators to produce quality metadata? New ways of training and outreach are needed.
We also had general agreement on the data elements required to support re-use by others: licenses, processing steps, tools, data documentation, data definitions, data steward, grant numbers and geospatial and temporal data (where relevant). Metadata schema used include Dublin Core, MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiative’s metadata standard). The Digital Curation Centre in the UK provides a linked catalog of metadata standards. The Research Data Alliance’s Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines.
The importance of identifiers for both the research data and the creator has become more widely acknowledged. DOIs, Handles and ARKs (Archival Resource Key) have been used to provide persistent access. Identifiers are available at the full data set level and for component parts, and they can be used to track downloads and potentially help measure impact. Both ORCID (Open Researcher and Contributor ID) and ISNI (International Standard Name Identifier) are in use to identify data creators uniquely.
Some have started to analyze the metadata requirements for the research data life cycle, not just the final product. Who are the collaborators? How do various projects use different data files? What kind of analysis tools do they use? What are the relationships of data files across a project, between related projects, and to other scholarly output such as related journal articles? The University of Michigan’s Research Data Services is designed to assist researchers during all phases of the research data life cycle.
Curation of research data as part of the evolving scholarly record requires new skill sets, including deeper domain knowledge, data modeling, and ontology development. Libraries are investing more effort in becoming part of their faculty’s research process and offering services that help ensure that their research data will be accessible if not also preserved. Good metadata will help guide other researchers to the research data they need for their own projects—and the data creators will have the satisfaction of knowing that their data has benefitted others.
About Karen Smith-Yoshimura
Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.Mail | Web | Twitter | More Posts (66)
This is a guest post by Carmel Curtis.
Over the past eight months I have been working as the National Digital Stewardship Resident at the Brooklyn Academy of Music. BAM is the oldest continually running performing arts center in the country and is home to a range of artistic expressions in dance, theater, film, and beyond. Over 150 years old, BAM has a rich history.
I have been working on a records management project at BAM. My mentor, processing archivist Evelyn Shunaman, and I have conducted 41 hour-long interviews with all divisions, departments and sub-departments to get a sense of what and how many electronic records are being created, saved and accessed. Then we created or revised departmental Record Retention Schedules to ensure they reflect BAM’s current workflows and practices.
Here are some of basics of records retention and tips on creating a Records Retention Schedule.
A Records Retention Schedule is an institutional or organizational policy that defines the information that is being created and identifies retention requirements based on legal, business and preservation requirements. An RRS can take many forms. Example 1 shows our RRS spreadsheet.
Record Series Title
Transfer to Archives
Category of Record
Explanation of record category
Time period records are retained
Whether or not records are sent to the Archives
and results from survey conducted every 3 years on BAM audience demographic.
yesExample 1. BAM RRS spreadsheet.
An RRS is a way for an institution to:
- Be accountable to any legal requirements – An RRS is a policy that ensures records are retained in accordance with state or federal legal requirements. It provides an outline for the minimum legal requirements related to the retention and destruction of records.
- Identify archivally significant materials – Appraisal and selection are not dead. While storage may be increasing in capacity and decreasing in cost, there is still considerable need for decisions to be made around what comes into the Archive and what does not. An RRS can help provide a framework for this decision making process.
- Identify when things can be deleted – People want permission to be able to delete their digital content. Similar to paper and other physical based records, there is little incentive to get rid of things until one runs out of space. With electronic records, it is not uncommon to purchase more storage instead of deleting unnecessary files. However, digital clutter is a real thing that can induce stress and anxiety as well as make retrievability challenging. Having an RRS can help reduce digital clutter by identifying what records can be deleted and when.
- Assist archive in preservation planning – Once an RRS has been created, it can be a helpful tool in planning for the specific preservation needs of the categories of records coming into the Archive. With the assistance of an RSS, you can think through file-format identification and decisions around normalization, requirements around minimum associated metadata, and estimations of how much information will be needed to be transferred into the Archive and thus how much space will be required.
Records management may be different than archives management but when there is no Records Manager, the responsibility often falls on the Archivist. While records management is concerned with all information created, not exclusively information that has archival significance, it can be useful for the Archive to have a comprehensive picture of work that is being done across the institution. Having a wide-ranging understanding of workflows will only strengthen decisions around selection of what needs to come into the Archive.
So how do you begin? Here are some tips on developing an RRS based off of my experience at BAM.
- Work with IT. While the creation of an RRS does not necessarily require the technical expertise or someone with an information technology background, the eventual transfer of materials into the Archive and the management of an electronic repository will take some technical know-how. Collaborating with IT at an early stage will only improve relations down the road. If you don’t have an IT Department, it is okay! The Archivist often wears many hats.
- Talk to as many staff members as possible! Those who create records are the experts in the records they are creating. Trust their words and do not aim to alter their workflows. Work with them! Conduct an interview with a general framework, not a strict roadmap. Give people space to speak and guide them when necessary. Consider this interview outline:
- Walk through the general responsibilities of your department with an emphasis on what kinds of records or information is being created.
- Who creates record(s)?
- How it is created? Specific software?
- What format is it?
- How is it identified (filename/folder)? Standard naming conventions?
- Are there multiple copies? Multiple versions? How are finals identified?
- Where is it stored?
- How long is it used/accessed/relevant to your department?
- What is the historical significance/long-term research value in information created by your department?
- Make people feel comfortable and not embarrassed. The Archive asking about records can have an intimidating feel. Few people are as organized as they would like to be. These interviews should not be about shaming people but are an opportunity to listen and identify issues across your institution.
- To record or not to record? To transcribe or not to transcribe? Think carefully about the decision to audio or video record these interviews. You want your interviewee to feel comfortable and you also want to be able to refer back to things you may have missed. Transcribing interviews can be helpful but it takes a considerable effort. Consider the amount of time and resources that are available to you.
- Determine a format for your RRS. Consider making a spreadsheet with the column headings from Example 1.
- Develop Record Series Titles based off of workflows present within the department. To encourage compliance to an RRS, it is recommended to have the categories be as reflective of workflows within your institution as possible. If you think of it as a map or a crosswalk, developing an RRS to mirror record types and folder structures currently being used will only make things easier. Directly referencing language used by departments within the Records Series Title or Description will facilitate the process of compliance.
- Determine retention periods and whether or not records should be transferred to the Archive. Use this decision tree to help establish appropriate time periods.
- Get legal advice. For record series with legal considerations, consult your legal department. If there is no legal department, look at existing records retention schedules and at your local legal requirements. Here are some useful resources:
- New York State Archives Retention and Disposition Schedule for Government Based Records – Includes useful justifications of all retention categories.
- IRS – How Long Should I Keep Records? – Guidance on financial based records.
- Society for Human Resource Management’s Federal Records Retention Requirements – Legal guidance on retention periods for HR based records.
It is always best to look up the underlying laws cited in example RRSs to confirm applicable interpretation.
- To help mitigate duplication, consider limiting records transferred to the Archive exclusively to the creating department. In other words, for information shared across departments or created collaboratively across departments, consider getting the department that holds the final version to transfer the record to the Archive, as opposed to all departments that have a copy.
- Make a note of information that is required to be transferred to the Archive but is stored in databases or other systems used by your institution. If any information that is required to be transferred into the Archive is stored on removable media or third party proprietary systems, make sure these are flagged and a specific archival ingest process is developed for these records.
- Appoint a departmental records coordinator and require yearly approval. Designating responsibility to a specific person will dissuade finger pointing. If every department has a specific records retention coordinator, there will be a person with whom the Archives can communicate with, thus improving likelihood of compliance. It is important to make sure that the RRS is reviewed annually to ensure that it continues to reflect current workflows and practices.
Writing an RRS is big step; however, it is only the beginning. At BAM, now that we have completed revisions on our RRS, we are working on developing workflows for transferring materials into the Archive.
Using TreeSize Pro, we have scanned the network storage systems of all departments and have estimated the amount of data that will need to be brought into the Archives based off of the RRS.
We are now working to establish timelines and requirements for when and how departments should transfer materials to the Archive. Presently, we are testing AVPS’s Exactly file delivery tool as a way to receive files and require minimum metadata associated with deposits. Follow the NDSR-NY blog for updates on this phase of the project as it continues to unfold.