You are here

Feed aggregator

LITA: Another upcoming LITA web course and webinar, register now!

planet code4lib - Tue, 2016-03-15 14:00

Register now for the next great LITA continuing education web course and webinar offerings.

Don’t miss out on this repeat of last springs sold out LITA webinar:

Yes, You Can Video: A how-to guide for creating high-impact instructional videos without tearing your hair out
Presenters: Anne Burke, Undergraduate Instruction & Outreach Librarian, North Carolina State University Libraries; and Andreas Orphanides, Librarian for Digital Technologies and Learning, North Carolina State University Libraries
Tuesday, April 12, 2016
1:00 pm – 2:30 pm Central Time
Register Online, page arranged by session date (login required)

Have you ever wanted to create an engaging and educational instructional video, but felt like you didn’t have the time, ability, or technology? Are you perplexed by all the moving parts that go into creating an effective tutorial? In this 90 minute session, Anne Burke and Andreas Orphanides will help to demystify the process, breaking it down into easy-to-follow steps, and provide a variety of technical approaches suited to a range of skill sets. They will cover choosing and scoping your topic, scripting and storyboarding, producing the video, and getting it online. They will also address common pitfalls at each stage. This webinar is for anyone wanting to learn more about making effective videos.

Details here and Registration here.

Make the investment in deeper learning with this web course:

Universal Design for Libraries and Librarians
Instructors: Jessica Olin, Director of the Library, Robert H. Parker Library, Wesley College; and Holly Mabry, Digital Services Librarian, Gardner-Webb University.
Starting Monday, April 11, 2016, running for 6 weeks
Register Online, page arranged by session date (login required)

Universal Design is the idea of designing products, places, and experiences to make them accessible to as broad a spectrum of people as possible, without requiring special modifications or adaptations. This course will present an overview of universal design as a historical movement, as a philosophy, and as an applicable set of tools. Students will learn about the diversity of experiences and capabilities that people have, including disabilities (e.g. physical, learning, cognitive, resulting from age and/or accident), cultural backgrounds, and other abilities. The class will also give students the opportunity to redesign specific products or environments to make them more universally accessible and usable. By the end of this class, students will be able to…

  • Articulate the ethical, philosophical, and practical aspects of Universal Design as a method and movement – both in general and as it relates to their specific work and life circumstances
  • Demonstrate the specific pedagogical, ethical, and customer service benefits of using Universal Design principles to develop and recreate library spaces and services in order to make them more broadly accessible
  • Integrate the ideals and practicalities of Universal Design into library spaces and services via a continuous critique and evaluation cycle

Details here and Registration here.

And don’t miss the other upcoming LITA continuing education offerings by checking the Online Learning web page.

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4268 or Mark Beatty,

HangingTogether: Metadata for archived websites

planet code4lib - Tue, 2016-03-15 02:39

That was the topic discussed recently by OCLC Research Library Partners metadata managers, initiated by Dawn Hale of Johns Hopkins University. For some years now, archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival. Some websites are ephemeral or intentionally temporary, such as those created for a specific event. Institutions would like to archive and preserve the content of their websites as part of their historical record. A large majority of web content is harvested by web crawlers, but the metadata generated by harvesting alone is considered insufficient to support discovery.

Examples of archived websites among OCLC Research Library Partnership institutions include:

  • Ivy-Plus collaborative collections: Collaborative Architecture, Urbanism, and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA);
  • The New York Art Resources Consortium (NYARC), which captures dynamic web-based versions of auction catalogs and artist, gallery and museum websites;
  • Thematic collections supporting a specific research area, such as Columbia University’s Human Rights, Historic Preservation and Urban Planning, and New York City Religions;
  • Teaching materials, such as MIT’s OpenCourseWare (OCW), which aspires to make the content available to scholars and instructors for reuse for the foreseeable future;
  • Government archives, such as the Australian Government Web Archive.

Approaches to web archiving are evolving. Libraries are developing policies regarding content selection, exploring potential uses of archived content and considering the requirements for long-term preservation. Our discussion focused on the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites.

Some of the challenges raised in the discussions:

  • Descriptive metadata requirements may depend on the type of website archived, e.g., transient sites, research data, social media, or organizational sites. Sometimes only the content of the sites is archived when the look-and-feel of the site is not considered significant.
  • Practices vary. Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives: A Content Standard). Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata.
  • Metadata requirements may differ depending on the scale of material being archived and its projected use. For example, digital humanists look at web content as data and analyze it for purposes such as identifying trends, while other users merely need individual pages.
  • Many websites are updated repeatedly, requiring re-crawling when the content has changed. Some types of change can result in capture failures.
  • The level of metadata granularity (collection, seed/URL, document) may vary based on anticipated user needs, scale of material being crawled, or available staffing.
  • Some websites are archived by more than one institution. Each may have captured the same site on different dates and with varying crawl specifications. How can they be searched and used in conjunction with one another?

Some of the issues raised such as deciding on the correct level of granularity, determining relevance to one’s existing collection and handling concerns about copyright are routinely addressed by archivists. Jackie Dooley’s The Archival Advantage: Integrating Archival Experience into Management of Born-Digital Library Materials is applicable to archiving websites as well.

Focus group members agreed we had a common need for community-level metadata best practices applicable to archived websites, perhaps a “core metadata set”. Since the focus group discussions started in January, my colleagues Jackie Dooley and Dennis Massie have convened a 26-member OCLC Research Library Partnership Web Archiving Metadata Working Group with a charge to “evaluate existing and emerging approaches to descriptive metadata for archived websites” and “recommend best practices to meet user needs and to ensure discoverability and consistency”. Stay tuned!


About Karen Smith-Yoshimura

Karen Smith-Yoshimura, senior program officer, works on topics related to creating and managing metadata with a focus on large research libraries and multilingual requirements.

Mail | Web | Twitter | More Posts (64)

William Denton: Passing blocks to a thread in Sonic Pi

planet code4lib - Tue, 2016-03-15 00:47

A short example of how to pass a block to a thread in Sonic Pi, where it will be run and control will return to you immediately. The block can contain whatever you want. (Thanks to Sam Aaron, creator of Sonic Pi, for this; he sent it to the mailing list but I can’t find the original now.)

define :play_in_a_thread do |&block| in_thread do end end play_in_a_thread do play :c4 sleep 180 play :e4 sleep 180 play :g4 end play :c3

The play_in_a_thread (you can call it whatever you want) function will take whatever block you give it, and combined with Sonic Pi’s in_thread this parcels the block off, runs it, and returns control. In this example the C major chord will take 6 minutes to play, but the two Cs (:C4 and :C3) will play at the same time. You can call play_in_a_thread as many times as you want, from anywhere. You can pass in whatever you want, simple or complex, it doesn’t matter, Sonic Pi will handle it and let the program flow continue uninterrupted.

This is part of Ruby (in and on which Sonic Pi is built), and tells the function to expect a block to be passed in. The block operator (it’s also called an ampersand operator) section of the Ruby documentation on methods explains more.

Thanks to Sonic Pi, I’ve learned something new about Ruby.

District Dispatch: Reflections on a South-by experience

planet code4lib - Mon, 2016-03-14 21:15

South by Southwest 2016

Highbrow meets hijinks

If you haven’t been yet, South by Southwest (SXSW) is the all-the-rage conference in Austin that can only be described as highbrow meets hijinks. For every insightful point made at the convention center, there’s at least one beer swilled on Sixth Street. For every transformative technology shown on the expo floor, there’s at least one brisket sandwich consumed in the presence of a fledgling rock troupe. If you’re going, you gotta pace yourself. There’s a dab of regret in that exhortation – I just got back…And I didn’t.

But, anyone who has ever gone would find it hard to blame me. There’s a universe of things to see and do. If SXSW is a marathon, the “EDU” portion – held the week before the rest of the festivities – is the first mile. That’s the portion I attended, and I can personally attest, it kicked things off the starting block with an Olympic twitch. I fought through the South-by foot traffic to catch programs on a wide breadth of topics: from the role hip hop can play in advancing classroom learning, to the ways in which “Watson,” IBM’s oft-advertised new cognitive system, can personalize education and improve student achievement.

ALA and Benetech collaborated on session about leveraging 3D printers

Photo by Lisa Wadors Verne

As much as I enjoyed drinking in the organized madness, I didn’t attend just to spectate. On the second day of programming, I took the stage with Dr. Lisa Wadors Verne of Benetech to speak about how 3D printers can be leveraged in libraries, museums and schools to create new learning opportunities for students with disabilities. Lisa outlined how 3D printed learning tools can animate the learning process for students with print, learning and physical impairments (If you’re not quite sure what that means, think about how a 3D printed double helix or H2O molecule might bring science to life for a visually impaired student).  I described why libraries, as creative, non-judgmental spaces, are the perfect institutions to support the development of assistive technologies through the use of 3D printing technology.

Courtesy of Lisa Wadors Verne

After our presentation was over, I spoke with several individuals who wanted to learn more – both concerning how to find out more about Benetech’s 3D printing initiative to create educational equity, and concerning the current role libraries play in their communities. In response to the inquiries about Benetech’s initiative, I pointed to Lisa’s blog on a convening last summer that brought together practitioners from anchor institutions and industry to develop ideas for using 3D printers to put all learners on an even keel. I also plugged this District Dispatch post on the convening.

Additionally, I recommended checking out Benetech’s comprehensive “starter guide” for using 3D printed objects in education. I would urge library professionals of all kinds who are interested in educational equity to explore these resources as well. I’ve said it before and I’ll say it again: no one else is doing what Benetech is doing in the 3D printing space. In response to the inquiries about the role today’s libraries play in their communities, I reiterated and expanded upon points I emphasized in the presentation: libraries are one-stop community hubs, replete with informational and digital resources that people of all ages and backgrounds can use to engage in creative learning, seek government services, pursue entrepreneurial opportunities, and a great deal more. I threw out everything I’d learned and experienced over the course of two years in library land. Nonetheless, I must admit, I felt like a tiny voice lost in the whir.

Libraries need greater presence at SXSW

Living and working within the beltway’s competitive policy ecosystem has made me all too familiar with this feeling.  It has taught me that convincing anyone of anything requires organized, concerted action. My colleague, Larra Clark, recently wrote an article in the American Libraries blog,”The Scoop,”  previewing all of the library programs and activities at SXSW. It’s an impressive list, to be sure.  I am now convinced the library community needs an even greater presence. SXSW has a bounty of content, all of it pithy, packaged and powerful. To truly convince attendees that libraries transform people and communities in the digital age, we need to deploy a mass of library folks to the conference – a veritable army of speakers, makers, movers and shakers who can reach both minds and hearts.

So, if you haven’t thought about proposing a program, or at least attending the conference next year, consider this post a big, loud, formal call to action. I promise: if you go, you won’t regret it.

The post Reflections on a South-by experience appeared first on District Dispatch.

District Dispatch: What does it mean to #ConnectALL?

planet code4lib - Mon, 2016-03-14 20:14

#ConnectALL is a new initiative with the goal to connect 20 million more Americans to broadband by 2020.

Last Wednesday, the White House announced the #ConnectALL initiative and a goal to connect 20 million more people to broadband by 2020. That’s right — 20 million! According to the Pew Research Center, only about 67 percent of Americans have home broadband access, or roughly 80 percent when factoring in access via smartphones. Cost is the major reason most people do not have broadband connections, and, in fact, less than half of families with incomes below $25,000 have broadband service.


So, national policymakers MUST address the cost barrier. Fortunately, the Federal Communications Commission (FCC) is in the thick of considering new rules for the Lifeline program, which was created during the Reagan Administration to subsidize telephone service for low-income Americans. Approximately 10 percent of American households participate in the program, and voice telephone service is now nearly universal. As FCC Commissioners Tom Wheeler and Mignon Clyburn called out in their blog this week, broadband is a lifeline for the 21st century.

ALA is one of many digital inclusion advocacy organizations that have called for Lifeline to be expanded to include broadband. The folks at the Media Action Grassroots Network, for instance, have collected more than 100 stories that illuminate the need for low-cost, high-speed broadband. Based on the FCC fact sheet outlining proposed reforms, our collective voices have been heard in this regard. And the FCC has done an impressive job considering how to increase provider participation and consumer choice, ensure minimum standards, and increase transparency.

Key provisions include:

  • Ability to apply the $9.25 per month support to stand-alone broadband (as well as bundled voice and data, and mobile voice through 2019);
  • Minimum standards starting with 10/1Mbps fixed broadband speeds, minimum monthly fixed broadband data allowance of 150GB, 500 megabits of 3G data for mobile broadband to start, and unlimited minutes for mobile voice;
  • New category of service providers called Lifeline Broadband Providers;
  • National Eligibility Verifier to determine eligibility

There also is a special focus within the Lifeline item that recognizes the particular impact a home broadband connection can have for families with school-aged children. According to a Pew Research Center report, there are 5 million households with a school-aged child that lack home internet service. These are the kids that struggle with the “homework gap”—or being unable to complete many class assignments at home. While libraries play a critical role providing free public Internet access and online resources for school projects, having a home internet connection increases the opportunity to excel with school work and explore the abundance of opportunity online.

Community Connectivity Initiative

The Community Connectivity Initiative also was announced last week. The effort is led by the National Telecommunications and Information Administration’s (NTIA) BroadbandUSA Program in partnership with civic organizations and cities and towns. The initiative will create an online assessment tool to help community leaders identify critical broadband needs and connect them with expertise, tools and resources to overcome the challenges to expanded broadband deployment and adoption. The ALA is one of the national collaborators working to design and develop the tool, along with groups like the International City/County Management Association, the National Association of Counties and the National League of Cities.

Libraries can also engage directly in the effort in a variety of ways, including:

A number of communities already have agreed to support the tool development, as well, ranging from Ammon, Idaho, to Charlotte, North Carolina, to Seattle. Libraries may engage at the local level, directly by emailing or expressing interest to the ALA Washington Office by emailing

More about #ConnectALL

A couple more notable items for libraries and the broader digital inclusion community include:

  • New collaboration between the Institute of Museum and Library Services and the Corporation for National and Community Service (CNCS) to initiate a national service effort to deliver digital literacy skills training;
  • Plan to expand access to devices for more organizations that provide digital literacy training for low-income Americans through the Computers for Learning program.

Ensuring equitable access to affordable, robust broadband is an essential national policy objective. ALA is pleased that the Administration, the FCC, and NTIA are all focusing efforts to address the barriers that remain for so many individuals and families across the country. We will be following the next phases of these efforts closely with the next action coming on March 31 when the Commission votes on the Lifeline item at the open Commission meeting.

The post What does it mean to #ConnectALL? appeared first on District Dispatch.

pinboard: 2016 Lightning Talks - Code4Lib

planet code4lib - Mon, 2016-03-14 18:24
RT @todrobbins: #c4l16 participants: please add your lightning talk links to this page: #code4lib

LITA: Article Discussion on

planet code4lib - Mon, 2016-03-14 17:17
Image courtesy of Tony Hall under a CC BY-ND 2.0 license

Article Discussion

A month ago I came across an interesting article titled “ Evolution of Structured Data on the Web”. In the article, R. V. Guha (Google), Ban Brickley (Google), and Steve MacBeth (Microsoft) talked about, the history of and other structured data on the Web, design decisions, extending the core vocabulary, and related efforts to Much of the article revolved around the design decisions and implementation of by “The Big Search Engines” (Google, Yahoo, Bing, etc)., first and foremost, is a set of vocabularies, or, a data model, just like Dublin Core and Bibframe. So, in that regard, we as information publishers can use the vocabularies in whatever way we like. However, from what I can gather from the article, The Big Search Engines’ implementation of has implications on how we publish our data on the Web . For instance, given this quote:

Many models such as Linked Data have globally unique URIs for every entity as a core architectural principle. Unfortunately, coordinating entity references with other sites for the tens of thousands of entities about which a site may have information is much too difficult for most sites. Instead, insists on unique URIs for only the very small number of terms provided by Publishers are encouraged to add as much extra description to each entity as possible so that consumers of the data can use this description to do entity reconciliation. While this puts a substantial additional burden on applications consuming data from multiple websites, it eases the burden on webmasters significantly.

I can only assume that the Big Engines do not index all URI entities. It is expected that the publisher uses class and property URIs, and optionally points to URLs as seen in this example, but does not use URIs for entities. Here’s a visual graph based on the previous example:

The area of concern to us Linked Data practitioners is the big question mark in the middle. Without a URI we cannot link to this entity. The lack of URIs in this example are not significant. However, when we begin to think about complex description (e.g. describing books, authors, and relationships among them), the lack of URIs makes it really hard to make connections and to produce meaningful, linked data.

Given the structured data examples by Google and the context of this article, I will also assume that the Big Engines only index vocabularies (if anybody knows otherwise please correct me). This means that if you embed Dublin Core or Bibframe vocabularies in Web pages, they won’t be indexed.


After reading and interpreting the article, I have come up with the following thoughts that I feel will affect how we employ Linked Data:

  • We will need to publish our data in two ways: as Linked Data and as Big Engine-compliant data
  • No matter which vocabulary/vocabularies we use in publishing Linked Data, we will need to convert the metadata to vocabularies when embedding data into Web pages for the Big Engines to crawl
  • Before embedding our data into Web pages for the Big Engines to crawl, we will need to simplify our data by removing URIs from entities

I don’t know if that last bullet point would be necessary. That might be something that the Big Engines do as part of their crawling.


I want to say that none of the thoughts I mentioned above were explicitly stated in the article, they are simply my interpretations. As such, I want your input. How do you interpret the article? How do you see this affecting the future of Linked Data? As always, please feel free to add comments and questions below.

Until next time!


Andromeda Yelton: Let’s measure the build of Measure the Future!

planet code4lib - Mon, 2016-03-14 16:30

Bret Davidson and Jason Casden wrote this Code4Lib journal article I adore, “Beyond Open Source: Evaluating the Community Availability of Software”.

From their abstract:

The Code4Lib community has produced an increasingly impressive collection of open source software over the last decade, but much of this creative work remains out of reach for large portions of the library community. Do the relatively privileged institutions represented by a majority of Code4Lib participants have a professional responsibility to support the adoption of their innovations?

(Protip: yes.)

Davidson and Casden then go on to propose some metrics for software availability — that is, how can the developers producing this software estimate how installable and usable it might be for institutions which may not themselves have developers? The first of these is:

Time to pilot on a laptop. Defined as the time needed to install and configure, at minimum, a demonstration instance of the application, particularly for use in organizational evaluation of software.

Well! I now have an alpha version of the Measure the Future mothership. And I want it to be available to the community, and installable by people who aren’t too scared of a command line, but aren’t necessarily developers. So I’m going to measure the present, too: how long does it take me to install a mothership from scratch — in my case, on its deployment hardware, an Intel Edison?

tl;dr 87 minutes.

Is this good enough? No. But I knew it wouldn’t be; the priority for alpha is making it installable at all. And this took about two days; my sysadmin-fu is weak, and the Edison is a weird little platform that doesn’t have as big a software ecosystem as I’m used to, or lots of handy internet documentation (as far as I can tell, I’m the first person to have ever put Django on an Edison).

It’s going to be an additional chunk of work to get that 87 minutes down – it’s hard to make things easy! I need to bundle what I have into a shell script and package up the dependencies for faster installation (luckily fellow MtF developer Clinton Freeman has already done some work on packaging dependencies for the scouts, which will speed things up for me). The goal here is that, after end users have installed the operating system (which will unavoidably take several steps), they’ll be able to run one command, which will then take care of everything else (possibly prompting users for a little information along the way). After that, download and installation will probably still take some time (a lot of bits need to move across the network), but that time should be unattended.

Anyway. That’s the plan! I’m making a public record of my 87 minutes here so you can see how much progress I make in future, and also to provide some insight into the very real work that goes into streamlining build systems.

Roy Tennant: I, For One, Welcome Our Internet Overlords

planet code4lib - Mon, 2016-03-14 15:12

As reported in the Economist, the Internet Corporation for Assigned Names and Numbers (ICANN) is trying to go global. That is, it is attempting to shed any remaining ties to an individual country (*cough* the US) and become truly independent.

I heartily welcome this, as no country should be able to control something that has a worldwide impact like the Internet has achieved. Certainly not the United States, which has a very poor track record of keeping our hands off.

To back up a bit, ICANN is the agency that handles the essential plumbing that keeps the Internet functioning — the registry of Internet addresses and domain names. Simply put, if ICANN chooses to no longer resolve a request for any given Internet address (say, to its numeric location on the Internet, then for all intents and purposes it no longer exists. So independence from governments is a good thing. 

But as the Economist also points out, there are forces here in the US that would block such independence — largely, or exclusively, Republican. This should not be allowed to happen. The ICANN needs its independence, but no more than we do. Our very existence as a free people may depend upon it.

Islandora: Guest Blog: How the University of Connecticut works with Islandora VMs

planet code4lib - Mon, 2016-03-14 13:16

Every institution uses Islandora a little bit differently. One of the strengths of our community is how we share and re-use the solutions that have been developed by other teams so we're not all re-inventing the wheel. The University of Connecticut recently shared a write-up describing how they use a customized Islandora VM to ease deployment for the repositories they manage:

Virtual images have long been used to set up portable testing environments that can be created and destroyed at will, without requiring a lengthy OS and software install process. The Islandora Virtual Machine Image provided by the Islandora Foundation is no different in that respect. Where our image differs is that we wanted to further minimize the software install process required to take the stock Islandora Virtual Machine Image to something more closely resembling our current production environment. This included not only module, theme, and Islandora configurations changes, but also Solr and Tomcat changes that differ from the stock Islandora Virtual Machine Image. This process was somewhat complicated by that fact that UConn runs Islandora in a distributed configuration, with a more end-user focused Islandora instance placed on a separate server from Fedora and Solr, as well as an additional ingest focused and derivative generation Islandora instance residing on yet another server.

By itself, having a customized Islandora VM is not particularly noteworthy, but what makes deploying this customized Islandora VM to Amazon interesting is that it will allow for the CTDA staff to rapidly create completely functional Islandora instances at will that somewhat resemble our production environment – and this is the important part – without IT involvement. Additionally, by deploying to Amazon, we also gain the ability for the CTDA staff to create and manage and Islandora instance that is available outside of the UConn network. This is important as we have noticed in the past that because our development and staging Islandora instances have limited access outside the UConn network, it was difficult to demonstrate work “in progress” to outside partners without going through the process granting specific exceptions and access to our servers. This process also created administrative overhead for our server administrators to keep track of these security exceptions and “undo” them when they were no longer needed. By deploying to Amazon we hope to sidestep this issue for all but a few special cases and speed up the collaborative process.

- Rick Sarvas

UConn Libraries ITS

Library of Congress: The Signal: Data Migration, Digital Asset Management and Microservices at CUNY TV

planet code4lib - Mon, 2016-03-14 12:46

This is a guest post from Dinah Handel.

Photo by Jonathan Farbowitz

For the past six months, I’ve been immersed in the day to day operations of the archive and library at CUNY Television, a public television station located at the Graduate Center of the City University of New York.

My National Digital Stewardship residency has consisted of a variety of tasks, including writing and debugging the code that performs many of our archival operations, interviewing staff in and outside of the library and archives and migrating data and performing fixity checks.

This work all falls under three separate but related goals of the residency, stated in the initial project proposal in slightly different language, which are: document, make recommendations for, and implement changes to the media microservices based on local needs; work with project mentors to implement an open source digital asset management system; and verify the data integrity of digitized and born digital materials and create a workflow for the migration of 1 petabyte of data from LTO (linear tape open) 5 to LTO 7 tape.

Workflow Automation through Microservices
Media and metadata arrive at the library and archives from production teams and editors, and in the context of the Reference Model for an Open Archival Information System, this is considered the Submission Information Package. Our media microservice scripts take SIPs, and transcode and deliver access derivatives (Dissemination Information Packages or DIPs), create metadata and package all of these materials into our Archival Information Packages, which we write to long-term storage on LTO tapes.

Media microservices are a set of bash scripts that use open source software such as ffmpeg, mediainfo and others, to automate our archival and preservation workflow. A microservices framework means that each script accomplishes one task at a time, which allows for a sense of modularity in our archival workflow. We can change out and upgrade scripts as needed without overhauling our entire system and we aren’t reliant upon proprietary software to accomplish archiving and preservation. I wrote more about microservices on our NDSR NY blog.

One of my first tasks when I began my residency at CUNY Television was to make enhancements to our media microservices scripts, based on the needs of the library and archives staff. I had never worked with the bash or ffmpeg and — while it has been quite a learning curve — with a dash of impostor syndrome, I’ve made a significant amount of changes and enhancements to the media microservices scripts, and even written my own bash scripts (and also totally broke everything 1 million times).

The enhancements range from small stylistic changes to the creation of new microservices with the end goal of increasing automation in the processing and preservation of our AIPs. It’s been heartening to see the work that I do integrated into the archive’s daily operations and I feel lucky that I’m trusted to modify code and implement changes to the workflow of the archive and library.

In addition to making changes to the media microservices code, I also am tasked with creating documentation that outlines their functionality. I’ve been working on making this documentation general, despite the fact that the media microservices are tailored to our institution, because microservices can be used individually by anyone. My intention is to create materials that are clear and accessible explanations of the code and processes we use in our workflows.

Finally, I’ve also been working on creating internal documentation about how the library and archives functions. In some ways, writing documentation has been as challenging as writing code because explaining complex computer-based systems and how they interact with each other, in a narrative format, is tricky. I often wonder whether the language I am using will make sense to an outside or first-time reader or if the way I explain concepts is as clear to others as it is to me.

Digital Asset Management in a Production and Preservation Environment
CUNY Television’s library and archives are situated in both preservation and production environments, which means that in addition to migrating media from tape to digital files and archiving and preserving completed television programs, we are also concerned with the footage (we use the terms raw, remote, and B-roll to denote footage that is not made publicly available or digitized/migrated) that producers and editors use to create content.

Presently, much of that footage resides on a shared server and when producers are finished using a segment, they notify the library and archives and we move it from the server to long term storage on LTO. However, this process is not always streamlined and we would prefer to have all of this material stored in a way that makes it discoverable to producers.

Before I arrived at CUNY Television, the library and archive had chosen an open source digital asset management  system and one of my tasks included assisting in its implementation. Our intention is that the DAM will house access copies of all of CUNY Television’s material: broadcast footage and born-digital completed television shows, migrated or digitized content and non-broadcast B-roll footage.

To get broadcasted shows uploaded to the DAM, I wrote a short bash script that queries the DAM and the server that we broadcast from, to determine which new shows have not yet been uploaded to the DAM. Then, I wrote another script that transcodes the access copies according to the correct specification for upload. The two of these scripts are combined so that if a video file is not yet on the DAM, it gets transcoded and delivered to a directory that is synced with the DAM and uploaded automatically.

The process of getting production materials into the DAM is much more difficult. Producers don’t necessarily follow file-naming conventions and they often store their materials in ways that make sense to them but don’t necessarily follow a structure that translates well to a DAM system.

After interviewing our production manager, and visiting Nicole Martin, the multimedia archivist and systems manager at Human Rights Watch (which uses the same DAM), we came up with a pilot plan to implement the DAM for production footage.

As I mentioned, it is possible to sync a directory with the DAM for automatic uploads. Our intention is to have producers deposit materials into a synced hierarchical directory structure, which will then get uploaded to the DAM. Using a field-mapping functionality, we’ll be able to organize materials based on the directory names. Depending on how the pilot goes with one producer’s materials, we could expand this method to include all production materials.

Photo by Dinah Handel

Data Integrity and Migration
Currently, much of my energy is focused on our data migration. At CUNY Television, we use LTO tape as our long-term storage solution. We have approximately 1 petabyte of data stored on LTO 5 tape that we plan to migrate to LTO 7. LTO 7 tapes are able to hold approximately 6 terabytes of uncompressed files, compared to the 1.5 terabyte the LTO 5 can hold, so they will cut down on the number of tapes we use.

Migrating will also allow us to send the LTO 5 tapes off-site, which gives us geographic separation of all of our preserved materials. The data migration is complex and has many moving parts, and I am researching and testing a workflow that will account for the variety of data that we’re migrating.

We’re pulling data from tapes that were written four years ago, using a different workflow than we currently use, so there are plenty of edge cases where the process of migration can get complicated. The data migration begins when we read data back from the LTO 5 tapes using rsync. There is an A Tape and a B Tape (LOCKSS), so we read both versions back to two separate “staging” hard drives.

Once we read data back from the tapes, we need to verify that it hasn’t degraded in any way. This verification is done in different ways depending on the contents of the tape. Some files were written to tape without checksums, so after reading the files back to the hard drive, we’ve been creating checksums for both the A Tape and B Tape, and comparing the checksums against one another. We’re also using ffmpeg to perform error testing on the files whose checksums do not verify.

This process repeats until there is enough to write to LTO 7. For some files we will just verify checksums and write them to LTO 7 and call it a day. For other files though, we will need to do some minimal re-processing and additional verification testing to ensure their data integrity. To do this, I’ve been working on a set of microservice scripts that update our archival information packages to current specifications, update metadata and create a METS file to go with the archival information package. As of this week, we’ve written one set of LTO 7 tapes successfully and, while we still have a long way to go towards a petabyte, it is exciting to be beginning this process.

Even though each of these project goals are separate, they influence and draw on one another. Microservices are always inherent within our workflows and making materials accessible via a DAM is also reliant upon creating access copies from materials stored on LTOs for the past 4 years. Adopting this holistic perspective has been enormously helpful in seeing digital preservation as an interconnected system, where there are system-wide implications in developing workflows.

Hydra Project: The University of York joins the Hydra Partners

planet code4lib - Mon, 2016-03-14 09:33

The University of York, in the UK, has become Hydra’s 31st formal Partner..

In York’s Letter of Intent, Sarah Thompson (the Head of Collections, Library and Archives) writes:

“To date we have contributed to a Hydra Interest Group on page turning, submitted a code improvement to one of the Hydra Labs projects and have launched our first Fedora 4 and Hydra application: York Archbishops’ Registers Revealed.  This project has allowed us to explore allied initiatives such as IIIF and work around PCDM, and it is encouraging to see so much joined up activity happening across the community.

“We hope that as our expertise with Hydra develops, so will our contributions to the community.”

Welcome York!  We look forward to working with you further.

Terry Reese: MarcEdit Update

planet code4lib - Mon, 2016-03-14 09:03

I posted a small MarcEdit Update for Linux/Windows users that corrects some file path issues on linux and corrects a problem introduced in doing unicode character replacement in the 260/264 process of the RDA Helper.  You can get the update from the downloads page or via the automated updates.

I also wanted to take this as a time to make a quick reminder to something because it was something that came up during my testing.  I test on a wide range of VMs when I push an update.  This doesn’t mean that I catch everything, but it means that I do endeavor to try and minimize the problems that can occur due to the Windows Installer (and there are many).  On one of my Windows 10 VMS, and update that touched the .NET framework somehow invalidated the MarcEdit install.  When this happens, you have a couple options.  The one I recommend —

1) Uninstall MarcEdit completely.  This includes going to the Program Directory and Deleting the MarcEdit program directory.  The Windows Installer does various types of black magic, and the only way to make sure that this goes away is to get rid of the directory.

2) if you cannot uninstall the program (said Windows Installer black magic has gone haywire) – there is a program called the msicleaner on the MarcEdit downloads page.  Download that, run it as an administrator – and then go to the Program Directory and delete the MarcEdit program directory.  Then reinstall.  Again, the msi installer with unstuck the Windows Installer – but removing the contents of the directory will prevent future black magic chicanery. 

Again – this showed up on 1 on the 12 or 15 VMs I test on – but since it showed up after an update, its hard to know if this is something that will affect others.  Given that – I thought this would be a good time to remind users of how to overcome issues with the Windows Installer when/if they occur.



DuraSpace News: Fedora International Special Interest Group to Meet on Mar. 30

planet code4lib - Mon, 2016-03-14 00:00

From Susan Lafferty, Associate Library Director (Resources and Access), Australian Catholic University

Sydney, AU  There will be a teleconference on  Wednesday March 30 at 5:00 PM Australian Eastern Daylight Saving Time for anyong interested in the Fedora International Special Interest Group recently discussed by David Wilcox.

Call in details:

Conference Dial-In Number (North America) : (712) 775-7035

Participant Access Code: 383934#

International dial in numbers:

Access Conference: About Access 2016

planet code4lib - Sun, 2016-03-13 23:47

Join us October 4-7th in beautiful Fredericton, New Brunswick on the University of New Brunswick campus, hosted by UNB Libraries.

Hackfest will kicks things off on October 4th, followed by two and half days of library technology madness and further possible workshop. All for one low price.

Watch this space for emerging details and our call for proposals.

About UNB Libraries

Morbi interdum mollis sapien. Sed ac risus. Phasellus lacinia, magna a ullamcorper laoreet, lectus arcu pulvinar risus, vitae facilisis libero dolor a purus. Sed vel lacus. Mauris nibh felis, adipiscing varius, adipiscing in, lacinia vel, tellus. Suspendisse ac urna. Etiam pellentesque mauris ut lectus. Nunc tellus ante, mattis eget, gravida vitae, ultricies ac, leo. Integer leo pede, ornare a, lacinia eu, vulputate vel, nisl.

About UNB

Suspendisse mauris. Fusce accumsan mollis eros. Pellentesque a diam sit amet mi ullamcorper vehicula. Integer adipiscing risus a sem. Nullam quis massa sit amet nibh viverra malesuada. Nunc sem lacus, accumsan quis, faucibus non, congue vel, arcu. Ut scelerisque hendrerit tellus. Integer sagittis. Vivamus a mauris eget arcu gravida tristique. Nunc iaculis mi in ante. Vivamus imperdiet nibh feugiat est.

About Fredericton

Ut convallis, sem sit amet interdum consectetuer, odio augue aliquam leo, nec dapibus tortor nibh sed augue. Integer eu magna sit amet metus fermentum posuere. Morbi sit amet nulla sed dolor elementum imperdiet. Quisque fermentum. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Pellentesque adipiscing eros ut libero. Ut condimentum mi vel tellus. Suspendisse laoreet. Fusce ut est sed dolor gravida convallis. Morbi vitae ante. Vivamus ultrices luctus nunc. Suspendisse et dolor. Etiam dignissim. Proin malesuada adipiscing lacus. Donec metus. Curabitur gravida.

Patrick Hochstenbach: Comics Art in Relationship

planet code4lib - Sun, 2016-03-13 10:52
Homework for the California College of the Arts online course: create a comic timing one second, one hour and one day.  Filed under: Comics, portaits Tagged: art, cca, comic, comics, duration, illustration, inking, time

pinboard: Twitter

planet code4lib - Sat, 2016-03-12 14:39
This is pretty much why I don't bother going to #code4lib. Too hiveminded for/against specific tech

District Dispatch: Five minutes can net libraries $200 million next year

planet code4lib - Fri, 2016-03-11 14:06

Earlier this week, we asked for your help in defending the more than $200 million in LSTA and other federal library funding from Congressional and Administration cost-cutters.  Time was short then and it’s even shorter now.  Your help is needed to get your Representative and both US Senators to sign “Dear Appropriator” letters supporting LSTA and Innovative Approaches to Literacy grants, among others.  With just a few days left to get as many members of Congress behind those programs as humanly possible, now is the time for you to go to ALA’s Legislative Action Center and help save more than $200 million for communities across the country . . . very likely including yours!

clock showing almost midnight

A strong showing on these letters sends a signal to the Appropriations Committees’ to protect LSTA and IAL funding. So far, your work has generated thousands of emails, but frankly, we need many, many more.

Whether you call, email, tweet or all of the above (which would be great), the message to the office staff of your Senators and Representative is simple:

“Hello, I’m a constituent. Please ask Representative/Senator ________ to sign the LSTA and IAL ‘Dear Appropriator’ letters circulating for signature!”

Please take five minutes to call, email, or Tweet at your Members of Congress and support library funding for 2017. For more detailed information, read our earlier post on District Dispatch.

The post Five minutes can net libraries $200 million next year appeared first on District Dispatch.


Subscribe to code4lib aggregator