You are here

Feed aggregator

Patrick Hochstenbach: Brush inking exercise

planet code4lib - Sun, 2015-05-17 10:22
Filed under: Comics Tagged: art, cartoon, cat, ink, inking, mouse

Patrick Hochstenbach: Brush inking exercise II

planet code4lib - Sun, 2015-05-17 10:22
Filed under: Doodles Tagged: brush, cartoon, cat, comic, inking, mouse, sketchbook

David Rosenthal: A good op-ed on digital preservation

planet code4lib - Sun, 2015-05-17 03:12
Bina Venkataraman was White House adviser on climate change innovation and is now at the Broad Foundation Institute working on long-term vs. short-term issues. She has a good op-ed piece in Sunday's Boston Globe entitled The race to preserve disappearing data. She and I e-mailed to and fro as she worked on the op-ed, and I'm quoted in it.

Update: Bina's affiliation corrected - my bad.

District Dispatch: Supporting the USA FREEDOM Act of 2015: ALA’s Perspective

planet code4lib - Fri, 2015-05-15 20:45

by Ronald Repolona

Anyone who’s followed legislative efforts over the past ten plus years to restore a fraction of the civil liberties lost by Americans to the USA PATRIOT Act and other surveillance laws will understand the photo accompanying this post. With the revelations of the last several years in particular, first by the New York Times and then Edward Snowden, many believed that real reform might be achieved in the last Congress by passing the USA FREEDOM Act of 2014. They were wrong.

In May 2014, the House passed a version of the USA FREEDOM Act (H.R. 3361) that was dramatically weakened from a civil liberties point of view in the House Judiciary Committee and then stripped of virtually all meaningful privacy-restoring reforms by the full House of Representatives. While strenuous efforts were made to bring a robust version of the bill (S. 2685) to the floor of the Senate, Republican members filibustered that bill and the 113th Congress ended without further action on any form of the USA FREEDOM Act of 2014.

Undeterred, the bill’s bipartisan sponsors in both chambers recently reintroduced the USA FREEDOM Act of 2015, H.R. 2048 and S. 1123, a tenuously calibrated agreement that garnered the support of both many civil liberties organizations, including the American Library Association (ALA), as well as congressional “surveillance hawks,” the nation’s intelligence agencies, and the Administration. On May 14, just one week after a federal appeals court ruled the NSA’s use of Section 215 to collect Americans’ telephone call records in bulk illegal, H.R. 2048 passed the House with a strongly bipartisan vote  (338 yeas – 88 nays). At this writing, with effectively just one week remaining for Congress to consider expiring PATRIOT Act provisions before recessing for the Memorial Day holiday and the June 1 “sunset” of those provisions, the bill’s fate rests with the Senate and is highly uncertain.

Not all civil liberties advocates, however, are pushing for passage of this year’s version of the USA FREEDOM Act. The ACLU, for example, is calling on Congress to simply permit Section 215 and other expiring provisions of the PATRIOT Act to “sunset” as scheduled on June 1. The Electronic Frontier Foundation (EFF) also is urging Members of Congress to strengthen H.R. 2048 (rather than pass it in its current form) because, in EFF’s view, the reforms it makes will not sweep as broadly as the appeals court’s recent ruling could if upheld and broadened in its precedential effect by adoption in other courts (including eventually perhaps the U.S. Supreme Court). Neither group, however, is urging Members of Congress to vote against H.R. 2048.

These views by respected long-time ALA allies have, not unreasonably, caused some to ask (and no doubt many more to wonder) why ALA is actively urging its members and the public to work for passage of H.R. 2048. The answer is distillable to four words: policy, politics, permanence, and perseverance.


Since January of 2003, the Council of the American Library Association (the Association’s policy-setting body) has adopted at least eight Resolutions addressing the USA PATRIOT Act and the access to library patron reading, researching and internet usage records that it affords the government under Section 215 and through the use of National Security Letters (NSLs) and their associated “gag orders.” While somewhat different in individual focus based upon the legislative environments in which they were written, all make ALA’s position on Section 215 of the PATRIOT Act and related authorities consistently clear. Stated most recently in January of 2014, that position is that ALA “calls upon Congress to pass legislation supporting the reforms embodied in [the USA FREEDOM Act of 2014] (see ALA CD#20-1(A)).”

As detailed in this Open Technology Institute (OTI) section-by-section, side-by-side comparison of the current USA FREEDOM Act (H.R. 2048) with two versions introduced in the last Congress, the current bill is a long way from perfect (just as the “old” ones were). It does, however, achieve the principal objectives of last year’s legislation endorsed by ALA’s Council. Specifically, H.R. 2048:

  • categorically ends the bulk collection not only of telephone call records but also of any “tangible things” (in the language of Section 215), library records included. Henceforth, any request for records must relate to a specific pending investigation and be based upon a narrowly defined “specific selection term” as defined in the law. Accordingly, no longer will the NSA or FBI be able to assert that the search histories of all public access computers are “tangible things” whose production they can lawfully and indefinitely compel as part of an essentially boundless fishing expedition. Nor will agencies be able to continue “bulk collection” under other legal authorities, including National Security Letters, or “PEN register” and “trap and trace” statutes;
  • significantly strengthens judicial review of the non-disclosure (“gag”) orders that generally accompany NSLs by eliminating the current requirement in law that a court effectively accept without challenge mere certification by a high-level government official that disclosure of the order would endanger national security. H.R. 2048 also requires the government to initiate judicial review of nondisclosure orders and to bear the burden of proof in those proceedings that they are statutorily justified;
  • permits more robust public reporting by companies and others who have received Section 215 orders or NSLs from the government of the number of such requests they’ve processed; and
  • requires the secret “FISA Court” that issues surveillance authorities to designate a panel of fully “cleared” expert civil liberties counsel whom the court may appoint to advise it in cases involving significant or precedential legal issues, and to declassify its opinions or summarize them for public access when declassification is not possible. The bill also expands the opportunity for review of FISA Court opinions by federal appellate courts.

As OTI’s “side-by-side” also indicates, H.R. 2048 falls short of last year’s USA FREEDOM Act iteration in several important respects. Most significantly, records collected by the government on persons who ultimately are not relevant to an investigation may still be retained, and reforms affected in last year’s bill to Section 702 of the Foreign Intelligence Surveillance Act Amendments Act are decidedly weaker. The bill also extends expiring portions of the PATRIOT Act, as modified, for five years.


Determining whether ALA should support a particular piece of almost inevitably imperfect legislation turns not only on the content of the legislation (though that naturally receives disproportionate weight in an assessment), but also on the probability of achieving a better result and when such a result might conceivably be obtained. With the change in control of the Senate in 2014 and very high probability that control of the House will not shift for many elections to come, many groups including ALA believe that H.R. 2048 represents the “high water mark” in reform of Section 215 and related legal authorities achievable in the foreseeable future.


The recent landmark ruling by the U.S. Court of Appeals for the Second Circuit noted above was sweeping and clear in some respects, but limited and uncertainty producing in others. Specifically, the Court firmly ruled that the bulk collection of telephone records under Section 215 is illegal. That ruling, however, addressed only the NSA’s bulk collection of “telephony metadata.” It did not directly speak to the bulk collection of any other information, including library records of any kind.

Further, while binding in the states that make up the Second Judicial Circuit (Connecticut, New York, and Vermont), the court’s decision has no precedential effect in any other part of the country. It is also unclear whether the Second Circuit’s decision will be appealed by the government and, if so, what the outcome will be.

Finally, similar decisions are pending in two other federal Courts of Appeal. Should one or both rulings differ materially from the Second Circuit’s, further uncertainty as to what the law is and should be nationally will result. Resolution of such a “split in the Circuits” can only be accomplished through a multi-year appeal process to the U.S. Supreme Court, which is not required to hear the case.

Enactment of the current version of the USA FREEDOM Act would “lock in” the reforms noted above immediately, permanently and nationwide. Accordingly, on balance, ALA and its many coalition allies are supporting the bill and affirmatively urging Members of Congress to do the same.


Finally, and crucially, ALA and its allies have long been and remain fully committed to working for the most profound reform of all of the nation’s privacy and surveillance laws possible. ALA thus regards the USA FREEDOM Act of 2015 as a critical step — the first possible in 14 years — to make real progress toward that much broader permanent goal, but as only a step.

Work in this Congress (and beyond) will continue aggressively to pass comprehensive reform of the badly outdated Electronic Communications Privacy Act and to restore Americans’ civil liberties still compromised by, for example, other portions of the USA PATRIOT Act, Section 702 of the Foreign Intelligence Surveillance Act, Executive Order 12333 and many other privacy-hostile legal authorities.

With our allies at our side, and librarians and their millions of patrons behind us, the fight goes on.

The post Supporting the USA FREEDOM Act of 2015: ALA’s Perspective appeared first on District Dispatch.

Nicole Engard: Bookmarks for May 15, 2015

planet code4lib - Fri, 2015-05-15 20:30

Today I found the following resources and bookmarked them on Delicious.

Digest powered by RSS Digest

The post Bookmarks for May 15, 2015 appeared first on What I Learned Today....

Related posts:

  1. ATO2014: Using Bootstrap to create a common UI across products
  2. Speeding up WordPress Dashboard
  3. Google Docs Templates

Library of Congress: The Signal: Digital Forensics and Digital Preservation: An Interview with Kam Woods of BitCurator.

planet code4lib - Fri, 2015-05-15 15:44

We’ve written about the BitCurator project a number of times, but the project has recently entered a new phase and it’s a great time to check in again. The BitCurator Access project began in October 2014 with funding through the Mellon Foundation. BitCurator Access is building on the original BitCurator project to develop open-source software that makes it easier to access disk images created as part of a forensic preservation process.

Kam Woods, speaking during a workshop at UNC.

Kam Woods has been a part of BitCurator from the beginning as its Technical Lead, and he’s currently a Research Scientist in the School of Information and Library Science at the University of North Carolina at Chapel Hill. As part of our Insights Interview series we talked with Woods about the latest efforts to apply digital forensics to digital preservation.

Butch: How did you end up working on the BitCurator project?

Kam: In late 2010, I took a postdoc position in the School of Information and Library Science at UNC, sponsored by Cal Lee and funded by a subcontract from an NSF grant awarded to Simson Garfinkel (then at the Naval Postgraduate School). Over following months I worked extensively with many of the open source digital forensics tools written by Simson and others, and it was immediately clear that there were natural applications to the issues faced by collecting organizations preserving born-digital materials. The postdoc position was only funded for one year, so – in early 2011 – Cal and I (along with eventual Co-PI Matthew Kirschenbaum) began putting together a grant proposal to the Andrew W. Mellon Foundation describing the work that would become the first BitCurator project.

Butch: If people have any understanding at all of digital forensics it’s probably from television or movies, but I suspect the actions you see there are pretty unrealistic. How would you describe digital forensics for the layperson? (And as an aside, what do people on television get “most right” about digital forensics?)

Kam: Digital forensics commonly refers to the process of recovering, analyzing, and reporting on data found on digital devices. The term is rooted in law enforcement and corporate security practices: tools and practices designed to identify items of interest (e.g. deleted files, web search histories, or emails) in a collection of data in order to support a specific position in a civic or criminal court case, to pinpoint a security breach, or to identify other kinds of suspected misconduct.

The goals differ when applying these tools and techniques within archives and data preservation institutions, but there are a lot of parallels in the process: providing an accurate record of chain of custody, documenting provenance, and storing the data in a manner that resists tampering, destruction, or loss. I would direct the interested reader to the excellent and freely available 2010 Council on Library and Information Resources report Digital Forensics and Born-Digital Content in Cultural Heritage Institutions (pdf) for additional detail.

You’ll occasionally see some semblance of a real-world tool or method in TV shows, but the presentation is often pretty bizarre. As far as day-to-day practices go, discussions I’ve had with law enforcement professionals often include phrases like “huge backlogs” and “overextended resources.” Sound familiar to any librarians and archivists?

Butch: Digital forensics has become a hot topic in the digital preservation community, but I suspect that it’s still outside the daily activity of most librarians and archivists. What should librarians and archivists know about digital forensics and how it can support digital preservation?

Forensic write-blockers used to capture disk images from physical media.

Kam: One of the things Cal Lee and I emphasize in workshops is the importance of avoiding unintentional or irreversible changes to source media. If someone brings you a device such as a hard disk or USB drive, a hardware write-blocker will ensure that if you plug that device into a modern machine, nothing can be written to it, either by you or some automatic process running on your operating system. Using a write-blocker is a baseline risk-reducing practice for anyone examining data that arrives on writeable media.

Creating a disk image – a sector-by-sector copy of a disk – can support high-quality preservation outcomes in several ways. A disk image retains the entirety of any file system contained within the media, including directory structures and timestamps associated with things like when particular files were created and modified. Retaining a disk image ensures that as your external tools (for example, those used to export files and file system metadata) improve over time, you can revisit a “gold standard” version of the source material to ensure you’re not losing something of value that might be of interest to future historians or researchers.

Disk imaging also mitigates the risk of hardware failure during an assessment. There’s no simple, universal way to know how many additional access events an older disk may withstand until you try to access it. If a hard disk begins to fail while you’re reading it, chances of preserving the data are often higher if you’re in the process of making a sector-by-sector copy in a forensic format with a forensic imaging utility. Forensic disk image formats embed capture metadata and redundancy checks to ensure a robust technical record of how and when that image was captured, and improve survivability over raw images if there is ever damage to your storage system. This can be especially useful if you’re placing a material in long-term offline storage.

There are many situations where it’s not practical, necessary, or appropriate to create a disk image, particularly if you receive a disk that is simply being used as an intermediary for data transfer, or if you’re working with files stored on a remote server or shared drive. Most digital forensics tools that actually analyze the data you’re acquiring (for example, Simson Garfinkel’s bulk extractor, which searches for potentially private and sensitive information and other items of interest) will just as easily process a directory of files as they would a disk image. Being aware of these options can help guide informed processing decisions.

Finally, collecting institutions spend a great deal of time and money assessing, hiring and training professionals to make complex decisions about what to preserve, how to preserve it and how to effectively provide and moderate access in ways that serve the public good. Digital forensics software can reduce the amount of manual triage required when assessing new or unprocessed materials, prioritizing items that are likely to be preservation targets or require additional attention.

Butch: How does BitCurator Access extend the work of the original phases of the BitCurator project?

Kam: One of the development goals for BitCurator Access is to provide archives and libraries with better mechanisms to interact with the contents of complex digital objects such as disk images. We’re developing software that runs as a web service and allows any user with a web browser to easily navigate collections of disk images in many different formats. This includes: providing facilities to examine the contents of the file systems contained within those images; interact with visualizations of file system metadata and organization (including timelines indicating changes to files and folders); and download items of interest. There’s an early version and installation guide in the “Tools” section of

We’re also working on software to automate the process of redacting potentially private and sensitive information – things like Social Security Numbers, dates of birth, bank account numbers and geolocation data – from these materials based on reports produced by digital forensics tools. Automatic redaction is a complex problem that often requires knowledge of specific file format structures to do correctly. We’re using some existing software libraries to automatically redact where we can, flag items that may require human attention and prepare clear reports describing those actions.

Finally, we’re exploring ways in which we can incorporate emulation tools such as those developed at the University of Freiburg using the Emulation-as-a-Service model.

Butch: I’ve heard archivists and curators express ethical concerns about using digital forensics tools to uncover material that an author may not have wished be made available (such as earlier drafts of written works). Do you have any thoughts on the ethical considerations of using digital forensics tools for digital preservation and/or archival purposes?

The Digital Forensics Laboratory at UNC SILS.

Kam: There’s a great DPC Technology Watch report from 2012, Digital Forensics and Preservation (pdf), in which Jeremy Leighton John frames the issue directly: “Curators have always been in a privileged position due to the necessity for institutions to appraise material that is potentially being accepted for long-term preservation and access; and this continues with the essential and judicious use of forensic technologies.”

What constitutes “essential and judicious” is an area of active discussion. It has been noted elsewhere (see the CLIR report I mentioned earlier) that the increased use of tools with these capabilities may necessitate revisiting and refining the language in donor agreements and ethics guidelines.

As a practical aside, the Society of American Archivists Guide to Deeds of Gift includes language alerting donors to concerns regarding deleted content and sensitive information on digital media. Using the Wayback Machine, you can see that this language was added mid-2013, so that provides some context for the impact these discussions are having.

Butch: An area that the National Digital Stewardship Alliance has identified as important for digital preservation is the establishment of testbeds for digital preservation tools and processes. Do you have some insight into how got established, and how valuable it is for the digital forensics and preservation communities?

Kam: was originally created by Simson Garfinkel to serve as a home for corpora he and others developed for use in digital forensics education and research. The set of materials on the site has evolved over time, but several of the currently available corpora were captured as part of scripted, simulated real-world scenarios in which researchers and students played out roles involving mock criminal activities using computers, USB drives, cell phones and network devices.

These corpora strike a balance between realism and complexity, allowing students in digital forensics courses to engage with problems similar to those they might encounter in their professional careers while limiting the volume of distractors and irrelevant content. They’re freely distributed, contain no actual confidential or sensitive information, and in certain cases have exercises and solution guides that can be distributed to instructors. There’s a great paper linked in the Bibliography section of that site entitled “Bringing science to digital forensics with standardized forensic corpora” (pdf) that goes into the need for such corpora in much greater detail.

Various media sitting in the UNC SILS Digital Forensics Laboratory.

We’ve used disk images from one corpus in particular – the “M57-Patents Scenario” – in courses taken by LIS students at UNC and in workshops run by the Society of American Archivists. They’re useful in illustrating various issues you might run into when working with a hard drive obtained from a donor, and in learning to work with various digital forensics tools. I’ve had talks with several people about the possibility of building a realistic corpus that simulated, say, a set of hard drives obtained from an artist or author. This would be expensive and require significant planning, for reasons that are most clearly described in the paper linked in the previous paragraph.

Butch: What are the next steps the digital preservation community should address when it comes to digital forensics?

Kam: Better workflow modeling, information sharing and standard vocabularies to describe actions taken using digital forensics tools are high up on the list. A number of institutions do currently document and publish workflows that involve digital forensics, but differences in factors like language and resolution make it difficult to compare them meaningfully. It’s important to be able to distinguish those ways in which workflows differ that are inherent to the process, rather than the way in which that process is described.

Improving community-driven resources that document and describe the functions of various digital forensics tools as they relate to preservation practices is another big one. Authors of these tools often provide comprehensive documentation, but they doesn’t necessarily emphasize those uses or features of the tools that are most relevant to collecting institutions. Of course, a really great tool tutorial doesn’t really help someone who doesn’t know about that tool, or isn’t familiar with the language being used to describe what it does, so you can flip this: describing a desired data processing outcome in a way that feels natural to an archivist or librarian, and linking to a tool that solves part or all of the related problem. We have some of this already, scattered around the web; we just need more of it, and better organization.

Finally, a shared resource for robust educational materials that reflect the kinds of activities students graduating from LIS programs may undertake using these tools. This one more or less speaks for itself.

LITA: The ‘I’ Word: Internships

planet code4lib - Fri, 2015-05-15 14:00
Image courtesy of DeKalb CSB.

Two weeks ago, I completed a semester-long, advanced TEI internship where I learned XSLT and utilized it to migrate two digital collections (explained more here, and check out the blog here) in the Digital Library Program. During these two weeks, I’ve had time to reflect on the impact that internships, especially tech-based, have on students.

At Indiana University, a student must complete an internship to graduate with a dual degree or specialization. However, this is my number one piece of advice for any student, but especially library students: do as many internships as you possibly can. The hands-on experience obtained during an internship is invaluable moving to a real-life position, and something we can’t always experience in courses. This is especially true for internships introducing and refining tech skills.

I’m going to shock you: learning new technology is difficult. It takes time. It takes patience. It takes a project application. Every new computer program or tech skill I’ve learned has come with a “drowning period,” also known as the learning period. Since technology exists in a different space, it is difficult to conceptualize how it works, and therefore how to learn and understand it.

An internship, usually unpaid or for student-paid credit, is the perfect safe zone for this drowning period. The student has time to fail and make mistakes, but also learn from them in a fairly low-pressure situation. They work with the people actually doing the job in the real world who can serve as a guide for learning the skills, as well as a career mentor.

The supervisors and departments also benefit from free labor, even if it does take time. Internships are also a chance for supervisors to revisit their own knowledge and solidify it by teaching others. They can look at their standards and see if anything needs updated or changed. Supervisors can directly influence the next generation of librarians, teaching them skills and hacks it took them years to figure out.

My two defining internships were: the Walt Whitman Archive at the University of Nebraska-Lincoln, introducing me to digital humanities work, and the Digital Library Program, solidifying my future career. What was your defining library internship? What kinds of internships does your institution offer to students and recent graduates? How does your institution support continuing education and learning new tech skills?

D-Lib: Semantic Description of Cultural Digital Images: Using a Hierarchical Model and Controlled Vocabulary

planet code4lib - Fri, 2015-05-15 11:44
Article by Lei Xu and Xiaoguang Wang, Wuhan University, Hubei, China

D-Lib: Metamorph: A Transformation Language for Semi-structured Data

planet code4lib - Fri, 2015-05-15 11:44
Article by Markus Michael Geipel, Christophe Boehme and Jan Hannemann, German National Library

D-Lib: Linked Data URIs and Libraries: The Story So Far

planet code4lib - Fri, 2015-05-15 11:44
Article by Ioannis Papadakis, Konstantinos Kyprianos and Michalis Stefanidakis, Ionian University

D-Lib: Facing the Challenge of Web Archives Preservation Collaboratively: The Role and Work of the IIPC Preservation Working Group

planet code4lib - Fri, 2015-05-15 11:44
Article by Andrea Goethals, Harvard Library, Clément Oury, International ISSN Centre, David Pearson, National Library of Australia, Barbara Sierman, KB National Library of the Netherlands and Tobias Steinke, Deutsche Nationalbibliothek

D-Lib: Olio

planet code4lib - Fri, 2015-05-15 11:44
Editorial by Laurence Lannom, CNRI

D-Lib: An Assessment of Institutional Repositories in the Arab World

planet code4lib - Fri, 2015-05-15 11:44
Article by Scott Carlson, Rice University

D-Lib: Helping Members of the Community Manage Their Digital Lives: Developing a Personal Digital Archiving Workshop

planet code4lib - Fri, 2015-05-15 11:44
Article by Nathan Brown, New Mexico State University Library


Subscribe to code4lib aggregator