You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 2 hours 21 min ago

Library of Congress: The Signal: Digital Archiving Programming at Four Liberal Arts Colleges

Thu, 2015-05-28 15:30

Clockwise from top left: Vassar, Bryn Mawr, Wheaton and Amherst Colleges. Photo of Thompson Library at Vassar from Wikimedia by Jim Mills.

The following guest post is a collaboration from Joanna DiPasquale (Vassar College), Amy Bocko (Wheaton College), Rachel Appel (Bryn Mawr College) and Sarah Walden (Amherst College) based on their panel presentation at the recent Personal Digital Archiving 2015 conference. I will write a detailed post about the conference — which the Library of Congress helped organize — in a few weeks..

When is the personal the professional? For faculty and students, spending countless hours researching, writing, and developing new ideas, the answer (only sometimes tongue-in-cheek) is “always:” digital archiving of their personal materials quickly turns into the creation of collections that can span multiple years, formats, subjects, and versions. In the library, we know well that “save everything” and “curate everything” are very different. What role, then, could the liberal arts college library play in helping our faculty and students curate their digital research materials and the scholarly communication objects that they create with an eye towards sustainability?

At Vassar, Wheaton, Bryn Mawr, and Amherst Colleges, we designed Personal Digital Archiving Days (PDAD) events to push the boundaries of outreach and archiving, learn more about our communities’ needs, and connect users to the right services needed to achieve their archiving goals. In Fall 2014, we held sessions across each of our campuses (some for the first time, some as part of an ongoing PDAD series), using the Library of Congress personal digital archiving resources as a model for our programming. Though our audiences and outcomes varied, we shared common goals: to provide outreach for the work we do, make the campus community aware of the services available to them, and impart best practices on attendees that will have lasting effects for their digital information management.

Joanna DiPasquale. Photo courtesy of Rachel Appel.

Joanna DiPasquale, digital initiatives librarian at Vassar, learned about personal digital archiving days from the Library of Congress’ resources and how they worked for public or community libraries. She saw these resources as an opportunity to communicate to campus about the library’s new Digital Initiatives Group and how each part of the group complemented other services on campus (such as media services, computing and preservation). Her workshop was geared toward faculty and faculty-developed digital projects and scholarship. Vassar began the workshops in 2012, and faculty continued to request them each year. By 2014, the event featured a case study from a faculty member (and past attendee) about the new strategies he employed for his own work.

Amy Bocko. Photo courtesy of Rachel Appel.

Amy Bocko, Digital Asset Curator at Wheaton, saw PDAD’s success during her time as a Vassar employee. Now at Wheaton, Amy wanted to publicize her brand-new position on campus and ability to offer new digitally-focused services in Library and Information Services, and her Personal Digital Archiving Day brought together a diverse group of faculty members to work on common issues. The reactions were favorable and the attendees were grateful for the help they needed to manage their digital scholarship.

Approaching everything as a whole could have been overwhelming, so Amy boiled it down to “what step could you take today that would improve your digital collection? which led to iterative, more effective results. Common responses included “investing in an external hard drive”, “adhering to a naming structure for digital files” and “taking inventory of what I have”. Amy made herself available after her workshop to address the specific concerns of faculty members in relation to their materials. She spoke at length with a printmaking professor that had an extensive collection of both analog slides and digital images with little metadata. They discussed starting small, creating a naming schema that would help her take steps towards becoming organized. The faculty member remarked how just a brief conversation, and knowing that the library was taking steps to help their faculty in managing their digital scholarship, put her mind at ease.

Rachel Appel. Photo courtesy of Rachel Appel.

Rachel Appel, digital collections librarian at Bryn Mawr, wanted to focus on student life. Rachel worked directly with Bryn Mawr’s Self-Government Association to work specifically with student clubs to bring awareness about their records, help them get organized and think ahead to filling archival silences in the College Archives. Like the other institutions, PDAD provided a great avenue to introduce her work to campus.  The students were also very interested in the concept of institutional memory and creating documented legacies between each generation of students. Rachel was able to hold the workshop again for different groups of attendees and focus on basic personal digital file management.

Sarah Walden. Photo courtesy of Rachel Appel.

Sarah Walden, digital projects librarian at Amherst, focused on student thesis writers for PDAD. Sarah worked with Criss Guy, a post-bac at Amherst, and they developed the workshop together. Their goal was to expose students to immediate preservation concerns surrounding a large research project like a thesis (backups, organization, versioning), as well as to give them some exposure to the idea of longer-term preservation. They offered two versions of their workshop. In the fall, they gave an overview of file identification, prioritization, organization, and backup. The second version of the workshop in January added a hands-on activity in which the students organized a set of sample files using the organizing-software program, Hazel.

Although our workshops had varying audiences and goals, they empowered attendees to become more aware of their digital data management and the records continuum. They also provided an outreach opportunity for the digital library to address issues of sustainability in digital scholarship.

This benefits both the scholar and the library. The potential for sustainable digital scholarship (whether sustained by the library, the scholar or both) increases when we can bring our own best practices to our constituents. We believe that PDAD events like ours provide an opportunity for college libraries to meet our scholars in multiple project phases:

  • While they are potentially worried about their past digital materials
  • While they are actively creating (and curating) their current materials
  • When they move beyond our campus services (particularly for students).

While we dispense good advice, we also raise awareness of our digital-preservation skills, our services and our best practices, and we only see that need growing as digital scholarship flourishes. On the college campus, the personal heavily overlaps with the professional. We anticipate that we will be holding more targeted workshops for specific groups of attendees and would like to hear experiences from other institutions on how their PDADs evolved.

David Rosenthal: Time for another IoT rant

Thu, 2015-05-28 15:00
I haven't posted on the looming disaster that is the Internet of Things You Don't Own since last October, although I have been keeping track of developments in brief comments to that post. The great Charlie Stross just weighed in with a brilliant, must-read examination of the potential the IoT brings for innovations in rent-seeking, which convinced me that it was time for an update. Below the fold, I discuss the Stross business model and other developments in the last 8 months.

Back in February, Stephen Balkam's Guardian article What will happen when the internet of things becomes artificially intelligent? sparked some discussion on Dave Farber's IP list, including this wonderfully apposite Philip K. Dick citation from Ian Stedman via David Pollak. It roused Mike O'Dell to respond with Internet of Obnoxious Things, a really important insight into the fundamental problems underlying the Internet of Things. Just go read it. Mike starts:
The PKDick excerpt cited about a shakedown by a door lock is, I fear, more prescient than it first appears.

I very much doubt that any "Internet of Things" will become Artificially Impudent because long before that happens, all the devices will be co-opted by The Bad Guys who will proceed to pursue shakedowns, extortion, and "protection" rackets on a coherent global scale.

Whether it is even possible to "secure" such a collection of devices empowered with such direct control over physical reality is a profound and, I believe, completely open theoretical question. (We don't even have a strong definition of what that would mean.)

Even if it is theoretically possible, it has been demonstrated in the most compelling possible terms that it will not be done for a host of reasons. The most benign fall under the rubric of "Never ascribe to malice what is adequately explained by stupidity" while others will be aggressively malicious. ...

A close second, however, is a definition of "security" that reads, approximately, "Do what I should have meant." Eg, the rate of technology churn cannot be reduced just because we haven't figured out what we need it to do (or not do) - we'll just "iterate" every time Something Bad(tm) happens.Charlie goes further, and follows Philip K. Dick more closely, by pointing out that the causes of Something Bad(tm) are not just stupidity and malice, but also greed:
The evil business plan of evil (and misery) posits the existence of smart municipality-provided household recycling bins. ... The bin has a PV powered microcontroller that can talk to a base station in the nearest wifi-enabled street lamp, and thence to the city government's waste department. The householder sorts their waste into the various recycling bins, and when the bins are full they're added to a pickup list for the waste truck on the nearest routing—so that rather than being collected at a set interval, they're only collected when they're full.

But that's not all.

Householders are lazy or otherwise noncompliant and sometimes dump stuff in the wrong bin, just as drivers sometimes disobey the speed limit.

The overt value proposition for the municipality (who we are selling these bins and their support infrastructure to) is that the bins can sense the presence of the wrong kind of waste. This increases management costs by requiring hand-sorting, so the individual homeowner can be surcharged (or fined). More reasonably, households can be charged a high annual waste recycling and sorting fee, and given a discount for pre-sorting everything properly, before collection—which they forefeit if they screw up too often.

The covert value proposition ... local town governments are under increasing pressure to cut their operating budgets. But by implementing increasingly elaborate waste-sorting requirements and imposing direct fines on households for non-compliance, they can turn the smart recycling bins into a new revenue enhancement channel, ... Churn the recycling criteria just a little bit and rely on tired and over-engaged citizens to accidentally toss a piece of plastic in the metal bin, or some food waste in the packaging bin: it'll make a fine contribution to your city's revenue!Charlie sets out the basic requirements for business models like this:
Some aspects of modern life look like necessary evils at first, until you realize that some asshole has managed to (a) make it compulsory, and (b) use it for rent-seeking. The goal of this business is to identify a niche that is already mandatory, and where a supply chain exists (that is: someone provides goods or service, and as many people as possible have to use them), then figure out a way to colonize it as a monopolistic intermediary with rent-raising power and the force of law behind it.and goes on to use speed cameras as an example. What he doesn't go into is what the IoT brings to this class of business models; reduced cost of detection, reduced possibility of contest, reduced cost of punishment. A trifecta that means profit! But Charlie brilliantly goes on to incorporate:
the innovative business model that Yves Smith has dubbed "crapification". A business that can reduce customer choice sufficiently then has a profit opportunity; it can make its product so awful that customers will pay for a slightly less awful version. He suggests:
Sell householders a deluxe bin with multiple compartments and a sorter in the top: they can put their rubbish in, and the bin itself will sort which section it belongs in. Over a year or three the householder will save themselves the price of the deluxe bin in avoided fines—but we don't care, we're not the municipal waste authority, we're the speed camera/radar detector vendor!Cory Doctorow just weighed in, again, on the looming IoT disaster. This time he points out that although it is a problem that Roomba's limited on-board intelligence means poor obstacle avoidance, solving the problem by equipping them with cameras and an Internet connection to an obstacle-recognition service is an awesomely bad idea:
Roombas are pretty useful devices. I own two of them. They do have real trouble with obstacles, though. Putting a camera on them so that they can use the smarts of the network to navigate our homes and offices is a plausible solution to this problem.
But a camera-equipped networked robot that free-ranges around your home is a fucking disaster if it isn't secure. It's a gift to everyone who wants to use cameras to attack you, from voyeur sextortionist creeps to burglars to foreign spies and dirty cops. Looking back through the notes on my October post, we see that Google is no longer patching known vulnerabilities in Android before 4.4. There are only about 930 million devices running such software. More details on why nearly a billion users are being left to the mercy of the bad guys are here.

The Internet of Things With Wheels That Kill People has featured extensively. First, Progressive Insurance's gizmo that tracks their customer's driving habits has a few security issues:
"The firmware running on the dongle is minimal and insecure," Thuen told Forbes.

"It does no validation or signing of firmware updates, no secure boot, no cellular authentication, no secure communications or encryption, no data execution prevention or attack mitigation technologies ... basically it uses no security technologies whatsoever."

What's the worst that can happen? The device gives access to the CAN bus.

"The CAN bus had been the target of much previous hacking research. The latest dongle similar to the SnapShot device to be hacked was the Zubie device which examined for mechanical problems and allowed drivers to observe and share their habits."

"Argus Cyber Security researchers Ron Ofir and Ofer Kapota went further and gained control of acceleration, braking and steering through an exploit."  Second, a vulnerability in BMWs, Minis and Rolls-Royces:
"BMW has plugged a hole that could allow remote attackers to open windows and doors for 2.2 million cars."
..."Attackers could set up fake wireless networks to intercept and transmit the clear-text data to the cars but could not have impacted vehicle acceleration or braking systems."
BMW's patch also updated its patch distribution system to use HTTPS."What were they thinking?

Third, Senator Ed Markey has been asking auto makers questions and the answers are not reassuring. No wonder he was asking questions. At an industry-sponsored hackathon last July a 14-year old with $15 in parts from Radio Shack showed how easy it was:
"Windshield wipers turned on and off. Doors locked and unlocked. The remote start feature engaged. The student even got the car's lights to flash on and off, set to the beat from songs on his iPhone."Key to an Internet of Things that we could live with is, as Vint Cerf pointed out, a secure firmware update mechanism. The consequences of not having one can be seen in Kaspersky's revelations of the "Equation group" compromising hard drive firmware. Here's an example of how easy it can be. To be fair, Seagate at least has deployed a secure firmware update mechanism, initially to self-encrypting drives but now I'm told to all their current drives.

Cooper Quintin at the EFF's DeepLinks blog weighed in with a typically clear overview of the issue entitled Are Your Devices Hardwired For Betrayal?. The three principles:
  • Firmware must be properly audited.
  • Firmware updates must be signed.
  • We need a mechanism for verifying installed firmware.
would greatly reduce the problem, except that they would make firmware companies targets for Gemalto-like key exfiltration. I agree with Quintin that:
"None of these things are inherently difficult from a technological standpoint. The hard problems to overcome will be inertia, complacency, politics, incentives, and costs on the part of the hardware companies."Among the Things in the Internet are computers with vulnerable BIOSes:
"Though there's been long suspicion that spy agencies have exotic means of remotely compromising computer BIOS, these remote exploits were considered rare and difficult to attain.

Legbacore founders Corey Kallenberg and Xeno Kovah's Cansecwest presentation ... automates the process of discovering these vulnerabilities. Kallenberg and Kovah are confident that they can find many more BIOS vulnerabilities; they will also demonstrate many new BIOS attacks that require physical access."GCHQ has the legal authority to exploit these BIOS vulnerabilities, and any others it can find, against computers, phones and any other Things on the Internet wherever they are. Its likely that most security services have similar authority.

Useful reports appeared, including this two part report from Xipiter, and this from Veracode on insecurities, this from DDOS-protection company Incasula, on the now multiple botnets running on home routers, and this from the SEC Consult Vulnerability Lab about a yet another catastrophic vulnerability in home routers. This last report, unlike the industry happy-talk, understands the economics of IoT devices:
"the (consumer) embedded systems industry is always keen on keeping development costs as low as possible and is therefore using vulnerability-ridden code provided by chipset manufacturers (e.g. Realtek CVE-2014-8361 - detailed summary by HP, Broadcom) or outdated versions of included open-source software (e.g. libupnp, MiniUPnPd) in their products."And just as I was finishing this rant, Ars Technica posted details of yet another botnet running on home routers, this one called Linux/Moose. It collects social network credentials.

That's all until the next rant. Have fun with your Internet-enabled gizmos!

FOSS4Lib Recent Releases: Archivematica - 1.4

Thu, 2015-05-28 14:17

Last updated May 28, 2015. Created by Peter Murray on May 28, 2015.
Log in to edit this page.

Package: ArchivematicaRelease Date: Wednesday, May 27, 2015

Open Library Data Additions: Amazon Crawl: part eu

Thu, 2015-05-28 14:12

Part eu of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

Mark E. Phillips: Effects of subject normalization on DPLA Hubs

Thu, 2015-05-28 14:00

In the previous post I walked through some of the different ways that we could normalize a subject string and took a look at what effects these normalizations had on the subjects in the entire DPLA metadata dataset that I have been using.

This post I wanted to continue along those lines and take a look at what happens when you apply these normalizations to the subjects in the dataset, but this time focus on the Hub level instead of working with the whole dataset.

I applied the normalizations mentioned in the previous post to the subjects from each of the Hubs in the DPLA dataset.  This included total values, unique but un-normalized values, case folded, lower cased, NACO, Porter stemmed, and fingerprint.  I applied the normalizations on the output of the previous normalization as a series, here is an example of what the normalization chain looked like for each.

total total > unique total > unique > case folded total > unique > case folded > lowercased total > unique > case folded > lowercased > NACO total > unique > case folded > lowercased > NACO > Porter total > unique > case folded > lowercased > NACO > Porter > fingerprint

The number of subjects after each normalization is presented in the first table below.

Hub Name Total Subjects Unique Subjects Folded Lowercase NACO Porter Fingerprint ARTstor 194,883 9,560 9,559 9,514 9,483 8,319 8,278 Biodiversity_Heritage_Library 451,999 22,004 22,003 22,002 21,865 21,482 21,384 David_Rumsey 22,976 123 123 122 121 121 121 Digital_Commonwealth 295,778 41,704 41,694 41,419 40,998 40,095 39,950 Digital_Library_of_Georgia 1,151,351 132,160 132,157 131,656 131,171 130,289 129,724 Harvard_Library 26,641 9,257 9,251 9,248 9,236 9,229 9,059 HathiTrust 2,608,567 685,733 682,188 676,739 671,203 667,025 653,973 Internet_Archive 363,634 56,910 56,815 56,291 55,954 55,401 54,700 J_Paul_Getty_Trust 32,949 2,777 2,774 2,760 2,741 2,710 2,640 Kentucky_Digital_Library 26,008 1,972 1,972 1,959 1,900 1,898 1,892 Minnesota_Digital_Library 202,456 24,472 24,470 23,834 23,680 22,453 22,282 Missouri_Hub 97,111 6,893 6,893 6,850 6,792 6,724 6,696 Mountain_West_Digital_Library 2,636,219 227,755 227,705 223,500 220,784 214,197 210,771 National_Archives_and_Records_Administration 231,513 7,086 7,086 7,085 7,085 7,050 7,045 North_Carolina_Digital_Heritage_Center 866,697 99,258 99,254 99,020 98,486 97,993 97,297 Smithsonian_Institution 5,689,135 348,302 348,043 347,595 346,499 344,018 337,209 South_Carolina_Digital_Library 231,267 23,842 23,838 23,656 23,291 23,101 22,993 The_New_York_Public_Library 1,995,817 69,210 69,185 69,165 69,091 68,767 68,566 The_Portal_to_Texas_History 5,255,588 104,566 104,526 103,208 102,195 98,591 97,589 United_States_Government_Printing_Office_(GPO) 456,363 174,067 174,063 173,554 173,353 172,761 170,103 University_of_Illinois_at_Urbana-Champaign 67,954 6,183 6,182 6,150 6,134 6,026 6,010 University_of_Southern_California_Libraries 859,868 65,958 65,882 65,470 64,714 62,092 61,553 University_of_Virginia_Library 93,378 3,736 3,736 3,672 3,660 3,625 3,618

Here is a table that shows the percentage reduction after each field is normalized with a specific algorithm.  The percent reduction makes it a little easier to interpret.

Hub Name Folded Normalization Lowercase Normalization Naco Normalization Porter Normalization Fingerprint Normalization ARTstor 0.0% 0.5% 0.8% 13.0% 13.4% Biodiversity_Heritage_Library 0.0% 0.0% 0.6% 2.4% 2.8% David_Rumsey 0.0% 0.8% 1.6% 1.6% 1.6% Digital_Commonwealth 0.0% 0.7% 1.7% 3.9% 4.2% Digital_Library_of_Georgia 0.0% 0.4% 0.7% 1.4% 1.8% Harvard_Library 0.1% 0.1% 0.2% 0.3% 2.1% HathiTrust 0.5% 1.3% 2.1% 2.7% 4.6% Internet_Archive 0.2% 1.1% 1.7% 2.7% 3.9% J_Paul_Getty_Trust 0.1% 0.6% 1.3% 2.4% 4.9% Kentucky_Digital_Library 0.0% 0.7% 3.7% 3.8% 4.1% Minnesota_Digital_Library 0.0% 2.6% 3.2% 8.3% 8.9% Missouri_Hub 0.0% 0.6% 1.5% 2.5% 2.9% Mountain_West_Digital_Library 0.0% 1.9% 3.1% 6.0% 7.5% National_Archives_and_Records_Administration 0.0% 0.0% 0.0% 0.5% 0.6% North_Carolina_Digital_Heritage_Center 0.0% 0.2% 0.8% 1.3% 2.0% Smithsonian_Institution 0.1% 0.2% 0.5% 1.2% 3.2% South_Carolina_Digital_Library 0.0% 0.8% 2.3% 3.1% 3.6% The_New_York_Public_Library 0.0% 0.1% 0.2% 0.6% 0.9% The_Portal_to_Texas_History 0.0% 1.3% 2.3% 5.7% 6.7% United_States_Government_Printing_Office_(GPO) 0.0% 0.3% 0.4% 0.8% 2.3% University_of_Illinois_at_Urbana-Champaign 0.0% 0.5% 0.8% 2.5% 2.8% University_of_Southern_California_Libraries 0.1% 0.7% 1.9% 5.9% 6.7% University_of_Virginia_Library 0.0% 1.7% 2.0% 3.0% 3.2%

Here is that data presented as a graph that I think shows the data a even better.

Reduction Percent after Normalization

You can see that for many of the Hubs you see the biggest reduction happening when applying the Porter Normalization and the Fingerprint Normalization.  Hubs of note are ArtStore which had the highest percentage of reduction of the hubs.  This was primarily caused by the Porter normalization which means that there were a large percentage of subjects that stemmed to the same stem, often this is plural vs singular versions of the same subject.  This may be completely valid with out ArtStore chose to create metadata but is still interesting.

Another hub I found interesting with this data was that from Harvard where the biggest reduction happened with the Fingerprint Normalization.  This might suggest that there are a number of values that are the same just with different order.  For example names that occur in both inverted and non-inverted form.

In the end I’m not sure how helpful this is as an indicator of quality within a field. There are fields that would benefit from this sort of normalization more than others.  For example subjects, creator, contributor, publisher will normalize very differently than a field like title or description.

Let me know what you think via Twitter if you have questions or comments.

Hydra Project: Hydra Connect 2015 – 21-24 September, Minneapolis: Request for Program Suggestions

Thu, 2015-05-28 13:39

We’re delighted to be able to tell you that detailed planning is now underway for an exciting program at Hydra Connect 2015 in Minneapolis this Fall.  The program committee would love to hear from those of you who have suggestions for items that should be included.  These might be workshops or demonstrations for the Monday, or they might be for 5, 10 or 20 minute presentations, discussion groups or another format you’d like to suggest during the conference proper.  It may be that you will offer to facilitate or present the item yourself or it may be that you’d like the committee to commission the slot from someone else – you could maybe suggest a name.  As in the past, we shall be attempting to serve the needs of attendees from a wide range of experience and background (potential adopters, new adopters, “old hands”; developers, managers, sysops etc) and, if it isn’t obvious, it may be helpful if you tell us who would be the target audience. Those of you going to Open Repositories 2015 might take the opportunity to talk to others about the possibility of joint workshops, presentations, etc.?

Please let us have your ideas, preferably before Monday 15th June, at or by adding them to the page at HC2015 suggestions for the program.

Advance warning that, as in past years, we shall ask all attendees who are working with Hydra to bring a poster for the popular “poster show and tell” session.  This is a great opportunity to share with colleague Hydranauts what your institution is doing and to forge connections around the work.  Details later…

FYI: we plan on opening booking in the next ten days or so and we hope to see you in Minneapolis for what promises to be another great Hydra Connect meeting!


Peter Binkley, Matt Critchlow, Karen Estlund, Erin Fahy and Anna Headley (the Hydra Connect 2015 Program Committee)

LITA: Amazon Echo Update

Thu, 2015-05-28 13:00

I wrote about Amazon Echo a few months back. At the time, I did not have it, but was looking forward to using it. Now, that I have had Echo for a while I have a better idea of its strengths and weaknesses.

It doesn’t pick up every word I say, but its voice recognition is much better than I anticipated.  The app works nicely on my phone and iPad and I found it easy to link Pandora, my music, and to indicate what news channels I want to hear from. I enjoy getting the weather report, listening to a flash news briefing, adding items to my shopping list, listening to music, and being informed of the best route to work to avoid traffic.

My favorite feature is that it is hands-free.  I’m constantly running around my house juggling a lot of things.  Often I need an answer to a question, I need to add something to a shopping list as I’m cooking, or I want to hear a certain song as I’m elbow-deep in a project.  Having the ability to just “say the words” is wonderful.  Now if it just worked everything…

I hope updates will come soon though as I’d like to see increased functionality in its ability to answer questions and provide traffic information for different locations other than the one location I can program into the app. I also want to be able to make calls and send text messages using Echo.

In my first post about Amazon Echo, I stated I was really interested in the device as an information retrieval tool. Currently, Echo doesn’t work as well as I was expecting for retrieving information, but with software updates I still see it (and similar tools) having an impact on our research.

Overall, I see it as a device that has amazing potential, but it is still in its infancy.

Has anyone else used Echo? I’d love to hear your thoughts on the device.

FOSS4Lib Recent Releases: Koha - Maintenance release 3.18.7

Thu, 2015-05-28 10:56
Package: KohaRelease Date: Tuesday, May 26, 2015

Last updated May 28, 2015. Created by David Nind on May 28, 2015.
Log in to edit this page.

Monthly maintenance release for Koha v 3.18.7. See the release announcements for the details:

Peter Murray: Thursday Threads: Man Photocopies Ebook, Google AutoAwesomes Photos, Librarians Called to HTTPS

Thu, 2015-05-28 10:35
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

In this week’s threads: a protest — or maybe just an art project — by a reader who saves his e-book copy of Orwell’s 1984 by photocopying each page from his Kindle, the “AutoAwesome” nature of artificial intelligence, and a call to action for libraries to implement encryption on their websites.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

Use Your Photocopier to Backup you E-book

E-book backup is a physical, tangible, human readable copy of an electronically stored novel. The purchased contents of an e-book reader were easily photocopied and clip-bound to create a shelf-stable backup for the benefit of me, the book consumer. I can keep it on my bookshelf without worry of remote recall. A second hardcover backup has been made with the help of an online self-publishing house.

E-book backup, Jesse England, circa 2012

This project is from around 2012, but it first caught my eye this month. The author — pointing when “some Amazon Kindle users found their copy of George Orwell’s 1984 and Animal Farm had been removed from their Kindles without their prior knowledge or consent” — decided to photocopy each page of his copy of 1984 as it appeared on a Kindle screen and create a bound paper version. The result is as you see in the image to the right.

Eight days ago, someone took the images from Mr. England’s page and uploaded the sequence to imgur. The project again circulated around the ‘net. There is a digital preservation joke in here, but I might not be able to find it unless the original creator took the text of 1984 and printed it out as QR Codes so the resulting book could be read back into a computer.

How Awesome is Artificial Intelligence?

The other day I created a Google+ album of photos from our holiday in France. Google’s AutoAwesome algorithms applied some nice Instagram-like filters to some of them, and sent me emails to let me have a look at the results. But there was one AutoAwesome that I found peculiar. It was this one, labeled with the word “Smile!” in the corner, surrounded by little sparkle symbols.
It’s a nice picture, a sweet moment with my wife, taken by my father-in-law, in a Normandy bistro. There’s only one problem with it. This moment never happened.

It’s Official: A.I.s are Now Re-Writing History, Robert Elliott Smith, 7-Oct-2014

Follow the link above to see the pictures — the two source pictures and the combination that Google’s algorithms created. The differences are subtle. I loaded both of the source images into Gimp and performed a difference operation between the two layers. The result is the image below.

The difference between the two pictures that Google combined in its “AutoAwesome” way.

Black means the pixel color values were identical, so you can see the changes of hand position clearly. (Other artifacts are I assume differences because of the JPEG compression in the original source pictures.)

This reminds me of the trick of taking multiple pictures of the same shot and using a tool like Photoshop to remove the people. Except in this case it is an algorithm deciding what are the best parts from a multitude of pictures and putting together what its programmers deem to be the “best” combination.

Call to Librarians To Implement HTTPS

Librarians have long understood that to provide access to knowledge it is crucial to protect their patrons’ privacy. Books can provide information that is deeply unpopular. As a result, local communities and governments sometimes try to ban the most objectionable ones. Librarians rightly see it as their duty to preserve access to books, especially banned ones. In the US this defense of expression is an integral part of our First Amendment rights.

Access isn’t just about having material on the shelves, though. If a book is perceived as “dangerous,” patrons may avoid checking it out, for fear that authorities will use their borrowing records against them. This is why librarians have fought long and hard for their patrons’ privacy. In recent years, that include Library Connection’s fight against the unconstitutional gag authority of National Security Letters and, at many libraries, choosing not to keep checkout records after materials are returned.

However, simply protecting patron records is no longer enough. Library patrons frequently access catalogs and other services over the Internet. We have learned in the last two years that the NSA is unconstitutionally hoovering up and retaining massive amounts of Internet traffic. That means that before a patron even checks out a book, their search for that book in an online catalog may already have been recorded. And the NSA is not the only threat. Other patrons, using off-the-shelf tools, can intercept queries and login data merely by virtue of being on the same network as their target.

Fortunately, there is a solution, and it’s getting easier to deploy every day.

What Every Librarian Needs to Know About HTTPS, by Jacob Hoffman-Andrews, Electronic Frontier Foundation, 6-May-2015

That is the beginning of an article that explains what HTTPS means, why it is important, and how libraries can effectively deploy it. This is something that has come up in the NISO Patron Privacy in Digital Library and Information Systems working group that has been holding virtual meetings this month and will culminate in a two-day in person meeting after the ALA Annual convention in San Francisco next month. As you look at this article, keep an eye out for announcements about the Let&aposs Encrypt initiative to kick-off some time this summer; it will give websites free server encryption certificates and provide a mechanism to keep them up-to-date.

Link to this post!

William Denton: Firefox privacy extensions

Thu, 2015-05-28 02:28

I noticed yesterday that the RequestPolicy Firefox extension wasn’t working because it’s not being developed any more. There’s a replacement in the works but it didn’t look done enough, so I didn’t install it. I did install a couple of other extensions, which I organized in alphabetical order on the right-hand side of the location bar:


They are, in order:

Is there anything else I should use?

I’m still being tracked a lot, even though I deny all third-party cookies and most site-specific cookies. With Lightbeam I can block everything from and and other places that do nothing useful for me.

With good sites, nothing suffers, or when something breaks I don’t care about it. With some sites I need to fire up another browser and allow everything just to achieve some minor goal like buying a ticket. I suffer that now, but maybe I’ll change my mind.

I’m trying to use Tor more often for browsing sites where I don’t have an account.

Sometimes I look at how other people use the web, and I’m appalled at how awful the experience is, with everything filled with ads (which they can see) and cookies and tracking (which they can’t). On the other hand, there’s how Richard Stallman does things:

I am careful in how I use the Internet.

I generally do not connect to web sites from my own machine, aside from a few sites I have some special relationship with. I usually fetch web pages from other sites by sending mail to a program (see git:// that fetches them, much like wget, and then mails them back to me. Then I look at them using a web browser, unless it is easy to see the text in the HTML page directly. I usually try lynx first, then a graphical browser if the page needs it (using konqueror, which won’t fetch from other sites in such a situation).

I occasionally also browse using IceCat via Tor. I think that is enough to prevent my browsing from being connected with me, since I don’t identify myself to the sites I visit.

I’m somewhere in the wide middle.

Open Library Data Additions: Amazon Crawl: part ge

Thu, 2015-05-28 01:20

Part ge of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

DuraSpace News: VIVO Keynote announced: Dr. James Onken

Thu, 2015-05-28 00:00

We are delighted to welcome Dr. James Onken, Senior Advisor to the NIH Deputy Director for Extramural Research, to deliver a keynote talk at the 2015 VIVO Conference.

Dr. James Onken is leading a new NIH initiative to develop a semantic NIH Portfolio Analysis and Reporting Data Infrastructure (PARDI) that leverages community data and requirements, including those from the VIVO community.

DuraSpace News: Late-breaking Call for Posters Open through this Saturday

Thu, 2015-05-28 00:00

Working on a new project? Interested in sharing local research profiling or analysis efforts with attendees of #vivo15? We want to hear from you! Authors are invited to submit abstracts for poster presentations for the Fifth Annual VIVO Conference in August. For details on the Late-breaking Call for Posters, please click here. All submissions must be submitted by Saturday, May 30th by midnight PST. 

DuraSpace News: Provider IT Neki Technologies launches Brasiliana Fotográfica Portal

Thu, 2015-05-28 00:00
From Tiago Ferreira, Provider IT Neki Technologies, Rio de Janeiro

DuraSpace News: ArchivesDirect Webinar Series Recordings Available

Thu, 2015-05-28 00:00

Earlier this year Artefactual Systems and the DuraSpace organization launched ArchivesDirect, a complete hosted solution for preserving valuable institutional collections and all types of digital resources.  This month Artefactual Systems’ Sarah Romkey and Courtney Mumma curated and presented a Hot Topics: The DuraSpace Community Webinar Series entitled, "Digital Preservation with ArchivesDirect: Ready, Set, Go!"

DuraSpace News: Cineca Releases the First Version of DSpace-CRIS Based on DSpace 5

Thu, 2015-05-28 00:00

Bologna, Italy Cineca is pleased to announce that the first version of DSpace-CRIS based on DSpace 5 was released to the community on May 25, 2015.

DSpace-CRIS 5.2.0 is aligned to the functionalities included in DSpace-CRIS 4.3.0 that was released on April 11, 2015.

The major functionalities shared by these two versions are:

DuraSpace News: Last Seats for June 15-16 DSpace Meetings in Geneva

Thu, 2015-05-28 00:00

From Bram Luyten, @mire

DuraSpace News: Now Available: Author Statistics for DSpace

Thu, 2015-05-28 00:00

From Bram Luyten, @mire

@mire released a new version of its Content and Usage Analysis module. The module’s main goal is to visualize DSpace statistics which are otherwise difficult and time-consuming to interpret. By offering a layer on top of those data, DSpace administrator are able to display usage statistics, content statistics, workflow statistics, search statistics and storage reports.

DuraSpace News: Amy Brand, Digital Science, Receives 2015 CSE Award for Meritorious Achievement

Thu, 2015-05-28 00:00

PHILADELPHIA, May 18, 2015

Industry veteran noted for her dedication to developing a future-facing scholarly communication ecosystem

DuraSpace News: Longsight Contributes New DSpace Storage Options

Thu, 2015-05-28 00:00
Independence, Ohio  Longsight has developed a refactored version of bitstream storage with a Pluggable Assetstore. This reworked version of bitstream storage allows new storage options to be implemented easily in DSpace, including Amazon S3.