You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 11 hours 8 min ago

Jonathan Rochkind: Catching HTTP OPTIONS /* request in a Rails app

Tue, 2014-10-07 21:36

Apache sometimes seems to send an HTTP “OPTIONS /*” request to Rails apps deployed under Apache Passenger.  (Or is it “OPTIONS *”? Not entirely sure). With User-Agent of “Apache/2.2.3 (CentOS) (internal dummy connection)”.

Apache does doc that this happens sometimes, although I don’t understand it.

I’ve been trying to take my Rails error logs more seriously to make sure I handle any bugs revealed. 404’s can indicate a problem, especially when the referrer is my app itself. So I wanted to get all of those 404’s for Apache’s internal dummy connection out of my log.  (How I managed to fight with Rails logs enough to actually get useful contextual information on FATAL errors is an entirely different complicated story for another time).

How can I make a Rails app handle them?

Well, first, let’s do a standards check and see that RFC 2616 HTTP 1.1 Section 9 (I hope I have a current RFC that hasn’t been superseded) says:

If the Request-URI is an asterisk (“*”), the OPTIONS request is intended to apply to the server in general rather than to a specific resource. Since a server’s communication options typically depend on the resource, the “*” request is only useful as a “ping” or “no-op” type of method; it does nothing beyond allowing the client to test the capabilities of the server. For example, this can be used to test a proxy for HTTP/1.1 compliance (or lack thereof).

Okay, sounds like we can basically reply with whatever we want to this request, it’s a “ping or no-op”.  How about a 200 text/plain with “OK\n”?

Here’s a line I added to my Rails routes.rb file that seems to catch the “*” requests and just respond with such a 200 OK.

match ':asterisk', via: [:options], constraints: { asterisk: /\*/ }, to: lambda {|env| [200, {'Content-Type' => 'text/plain'}, ["OK\n"]]}

Since “*” is a special glob character to Rails routing, looks like you have to do that weird constraints trick to actually match it. (Thanks to mbklein, this does not seem to be documented and I never would have figured it out on my own).

And then we can use a little “Rack app implemented in a lambda” trick to just return a 200 OK right from the routing file, without actually having to write a controller action somewhere else just to do this.

I have not yet tested this extensively, but I think it works? (Still worried if Apache is really requesting “OPTIONS *” instead of “OPTIONS /*” it might not be. Stay tuned.)

Filed under: General

Library of Congress: The Signal: What Does it Take to Be a Well-rounded Digital Archivist?

Tue, 2014-10-07 18:34

The following is a guest post from Peter Chan, a Digital Archivist at the Stanford University Libraries.

Peter Chan

I am a digital archivist at Stanford University. A couple of years ago, Stanford was involved in the AIMS project, which jump-started Stanford’s thinking about the role of a “digital archivist.” The project ended in 2011 and I am the only digital archivist hired as part of the project that is still on the job on a full-time basis. I recently had discussions with my supervisors about the roles and responsibilities of a digital archivist. This inspired me to take a look at job postings for “digital archivists” and what skills and qualifications organizations were currently looking for.

I looked at eight job advertisements for digital archivists that were published in the past 12 months. The responsibilities and qualifications required of digital archivists were very diverse in these organizations. However, all of them required formal training in archival theory and practice. Some institutions placed more emphasis on computer skills and prefer applicants to have programming skills such as PERL, XSLT, Ruby, HTML and experience working with SQL databases and repositories such as DSpace and Fedora. Others required knowledge on a variety of metadata standards. A few even desired knowledge in computer forensic tools such as FTK Imager, AccessData Forensic Toolkits and writeblockers. Most of these tools are at least somewhat familiar to digital archivists/librarians.

Screenshot from the ePADD project.

In my career, however, I have also found other skills useful to the job. In my experience working on two projects (ePADD and GAMECIP), I also found that the knowledge of Natural Language Processing and Linked Open Data/Semantic Web/Ontology was extremely useful. Because of those needs, I became familiar with the Stanford Named Entity Recognizer (NER) and the Apache OpenNLP library to extract personal names, organizational names and locations in email archives in the ePADD project. Additionally, familiarity with SKOS, Open Metadata Registry and Protégé helped publish controlled vocabularies as linked open data and to model the relationship among concepts in video game consoles in the GAMECIP project.

The table below summarizes the tasks I encountered during the past six years working in the field as well as the skills and tools useful to address each task.

Tasks which may fall under the responsibilities of Digital Archivists Knowledge / Skills / Software / Tools needed to work on the Tasks Collection Development (Interact with donors, creators, dealers, curators – hereafter “creators.”)   Gain overall knowledge (computing habits of creators, varieties of digital material, hardware/software used, etc.) of the digital component of a collection. In-depth knowledge of computing habits, varieties of digital material, hardware/software for all formats (PC, Mac, devices, cloud, etc.). Tool:  AIMS Born-Digital Material Survey Explain to creators the topic of digital preservation, including differences between “bit preservation” and “preserving the abstract content encoded into bits”; migration / emulation / virtualization; “Trust Repository”; levels of preservation when necessary. In-depth knowledge of digital preservation.Background:”Ensuring the Longevity of Digital Information” by Jeff Rothenberg, January 1995 edition of Scientific American (Vol. 272, Number 1, pp. 42-7) (PDF); Reference Model for an Open Archival System (OAIS) (PDF); Preserving Exe: Toward a National Strategy for Software Preservation (PDF); Library of Congress Recommended Format Specifications; NDSA Levels of Preservation; Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) (PDF). Explain to creators how forensic software is used to accession, process and deliver born-digital collections when necessary – especially regarding sensitive/restricted materials. Special knowledge of making use of forensic software in an archival context.Tools: AccessData FTK, EnCase Forensic, etc. Explain to creators the use of natural language processing/data mining/visualization tools to process and deliver born-digital collections when necessary. General knowledge of tools used in processing and delivering born-digital archives such as entity extraction, networking and visualization software.Tools: Stanford Named Entity Recognizer (NER), Apache OpenNLP, Gephi, D3.js, HTML 5 PivotViewer, etc. Explain to creators about publishing born-digital collection metadata and/or contents in semantic web/linked open data vs. Encoded Archival Description finding aids/other HTML-based web publishing methods when necessary. Knowledge of linked data/semantic web/EAD finding aids / HTML-based web publishing method. Explain web archiving to creators. General knowledge of web archiving, cataloging, delivery and preservation of web sites. Knowledge of web archiving software such as Heritrix and HTTrack. Knowledge of Wayback Machine from Internet Archive. Explain to creators about the archives profession in general. Knowledge of establishing and maintaining control, arranging and describing born-digital archival materials in accordance with accepted standards and practices to ensure the long-term preservation of collections. Accessioning    Copy files contained on storage media including obsolete formats such as 5.25 inch floppy disks, computer punch cards, etc. Knowledge of onboard 5.25 inch. floppy disk controllers and hardware interfaces and tools, including IDE, SCSI, Firewire, SATA, FC5025, KryoFlux, Catweasel, Zip drives, computer tapes, etc. Knowledge of file systems such as FAT, NTFS, HFS, etc. Ensure source data in storage media will not be erased/changed accidentally during accessioning while maintaining a proper audit trail in copying files from storage media. Knowledge of write-protect notch/slide switch in floppy disks and hardware write blockers. Knowledge of forensic software (e.g., FTK Imager for PC and Command FTK Imager for Mac). Get file count, file size and file category of collections. Knowledge of forensic software (e.g. AccessData FTK, EnCase Forensic, BitCurator, etc.), JHOVE, DROID, Pronom, etc. Ensure computer viruses, if they exist in collection materials, are under control during accessioning. Knowledge of the unique nature of archival materials (no replacement, etc.), behavior of viruses stored in file containers and special procedures in using antivirus software for archival materials. Accession email archives. Knowledge of Internet protocol (POP, IMAP) and email format (Outlook, mbox). Knowledge of commercial software packages to archive and reformat email (Emailchemy, Mailstore). Knowledge of open source software such as ePADD (Email: Process, Accession, Discover and Deliver) to archive emails. Archive web sites. Knowledge of web archiving software such as Heritrix and HTTrack. Knowledge of legal issues in archiving web sites. Knowledge of web archiving services such as Archive-It. Create accession records for born-digital archives. Knowledge of archival data management systems such as Archivists’ Toolkit (AT) with the Multiple Extent Plugin, etc.. Arrangement and Description / Processing    Screen out restricted, personal, classified and proprietary information such as social security numbers, credit card numbers, classified data, medical records, etc. in archives. Knowledge of the sensitivity of personal identifiable information (PII) and tools to locate PII (e.g. AccessData FTK, Identity Finder). Knowledge of legal restrictions on access to data (DMCA, FERPA, etc.). Classify text elements in born-digital materials into predefined categories such as the names of persons, organizations and locations when appropriate. Knowledge of entity extraction software and tools to perform entity extraction (such as Open Calais, Stanford Named Entity Recognizer, Apache Open NLP). Show the network relationship of people in collections when appropriate. Knowledge of network graph and tools such as Gephi, NodeXL. Create controlled vocabularies to facilitate arrangement and description when appropriate. Knowledge of the  concepts of controlled vocabularies. Knowledge of W3C standard for publishing controlled vocabularies (SKOS). Knowledge of software for creating controlled vocabularies in SKOS such as SKOSjs and SKOS Editor. Knowledge of platforms for hosting SKOS controlled vocabularies such as Linked Media Framework and Apache Marmotta. Knowledge of services for publishing SKOS such as Open Metadata Registry and Poolparty, Inc. Model data in archives in RDF (Resource Description Framework). Knowledge of semantic web/linked data. Knowledge of commonly used vocabularies/schema such as DC, and FOAF, etc. Knowledge of vocabulary repositories such as Linked Open Vocabularies (LOV). Knowledge of tools to generate rdf/xml, rdf/json such as LODRefine and Karma, etc. Model concepts and relationships between them in archives (e.g. video game consoles) using ontology when appropriate. Knowledge of the W3C standard OWL (Web Ontology Language) and software to create ontologies using OWL such as Protégé and WebProtege. Describe files with special formats (e.g. born-digital photographic images). Knowledge of image metadata schema standards (IPTC, EXIF) and software to create/modify such metadata (Adobe Bridge, Photo Mechanic, etc.). Describe image files by names of persons in images with the help of software when appropriate. Knowledge of facial recognition functions in software such as Picasa, Photoshop Elements. Use visualization tools to represent data in archives when appropriate. Knowledge of open source JavaScript library for manipulating documents such as D3.js, HTML 5 PivotViewer and commercial tools such as IBM ManyEyes and Cooliris. Assign metadata to archived web sites. Knowledge of cataloging options available in web archiving services such as Archive-It or in web archiving software such as HTTrack. Create EAD finding aids. Knowledge of accepted standards and practices in creating finding aids. Knowledge of XML editors or other software (such as Archivists’ Toolkit) to create EAD finding aids. Discovery and Access    Deliver born-digital archives. Knowledge of copyright laws and privacy issues. Deliver born-digital archives in reading room computers. Knowledge of security measures required for workstations in reading rooms, such as disabling Internet access and USB ports, to prevent unintentional transfer of collection materials. Knowledge of software to deliver images in collections such as Adobe Bridge. Knowledge of software to read files with obsolete file formats such as QuickView Plus. Deliver born-digital archives using institutions’ catalog system. Knowledge of the interface required by the institutions’ catalog system to make the delivery. Deliver born-digital archives using institution repository systems. Knowledge of DSpace, Fedora, Hydra and the interfaces developed to facilitate such delivery. Publish born-digital archives using linked data/semantic web. Knowledge of linked data publishing platform such as Linked Media Framework, Apache Marmotta, OntoWiki and linked data publishing services such as Open Metadata Registry. Deliver born-digital archives using exhibition software. Knowledge of open source exhibition software such as Omeka. Deliver archived web sites. Knowledge of delivery options available in Web Archiving Services such as Archive-It or in web archiving software such as HTTrack. Deliver email archives. Knowledge of commercial software such as Mailstore. Knowledge of open source software such as ePADD (Email: Process, Accession, Discover and Deliver). Deliver software collections using emulation/virtualization. Knowledge of emulation/virtualization tools such as KEEP, JSMESS, MESS, VMNetX and XenServer. Deliver finding aids of born-digital archives using union catalogs such as OAC. Knowledge of uploading procedures to respective union catalogs such as OAC. Preservation  Prepare the technical metadata (checksum, creation, modification and last access dates, file format, file size, etc.) of files in archives for transfer to preservation repository. Knowledge of forensic software such as AccessData FTK, EnCase Forensic, and BitCurator, etc. Programming skill in XSLT to extract the information when appropriate from reports generated by the software. Use emulation / virtualization strategy to preserve software collections. Knowledge of emulation/virtualization tools such as KEEP, JSMESS, MESS, VMNetX and XenServer. Use migration strategies to preserve digital objects. Knowledge of Library of Congress Recommended Format Specifications. Knowledge of migration tools such as Xena, Adobe Acrobat Professional and Curl Exemplars in Digital Archives (Cedars) and the Creative Archiving at Michigan and Leeds: Emulating the Old on the New (CAMiLEON) projects. Submit items to preservation repository. Knowledge of preservation system such as Archivematica, LOCKSS and preservation services such as Portico, Tessella and DuraSpace. Knowledge of preservation repository interfaces. Advanced knowledge in Excel for batch input to the repository when appropriate. Preserve archived web sites. Knowledge of preservation options available in Web Archiving Services such as Archive-It. Knowledge of preserving web sites in preservation repository.

This list may seem dishearteningly comprehensive, but I attained these skills with years of experience working as a digital archivist on a number of challenging projects. I didn’t start off knowing everything on this list. I learned these skills and knowledge by going to conferences, workshops, attending the Natural Language Processing MOOC classes and through self-study by seeking resources available online. A digital archivist starting out in this field does not need to have all these skills right off the bat, but does need to be open to and able to consistently learn and apply new knowledge.

Of course, digital archivists in different institutions will have different responsibilities according to their particular situations. I hope this article will generate discussion of the work expected from digital archivists and the knowledge required for them to succeed. Finally, I would like to thank Glynn Edwards, my supervisor, who supports my exploratory investigation into areas which some organizations may consider irrelevant to the job of a digital archivist. As a reminder, my opinions aren’t necessarily that of my employer or any other organizations.

LibraryThing (Thingology): NEW: Easy Share for Book Display Widgets

Tue, 2014-10-07 17:48

LibraryThing for Libraries is pleased to announce an update to our popular Book Display Widgets.

Introducing “Easy Share.” Easy Share is a tool for putting beautiful book displays on Facebook, Twitter, Pinterest, Tumblr, email newsletters and elsewhere. It works by turning our dynamic, moving widgets into shareable images, optimized for the service you’re going to use them on.

Why would I want an image of a widget?

Dynamic widgets require JavaScript. This works great on sites you control, like a library’s blog or home page. But many sites, including some of the most important ones, don’t allow JavaScript. Easy Share bridges that gap, allowing you to post your widgets wherever a photo or other image can go—everywhere from Facebook to your email newsletters.

How do I find Easy Share?

To use Easy Share, move your cursor over a Book Display Widget. A camera icon will appear in the lower right corner of the widget. Click on that to open up the Easy Share box.

How can I share my widgets?

You can share your widget in three ways:

  1. Download. Download an image of your widget. After selecting a size, click the “down” arrow to download the image. Each image is labeled with the name of your widget, so you can find it easily on your computer. Upload this image to Facebook or wherever else you want it to go.
  2. Link. Get a link (URL) to the image. Select the size you want, then click the link icon to get a link to copy into whatever social media site you want.
  3. Dynamic. “Dynamic” images change over time, so you can place a “static” image somewhere and have it change as your collection changes. To get a dynamic image, go to the edit page for a widget. Use the link there to embed this image into your website or blog. Dynamic widgets update whenever your widget updates. Depending on users’ browser “caching” settings, changes may or may not happen immediately. But it will change over time.

You can also download or grab a link to a image of your widget from the widget edit page. Under the preview section, click “Take Screenshot.” You can see our blog post about that feature here.

Check out the LibraryThing for Libraries Wiki for more instructions.


Find out more about LibraryThing for Libraries and Book Display Widgets. And sign up for a free trial of either by contacting

DPLA: DPLA Community Reps Produce Hackathon Planning Guide, Now Available

Tue, 2014-10-07 16:45

We’re excited to announce the release of a new Community Reps-produced resource, GLAM Hack-in-a-box, a short guide to organizing and convening a hackathon using cultural heritage data from GLAM organizations (Galleries, Libraries, Archives, Museums) including DPLA. We hope this guide will serve as a useful resource for those either unfamiliar with or inexperienced in pulling together a hackathon.

Included in this hackathon guide

What is a hackathon?
Learn about what a hackathon is and who can participate in one. Common examples–and misconceptions–are covered in this introductory section.

Developing your program
Think through the key details of your hackathon’s program. Topics covered include audience, purpose and goals, format, and staffing. Example programs are included as well.

Working through the logistics
Understand the logistical details to consider when planning a hackathon. Topics covered include venue considerations, materials, and project management tips. Example materials are included as well.

Day-of and post-hackathon
Learn how to make the most of your hard work when it counts most: the day-of! Topics covered include key day-of considerations and common concerns.

Handy resources
Find a number of useful resources for planning a GLAM API-based hackathon, including DPLA, as well as guides that we used in the process of writing this document.

This free resource was produced by the DPLA Community Reps over the course of summer 2014.  Many thanks go out to Community Reps Chad Nelson and Nabil Kashyap for volunteering their time and energy to work on this guide. If you happen to use the guide in planning a hackathon for the first time, or, if you’ve planned a hackathon in the past and learned something new and/or have additional recommendations after reading it, let us know!

 All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

Hydra Project: Hydra Connect #2 presentations and posters

Tue, 2014-10-07 15:27

Poster images from the “show and tell” session, and slide packs from many presentations given at Hydra Connect #2 last week can be found linked from the Hydra wiki.  The wiki front page also has a nice group photo – the full size version, should you want it, is linked from the conference program at the time it was taken – 12:30 on Wednesday.

David Rosenthal: Economies of Scale in Peer-to-Peer Networks

Tue, 2014-10-07 15:00
In a recent IEEE Spectrum article entitled Escape From the Data Center: The Promise of Peer-to-Peer Cloud Computing, Ozalp Babaoglu and Moreno Marzolla (BM) wax enthusiastic about the potential for Peer-to-Peer (P2P) technology to eliminate the need for massive data centers. Even more exuberance can be found in Natasha Lomas' Techcrunch piece The Server Needs To Die To Save The Internet (LM) about the MaidSafe P2P storage network. I've been working on P2P technology for more than 16 years, and although I believe it can be very useful in some specific cases, I'm far less enthusiastic about its potential to take over the Internet.

Below the fold I look at some of the fundamental problems standing in the way of a P2P revolution, and in particular at the issue of economies of scale. After all, I've just written a post about the huge economies that Facebook's cold storage technology achieves by operating at data center scale.
Economies of ScaleBack in April, discussing a vulnerability of the Bitcoin network, I commented:
Gradually, the economies of scale you need to make money mining Bitcoin are concentrating mining power in fewer and fewer hands. I believe this centralizing tendency is a fundamental problem for all incentive-compatible P2P networks. ... After all, the decentralized, distributed nature of Bitcoin was supposed to be its most attractive feature. In June, discussing Permacoin, I returned to the issue of economies of scale:
increasing returns to scale (economies of scale) pose a fundamental problem for peer-to-peer networks that do gain significant participation. One necessary design goal for networks such as Bitcoin is that the protocol be incentive-compatible, or as Ittay Eyal and Emin Gun Sirer (ES) express it:
the best strategy of a rational minority pool is to be honest, and a minority of colluding miners cannot earn disproportionate benefits by deviating from the protocolThey show that the Bitcoin protocol was, and still is, not incentive-compatible.

Even if the protocol were incentive-compatible, the implementation of each miner would, like almost all technologies, be subject to increasing returns to scale.Since then I've become convinced that this problem is indeed fundamental. The simplistic version of the problem is this:
  • The income to a participant in a P2P network of this kind should be linear in their contribution of resources to the network.
  • The costs a participant incurs by contributing resources to the network will be less than linear in their resource contribution, because of the economies of scale.
  • Thus the proportional profit margin a participant obtains will increase with increasing resource contribution.
  • Thus the effects described in Brian Arthur's Increasing Returns and Path Dependence in the Economy will apply, and the network will be dominated by a few, perhaps just one, large participant.
The advantages of P2P networks arise from a diverse network of small, roughly equal resource contributors. Thus it seems that P2P networks which have the characteristics needed to succeed (by being widely adopted) also inevitably carry the seeds of their own failure (by becoming effectively centralized). Bitcoin is an example of this. Some questions arise:
  • Does incentive-compatibility imply income linear in contribution?
  • If not, are there incentive-compatible ways to deter large contributions?
  • The simplistic version is, in effect, a static view of the network. Are there dynamic effects also in play?
Does incentive-compatibility imply income linear in contribution? Clearly, the reverse is true. If income is linear in, and solely dependent upon, contribution there is no way for a colluding minority of participants to gain more than their just reward. If, however:
  • Income grows faster than linearly with contribution, a group of participants can pool their contributions, pretend to be a single participant, and gain more than their just reward.
  • Income goes more slowly than linearly with contribution, a group of participants that colluded to appear as a single participant would gain less than their just reward.
So it appears that income linear in contribution is the limiting case, anything faster is not incentive-compatible.

Are there incentive-compatible ways to deter large contributions? In principle, the answer is yes. Arranging that income grows more slowly than contribution, and depends on nothing else, will do the trick. The problem lies in doing so.

Source: bitcoincharts.comThe actual income received by a participant is the value of the reward the network provides in return for the contribution of resources, for example the Bitcoin, less the costs incurred in contributing the resources, the capital and running costs of the mining hardware, in the Bitcoin case. As the value of Bitcoins collapsed (as I write, BTC is about $320, down from about $1200 11 months ago and half its value in August) many smaller miners discovered that mining wasn't worth the candle.

The network has to arrange not just that the reward grows more slowly than the contribution, but that it grows more slowly than the cost of the contribution to any participant. If there is even one participant whose rewards outpace their costs, Brian Arthur's analysis shows they will end up dominating the network. Herein lies the rub. The network does not know what an individual participant's costs, or even  the average participant's costs, are and how they grow as the participant scales up their contribution.

So the network would have to err on the safe side, and make rewards grow very slowly with contribution, at least above a certain minimum size. Doing so would mean few if any participants above the minimum contribution, making growth dependent entirely on recruiting new participants. This would be hard because their gains from participation would be limited to the minimum reward. It is clear that mass participation in the Bitcoin network was fuelled by the (unsustainable) prospect of large gains for a small investment.

Source: A network that assured incentive-compatibility in this way would not succeed, because the incentives would be so limited. A network that allowed sufficient incentives to motivate mass participation, as Bitcoin did, would share Bitcoin's vulnerability to domination by, as at present, two participants (pools, in Bitcoin's case).

Are there dynamic effects also in play? As well as increasing returns to scale, technology markets exhibit decreasing returns through time. Bitcoin is an extreme example of this. Investment in Bitcoin mining hardware has a very short productive life:
the overall network hash rate has been doubling every 3-4 weeks, and therefore, mining equipment has been losing half its production capability within the same time frame. After 21-28 weeks (7 halvings), mining rigs lose 99.3% of their value.This effect is so strong that it poses temptations for the hardware manufacturers that some have found impossible to resist. The FBI recently caught Butterfly Labs using hardware that customers had bought and paid for to mine on their own behalf for a while before shipping it to the customers. They thus captured the most valuable week or so of the hardware's short useful life for themselves

Source: blockchain.infoEven though with technology improvement rates much lower than the Bitcoin network hash rate increase, such as Moore's Law or Kryder's Law, the useful life of hardware is much longer than 6 months, this effect can be significant. When new, more efficient technology is introduced, thus reducing the cost per unit contribution to a P2P network, it does not become instantly available to all participants. As manufacturing ramps up, the limited supply preferentially goes to the manufacturers best customers, who would be the largest contributors to the P2P network. By the time supply has increased so that smaller contributors can enjoy the lower cost per unit contribution, the most valuable part of the technology's useful life is over.

Early availability of new technology acts to reduce the costs of the larger participants, amplifying their economies of scale. This effect must be very significant in Bitcoin mining, as Butterfly Labs noticed. At pre-2010 Kryder rates it would be quite noticeable since storage media service lives were less than 60 months. At the much lower Kryder rates projected by the industry storage media lifetimes will be extended and the effect correspondingly less.
TrustBM admit that there are significant unresolved trust issues in P2P technology:
The people using such a cloud must trust that none of the many strangers operating it will do something malicious. And the providers of equipment must trust that the users won’t hog computer time.

These are formidable problems, which so far do not have general solutions. If you just want to store data in a P2P cloud, though, things get easier: The system merely has to break up the data, encrypt it, and store it in many places.Unfortunately, even for storage this is inadequate. The system cannot trust the peers claiming to store the shards of the encrypted data but must verify that they actually are storing them. This is a resource-intensive process. Permacoin's proposal, to re-purpose resources already being expended elsewhere, is elegant but unlikely to be successful. Worse, the verification process consumes not just resources, but time. At each peer there is necessarily a window of time between successive verifications. During that time the system believes the peer has a good copy of the shard, but it might no longer have one.
Edge of the internetP2P enthusiasts describe the hardware from which their network is constructed in similar terms. Here is BM:
the P2P cloud is made up of a diverse collection of different people’s computers or game consoles or whateverand here is LM:
Users of MaidSafe’s network contribute unused hard drive space, becoming the network’s nodes. It’s that pooling — or, hey, crowdsourcing — of many users’ spare computing resource that yields a connected storage layer that doesn’t need to centralize around dedicated datacenters.When the idea of P2P networks started in the 90s:
Their model of the edge of the Internet was that there were a lot of desktop computers, continuously connected and powered-up, with low latency and no bandwidth charges, and with 3.5" hard disks that were mostly empty. Since then, the proportion of the edge with these characteristics has become vanishingly small. The edge is now intermittently powered up and connected, with bandwidth charges, and only small amounts of local storage.
Monetary rewardsThis means that, if the network is to gain mass participation, the majority of participants cannot contribute significant resources to it; they don't have suitable resources to contribute. They will have to contribute cash. This in turn means that there must be exchanges, converting between the rewards for contributing resources and cash, allowing the mass of resource-poor participants to buy from the few resource-rich participants.

Both Permacoin and MaidSafe envisage such exchanges, but what they don't seem to envisage is the effect on customers of the kind of volatility seen in the Bitcoin graph above. Would you buy storage from a service with this price history, or from Amazon? What exactly is the value to the mass customer of paying a service such as MaidSafe, by buying SafeCoin on an exchange, instead of paying Amazon directly, that would overcome the disadvantage of the price volatility?

As we see with Bitcoin, a network whose rewards can readily be converted into cash is subject to intense attack, and attracts participants ranging from sleazy to criminal. Despite its admirably elegant architecture, Bitcoin has suffered from repeated vulnerabilities. Although P2P technology has many advantages in resisting attack, especially the elimination of single points of failure and centralized command and control, it introduces a different set of attack vectors.
Measuring contributionsDiscussion of P2P storage networks tends to assume that measuring the contribution a participant supplies in return for a reward is easy. A Gigabyte is a Gigabyte after all. But compare two Petabytes of completely reliable and continuously available storage, one connected to the outside world by a fiber connection to a router near the Internet's core, and the other connected via 3G. Clearly, the first has higher bandwidth, higher availability and lower cost per byte transferred, so its contribution to serving the network's customers is vastly higher. It needs a correspondingly greater reward.

In fact, networks would need to reward many characteristics of a peer's storage contribution as well as its size:
  • Reliability
  • Availability
  • Bandwidth
  • Latency
Measuring each of these parameters, and establishing "exchange rates" between them, would be complex, would lead to a very mixed marketing message, and would be the subject of disputes. For example, the availability, bandwidth and latency of a network resource depends on the location in the network from which the resource is viewed, so there would be no consensus among the peers about these parameters.
ConclusionWhile it is clear that P2P storage networks can work, and can even be useful tools for small communities of committed users, the non-technical barriers to widespread adoption are formidable. They have been effective in preventing widespread adoption since the late 90s, and the evolution of the Internet has since raised additional barriers.

Terry Reese: MarcEdit 6 Update (10/6/2014)

Tue, 2014-10-07 13:47

I sent this note to the MarcEdit listserv late last night, early this morning, but forgot to post here.  Over the weekend, the Ohio State University Libraries hosted our second annual hackaton on the campus.  It’s been a great event, and this year, I had one of the early morning shifts (12 am-5 am) so I decided to use the time to do a little hacking myself.  Here’s a list of the changes:

  • Bug Fix: Merge Records Function: When processing using the control number option (or MARC21 primarily utilizing control numbers for matching) the program could merge incorrect data if large numbers of merged records existed without the data specified to be merged.  The tool would pull data from the previous record used and add that data to the matches.  This has been corrected.
  • Bug Fix: Network Task Directory — this tool was always envisioned as a tool that individuals would point to when an existing folder existed.  However, if the folder doesn’t exist prior to pointing to the location, the tool wouldn’t index new tasks.  This has been fixed.
  • Bug Fix: Task Manager (Importing new tasks) — When tasks were imported with multiple referenced task lists, the list could be unassociated from the master task.  This has been corrected.
  • Bug Fix:  If the plugins folder doesn’t exist, the current Plugin Manager doesn’t create one when adding new plugins.  This has been corrected.
  • Bug Fix: MarcValidator UI issue:  When resizing the form, the clipboard link wouldn’t move appropriately.  This has been fixed.
  • Bug Fix: Build Links Tool — relator terms in the 1xx and 7xx field were causing problems.  This has been corrected.
  • Bug Fix: RDA Helper: When parsing 260 fields with multiple copyright dates, the process would only handle one of the dates.  The process has been updated to handle all copyright values embedded in the 260$c.
  • Bug Fix: SQL Explorer:  The last build introduced a regression error so that when using the non-expanded SQL table schema, the program would crash.  This has been corrected.
  • Enhancement:  SQL Explorer expanded schema has been enhanced to include a column id to help track column value relationships.
  • Enhancement: Z39.50 Cataloging within the MarcEditor — when selecting the Z39.50/SRU Client, the program now seemlessly allows users to search using the Z39.50 client and automatically load the results directly into the open MarcEditor window.

Two other specific notes.  First, a few folks on the listserv have noted trouble getting MarcEdit to run on a Mac.  The issue appears to be MONO related.  Version 3.8.0 appears to have neglected to include a file in the build (which caused GUI operations to fail), and 3.10.0 brings the file back, but there was a build error with the component so the issue continues.  The problems are noted in their release notes as known issues and the bug tracker seems to suggest that this has been corrected in the alpha channels, but that doesn’t help anyone right now.  So, I’ve updated the Mac instruction to include a link to MONO 3.6.0, the last version tested as a stand alone install that I know works.  From now on, I will include the latest MONO version tested, and a link to the runtime to hopefully avoid this type of confusion in the future.

Second – I’ve created a nifty plugin related to the LibHub project.  I’ve done a little video recording and will be making that available shortly.  Right now, I’m waiting on some feedback.  The plugin will be initially released to LibHub partners to provide a way for them to move any data into the project for evaluation – but hopefully in time, it will be able to be more made more widely available.

Updates can be downloaded automatically via MarcEdit, or can be found at:

Please remember, if you are running a very old copy of MarcEdit 5.8 or lower, it is best practice to uninstall the application prior to installing 6.0.



Thom Hickey: Another JSON encoding for MARC data

Tue, 2014-10-07 13:42

You might think that a well understood format such as MARC would have a single straight-forward way of being represented in JSON.  Not so!  There are lots of ways of doing it, all with their own advantages (see some references below).  Still, I couldn't resist creating yet another.

This encoding grew out of some experimentation with Go (Golang), in which encoding MARC in JSON was one of my test cases, as was the speed at which the resulting encoding could be processed.  Another inspiration was Rich Hickey's ideas about the relaionship of data and objects:

...the use of objects to represent simple informational data is almost criminal in its generation of per-piece-of-information micro-languages, i.e. the class methods, versus far more powerful, declarative, and generic methods like relational algebra. Inventing a class with its own interface to hold a piece of information is like inventing a new language to write every short story.

That said, how to represent the data still leaves lots of options, as the multiple enocodings of MARC into JSON show.

Go's emphasis on strict matching of types pushed me towards a very flat structure:

  • The record is encoded as an array of objects
  • Each object has a 'Type' and represents either the leader or a field

Here are examples of the different fields:

{"Type":"leader", "Data": "the leader goes here"}

{"Type":"cfield", "Tag":"001", "Data":"12345"}

{"Type":"dfield", "Tag":"245", "Inds":"00", "Data":"aThis is a title$bSubtitle"}

Note that the subfields do not get their own objects.  They are concatenated together into one string using standard MARC subfield delimiters (represented by a $ above), essentially the way they appear in an ISO 2709 encoding.  In Python (and in Go) it is easy to split these strings on the delimiter into subfields as needed.  

In addition to making it easy to import the JSON structure into Go (everything is easy in Python), the lack of structure makes reading and writing the list of fields very fast and simple. The main HBase table that supports WorldCat now has some 1.7 billion rows, so fast processing is essential and we find that this encoding much faster than processing the XML representation.  Although we do put the list of fields into a Python object, that object is derived from the list itself, so we can treat is as such, including adding new fields (and Types) as needed, which then get automatically carried along in the exported JSON.

We are also finding that a simple flat structure makes it easy to add information (e.g. administrative metadata) that doesn't fit into standard MARC without effort.

Here are a few MARC in JSON references (I know there have been others in the past).  As far as I can tell, Ross's is the most popular:

Ross Singer:

Clay Fouts:

Galen Charlton:

Bill Dueber:

A more general discussion by Jakob Voss

Here is a full example of a record using the same example Ross Singer uses (although the record itself appears to have changed):

[{"Data": "01471cjm a2200349 a 4500", "Type": "leader"},
{"Data": "5674874", "Tag": "001", "Type": "cfield"},
{"Data": "20030305110405.0", "Tag": "005", "Type": "cfield"},
{"Data": "sdubsmennmplu", "Tag": "007", "Type": "cfield"},
{"Data": "930331s1963 nyuppn eng d", "Tag": "008", "Type": "cfield"},
{"Data": "9(DLC) 93707283", "Tag": "035", "Type": "dfield", "Inds": " "},
{"Data": "a7\u001fbcbc\u001fccopycat\u001fd4\u001fencip\u001ff19\u001fgy-soundrec", "Tag": "906", "Type": "dfield", "Inds": " "},
{"Data": "a 93707283 ", "Tag": "010", "Type": "dfield", "Inds": " "},
{"Data": "aCS 8786\u001fbColumbia", "Tag": "028", "Type": "dfield", "Inds": "02"},
{"Data": "a(OCoLC)13083787", "Tag": "035", "Type": "dfield", "Inds": " "},
{"Data": "aOClU\u001fcDLC\u001fdDLC", "Tag": "040", "Type": "dfield", "Inds": " "},
{"Data": "deng\u001fgeng", "Tag": "041", "Type": "dfield", "Inds": "0 "},
{"Data": "alccopycat", "Tag": "042", "Type": "dfield", "Inds": " "},
{"Data": "aColumbia CS 8786", "Tag": "050", "Type": "dfield", "Inds": "00"},
{"Data": "aDylan, Bob,\u001fd1941-", "Tag": "100", "Type": "dfield", "Inds": "1 "},
{"Data": "aThe freewheelin' Bob Dylan\u001fh[sound recording].", "Tag": "245", "Type": "dfield", "Inds": "14"},
{"Data": "a[New York, N.Y.] :\u001fbColumbia,\u001fc[1963]", "Tag": "260", "Type": "dfield", "Inds": " "},
{"Data": "a1 sound disc :\u001fbanalog, 33 1/3 rpm, stereo. ;\u001fc12 in.", "Tag": "300", "Type": "dfield", "Inds": " "},
{"Data": "aSongs.", "Tag": "500", "Type": "dfield", "Inds": " "},
{"Data": "aThe composer accompanying himself on the guitar ; in part with instrumental ensemble.", "Tag": "511", "Type": "dfield", "Inds": "0 "},
{"Data": "aProgram notes by Nat Hentoff on container.", "Tag": "500", "Type": "dfield", "Inds": " "},
{"Data": "aBlowin' in the wind -- Girl from the north country -- Masters of war -- Down the highway -- Bob Dylan's blues -- A hard rain's a-gonna fall -- Don't think twice, it's all right -- Bob Dylan's dream -- Oxford town -- Talking World War III blues -- Corrina, Corrina -- Honey, just allow me one more chance -- I shall be free.", "Tag": "505", "Type": "dfield", "Inds": "0 "},
{"Data": "aPopular music\u001fy1961-1970.", "Tag": "650", "Type": "dfield", "Inds": " 0"},
{"Data": "aBlues (Music)\u001fy1961-1970.", "Tag": "650", "Type": "dfield", "Inds": " 0"},
{"Data": "3Preservation copy (limited access)\u001fu", "Tag": "856", "Type": "dfield", "Inds": "41"},
{"Data": "aNew", "Tag": "952", "Type": "dfield", "Inds": " "},
{"Data": "aTA28", "Tag": "953", "Type": "dfield", "Inds": " "},
{"Data": "bc-RecSound\u001fhColumbia CS 8786\u001fwMUSIC", "Tag": "991", "Type": "dfield", "Inds": " "}


Note: As far as I know Rich Hickey and I are not related.

Open Knowledge Foundation: Open Definition v2.0 Released – Major Update of Essential Standard for Open Data and Open Content

Tue, 2014-10-07 11:00

Today Open Knowledge and the Open Definition Advisory Council are pleased to announce the release of version 2.0 of the Open Definition. The Definition “sets out principles that define openness in relation to data and content” and plays a key role in supporting the growing open data ecosystem.

Recent years have seen an explosion in the release of open data by dozens of governments including the G8. Recent estimates by McKinsey put the potential benefits of open data at over $1 trillion and others estimates put benefits at more than 1% of global GDP.

However, these benefits are at significant risk both from quality problems such as “open-washing” (non-open data being passed off as open) and from fragmentation of the open data ecosystem due to incompatibility between the growing number of “open” licenses.

The Open Definition eliminates these risks and ensures we realize the full benefits of open by guaranteeing quality and preventing incompatibility.See this recent post for more about why the Open Definition is so important.

The Open Definition was published in 2005 by Open Knowledge and is maintained today by an expert Advisory Council. This new version of the Open Definition is the most significant revision in the Definition’s nearly ten-year history.

It reflects more than a year of discussion and consultation with the community including input from experts involved in open data, open access, open culture, open education, open government, and open source. Whilst there are no changes to the core principles, the Definition has been completely reworked with a new structure and new text as well as a new process for reviewing licenses (which has been trialled with governments including the UK).

Herb Lainchbury, Chair of the Open Definition Advisory Council, said:

“The Open Definition describes the principles that define “openness” in relation to data and content, and is used to assess whether a particular licence meets that standard. A key goal of this new version is to make it easier to assess whether the growing number of open licenses actually make the grade. The more we can increase everyone’s confidence in their use of open works, the more they will be able to focus on creating value with open works.”

Rufus Pollock, President and Founder of Open Knowledge said:

“Since we created the Open Definition in 2005 it has played a key role in the growing open data and open content communities. It acts as the “gold standard” for open data and content guaranteeing quality and preventing incompatibility. As a standard, the Open Definition plays a key role in underpinning the “open knowledge economy” with a potential value that runs into the hundreds of billions – or even trillions – worldwide.”

What’s New

In process for more than a year, the new version was collaboratively and openly developed with input from experts involved in open access, open culture, open data, open education, open government, open source and wiki communities. The new version of the definition:

  • Has a complete rewrite of the core principles – preserving their meaning but using simpler language and clarifying key aspects.
  • Introduces a clear separation of the definition of an open license from an open work (with the latter depending on the former). This not only simplifies the conceptual structure but provides a proper definition of open license and makes it easier to “self-assess” licenses for conformance with the Open Definition.
  • The definition of an Open Work within the Open Definition is now a set of three key principles:
    • Open License: The work must be available under an open license (as defined in the following section but this includes freedom to use, build on, modify and share).
    • Access: The work shall be available as a whole and at no more than a reasonable one-time reproduction cost, preferably downloadable via the Internet without charge
    • Open Format: The work must be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights. Specifically, data should be machine-readable, available in bulk, and provided in an open format or, at the very least, can be processed with at least one free/libre/open-source software tool.
  • Includes improved license approval process to make it easier for license creators to check conformance of their license with the Open Definition and to encourage reuse of existing open licenses (rrareuse and outlines the process for submitting a license so that it can be checked for conformance against the Open Definition.
More Information
  • For more information about the Open Definition including the updated version visit:
  • For background on why the Open Definition matters, read the recent article ‘Why the Open Definition Matters’

This post was written by Herb Lainchbury, Chair of the Open Definition Advisory Council and Rufus Pollock, President and Founder of Open Knowledge

Open Knowledge Foundation: Branzilian Government Develops Toolkit to Guide Institutions in both Planning and Carrying Out Open Data Initatives

Tue, 2014-10-07 10:20

This is a guest post by Nitai Silva of the Brazilian government’s open data team and was originally published on the Open Knowledge Brazil blog here.

Recently Brazilian government released the Kit de Dados Abertos (open data toolkit). The toolkit is made up of documents describing the process, methods and techniques for implementing an open data policy within an institution. Its goal is to both demystify the logic of opening up data and to share with public employees observed best practices that have emerged from a number of Brazilian government initiatives.

The toolkit focuses on the Plano de Dados Abertos – PDA (Open Data Plan) as the guiding instrument where commitments, agenda and policy implementation cycles in the institution are registered. We believe that making each public agency build it’s own PDA is a way to perpetuate the open data policy, making it a state policy and not just a transitory governmental action.

It is organizsd to facilitate the implementation of the main activities cycles that must be observed in an institution and provides links and manuals to assist in these activities. Emphasis is given to the actors/roles involved in each step and their responsibilities. Is also helps to define a central person to monitor and maintain the PDA. The following diagram summarises the macro steps of implementing an open data policy in an institution:


Processo Sistêmico de um PDA


Open data theme has been part of the Brazilian government’s agenda for over three years. Over this period, we have accomplished a number of important achievemnet including passing the Lei de Acesso à Informação – LAI (FOIA) (Access to Information Law), making commitments as part of our Open Government Partnership Action Plan and developing the Infraestrutura Nacional de Dados Abertos (INDA) (Open Data National Infrastructure). However, despite these accomplishments, for many public managers, open data activities remain the exclusive responsibility of the Information Technology department of their respective institution. This gap is, in many ways, the cultural heritage of the hierarchical, departmental model of carrying out public policy and is observed in many institutions.

The launch of the toolkit is the first of a series of actions prepared by the Ministry of Planning to leverage open data initiatives in federal agencies, as was defined in the Brazilian commitments in the Open Government Partnership (OGP). The next step is to conduct several tailor made workshops designed to support major agencies in the federal government in the implementation of open data.

Despite it having been built with the aim of expanding the quality and quantity of open data made available by the federal executive branch agencies, we also made a conscious effort to make the toolkit generic enough generic enough for other branches and levels of government.

About the toolkit development:

It is also noteworthy to mention that the toolkit was developed on Github. Although the Github is known as an online and distributed environment for develop software, it has already being used for co-creation of text documents for a long time, even by governments. The toolkit is still hosted there, which allows anyone to make changes and propose improvements. The invitation is open, we welcome and encourageyour collaboration.

Finally I would like to thank Augusto Herrmann, Christian Miranda, Caroline Burle and Jamila Venturini for participating in the drafting of this post!

LibUX: Designing a Library Website with Zurb Triggers

Mon, 2014-10-06 22:54

design trigger is a pattern meant to appeal to behavior and cognitive biases observed in users. Big data and the user experience boom has provided a lot of information about how people actually use the web, which designs work, and–although creepy–how it is possible to cobble together an effective site designed to social engineer users.

So not too long ago Zurb pulled a bunch of triggers together into a pretty useful and interesting collection, and I thought I’d take a crack at prototyping a the front page of a library website.

Spoilers: the quality of my prototype kind of sucks, and blah-blah-blah let’s just call it low-fidelity prototyping, but I just want to ask you to, y’know, use your imagination.

Design for the Bottom Line

The Nightvale Public Library is a small, haunted public library that like many others has a bit of an image problem. It derives its annual budget from circulation numbers, library card registration, and resource usage. While there are a ton of interesting and good-looking public library websites to rip off, the Nightvale Public Library Web Librarian knows that a successful #libweb needs to convert traffic into measurements the institution uses to define success.

This website needs to

  • increase catalog usage
  • increase resource awareness and usage
  • increase library card registration

so let’s use design triggers to inspire this behavior.


 Any piece of information can act like an “anchor,” serving as a reference point for users to contrast against all other information. It can also bias how they make choices. Oftentimes, the anchor is the initial information you’re exposed to.

The first opportunity we should take is to present the, ah, “modernity” of the NPL – and modern it is, even if its offerings are comparatively small to some of the larger libraries in the state. If public libraries are concerned about future state and federal budgets, then it is worth drawing direct comparisons between the library and other popular web services users will be familiar with. A library website can’t compete with an Amazon, true, but it can and should exist in the same universe. If “relevancy” is in anyway judged by design trends, then it helps NPL’s ulterior motives to rip off a few.

In this example, I used a background image of someone using a library website on their phone and made the catalog search crazy prominent.

Behavioral Goals
  • User’s first impression is of a clean, modern library
  • Draw user’s attention to search

Relative Value

Shoppers often have a reference price in mind for the cost of any particular item. Reference prices are as important for assessing relative value as the price difference between competing products.

We can then use our next opportunity to create an effective double-whammy that at once previews our wares and also draws comparisons to other services our users like. Library web services may not have the, er, video library available to Netflix, but there’s always a one-up: it’s free. Making users aware of the relative value of library services can increase brand loyalty, support, forgiveness for when databases inevitably break or there are troublesome user interfaces, and so on.

Libraries can also use “relative value triggers” to make the case for Friends membership and other donations. I’ll link-up the research later, but there is powerful evidence showing that if you ask people to donate, subscribe, try a beta, or whatever – they will.

Behavioral Goals
  • Impress user with breadth and currency of resources
  • Appeal to user’s sense of community, price consciousness, and so on
  • User feels good about her or his library


Having partial knowledge can create a feeling that drives us to fill those gaps in information. Novel and challenging situations can increase that drive, and give intrinsic value to tasks.

At this point, we have hopefully suggested to the user that the Nightvale Public Library is clean, modern, and otherwise just a pretty awesome feel-good service. We can at this point exploit those good feelings for real engagement. The idea is that this is a section using only the best and most recent content, because boring or badly written content will negate our efforts. It could be an awesome library video or a high quality, link-baity post encouraging use of an obscure resource, but the idea is that it’s good and interesting.

Libraries can use curiosity to guide users to various gems which create delight (lost episode coming soon!) – delight increases loyalty and good-feels all around.

Behavioral Goals
  • User has a glimpse of quality content, increasing the library’s overall sense of value
  • Heck, maybe the user even clicks something

Recognition over Recall

Memories rely on being able to retrace neural patterns and connections that were associated with the original stimulus. Cues make it easier to retrieve memories, over free recall without a cue.

We can use recognition over recall to immediately encourage an action – like card registration. Additionally, libraries can use this tactic to shape the general story they are telling. If even one of the above triggers worked as hoped, then we can use our final opportunity to transform these hazy impressions into a concrete feeling: the library is awesome, it has a lot of cool stuff, and it may even be better than competitors.

There’s also a trigger called the Zeigarnik Effect, which basically translates to, “Well, you came this far. Don’t you want to sign up?” It actually sounds a little evil:

It’s possible to take advantage of our tendency to seek closure. Once the initial hurdle of starting something is overcome, the internal tension caused from not completing a task can be a great motivator

Oh, and libraries really need to make library card registration suck less. Even if you need loads of personal, creepy information, maybe you can at least hide the bulk of it from sight until the user is committed. Here’s an example:

See the Pen Simple Expanding Sign-Up Form by Michael Schofield (@michaelschofield) on CodePen.


Understanding and making use of design triggers is an important next step after libraries embrace usability testing. If anything, they allow for quick and informed prototypes, but more importantly they force you to think about the bottom line and define tangible measures of success.

So, what do you think? Would you register?

FYI: I used Balsamiq Mockups.

The post Designing a Library Website with Zurb Triggers appeared first on LibUX.

Jonathan Rochkind: Umlaut News: 4.0 and two new installations

Mon, 2014-10-06 21:27

Umlaut is, well, now I’m going to call it a known-item discovery layer, usually but not neccesarily serving as a front-end to SFX.

Umlaut 4.0.0 has been released

This release is mostly back-end upgrades, including:

  •  Support for Rails 4.x (Rails 3.2 included to make migration easier for existing installations, but recommend upgrading to Rails 4.1 asap, and starting with Rails 4.1 in new apps)
  • Based on Bootstrap 3 (Umlaut 3.x was Bootstrap 2)
  • internationalization/localization support
  • A more streamlined installation process with a custom installer
Recent Umlaut Installations

Princeton University has a beta install of Umlaut, and is hoping to go live in production soon.

Durham University (UK) has a beta/soft launch of Umlaut live. 

Filed under: General

Jonathan Rochkind: Umlaut News: 4.0 and two new installations

Mon, 2014-10-06 21:27

Umlaut is, well, now I’m going to call it a known-item discovery layer, usually but not neccesarily serving as a front-end to SFX.

Umlaut 4.0.0 has been released

This release is mostly back-end upgrades, including:

  •  Support for Rails 4.x (Rails 3.2 included to make migration easier for existing installations, but recommend upgrading to Rails 4.1 asap, and starting with Rails 4.1 in new apps)
  • Based on Bootstrap 3 (Umlaut 3.x was Bootstrap 2)
  • internationalization/localization support
  • A more streamlined installation process with a custom installer
Recent Umlaut Installations

Princeton University has a beta install of Umlaut, and is hoping to go live in production soon.

Durham University (UK) has a beta/soft launch of Umlaut live. 

Filed under: General

District Dispatch: ALA voice to join FCC Open Internet roundtable

Mon, 2014-10-06 19:27

John Windhausen, network neutrality counsel to the American Library Association (ALA) and president of Telepoly Consulting, will represent libraries and higher education institutions Tuesday as a panelist for an Open Internet roundtable discussion hosted by the Federal Communications Commission (FCC).

Windhausen will advocate for strong net neutrality principles during the panel “Construction of Legally Sustainable Rules” of the roundtable “Internet Openness and the Law,” which takes place at 11:30 a.m. on Tuesday, October 7, 2014. During the panel discussion, Windhausen will explore ways that the FCC can use its Section 706 authority with an “Internet-reasonable” standard that would recognize and support the Internet ecosystem and the Internet as an open platform for free speech, learning, research and innovation.

ALA, along with other higher education and library organizations, affirmed the critical importance of network neutrality to fulfilling our public interest missions and outlined the case for an Internet-reasonable standard in its filings (pdf) in the Open Internet proceeding. The ALA also urged the FCC to use available legal authority to adopt strong, enforceable net neutrality rules in a joint letter (pdf) with the Center for Democracy and Technology.

The roundtable is free, open to the public and will be streamed live.

The post ALA voice to join FCC Open Internet roundtable appeared first on District Dispatch.

Library of Congress: The Signal: We Want You Just the Way You Are: The What, Why and When of Fixity

Mon, 2014-10-06 16:10

Icons for Archive and Checksum, Designed by Iconathon Los Angeles, California, US 2013. Public Domain

Fixity, the property of a digital file or object being fixed or unchanged, is a cornerstone of digital preservation. Fixity information, from simple file counts or file size values to more precise checksums and cryptographic hashes, is data used to verify whether an object has been altered or degraded.

Many in the preservation community know they should be establishing and checking the fixity of their content, but less understood is how, when and how often? The National Digital Stewardship Alliance Standards and Practices and Infrastructure working groups have published Checking Your Digital Content: What is Fixity and When Should I Be Checking It?  (PDF) to help stewards of digital objects answer these questions in a way that makes sense for their organization based on their needs and resources.

The document includes helpful information on the following topics:

  • Definitions of fixity and fixity information.
  • Eleven reasons to collect, check, maintain and verify fixity information.
  • Seven general approaches to fixity check frequency.
  • Six common fixity information-generating instruments compared against each other.
  • Four places to consider storing and referencing fixity information.

Many thanks to everyone else who participated in the drafting and review of the document:

  • Paula De Stefano, New York University.
  • Carl Fleischhauer, Library of Congress.
  • Andrea Goethals, Harvard University.
  • Michael Kjörling, independent researcher.
  • Nick Krabbenhoeft, Educopia Institute.
  • Chris Lacinak, AVPreserve.
  • Jane Mandelbaum, Library of Congress.
  • Kevin McCarthy, National Archives and Records Administration.
  • Kate Murray, Library of Congress.
  • Vivek Navale, National Archives and Records Administration.
  • Dave Rice, City University of New York.
  • Robin Ruggaber, University of Virginia.
  • Kate Zwaard, Library of Congress.

DuraSpace News: VIVO Data from Mars: LASP, VIVO, and MAVEN

Mon, 2014-10-06 00:00

Above are four of the first images taken by the IUVS instrument. IUVS obtained these false-color images about eight hours after the successful completion of MAVEN’s Mars orbital insertion maneuver on September 21, 2014.

From Michael Cox, Laboratory of Atmospheric and Space Physics, University of Colorado at Boulder

Ed Summers: Sign o’ the Times

Sun, 2014-10-05 17:42

a sign, a metaphor by Casey Bisson.

An old acquaintance took this photo in Coaldale, Nevada. I had to have a copy for myself.

Patrick Hochstenbach: Homework assignment #2 Sketchbookskool

Sun, 2014-10-05 11:34
Filed under: Comics Tagged: cartoon, cat, comics, copic, manual, sketchskool, watercolor

Riley Childs: Test Post

Sat, 2014-10-04 21:05

This is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutesThis is a test post to test Dublin Core on the code4lib Planet, It will go away in a few minutes

The post Test Post appeared first on Riley's blog at