You are here

Feed aggregator

Ian Davis: Gorecipes: Fin

planet code4lib - Wed, 2016-03-30 18:00
Gorecipes: Fin Internet Alchemy

est. 1999

2016 · 2010 · 2005 · 2000 2015 · 2009 · 2004 · 1999 2014 · 2008 · 2003 2012 · 2007 · 2002 2011 · 2006 · 2001                   Wed, Mar 30, 2016 Gorecipes: Fin

How hard can it be to write a book of small, useful programming recipes? Too hard as it turns out.

About a year ago I was prompted by APress publishing to submit a proposal for a book to be called Go Recipes. Knowing nothing about book publishing this surprised me: I’d always assumed that authors pitched ideas to publishers or publishers pitched ideas to authors. It never occurred to me that a publisher would ask an author to devise an idea to pitch out of the blue. I surmised that they must have spotted my Go Cookbook repository on Github. After a couple of weeks of hesitation I put together the table of contents and blurb for the kind of book I would want to use, and one I hoped I could write. The publisher liked what I put together and asked me to write it. After another couple of weeks of hesitation, knowing it would be a massive time commitment. I should have hesitated a bit longer because I completely underestimated that investment of time.

I had plenty of ideas and within a few weeks I had delivered the first chapter and then a few weeks later I managed a second. Then my enthusiasm and energy began to wane: I was busy with other things in my life; I fretted about writing compelling and relevant recipes; I was writing about things I wasn’t that interested in; I worried about repeating myself over and over when explaining things in Go. Most of all I was anxious that some of the things I was writing might not be complete enough or not perceived as correct by other Go programmers.

If you were waiting for that book then I’m sorry to disappoint you: it’s not going to be completed. I decided that it was going to take too long and APress can’t sit around forever waiting for this awesome book to emerge. I don’t know if they’ll take that idea up with someone else. Hopefully they will because it would be a great resource for new Go programmers.

All is not doom and gloom though: I have taken the best and most complete recipes I came up with and added them to the Go Cookbook repository. These are now in the public domain, free for anyone to use for any purpose. Go wild.

ZBW German National Library of Economics: Turning the GND subject headings into a SKOS thesaurus: an experiment

planet code4lib - Wed, 2016-03-30 17:07

The "Integrated Authority File" (Gemeinsame Normdatei, GND) of the German National Library (DNB), the library networks of the German-speaking countries and many other institutions, is a widely recognized and used authority resource. The authority file comprises persons, institutions, locations and other entity types, in particular subject headings. With more than 134,000 concepts, organized in almost 500 subject categories, the subjects part - the former "Schlagwortnormdatei" (SWD) - is huge. That would make it a nice resource to stress-test SKOS tools - when it would be available in SKOS. A seminar at the DNB on requirements for thesauri on the Semantic Web (slides, in German) provided another reason for the experiment described below.

The GND subject headings are defined using a well-thought-out set of custom classes and properties, the GND Ontology (gndo). The GND links to other vocabularies with SKOS mapping properties, which technically implies for some, but not all, of its subject headings being skos:Concepts. Many of the gndo properties already mirror the SKOS or Isothes properties. For the experiment, the relevant subset of the whole GND was selected by the gndo:SubjectHeadingSensoStricto class. One single SPARQL construct query does the selection and conversion (execute on an example concept). For skos:pref/altLabel, derived from gndo:preferred/variantNameForTheSubjectHeading, German language tags are added. The fine-grained hiearchical relations of GNDO - generic, instantial, partitive - are dumped down to skos:broader/narrower. All original properties of a concept are included in the output of the query.

Some additional work was required to integrate the GND Subject Categories (gndsc), a skos:ConceptScheme of about 484 skos:Concepts which logically build a hierarchy. (In fact, the currently published file puts all subject categories on one level.) The subject headings invariably link by to one or more subject categories, but unfortunately the data has to be downloaded and added separately (with a bit of extension). The linking property from the subject headings, gndo:gndSubjectCategory, was already dumped down to skos:broader in the former query. Finally we add an explicit skos:notation and some bits of metadata about the concept scheme.

This earns us a large skos:ConceptScheme, which we called swdskos and which is currently avaliable in a SPARQL endpoint. Now, we can proceed, and try to prove that generic SKOS tools for display, verification and version history comparisons work at that scale.

Skosmos for thesaurus display

Skosmos is an open source web application for browsing controlled vocabularies, developed by the National Library of Finland. It requires a triple store with the vocabulary loaded. (The Skosmos wiki provides detailed installation and configuration help for this.) The configuration for the GND/SWD vocabulary takes only a few lines, following the provided template. The result can be found at http://zbw.eu/beta/skosmos/swdskos:


 

With marginal effort, we gained a structured concept display, a very nice browsing and hierarchical view interface, and a powerful search - out of the box. The initial alphabetical display takes a few seconds, due to the large number of terms for most of the index letters. In a production setting, that could be improved by adding a Squid or Varnish cache. The navigation from concept to concept is far below one second, so the tool seems well suited for practical use even with larger-than-usual vocabularies. For GND, it offers an alternative to the existing access over the DNB portal, more focused on browsing contexts and with a more precise search.

Quality assurance with qSKOS

Large knowledge organization systems are prone to human mistakes, which creep in even with strict rules and careful editing. Some maintenance systems try to catch some of these errors, but let slip others. So one of the really great things about SKOS as a general format for knowledge organization systems is that generic tools can be developed, which catch more and more classes of errors. qSKOS has identified a number of wide-spread possible quality issues, on which it provides detailed analytic information. Of course, often it depends on the vocabulary, which types if issues are considered as errors - for example, it is expected that most GND subject headings lack a definition, so a list of 100,000+ such concepts is not helpful, whereas the list of the (in total 3) cyclic hierarchical relations is. The parametrization we use for STW seems to provide useful results here too:

java -jar qSKOS-cmd.jar analyze -np -d -c ol,chr,usr,rc,mc,ipl,dlv,urc swdskos.ttl.gz -o qskos_extended.log

The tool has already been tested with very large vocabularies (LCSH, e.g.) On the swdskos dataset, it runs for 8 minutes, but it provides results, which could not be obtained otherwise. For example, the list of overlapping labels (report) reveals some strange clashes (example). Standard SKOS tools thus could complement the quality assurance procedures which are already in place.

Version comparisons with skos-history

The skos-history method allows to track changes in knowlege organization systems. It had been developed in the context of the STW overhaul. With swdskos, it proves to be applicable to much larger KOS. The loading of the three versions and the computation of all version deltas take almost half an hour (on a moderately sized virtual machine). That way, for example, we can see the 638 concepts, which were deleted between the Oct 2015 and the Feb 2016 dump of GND. Some checked concept URIs return concepts with different URIs, but the same preferred label, so we can assume that duplicates have been removed here. The added_concepts query can be extended to make use of the - often underestimated - GND subject categories for organizing the query results, as is shown here (list filtered by the notation for computer science and data processing):

These queries only scratch the surface of what could be done by comparing multiple versions of the GND subject headings. Custom queries could try to reveal maintenance patterns, or, for example, trace the uptake of the finer-grained hierarchical properties (generic/instantial/partitive) used in GND.

Summing up

Generic SKOS tools seem to be useful to complement custom tools and processes for specialized knowledge organization systems. The tools considered here have shown no scalability issues with large vocabularies. The publication of an additional experimental SKOS version of the subject headings part of the GND linked data dump could perhaps instigate further research on the development of vocabulary.

The code and the data of the experiment are available here.

GND subject headings as SKOS thesaurus Linked data   Thesaurus  

District Dispatch: Library funding dance: two steps forward, one step back

planet code4lib - Wed, 2016-03-30 16:54

Congratulations . . . and thank you! Over the past two weeks, we’ve issued several calls-to-action asking you to ask your Members of Congress in both chambers to sign four letters urging their colleagues on the powerful Appropriations Committees of the House and Senate to support maximum funding in FY 2017 for the Library Services and Technology Act (LSTA) and Innovative Approaches to Literacy (IAL) programs. And you really responded!

One step forward

Nearly every Member of Congress (529 out of 535) got at least one of the 14,000+ emails, over 740 tweets, and many phone calls with which you answered our call, and some local Congressional offices were even visited personally by library advocates! All that work was rewarded with strong bipartisan support for our four LSTA and IAL “Dear Appropriator” letters, which, on balance, garnered essentially the same solid number of signatures this year as last. That delivered a strong message of support for these critical programs to the House and Senate Appropriations Committees as they begin to sharpen their budget cutting axes in earnest. But . . .

Your support, and those letters, didn’t arrive a moment too soon. For the second year in a row, with the blessing of House Speaker Paul Ryan, the formal Resolution released by the House Budget Committee (a non-binding document that nonetheless conveys the majority’s philosophy) expressly suggests that ALL federal library funding — and with it the Institute of Museum and Library Services that administers LSTA — could be completely eliminated to achieve the House majority’s budget-cutting goals! Regardless of whether this Budget Resolution is adopted (they often aren’t), it’s just the beginning of the FY2017 appropriations process. Sadly, LSTA and IAL will be on the potential “chopping block” almost until year’s end.

So, our advocacy for and defense of LSTA and IAL will need to be an ongoing effort. As critical votes near, there will certainly be more calls-to-action coming. (If you haven’t already, please do take a few seconds now to sign up for future alerts.) Meantime, there’s one sure-fire way to help keep support for LSTA and IAL strong: say “Thanks!” to your Senators and Representative if they signed our LSTA and/or IAL “Dear Appropriator” letters.

That’s as easy as one, two three:

  1. Check out this handy “Champions” chart of who supported LSTA and/or IAL.
  2. If you see at least one of your Member of Congress’ names, or if you want to find out who your Members are, enter your zip code here.
  3. Just click on their name(s) in the list that pops up to email them directly.

Of course, if you’d prefer to tweet or call our LSTA and IAL supporters instead that would be great too. After checking out our Champions chart, consult the Member directories for each chamber linked below for the handles and numbers you’ll need to reach your Senators’ and/or Representative’s offices.

Members of Congress love to hear from constituents, especially with a well-deserved “thank you,” and there’s no easier or better way to keep them engaged and strong for LSTA and IAL. Please, check out our Champions chart and send your message of appreciation now. It’s strategic. It’s important. And it’s just plain good manners, to boot!

The post Library funding dance: two steps forward, one step back appeared first on District Dispatch.

OCLC Dev Network: Announging Linked Data Webinar and Blog Series

planet code4lib - Wed, 2016-03-30 16:00

Developer Network will be sharing blog posts and hosting practice-based webinars over the next three months to help educate those who have little or no experience working with linked data.

LITA: LITA Top Tech Trends Panel at ALA Annual 2016

planet code4lib - Wed, 2016-03-30 15:00

Help LITA celebrate the kick off to it’s 50th year by participating or nominating a Top Tech Trends panelist.

Submit your nomination here.

The LITA Top Technology Trends Committee is currently seeking nominations for panelists to participate in their popular panel discussion session at ALA Annual 2016. We are looking for a diverse panel of speakers ready to offer insights into a range of technology topics impacting libraries today and into the future.

Have someone you’d love to hear share their thoughts about current and future trends in technology? Want to share your own thoughts on some tech topics? Let us know what you or your nominee have to offer to the discussion!

For more details and a chance to nominate yourself or someone else, visit this site.

Nominations are due by April 15th, 2016.

Spread the Word!!!

Emily Clasper
Suffolk Cooperative Library System
LITA Top Tech Trends Committee Chair
emily@suffolknet.org

Max Planck Digital Library: MPG.ReNa via https only

planet code4lib - Wed, 2016-03-30 12:48

The MPG Resource Navigator MPG.ReNa is now accessible via https only.

If in doubt, please double-check any routines and applications loading or embedding content via MPG.ReNa APIs.

Please note that you may need to re-subscribe to resource feeds, or update URLs of RSS widgets in your Content Management System, etc.

We apologize for any inconvenience.

Open Knowledge Foundation: What happened during Open Data Day 2016 in Aix en Provence?

planet code4lib - Wed, 2016-03-30 12:38

This blog post was written by Samuel Goeta and the team in Open Knowledge France

This year, Open Data Day in France left Paris after hosting us in several tech hubs in the capital: Telecom ParisTech in 2013, Simplon in 2014 and La Paillasse en 2015. However, Paris still celebrated Open Data Day online. Etalab, the French government open data taskforce, made a blog post celebrating their favorite open data apps and Open Street Maps announced that there are now 400 million objects in its French database.

 

On the 5th of March, Open Knowledge France headed South to the city of Aix en Provence, near Marseille. Yelloworking hosted the Open Data Day, and it was the first event organized at the new villa of the coworking space. Thanks to Open Knowledge’s Open Data Day mini grant, we were able to start the event with delicious Italian buffet that gave us plenty of energy for hacking.
Anonymal, a local webtv, shoot a short video report (in French) on the event. In two minutes, it explains the concept of the open data day and shows some of the activities that happened on March 5th in Aix en Provence.

Le Yelloworking accueille l’Open Data Day à Aix-en-Provence from anonymal tv on Vimeo.

Our main activities Activity #1: Hack a coffee machine

The concept:

  1. Open the guts of the coffee machine of the coworking space (ask premission first!)
  2. Connect an Arduino board to the buttons of the machine
  3. Collect data on each coffee made: time, length and strength of the coffee.

This data will be used to monitor the activity of the coworking space as each member or visitor usually consumes a coffee.

Why it matters: Open Knowledge can seriously benefit from open hardware devices for automatic real-time data collection. This activity was one of the rare bridge between the open hardware and open data movements which rarely speak to each other.

How did it turn out? Well, the Arduino starter kit is indeed…a good start. We managed to connect the buttons, monitor the strength of a coffee, to record on a log when each coffee is being made. The code has already been published but we still need more equipment to actually publish data.

What’s next: Another hacknight is planned, and the data will be visualised live on Yelloworking website.


Activity #2: Open Yelloworking data

The concept: Yelloworking is a transparent coworking space which already provides regular updates on its incomes, expenses and its activities. We used Open Data Day to push transparency one step further by opening data on Yelloworking’s revenues and visualizing it.   

Why it matters: Open Business models are promising to develop transparent business and increase consumer trust. This is especially true for coworking spaces which are about strengthening trust and tightening ties between members. Open data on revenues can be a powerful way to renew transparency in the corporate sector.

How did it turn out? Samuel, as both the host of the event and a partner at Yelloworking, went through every single invoice and reported it in a CSV file. Then, he visualized the data using Raw, an incredible tool made by Density Design. Data has been published on NosDonnees.fr, a CKAN instance run by Open Knowledge France and Regards Citoyens.

What’s next: Yelloworking now wants to open data on its operating expenses. However, the work to report this information will take much longer.  

 

Activity #3: Deliberations of the city council

The concept: The city council of Aix en Provence, as everywhere in France, votes on deliberations (public debates by city officials). These protocols are online, and can be downloaded as a PDF, but for the average citizen, inspecting and understanding the official lingo in these files is a fastidious job. PourAix, a young collective dedicated to mobilisation and citizens participation, had an excellent idea to map these complicated documents to make local policymaking more accessible. For each protocol, they identified the place that are affected by the decision and created a map. The participants crowdsourced this information and reported the precise place concerned by each deliberation in a file in order to create a map including date, location, name of the elected official proposing the bill, full text of the document, decisions taken.

Why it matters not only here: We know that much extremely valuable information about local life is still stuck in PDF. Crowdsourcing can, in a matter of hours, make these information much more accessible and processable. This in the end, can help and foster accountability.

How did it turn out? More than 100 deliberations were mapped. Crowdsourcing helped PourAix to map all 2015 deliberations. The data were published on their data portal data.pouraix.fr, on data.gouv.fr and mapped using Umap, OSM France ’s tool.

What’s next:  These crowdsourced data will be used to create a monitoring tool for citizens on the model of NosDeputes.fr, the Parliament Monitoring Tool made by Regards Citoyens.

 

Voir en plein écran

 




HELP TO KEEP US AFLOAT

The Public Domain Review is a not-for-profit project and we rely on support from our readers to stay afloat. If you like what we do then please do consider making a donation. We welcome all contributions, big or small - everything helps!


Become a Patron
Small angel : £3.00 GBP - monthly Medium sized hero : £5.00 GBP - monthly Large emperor : £10.00 GBP - monthly Vast deity : £20.00 GBP - monthly
Make a one off Donation





SIGN UP TO THE NEWSLETTER

Sign up to get our free fortnightly newsletter which shall deliver direct to your inbox the latest brand new article and a digest of the most recent collection items. Simply add your details to the form below and click the link you receive via email to confirm your subscription!

Name:
E-mail:

Open Knowledge Foundation: What happened during Open Data Day 2016 in Aix en Provence?

planet code4lib - Wed, 2016-03-30 12:38

This blog post was written by Samuel Goeta and the team in Open Knowledge France

This year, Open Data Day in France left Paris after hosting us in several tech hubs in the capital: Telecom ParisTech in 2013, Simplon in 2014 and La Paillasse en 2015. However, Paris still celebrated Open Data Day online. Etalab, the French government open data taskforce, made a blog post celebrating their favorite open data apps and Open Street Maps announced that there are now 400 million objects in its French database.

 

On the 5th of March, Open Knowledge France headed South to the city of Aix en Provence, near Marseille. Yelloworking hosted the Open Data Day, and it was the first event organized at the new villa of the coworking space. Thanks to Open Knowledge’s Open Data Day mini grant, we were able to start the event with delicious Italian buffet that gave us plenty of energy for hacking. Anonymal, a local webtv, shoot a short video report (in French) on the event. In two minutes, it explains the concept of the open data day and shows some of the activities that happened on March 5th in Aix en Provence.

Le Yelloworking accueille l’Open Data Day à Aix-en-Provence from anonymal tv on Vimeo.

Our main activities Activity #1: Hack a coffee machine

The concept:

  1. Open the guts of the coffee machine of the coworking space (ask premission first!)
  2. Connect an Arduino board to the buttons of the machine
  3. Collect data on each coffee made: time, length and strength of the coffee.

This data will be used to monitor the activity of the coworking space as each member or visitor usually consumes a coffee.

Why it matters: Open Knowledge can seriously benefit from open hardware devices for automatic real-time data collection. This activity was one of the rare bridge between the open hardware and open data movements which rarely speak to each other.

How did it turn out? Well, the Arduino starter kit is indeed…a good start. We managed to connect the buttons, monitor the strength of a coffee, to record on a log when each coffee is being made. The code has already been published but we still need more equipment to actually publish data.

What’s next: Another hacknight is planned, and the data will be visualised live on Yelloworking website.

Activity #2: Open Yelloworking data

The concept: Yelloworking is a transparent coworking space which already provides regular updates on its incomes, expenses and its activities. We used Open Data Day to push transparency one step further by opening data on Yelloworking’s revenues and visualizing it.   

Why it matters: Open Business models are promising to develop transparent business and increase consumer trust. This is especially true for coworking spaces which are about strengthening trust and tightening ties between members. Open data on revenues can be a powerful way to renew transparency in the corporate sector.

How did it turn out? Samuel, as both the host of the event and a partner at Yelloworking, went through every single invoice and reported it in a CSV file. Then, he visualized the data using Raw, an incredible tool made by Density Design. Data has been published on NosDonnees.fr, a CKAN instance run by Open Knowledge France and Regards Citoyens.

What’s next: Yelloworking now wants to open data on its operating expenses. However, the work to report this information will take much longer.  

 

Activity #3: Deliberations of the city council

The concept: The city council of Aix en Provence, as everywhere in France, votes on deliberations (public debates by city officials). These protocols are online, and can be downloaded as a PDF, but for the average citizen, inspecting and understanding the official lingo in these files is a fastidious job. PourAix, a young collective dedicated to mobilisation and citizens participation, had an excellent idea to map these complicated documents to make local policymaking more accessible. For each protocol, they identified the place that are affected by the decision and created a map. The participants crowdsourced this information and reported the precise place concerned by each deliberation in a file in order to create a map including date, location, name of the elected official proposing the bill, full text of the document, decisions taken.

Why it matters not only here: We know that much extremely valuable information about local life is still stuck in PDF. Crowdsourcing can, in a matter of hours, make these information much more accessible and processable. This in the end, can help and foster accountability.

How did it turn out? More than 100 deliberations were mapped. Crowdsourcing helped PourAix to map all 2015 deliberations. The data were published on their data portal data.pouraix.fr, on data.gouv.fr and mapped using Umap, OSM France ’s tool.

What’s next:  These crowdsourced data will be used to create a monitoring tool for citizens on the model of NosDeputes.fr, the Parliament Monitoring Tool made by Regards Citoyens.

 

Voir en plein écran

 

District Dispatch: CopyTalk: the libertarians are coming! The libertarians are coming!

planet code4lib - Tue, 2016-03-29 19:18

CopyTalk is back.

When it comes to copyright policy, you will not find strict bipartisanship between Republicans and Democrats. Both parties tend to focus on maintaining the business models that they are familiar with. Do no harm to the content companies, help the starving artists without understanding why they are not making money, and make Internet companies and anyone who provides internet services more accountable for copyright piracy (aka infringement).  But not everyone is a Republican or Democrat; there are others, like Independents, Tea Party people and Libertarians, who favor small government, individual freedom, the market economy and private property.  Of course, like all political parties, there is a spectrum of thought among libertarians— there are libertarians, and then there are libertarians. But in general, what do Libertarians think about copyright law?   Were copyright ever to be reformed, what reform would a libertarian want to see?

ALA is a founding member of Re:Create: a coalition of industry, civil society, trade associations, and libertarians who seek balanced copyright, creativity, freedom of speech and a more understandable copyright law.

The April 7th CopyTalk will feature coalition partner R Street—“a free market think tank advancing real solutions for complex public policy problems.” This webinar will be our first policy related webinar, so get ready for a trip to Wonky Town!

Details: Thursday, April 7th at 2:00 pm (Eastern) and 11:00 am (Pacific).  This is the URL that will get you into the webinar. Register as a guest and you’re in.  Yes, it’s still FREE because the Office for Information Technology Policy and the Copyright Education Subcommittee want to expand copyright awareness and education opportunities.

The post CopyTalk: the libertarians are coming! The libertarians are coming! appeared first on District Dispatch.

LITA: 2016 LITA Forum – Call for Proposals

planet code4lib - Tue, 2016-03-29 15:38

The 2016 LITA Forum Committee seeks proposals for the 19th Annual Forum of the Library Information and Technology Association in Fort Worth Texas, November 17-20, 2016 at the Omni Fort Worth Hotel.

Submit your proposal at this site

The Forum Committee welcomes proposals for full-day pre-conferences, concurrent sessions, or poster sessions related to all types of libraries: public, school, academic, government, special, and corporate. Collaborative and interactive concurrent sessions, such as panel discussions or short talks followed by open moderated discussions, are especially welcomed. We deliberately seek and strongly encourage submissions from underrepresented groups, such as women, people of color, the LGBT community and people with disabilities.

The Submission deadline is Friday April 29, 2016.

Proposals could relate to, but are not restricted to, any of the following topics:

  • Discovery, navigation, and search
  • Practical applications of linked data
  • Library spaces (virtual or physical)
  • User experience
  • Emerging technologies
  • Cybersecurity and privacy
  • Open content, software, and technologies
  • Assessment
  • Systems integration
  • Hacking the library
  • Scalability and sustainability of library services and tools
  • Consortial resource and system sharing
  • “Big Data” — work in discovery, preservation, or documentation
  • Library I.T. competencies

Proposals may cover projects, plans, ideas, or recent discoveries. We accept proposals on any aspect of library and information technology. The committee particularly invites submissions from first time presenters, library school students, and individuals from diverse backgrounds.

Vendors wishing to submit a proposal should partner with a library representative who is testing/using the product.

Presenters will submit final presentation slides and/or electronic content (video, audio, etc.) to be made available on the web site following the event. Presenters are expected to register and participate in the Forum as attendees; a discounted registration rate will be offered.

If you have any questions, contact Tammy Allgood Wolf, Forum Planning Committee Chair, at tammy.wolf@asu.edu.

Submit your proposal at this site

More information about LITA is available from the LITA website, Facebook and Twitter.

Tim Ribaric: A hot take on discovery system results

planet code4lib - Tue, 2016-03-29 15:36

Here's an example of Google doing better then a discovery system. First and foremost your mileage may vary. This is a very specific example but endemic of the landscape we find ourselves in.

read more

David Rosenthal: Following Up On The Emulation Report

planet code4lib - Tue, 2016-03-29 15:00
A meeting was held at the Mellon Foundation to follow up on my report Emulation and Virtualization as Preservation Strategies. I was asked to provide a brief introduction to get discussion going. The discussions were confidential, but below the fold is an edited text of my introduction with links to the sources.

I think the two most useful things I can do this morning are:
  • A quick run-down of developments I'm aware of since the report came out.
  • A summary of the key problem areas and recommendations from the report.
I'm going to ignore developments by the teams represented here. Not that they aren't important, but they can explain them better than I can.
EmulatorsFirst, the emulators themselves. Reports of new, enthusiast-developed emulators continue to appear. Among recent ones are:
The quality of the emulators, especially when running legacy artefacts, is a significant concern. A paper at last year's SOSP by Nadav Amit et al entitled Virtual CPU Verification casts light on the causes and cures of fidelity failures in emulators. They observed that the problem of verifying virtualized or emulated CPUs is closely related to the problem of verifying a real CPU. Real CPU vendors sink huge resources into verifying their products, and this team from the Technion and Intel were able to base their research into X86 emulation on the tools that Intel uses to verify its CPU products.

Although QEMU running on an X86 tries hard to virtualize rather than emulate, it is capable of emulating and the team were able to force it into emulation mode. Using their tools, they were able to find and analyze 117 bugs in QEMU, and fix most of them. Their testing also triggered a bug in the VM BIOS:
But the VM BIOS can also introduce bugs of its own. In our research, as we addressed one of the disparities in the behavior of VCPUs and CPUs, we unintentionally triggered a bug in the VM BIOS that caused the 32-bit version of Windows 7 to display the so-called blue screen of death.Having Intel validate the open source hypervisors, especially doing so by forcing them to emulate rather than virtualize, would be a big step forward. To what extent the validation process would test the emulation of the hardware features of legacy CPUs important for preservation is uncertain, though the fact that their verification caught a bug that was relevant only to Windows 7 is encouraging.

QEMU is supported via the Software Freedom Conservancy. It supported Christopher Hellwig's lawsuit against VMware for GPL violations. As a result the Conservancy is apparently seeing corporate support evaporate, placing its finances in jeopardy.
FrameworksSecond, the frameworks. The performance of the Internet Archive's JSMESS framework, now being called Emularity, depends completely on the performance of the JavaScript virtual machine. Other frameworks are less dependent, but its performance is still important. The movement supported by major browser vendors to replace this virtual machine with a byte-code virtual machine called WebAssembly has borne fruit. A week ago four major browsers announced initial support, all running the same game, a port of Unity's Angry Bots. This should greatly reduce the pressure for multi-core and parallelism support in JavaScript, which was always likely to be a kludge. Improved performance for in-browser emulation is also likely to make in-browser emulation more competitive with techniques that need software installation and/or cloud infrastructure, reducing the barrier to entry.

The report discusses the problems GPUs pose for emulation and the efforts to provide paravirtualized GPU support in QEMU. This limited but valuable support is now mainstreamed in the Linux 4.4 kernel.

Mozilla among others has been working to change the way in which Web pages are rendered in the browser to exploit the capabilities of GPUs. Their experimental "servo" rendering engine gains a huge performance advantage by doing so. For us, this is a double-edged sword. It makes the browser dependent on GPU support in a way it wasn't before, and thus makes the task of browser emulations such as oldweb.today harder. If, on the other hand, it means that GPU capabilities will be exposed to WebAssembly, it raises the prospect of worthwhile GPU-dependent emulations running in browsers, further reducing the barrier to entry.
CollectionsThird, the collections. The Internet Archive has continued to release collections of legacy software using Emularity. The Malware Museum, a collection of currently 47 viruses from the '80s and '90s, has proven very popular, with over 850K views in about 6 weeks. The Windows 3.X Showcase, a curated sample of the over 1500 Windows emulations in the collection, has received 380K views in the same period. It is particularly interesting because it includes a stock install of Windows 3.11. Despite that the team has yet to receive a takedown request from Microsoft.

About the same time as my report, a team at Cornell led by Oya Rieger and Tim Murray produced a white paper for the National Endowment for the Humanities entitled Preserving and Emulating Digital Art Objects. I blogged about it. To summarize my post, I believe that outside their controlled "reading room" conditions the concern they express for experiential fidelity is underestimated, because smartphones and tablets are rapidly replacing PCs. But two other concerns, for emulator obsolescence and the fidelity of access to web resources, are overblown.
ToolsFourth, the tools. The Internet Archive has a page describing how DOS software to be emulated can be submitted. Currently about 65 submissions a day are being received, despite the somewhat technical process it lays out. Each is given minimal initial QA to ensure that it comes up, and is then fed into the crowd-sourced QA process described in the report. It seems clear that improved tooling, especially automating the process via an interactive Web page that ran the emulation locally before submission, would result in more and better quality submissions.
Internet of ThingsThe Internet of Things has been getting a lot of attention, especially the catastrophic state of IoT security. Updating the software of Things in the Internet to keep them even marginally secure is often impossible because the Things are so cheap there are no dollars for software support and updates, and because customers have no way to tell that one device is less insecure than another. This is exactly the problem faced by preserved software that connects to the Internet, as discussed in the report. Thus efforts to improve the security of the IoT and efforts such as Freiburg's to build an "Internet Emulator" to protect emulations of preserved software may be highly synergistic.

Off on a tangent, it is worth thinking about the problems of preserving the Internet of Things. The software and hardware are intimately linked, even more so than smartphone apps. So does preserving the Internet of Things reduce to preserving the Things in the Internet, or does emulation have a role to play?
The To-Do ListTo refresh your memories, here are the highlights of the To-Do List that ends the report, with some additional commentary. I introduce the list by pointing out the downsides of the lack of standardization among the current frameworks, in particular:
  • There will be multiple emulators and emulation frameworks, and they will evolve through time. Re-extracting or re-packaging preserved artefacts for different, or different versions of, emulators or emulation frameworks would be wasted effort.
  • The most appropriate framework configuration for a given user will depend on many factors, including the bandwidth and latency of their network connection, and the capabilities of their device. Thus the way in which emulations are advertised to users, for example by being embedded in a Web page, should not specify a particular framework or configuration; this should be determined individually for each access.
I stressed that:
If the access paths to the emulations link directly to evanescent services emulating the preserved artefacts, not to the artefacts themselves, the preserved artefacts are not themselves discoverable or preservable. In summary, the To-Do list was:
  1. Standardize Preserved System Images so that the work of preparing preserved system images for emulation will not have to be redone repeatedly as emulation technology evolves, and
  2. Standardize Access To System Images and
  3. Standardize Invoking Emulators so that the work of presenting emulations of preserved system images to the "reader" will not have to be redone repeatedly as emulation technology evolve.
  4. Improve Tools For Preserving System Images: The Internet Archive's experience shows that even minimal support for submission of system images can be effective. Better support should be a high priority. If the format of system images could be standardized, submissions would be available to any interested archive.
  5. Enhance Metadata Databases: these tools, and standardized methods for invoking emulators, rely on metadata database, which need significant enhancement for this purpose.
  6. Support Emulators: The involvement of Intel in QA-ing QEMU is a major step forward, but it must be remembered that most emulations of old software depend on enthusiast-supported emulators such as MAME/MESS. Supporting ways to improve emulator quality, such as for example external code reviews to identify critical quality issues, and a "bounty" program for fixing them, should be a high priority. It would be important that any such program be "bottom-up"; a "top-down" approach would not work in the enthusiast-dependent emulator world.
  7. Develop Internet Emulators: oldweb.today is already demonstrating the value of emulating software that connects to the Internet. Doing so carries significant risks, and developing technology to address them (before the risks become real and cause a backlash) needs high priority. The synergies between this and the security of the Internet of Things should be explored urgently.
  8. Tackle Legalities: As always, the legal issues are the hardest to address. I haven't heard that the PERSIST meeting in Paris last November came up with any new ideas in this area. The lack of a reaction to the Internet Archive's Windows 3.X Showcase is encouraging, and I'm looking forward to hearing whether others have made progress in this area.

Open Library Data Additions: Amazon Crawl: part o-4

planet code4lib - Tue, 2016-03-29 14:59

Part o-4 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

Library of Congress: The Signal: Digital Preservation at the State Library of Massachusetts

planet code4lib - Tue, 2016-03-29 12:47

This is a guest post by Stefanie Ramsay.

The State Library in 1890. Courtesy of the State Library of Massachusetts Special Collections.

How do you capture, preserve and make accessible thousands of born-digital documents produced by state agencies, published to varying websites without any notification or consistency and which are often relocated or removed over time? This is the complex task that the State Library of Massachusetts faces in its legal digital preservation mandate.

My National Digital Stewardship residency involves conducting a comprehensive assessment of existing state publications, assessing how other state libraries and archives handle this challenge and establishing best practices and procedures for the preservation and accessibility of electronic state publications at the State Library. In this post, I’ll cover how we’ve approached this project, as well as our next steps.

State agencies publish thousands of documents for the public, and a legal mandate requires that they send this content to the library for preservation. Unfortunately this is a rare occurrence, leaving library staff to retrieve content using various other methods. The staff relies on a homegrown web crawler to capture publications from agency websites, but they also comb through individual agency pages and check social media and the news to spot mentions of agency publications.

Creative as these approaches may be, they do not form a sustainable practice for handling the large amounts of content that agencies produce. Before establishing a better workflow however, the library needed to get a better understanding of how much material is published, what kinds of material are published and how best to capture these materials for long-term access and preservation. Having this data, we can then begin to build an effective digital preservation program.

The State Library today after a complete renovation. Photo by Stefanie Ramsay.

At the beginning of my residency, we began using web statistics collected from the Massachusetts governments’ main portal, Mass.gov. The statistics show publications requested by users of the site per month. Having the URL allows us to see where these are posted and to ascertain the types of documents agencies are publishing.

We found a wide range of documents, such as annual reports, meeting materials, Executive Orders, and my personal favorite, a guide to resolving conflict between people and beavers.

After categorizing the content, we needed to narrow down a collection scope. Rather than attempting to capture every publication, I thought it best to define what types of documents are most valuable to the staff and library users and to focus our efforts on those documents (which is not to say that the lower priority items will be ignored, but that the higher priority items will be handled first, then we will develop a plan for the rest). To determine what documents were high and low priorities, we implemented a ranking process.

Each staff member ranked the publications for individual agencies on a scale of 1-5 (1 being lowest priority, 5 being highest) on shared spreadsheets, and our collective averages started to filter what was most valuable. Documents such as reports, project documents and topical issues rose to the top, while items such as draft reports, requests for proposals and ephemeral information sunk to the low priority tier.

This process formed the basis of our collection policy statement to be used as a guide when identifying and selecting content for ingestion. This statement is regularly updated as we continue to determine our priorities. We also began collecting metrics on the total number of documents captured by the statistics and the number of documents that fell into each priority tier. This gives us a sense of the bigger picture of not only the amount of content, but how much needs to be handled quickly and forms the basis of an argument for increased resources.

This issue is not unique to Massachusetts; every state library has a mandate to capture state government information and every state takes a different approach based on their resources, staff expertise, and constituents. In my research of how other state libraries and archives handle this mandate, one common thread emerged: at least 24 states use Archive-It as a means for capturing digital content. I was eager to investigate this, as I hoped it could be another resource for Massachusetts to use as well.

The IT department of the Executive Branch of Massachusetts state government, MassIT, has an Archive-It account and has crawled Mass.gov since 2008. Though the account was publicly available, MassIT had not advertised the site, as their focus was on ensuring capture of content rather than accessibility. Seizing this opportunity for collaboration, we reached out to MassIT, who granted us access to the site. We worked together to customize the metadata and I wrote some language for the library’s website that provided instructions for our patrons on how to use Archive-It to access state publications.

The State Library stacks. Courtesy of the State Library of Massachusetts.

Our situation is a bit different in that we do not exclusively maintain the Archive-It account. However, we are using this resource in a similar way to many other state libraries and archives. Archive-It will not be our main repository for accessing state publications– the library has a DSpace digital repository that has been in place since 2006 and will continue to be our central portal for providing enhanced access to high priority publications.

Archive-It will act as a service for crawling Mass.gov, thereby ensuring the capture of more documents than we could hope to collect on our own and allowing us another means of finding material we may not have captured in the past. Using the two in concert goes a long way towards meeting the legal mandate.

With just a few months left in the residency, there is still much work to be done. We’re testing a workflow for batch downloading PDFs using a Firefox add-on called DownThemAll!, investigating how to streamline the cataloging process and conducting outreach efforts to state agencies.

Outreach is crucial in raising awareness of the library’s resources and services, as well as in reminding agencies about that pesky law regarding their publications. These steps form the foundation of a more sustainable digital preservation program at the State Library.

Open Knowledge Foundation: Open Data Day Buenos Aires – planning the open data agenda for 2016

planet code4lib - Tue, 2016-03-29 12:30

This blog was written by Yamila Garcia, Open Knowledge ambassador in Argentina 

For the third time, we celebrated Open Data Day in Argentina, and we invited different groups to celebrate it with us: members of the official open government office; transparency, open data and freedom of information activists, civic innovators, journalists and anyone who is interested in the progress of 21st century open governments.

March the 5th marks a day of open data deliberations, where we understand the importance of open data in three main pillars – release, reuse and impact. It is a day to share ideas, projects and opening up the dialogue channels about open public information, promotion of freedom of information laws, open government in the three branches  of the state, strengthening democracy, promoting citizenes participation and generation of public and civic innovation. Fundación Conocimiento Abierto with Argentinian civil society organizations (Democracia en Red, Asociación Civil por la Igualdad y la Justicia, Directorio Legislativo and FOPEA) had the honour to receive 250 participants in #ODD16. The event was supported by ILDA.

In the last two years, we invited open data projects in Argentina and practitioners from different fields such as academia, government (in all branches), journalists and civic hackers to join us under the same roof and present their projects.  This year we decided to shake things up, and had an event with the following  activities:

  • Panels: we had Four central panels with the following topics:
    • “Progress for a law on access to information” with Laura Alonso (anticorruption office holder), Fernando Sanchez (national legislature), Government officials and José Crettaz (Journalist of La Nación Newspaper).
    • “Challenges for open government” with Rudi Borrmann (National Subsecretary of Innovation and open government ), Carolina Cornejo (ACIJ), Alvaro Herrero ( City government) and Gustavo Bevilaqcua (national legislature).
    • “Hackers civic and open data” with four famous civic hackers  
    • “Local governments and opening information” panel with five representatives of local government innovation

 

  • An open space of dialogue with mentors in the following  topics:
    • Innovating in the public sector,
    • Challenges for a law on access to public information
    • Municipalities progress in the area of open government, How to achieve citizen participation channels?
    • OGP agenda in Argentina
    • Challenges for open municipal government with open data
    • The impact of open data: How to measure results?
    • Codeando Argentina: Cooperation between governments and civic hackers, and Parliament opened.

We accomplished the goal of gather all the areas that work on open data to shape the Open Data Agenda for 2016 in a collaborative way. Each year this community grows more and more. In 2017, we will expect to have an Open Data Day in other parts of the country, and not only in Buenos Aires. From year to year, we get more challenges, and we are happy to have Open Data Day to tackle them. 

 

 

 

Terry Reese: MarcEdit Mac: Export Tab Delimited Records

planet code4lib - Tue, 2016-03-29 04:15

As part of the last update, I added a new feature that is only available in the Mac Version of MarcEdit at this point.  One of the things that had been missing in the Export Tab Delimited Tool was the ability to save and load one’s export settings.  I added that as part of the most recent update.  At the same time, I though that the ability to batch process multiple files using same criteria may be useful as well.  So this has been added to the Mac interface as well.

In the image above, you initiate the batch processing mode by checking the batch process checkbox.  This will change the marc file and save file textbox and buttons to directory paths.  You will also be prompted to select a file extension to process.

I’m not sure if this will be useful — but as I’m working through new functionality, I’ll be noting changes being made to the MarcEdit Mac version.  And this is notable, because this is the first time that the Mac version contains functionality that is not in the Windows version.

–tr

Terry Reese: MarcEdit Bug Fix Update

planet code4lib - Tue, 2016-03-29 04:05

I’ve posted a new update for all versions of MarcEdit this afternoon.  Last night, when I posted the new update, I introduced a bug into the RDA Helper that rendered it basically unusable.  When adding functionality to the tool to enable support for abbreviations at a subfield level, I introduced a problem that removed the subfield codes from fields where abbreviations would take place.

So what does this bug look like?  When processing data, a set field that would look like this:
=300  \\$a1 vol $b ill.

would be replaced to look like:
=300  \\a1 volume b illustrations

As one case see, the delimiter symbol “$” has been removed.  This occurred in all fields where data abbreviations were occurring.  This has been corrected with this update.

You can get the update from the downloads page: http://marcedit.reeset.net/downloads or via the automated update tool.

–tr

DuraSpace News: VIVO/Fedora Integration Forum

planet code4lib - Tue, 2016-03-29 00:00

From Andrew Woods, Fedora Tech Lead

Austin, TX  There has been increasing interest and discussion around the opportunities of an integration between VIVO and Fedora 4. In an effort to provide a public forum for detailing use cases, identifying related initiatives, gaining consensus on integration patterns, etc, the following mailing list has been created:

DuraSpace News: NOW AVAILABLE: DSpace-CRIS 5.5.0

planet code4lib - Tue, 2016-03-29 00:00

From Andrea Bollini, Head of Open Source and Open Standards Strategy, Cineca

Rome, Italy  I'm glad to announce the availability of the 5.5.0 version of DSpace-CRIS built on top of DSpace JSPUI 5.5.

https://github.com/Cineca/DSpace/tree/dspace-cris-5.5.0

Pages

Subscribe to code4lib aggregator