You are here

Feed aggregator

Nicole Engard: Bookmarks for February 25, 2016

planet code4lib - Thu, 2016-02-25 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Sponsored: The 3 Week Diet 8 Rules of Fat Loss. Warning: Fast Results! Click Here to Watch Video…
  • Connfa Open Source iOS & Android App for Conferences & Events
  • Paperless Scan, index, and archive all of your paper documents
  • Foss2Serve Foss2serve promotes student learning via participation in humanitarian Free and Open Source Software (FOSS) projects.
  • Disk Inventory X Disk Inventory X is a disk usage utility for Mac OS X 10.3 (and later). It shows the sizes of files and folders in a special graphical way called “treemaps”.
  • Loomio Loomio is the easiest way to make decisions together. Loomio empowers organisations and communities to turn discussion into action, wherever people are.
  • DemocracyOS DemocracyOS is an online space for deliberation and voting on political proposals. It is a platform for a more open and participatory government. The software aims to stimulate better arguments and come to better rulings, as peers.

Digest powered by RSS Digest

The post Bookmarks for February 25, 2016 appeared first on What I Learned Today....

Related posts:

  1. OSCON Keynote: Creating Communities of Inclusion
  2. What happened to Bloglines?
  3. The curious (mis)perception of open-source support

District Dispatch: Everyday Fair Use in Libraries

planet code4lib - Thu, 2016-02-25 20:08

Tammy Ravas, Visual and Performing Arts Librarian and Media Coordinator, University of Montana

Guest Blog Post by Tammy Ravas*

Happy Fair Use Week everyone!

Fair use is one of the most important exceptions to the exclusive rights of copyright holders. It allows people to use copyrighted materials for certain purposes without the need to ask permission from rights holders. Fair use is the safety valve in the law that allows citizens to exercise their first amendment rights when using copyrighted materials. It balances the rights of copyright holders with those who make uses of their materials for reasons that benefit society at large. In the actual statute, these purposes are: “criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research.” Without fair use, many familiar and everyday experiences could become illegal. For instance:

  • Search engines like Google would not exist.
  • Online stores would not be able to effectively operate.
  • Only the rights holders of original images would be able to create Internet memes that one often finds on sites like Facebook, Twitter, Reddit, and many others.
  • Using a DVR to watch a TV program later in the week would be illegal.
  • This commentary could be next to impossible to create by virtue of the fact that each link and quotation I use from outside sources would require permission from rights holders. If fair use did not exist, the creators behind the content that I am linking to and quoting from could effectively censor this post by either denying my request or by charging me an exorbitant amount of money in licensing fees to use their material.

By looking at these examples, it is easy to see how you and your patrons rely on fair uses of copyrighted materials every single day. Here are a few library-specific scenarios to consider:

1. Making library materials accessible to those with disabilities:

A hearing-impaired patron would like to view a video in your collection that does not have captions. If there isn’t a copy available for purchase then fair use may help you to make that item
accessible to them. The Association for Research Libraries’ 2012 publication, Code of Best Practices in Fair Use for Academic and Research Libraries, provides guidance on this issue on pages 25 and 26:

When fully accessible copies are not readily available from commercial sources, it is fair use for a library to (1) reproduce materials in its collection in accessible formats for the disabled upon request, and (2) retain those reproductions for use in meeting subsequent requests from qualified patrons.

2. Use of digital files and multimedia equipment in your library.

An example of this would be a student creating a presentation for a high school class about movies based on video games. A student may use books and videos from the library’s collection along with other hardware and software to create the presentation. In more and more cases, students are assigned video presentations to post online to their classes. Also, patrons who read books, listen to music, or watch videos on mobile devices are relying on fair use to do so.

3. Digitizing materials in your collection.

In this example you can watch a video about how the New York Public Library relies on fair use to digitize an important collection on the World’s Fair.

To learn more about fair use, as well as fair use week, please visit the following resources:

Copyright and Fair Use Stanford University Libraries

Fair Use Week

 

Note:  Guest blog author Tammy Ravas is  Visual and Performing Arts Librarian and Media Coordinator at the University of Montana.

The post Everyday Fair Use in Libraries appeared first on District Dispatch.

Richard Wallis: Evolving Schema.org in Practice Pt2: Working Within the Vocabulary

planet code4lib - Thu, 2016-02-25 11:24

In the previous post in this series Pt1: The Bits and Pieces I stepped through the process of obtaining your own fork of the Schema.org GitHub repository; working on it locally; uploading your version for sharing; and proposing those changes to the Schema.org community in the form of a GitHub Pull Request.

Having covered the working environment; in this post I now intend to describe some of the important files that make up Schema.org and how you can work with them to create or update, examples and term definitions within your local forked version in preparation for proposing them in a Pull Request.

The File Structure
If you inspect the repository you will see a simple directory structure.  At the top level you will find a few files sporting a .py suffix.  These contain the python application code to run the site you see at http://schema.org.  They load the configuration files, build an in-memory version of the vocabulary that are used to build the html pages containing the definitions of the terms, schema listings, examples displays, etc.  They are joined by a file named app.yaml, which contains the configuration used by the Google App Engine to run that code.

At this level there are some directories containing supporting files: docs & templates contain static content for some pages; tests & scripts are used in the building and testing of the site; data contains the files that define the vocabulary, its extensions, and the examples used to demonstrate its use.

The Data Files
The data directory itself contains various files and directories.  schema.rdfa is the most important file, it contains the core definitions for the majority of the vocabulary.  Although, most of the time, you will see schema.rdfa as the only file with a .rdfa suffix in the data directory, the application will look for and load any .rdfa files it finds here.  This is a very useful feature when working on a local version – you can keep your enhancements together only merging them into the main schema.rdfa file when ready to propose them.

Also in the data directory you will find an examples.txt file and several others ending with –examples.txt.  These contain the examples used on the term pages, the application loads all of them.

Amongst the directories in data, there are a couple of important ones.  releases contains snapshots of versions of the vocabulary from version 2.0 onwards.  The directory named ext contains the files that define the vocabulary extensions and examples that relate to them.  Currently you will find auto and bib directories within ext, corresponding to the extensions currently supported.  The format within these directories follows the basic pattern of the data directory – one or more .rdfa files containing the term definitions and –examples.txt files containing relevant examples.

Getting to grips with the RDFa

Enough preparation let’s get stuck into some vocabulary!

Take your favourite text/code editing application and open up schema.rdfa. You will notice two things – it is large [well over 12,500 lines!], and it is in the format of a html file.  This second attribute makes it easy for non-technical viewing – you can open it with a browser.

Once you get past a bit of CSS formatting information and a brief introduction text, you arrive [about 35 lines down] at the first couple of definitions – for Thing and CreativeWork.

The Anatomy of a Type Definition
Standard RDFa (RDF in attributes) html formatting is used to define each term.  A vocabulary Type is defined as a RDFa marked up <div> element with its attributes contained in marked up <span> elements.

The Thing Type definition:

<div typeof="rdfs:Class" resource="http://schema.org/Thing">   <span class="h" property="rdfs:label">Thing</span>   <span property="rdfs:comment>The most generic type of item.</span> </div>

The attributes of the <div> element indicate that this is the definition of a Type (typeof=”rdfs:Class”) and its canonical identifier (resource=”http://schema.org/Thing”).  The <span> elements filling in the details – it has a label (rdfs:label) of ‘Thing‘ and a descriptive comment (rdfs:comment) of ‘The most generic type of item‘.  There is one formatting addition to the <span> containing the label.  The class=”h” is there to make the labels stand out when viewing in a browser – it has no direct relevance to the structure of the vocabulary.

The CreativeWork Type definition

<div typeof="rdfs:Class" resource="http://schema.org/CreativeWork">    <span class="h" property="rdfs:label">CreativeWork</span>    <span property="rdfs:comment">The most generic kind of creative work, including books, movies, photographs, software programs, etc.</span>    <span>Subclass of: <a property="rdfs:subClassOf" href="http://schema.org/Thing">Thing</a></span>    <span>Source:  <a property="dc:source" href="http://www.w3.org/wiki/WebSchemas/SchemaDotOrgSources#source_rNews">rNews</a></span> </div>

Inspecting the CreativeWork definition reveals a few other attributes of a Type defined in <span> elements.  The rdfs:subClassOf property, with the associated href on the <a> element, indicates that http://schema.org/CreativeWork is a sub-type of http://schema.org/Thing.

Finally there is the dc:source property and its associated href value.  This has no structural impact on the vocabulary, its purpose is to acknowledge and reference the source of the inspiration for the term.  It is this reference that results in the display of a paragraph under the Acknowledgements section of a term page.

Defining Properties
The properties that can be used with a Type are defined in a very similar way to the Types themselves.

The name Property definition:

<div typeof="rdf:Property" resource="http://schema.org/name">   <span class="h" property="rdfs:label">name</span>   <span property="rdfs:comment">The name of the item.</span>   <span>Domain: <a property="http://schema.org/domainIncludes" href="http://schema.org/Thing">Thing</a></span>   <span>Range: <a property="http://schema.org/rangeIncludes" href="http://schema.org/Text">Text</a></span> </div>

The attributes of the <div> element indicate that this is the definition of a Property (typeof=”rdf:Property”) and its canonical identifier (resource=”http://schema.org/name”).  As with Types the <span> elements fill in the details.

Properties have two specific <span> elements to define the domain and range of a property.  If these concepts are new to you, the concepts are basically simple.  The Type(s) defined as being in the domain of a property are are those for which the property is a valid attribute.  The Type(s) defined as being in the range of a property, are those that expected values for that property.  So inspecting the above name example we can see that name is a valid property of the Thing Type with an expected value type of Text.  Also specific to property definitions is rdfs:subPropertyOf which defies that one property is a sub-property another.  For html/RDFa format reasons this is defined using a link entity thus: <link property=”rdfs:subPropertyOf” href=”http://schema.org/workFeatured” />.

Those used to defining other RDF vocabularies may question the use of  http://schema.org/domainIncludes and http://schema.org/rangeIncludes to define these relationships. This is a pragmatic approach to producing a flexible data model for the web.  For a more in-depth explanation I refer you to the Schema.org Data Model documentation.

Not an exhaustive tutorial in editing the defining RDFa but hopefully enough to get you going!

 

Making Examples

One of the most powerful features of the Schema.org documentation is the Examples section on most of the term pages.  These provide mark up examples for most of the terms in the vocabulary, that can be used and built upon by those adding Schema.org data to their web pages.  These examples represent how the html of a page or page section may be marked up.  To set context, the examples are provided in several serialisations – basic html, html plus Microdata, html plus RDFa, and JSON-LD.  As the objective is to aid the understanding of how Schema.org may be used, it is usual to provide simple basic html formatting in the examples.

Examples in File
As described earlier, the source for examples are held in files with a –examples.txt suffix, stored in the data directory or in individual extension directories.

One or more examples per file are defined in a very simplistic format.

An example begins in the file with a line that starts with TYPES:, such as this:

TYPES: #eg2  Place,LocalBusiness, address, streetAddress

This example has a unique identifier prefixed with a # character, there should be only one of these per example.  These identifiers are intended for future feedback mechanisms and as such are not particularly controlled.  I recommend you crate your own when creating your examples.  Next comes a comma separated list of term names.  Adding a term to this list will result in the example appearing on the page for that term.  This is true for both Types and Properties.

Next comes four sections each preceded by a line containing a single label in the following order: PRE-MARKUP:, MICRODATA:, RDFA:, JSON:.  Each section ends when the next label line, or the end of the file is reached.  The contents of each section of the example is then inserted into the appropriate tabbed area on the term page.  The process that does this is not a sophisticated one, there are no error or syntax checking involved – if you want to insert the text of the Gettysburg Address as your RDFa example, it will let you do it.

I am not going to provide tutorials for html, Microdata, RDFa, or JSON-LD here there are a few of those about.  I will however recommend a tool I use to convert between these formats when creating examples.  RDF Translator is a simple online tool that will validate and translate between RDFa, Microdata, RDF/XML, N3, N-Triples, and JSON-LD.  A suggestion, to make your examples as informative possible – when converting between formats, especially when converting to JSON-LD, most conversion tools reorder he statements. It is worth investing some time in ensuring that the mark up order in your example is consistent for all serialisations.

Hopefully this post will clear away some of mystery of how Schema.org is structured and managed. If you have proposals in mind to enhance and extend the vocabulary or examples, have a go, see if thy make sense in a version on your own system, suggest them to the community on Github.

In my next post I will look more at extensions, Hosted and External, and how you work with those, including some hints on choosing where to propose changes – in the core vocabulary, in a hosted or an external extension.

William Denton: There Wasn't No Party Like the Rdio Party, Then the Rdio Party Stopped

planet code4lib - Thu, 2016-02-25 03:39

Back in October 2012 I was in Montreal for Access. One evening I was over at the cool place where Dan Chudnov and his wife were staying. Some cool music was playing: hard bop with a trumpet lead, maybe Lee Morgan. It was all cool.

“Who is this?”

“I don’t know,” said Dan.

“Huh?”

“It’s a playlist on Rdio. I only have one playlist there, called ‘Rudy.’ Every time I come across a Rudy Van Gelder remaster I add it to the list. Then I just put it on shuffle.”

Damn, I thought, that is fantastic. That night I signed up for Rdio. I listened to it almost every day for the next three years.

Rdio was great. Well, for two years or so it was great, then they made some changes and for a year it was just very good; looking back I see the changes were a failed attempt to stave off the problems that ended up putting them into bankruptcy and being bought by Pandora. See Vox’s Why Rdio Died for more about all that.

“Social from the ground up—it sounds like marketing speak, but it was legit,” says Chris Becherer, Rdio’s head of product. “The founding premise was the best music recommendations come from the people you know. That was the whole idea.”

That idea is correct. The social side of Rdio was wonderful: by following people you could see what they were listening to, what albums they’d added to their collection (you could see when someone discovered a new band), what playlists they were making, what comments they’d left, who they were following. I followed about fifty people: some old friends; Dan and a bunch of other Code4Lib people; librarians and archivists I work with or know around Ontario; some writers and musicians I discovered were on the system; and some people I just knew by repute on the web.

The best thing about Rdio (until they changed it and made it harder to find) was that your home page was a feed of what they people you followed had done recently. Every time you checked, you could see what people were listening to. It hepped me to all kinds of new music.

When playing an album—it was album-oriented, as I am—you could see the avatars of people who’d played that album recently. After a while I began to notice the same people showing up as having listened to some of the less popular albums I was checking out, ones only a few people had played in the last few months. I checked their profiles and saw that they listened to some stuff I knew I liked and some stuff I didn’t know at all—but on experimenting I found I liked a lot of it.

So I started following those people, and then all kinds of new music started opening up. These people knew a lot about music and they were happy to share their discoveries! I’d see their comments and recommendations, and if they really dug something, I’d listen. I picked up on a lot of great music that way, new and old.

The social side doesn’t only work starting from people you already know—you can get to know (in an online way) people from the music they listen to and review.

My Rdio profile just before it all went under. I really was listening to Air Supply—the track was on a group Rdio-is-dying playlist.

I couldn’t possibly make a list of all of the musicians and composers I got to like because of recommendations from my Rdio network, so I’ll just pick one: Circle. A fellow named CAW aka the Aquatic Ape—I’m pretty sure he’s a fellow and that he lives in the US, but that’s all—started pointing out albums or tracks by them, and when I listened, I really liked what I heard. I’m still listening. (I’ll do a list of my favourite tracks some day, but if you want to try one, check their cover of Brian Eno’s “Here Come the Warm Jets” from Serpent. Glorious.)

My sincere thanks to CAW and everyone else I followed on Rdio who opened up my ears to new music.

The Soul Rebels, a New Orleans brass band. If they come to your town, do not miss it.

Late last year, Rdio went bankrupt. What to do? Apple’s iTunes and Google’s Play Music are out from the start. I tried Spotify but it was kind of creepy, and on Linux I had to install a proprietary binary, so that’s a no go.

I ended up with Deezer, and a bunch of other Rdio people also went, and tried to make a go of it. It has social features: you can follow people and, to some extent, see what they’ve been listening to or have recommended.

The thing is, Deezer is shite. Everything about it is worse than Rdio—except, as one person pointed out, that it hasn’t gone bankrupt. But the web player, the Android app, the social side: the queue: all shite. I won’t go into details, but believe me, it all works, but it’s basically shite.

Screenshot of Deezer.

That’s a screenshot of Deezer, with a social sidebar where CAW (under a new name) is recommending Bright Lights and Filthy Nights by Nina Walsh, which doesn’t look like the kind of thing I dig, but I’m happy to know about it. Skydivingrhino and geemarcus are two other Rdio people whose recommendations I always liked. They’re on Deezer but no one is as active there as on Rdio, and it’s so much harder to do anything social that it’s tough to find the energy.

Also, the music quality isn’t good. I regularly came across albums where the quality of their digital version was so poor it was distorted. I never had that problem on Rdio. I listen to distorted music, sure, but I want that to be the artist’s intent, not a bad rip.

So: the social side doesn’t work and the music doesn’t sound good.

The hell with that, I decided, and signed up with Tidal.

Screenshot of Tidal.

There is no social side to Tidal. None at all. But the music quality is great. It costs more, but $20 a month is still fine by me, and if more goes to the artists, that’s even better. It doesn’t have everything Rdio had (some Rush albums are missing?!), or Deezer had (no Glass Box), but it has artists they didn’t (Prince), and anyway, if I really want something I’ll buy it, on CD or as FLAC. Of course I already own all the Rush.

The web interface and the app are nice. They are not shite.

Things I hope Tidal does:

  • Goes social.
  • Lets me follow labels.
  • Remembers my listening history.
  • Integrates the app and the web client; being able to control what’s playing on the web version from the app is great when you have a cat on your lap.
  • Improves the tool tips on links so that when I hover over a song or album it gives full information.
  • Adds recommendations based on my listening.
  • Lets me personalize what it thinks I might like, by letting me plus and minus tracks it suggests.
  • Lets me add albums and playlists to queues.

In other words, the more it does to be like Rdio, the better.

In the mean time, I’ll get by. I miss the social side, but I miss it when it worked, on a site that’s gone, a lot more than the shite version on a site that’s still around.

For my part now, I’m going to try to post more about music I’m listening to, and give some favourite new discoveries or playlists I’ve made.

John Lydon’s goodbye at this Public Image Ltd. concert: 'We do this because we love it. Fuck the system. Good night.’

All the streaming music services are silos. You can export playlists from one to another, and Rdio let me download all my data before it went bust (my thanks to the core developers to whom I suspect this meant so much they made the company do it), but once you sign up with one service, you’re isolated, and no one on other services can see what you’re doing (except, to a limited extent, through Last.fm or Libre.fm).

There’s no indie web solution to online streaming music yet. We need one.

Some sources I use for finding new music:

One last recommendation: “204” by Booker Ervin, from Tex Book Tenor, recorded in 1968 with Ervin on tenor, Woody Shaw on trumpet, Kenny Barron on piano, Jan Arnet on bass and Billy Higgins on drums. Post bop that takes off like a rocket and swings hard, with blistering solos and tasty eights at the end. And recorded by Rudy Van Gelder!

DuraSpace News: “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Activity” Webinar Recording Available

planet code4lib - Thu, 2016-02-25 00:00

Austin, TX  In order to gauge true scholarly impact, efforts to capture scholarly activity have been driven both at the community and institutional level.  VIVO captures information only available at the local level, and SHARE harvests activity from almost 100 community sources, not just repository data.  VIVO and SHARE together bring us closer to a wider picture of today’s scholarship.

DuraSpace News: NEW RELEASE: DSpace-CRIS 5.4.0 Now Available

planet code4lib - Thu, 2016-02-25 00:00

From Andrea Bollini, Michele Mennielli

District Dispatch: A non-transformative argument for orphan works

planet code4lib - Wed, 2016-02-24 21:21

Guest Post by Eric Harbeson, University of Colorado, Boulder

In the last decade, policymakers and advocates have been debating how best to solve the problem of “orphan works”—those works that are, or are presumed to be, under copyright, yet whose rightful owner cannot be identified or found. That orphan works exist (and all the evidence points to their existing in vast quantities) is a tragic flaw in our copyright system. Copyright provides an economic benefit to authors, as an incentive to invest the resources into creation. It also comes at a cost: early writings from Jefferson and Madison indicate the caution with which they and others recommended granting of monopoly rights; they generally abhorred such a thing, but recognized that it was the best option. And it worked very well for a long time. But copyright laws have expanded and evolved, adding automatic assignment of copyright, greatly expanding terms, and greatly devalued registration, and so we are left with an increasingly frequent condition: some would make use of copyrighted works, but no one is available to authorize and license the use.

Many solutions have been proposed, and in one case even come up for a vote in Congress (which failed). Of all of the solutions, I think the most elegant is fair use. But I think for fair use to work well as an orphan works solution, we may need to expand our understanding of the doctrine.

The public dialogue over fair use in the last several years has focused very heavily on the first evaluative factor—that of the purpose and character of the use—and specifically on whether the use is “transformative.” And for good reason. Beginning with a seminal work on the subject by judge Pierre Leval, followed shortly after by the Supreme Court’s landmark decision in Campbell v. Acuff Rose Music, Courts and commentators have placed a high importance on whether the use made of a copyrighted work recasts that work in a way that alters it, to quote the Campbell decision, “with new expression, meaning, or message.” There have been interesting efforts to codify how use of orphan works by libraries and archives constitutes fair use on the grounds that it is transformative. I would like to suggest, though, that the focus on transformativeness may miss the point entirely, because it relies on the use that is made, and not on the copyrighted work. To be orphaned is not, after all, a property of the use, but a property of the work. Just as a use is transformative whether or not the work is an orphan, shouldn’t likewise a general theory for orphan works focus not on how the work is being used, but on the work itself?

Photo credit: Pascal Terjan

The second fair use factor is where the courts are invited to consider the “nature of the copyrighted work.” Though courts have rarely spent more than a short paragraph applying the second factor (essentially boiling down to the question, “is the work primarily factual or creative?” and maybe, “is the work published?”), there is room for a much more complex analysis. Associate Register of Copyrights Rob Kasunic discusses this in detail in his excellent article, “Is That All There Is? Reflections on the Second Fair Use Factor” (31 Columbia Journal of Law and the Arts, 529). In fact, Kasunic’s analysis opens the second factor up so much, that the simplification by the courts to this point almost wants to ridicule itself. Why should we think we can apply such a basic question to all copyrighted works and hope for anything remotely helpful as a result? Inspired by Kasunic’s article, I’d like to suggest that it is here, in the second factor, that an orphan works solution might live.

If we drill down into the “nature of the copyrighted work,” there are many potential questions that might help a court resolve fair use questions for orphan works without having to make the transformative question determinative. Questions a court could ask of the nature of the work might include:

  • Was the copyright bargain a factor in the creation of the work in the first place? It probably would have been for a work of art or book, but almost certainly not for a letter, or other bit of ephemera.
  • Has the expected economic benefit from the work been achieved? A commercial work that has followed the normal arc of commercial viability, only to lie dormant for fifty years, probably has achieved the economic success it expected, probably contributing to the absence of the right holder.
  • Has the right holder made a reasonable effort to protect the work? A work whose commercial life has ended, but which the right holder still makes an effort to license and be found, would clearly point away from fair use under this factor (as well as in the fourth factor, the effect on the market). But abandonment of a work, like abandonment of a child, must push some caretaking responsibility to the public.

The beauty of using fair use to answer the orphan works question is that, when done correctly, it should solve the problem completely, to the satisfaction of both the public good and the right holder interests, and without the need for legislation. Right holder advocates have argued strenuously against cumbersome registration requirements, loss of rights, and loss of revenue that might come with a legislative action, and viewing this through the lens of fair use would obviate all those concerns. A right holder who comes forward after discovering her work could easily re-establish her active custodial interest in the work, possibly at the time she registers the work (required for any legal action). At the same time, a fair use solution would allow users (and especially libraries and archives) to pursue many valuable projects to benefit the public, without the risk of having to pay gargantuan statutory damages to an opportunistic plaintiff, and very likely without anyone ever feeling harmed. Copyright, after all, was never intended as a lottery ticket for a windfall in legal fees, nor should it ever be used as such.

In many cases, uses of orphan works are unquestionably transformative. HathiTrust and Google Books are two examples of the complete recasting that the courts have envisioned. However, the opinion by Judge Leval in the Google Books case makes me worry that the courts may set a higher bar for transformativeness than may previously have been thought. Given that, I think it is worthwhile for us to rethink whether transformativeness is the sine qua non of fair use.

The post A non-transformative argument for orphan works appeared first on District Dispatch.

SearchHub: 2015 Solr Developer Survey

planet code4lib - Wed, 2016-02-24 20:46

The results are in! We’ve got the results of the 2015 Solr Developer Survey – thank you to everyone that participated. It really helps us see a snapshot in time of the vibrant Solr community and how developers all over the world are doing amazing things with Apache Solr.

Basic Demographics

We kicked off the survey asking some basic demographic questions education, salary, industry, etc.

Most developers work full-time in technology/telecom and have a graduate-level education. More details below:

Location

Not surprisingly, most of our developers surveyed live in the United States with India a close second. Germany, UK, Italy, and France following after that.

  Version of Solr

We’re always curious about this one: Who is still working with older versions of Solr? It’s good to see Solr 4 at the head of the pack – with some developers still having to create or maintain apps built on Solr 1.4 (released back in November of 2009). 

Connectors and Data Sources

Before you can search your data you’ve got to get it into your Solr index. We asked the Solr developer community what connectors they relied on most for bringing their data sources into their Solr instance ready for indexing. MySQL and local filesystems like internal network drives and other resources were at the top – no surprise there. Other database technologies rounded out the top data sources with Amazon S3, the public web, and Hadoop all making an appearance.

Authentication, Security, and Redaction

Security is paramount when building search-driven applications that are created for a larger user base. The most popular authentication protocols are pretty much standard across the board for most developers: LDAP, Kerberos, and Active Directory.

We also wanted to know what levels and complexity of security that developers were using to block users from viewing unauthorized content. 40% said that they had no level of security – which is more than a little distressing. About the same amount had deployed document level security within a search app with other levels and methods following:

UI Frameworks

It’s always a thorny topic asking developers what frameworks they are using for their application. No surprise to see jQuery at the top with AngularJS.

Query Types

The survey also included a question to gauge the level of sophistication of query types apps that developers were including in their Solr apps. Text and keyword search was obvious and remains the foundation of most search projects. It was good to see representation for semantic and conceptual search becoming more prominent. And as mobile devices continue to take over the world, spatial and geo-specific search is more important than ever in helping users find people, resources, products, and services, and search results that are around where they are right now.

ETL Pipelines and Transformations at Indexing Time

We also wanted to know a little about what types of transformations Solr apps were performing at indexing time. The top transformations were pretty common across those surveyed –  at between 40% and 60% – included synonym identification, content extraction, metadata extraction and enrichment, named entity classification, taxonomies and ontologies. Sentiment analysis and security enrichment were less common.

ETL Pipelines and Transformations at Query Time

Transformations can also take place on the query side – as a query is sent by the user to the app and the list of results is returned to the user. Faceting, auto-suggest, and boost/block were in use by nearly half of the applications that developers were working on. Expect to see user feedback signals move up the chain as more search applications start aggregating user behavior to influence search results for an individual user, a particular cohort, or across the entire user base.

 Big Data Integrations

Solr plays well with others so we wanted to get a sense of what big data libraries and modules developers are adding to the mix. Storage and scalability workhorse Hadoop was part of the picture for over half of devs surveyed with Mongo and Spark in 1/3. Familiar faces like Cassandra, HBase, Hive, and Pig rounded out the less popular modules.

Custom Coding

And finally we wanted to know the kind of blood, sweat, and tears being poured into custom development. When you can’t find a library or the module that you need what do you do?

And that’s the end of our 2015 developer survey.

Thank you to everyone who participated and we’ll see ya in late 2016 to do an update!

The post 2015 Solr Developer Survey appeared first on Lucidworks.com.

Open Knowledge Foundation: Think big, start small, move fast

planet code4lib - Wed, 2016-02-24 13:43
How the York Museums Trust started opening up its collection – OpenGLAM Case study

More and more libraries, museums and other cultural institutions publish their collections online, often allowing users to reuse the material for research or creative purpose by licensing it openly. For institutions that start planning such a step, it may seem daunting at first: not all of their collection may be digitised, the metadata is not always perfect, copyright information is sometimes missing or the images have been taken a long time ago and are not of the best quality. Working towards having the perfect online collection is such a time-consuming process that it can get in the way of publishing any of the collection at all. Coupled with that is the fear that publishing raw, imperfect material online can damage an institution’s reputation.

Replica Roman Figurine, York Museums Trust, YORYM : 2006.2914

This case study by OpenGLAM (an initiative run by Open Knowledge International that promotes free and open access to digital cultural heritage held by Galleries, Libraries, Archives and Museums), describes how the York Museums Trust went about publishing their online collection, as well as the effect this had, including different examples of the reuse of their content. By publishing the collection fast, and allowing people to reuse their material, even though it was not yet perfect, they managed to engage with their audience, stimulate reuse and generate new interest in their collection and museums. It is exactly this type of approach (think big, start small, move fast) that Michael Edson, Associate Director/Head of Digital at United Nations Live Museum for Humanity, identified as on of the patterns that accelerates change in organisations last year at the Openlab workshop in December 2015 (see How Change Happens).

The study is based on an interview conducted with Martin Fell, Digital Team Leader at York Museums Trust and has been written within the frame of OpenGLAM’s current involvement in Europeana Space, a project that works on increasing and enhancing reuse of Europeana and other online collections of digital cultural content by creative industries especially. We hope that the story of how York Museums Trust opened up their rich collections can inspire other institutions to take steps in this direction, because, as Martin put it: “To just say the content is not good enough for us, and therefore no one can see it, did not sit right with me”.

Read the full case study here: OpenGLAM_Case Study_York Museums Trust_Feb2016

DPLA: Maximizing Access to eBooks

planet code4lib - Wed, 2016-02-24 13:30

Since our launch almost three years ago, the Digital Public Library of America has sought to maximize access to our shared culture. Thanks to our many library, archive, and museum partners, we’ve been able to share over 11 million items, including a wide range of artifacts, documents, artworks, photographs, audiovisual materials, and, of course, many books.

We’re extremely proud of what we’ve accomplished in a relatively short time, but always seek to do more to connect the greatest number of people with the largest number of works. Over the last year, the case of popular contemporary ebooks has been of special interest, since we cannot pursue them in the same way as for other materials, through digitization and open access on the web. Publishers maintain copyright on most of these books, and many readers wish to read them on tablets and smartphones.

The Open eBooks app is now available for iOS and Android smartphones and tablets.

So we were thrilled when the possibility arose a year ago to contribute to the White House ConnectED Initiative, partnering with The New York Public Library and First Book, with help from Baker & Taylor and financial support from the Institute of Museum and Library Services and the Alfred P. Sloan Foundation to receive a remarkable $250,000,000 in ebooks from big publishers, and to make them available to those most in need. Critically, qualified kids will be able to read any of these ebooks on a whim, and at the same time, unlike with apps that require a reader to check a book back in before it can be read by someone else. This is truly “all you can read” for children in low-income areas of the United States.

Last summer the DPLA recruited a diverse corps of librarians to help us shape the collection, selecting great books which represent a variety of perspectives, experiences, and voices. It was our intention, as it is with every library, to create a collection that excites and inspires children to read, sparking their curiosity to learn more.

There is undoubtedly much left to do to fulfill our mission, and no program is perfect or can solve all issues. We see the Open eBooks Initiative as a first—but quite major—step for DPLA to bring more modern ebooks to children and adults in an expansive way. With slightly less fanfare than the Open eBooks Initiative we have been pursuing alternative means of providing ebooks, by working with publishers and authors, and by assessing the existing riches within our hubs’ open access collections for gems that the public would like easier access to.

Today, with our partners, we launch a national program for children in need, to help them discover the joy of reading and to complement their public library, which holds thousands of other books to explore. Tomorrow, we will provide many more ebooks to even larger audiences. Stay tuned, and join us in this mission.

District Dispatch: Long-awaited Open eBooks app launched

planet code4lib - Wed, 2016-02-24 12:35

A young boy enjoys reading an ebook on his tablet. Image courtesy of Milltown Public Library.

Today, the White House announced that the Open eBooks app to put ebooks in the hands of lower income children and young adults aged 4-18, is up and running. The Open eBooks service to provide age appropriate reading materials, was developed in partnership with the Digital Public Library of America, the New York Public Library, FirstBook, and Baker & Taylor. With the app, the Open eBooks service can begin.

The Institute of Library and Museum Services (IMLS) and the Alfred P. Sloan Foundation provided grant funds to get the project up and running. The Open eBooks service will require the help of librarians, teachers and other professionals who can register eligible children and young adults and provide them with an access code to enjoy thousands of books made available by publishers, including the Big Five. The service is modelled after the existing FirstBook services for print books. The ebooks can be accessed and read from any smart phone or tablet that runs on Android or iOS mobile operating systems. The Open eBooks web site notes:

Open eBooks is an app containing thousands of popular and award-winning titles that are free for children from low-income households. These eBooks can be read without checkouts or holds. Children from low-income families can access these eBooks, which include some of the most popular works of the present and past, using the Open eBooks app and read as many as they like without incurring any costs. The goal of Open eBooks is to encourage a love of reading and serve as a gateway to children reading even more often, whether in school, at libraries, or through other ebook reading apps.

I think it goes without saying that implementing the Open eBooks service initially will be a bumpy ride—isn’t everything?—but worthwhile to so many young people. Time and experience with the new app will allow for working out any kinks.  Everyone should have books to read!  And that’s what the new Open eBooks app is all about.

The post Long-awaited Open eBooks app launched appeared first on District Dispatch.

DPLA: Open eBooks Opens World of Digital Reading to Children

planet code4lib - Wed, 2016-02-24 12:30

February 24, 2016 – Open eBooks, a new initiative and e-reader app that will make thousands of popular, top-selling eBooks available to children in need for free, is launching today. First Lady Michelle Obama is releasing a video today raising awareness of the new opportunity for children. The initiative is designed to address the challenge of providing digital reading materials to children living in low-income households, and offers unprecedented access to quality digital content, including a catalog of eBooks valued at more than $250 million.

President Obama announced a nongovernmental eBooks effort in support of the ConnectED Initiative at the April 30 Kids Town Hall held by the White House at the Anacostia Branch of the District of Columbia Public Library. ConnectED is multi-pronged effort designed to provide all youth with access to high-quality digital learning tools. Since it launched, over 20 million more students have been connected to high-speed broadband in their schools and libraries and millions more are taking advantage of its free private sector resources. Open eBooks complements the new digital infrastructure to provide an opportunity for kids in need to have a world-class eLibrary in their homes.

A coalition of literacy, library, publishing and technology partners joined together to make the Open eBooks program possible. The initiative’s partners — Digital Public Library of America (DPLA), First Book, and The New York Public Library (NYPL), with content support from digital books distributor Baker & Taylor — created the app, curated the eBook collection, and developed a system for distribution and use. They received financial support from the Institute of Museum and Library Services (IMLS) and content contributions from major publishers. National Geographic announced today that they will provide all of their age-appropriate content to the app, joining publishers Bloomsbury, Candlewick, Cricket Media, Hachette, HarperCollins, Lee & Low, Macmillan, Penguin Random House, and Simon & Schuster, who made commitments providing thousands of popular and award-winning titles last year.

The books in the Open eBooks collection were selected by the DPLA Curation Corps, which was established to ensure a diverse, compelling, and appropriately targeted set of thousands of titles—something from which every child could read, enjoy, and learn. The Curation Corps was selected through a competitive process from a pool of more than 140 applicants from across the country, and they bring their extensive experience helping children select titles in school and public libraries.

Adults who work with children in need through libraries, schools, shelters and clinics, out-of-school programs, military family services, early childhood programs and other capacities can qualify for Open eBooks credentials by first signing up with First Book and then requesting Open eBooks access for the children they serve. Students can download the free Open eBooks app to their individual devices from the App Store or Google Play and enter their access code to start enjoying Open eBooks.

“We are thrilled to be a part of this fantastic initiative that will bridge a major gap in our society and help all children discover a love of reading,” said Dan Cohen, DPLA’s executive director. “Maximizing access to our culture has been the Digital Public Library of America’s goal from its inception, and we are so delighted to join together with such great partners to make eBooks much more widely available.”

“The Open eBooks initiative recognizes the critical need for books — in all forms — among children growing up in families in need,” said Kyle Zimmer, president and CEO of First Book. “We’re proud to support this ground-breaking effort to put high quality digital content into the hands of those who need it most, and to welcome the teachers and program leaders seeking access to these resources into the largest national network of educators serving kids in need.”

“The New York Public Library is proud to work with these partners on the Open eBooks initiative, in support of the White House’s ConnectED initiative that is perfectly aligned with NYPL’s mission to provide free and open access to information, education, and opportunity,” said Tony Marx, president and CEO of The New York Public Library.

“This program is the result of an extraordinary public-private partnership, which could not have been made possible without the support of many committed partners, particularly those in our libraries who really stepped forward to help move this vision into reality,” said IMLS Director Kathryn K. Matthew. “Digital books open new doors to learning opportunities for students and can underpin brighter educational futures. IMLS is very proud to be a part of this unique initiative.”

“We hope that by donating our technology to this innovative program, we help expand access to information and create new reading opportunities for school-age children throughout America,” said George Coe, president and CEO of Baker & Taylor.

In the future, the partners will expand the initiative by adding to the collection with new and enhanced content from publishers and public domain titles; broadening the network of Title I schools, preschools, libraries, and other programs; incorporating new features into the app; and researching and sharing the effort’s impact and best practices.

Access and Equality

The Open eBooks initiative is a significant step toward more equitable digital access for all U.S. residents, addressing the need for free, quality digital content for children in pre-kindergarten through high school. Specifically targeting youth in need, Open eBooks aims to ensure that any device can be enjoyed as a tool to deepen a child’s love of reading.

While Internet access and device availability remain major hurdles in closing the digital divide, a recent study funded by the Gates Foundation and published by the Joan Ganz Cooney Center  finds  85% of families below the poverty line have a mobile device (tablet or smartphone) in surveyed households with children aged 6 to 13.  Additionally, a growing number of students can access and borrow electronic reading devices, and connect to the Internet at school and local public libraries. Open eBooks is designed to complement the Wi-Fi, computer, and physical book offerings of public libraries and school libraries, and serve as a gateway to more reading.

The Open eBooks Collection

The catalog of content in the Open eBooks initiative includes contributions of the most exciting, top-selling titles from publishers. Using Open eBooks, children will be able to build their own virtual collection of favorites and access single titles. The major publishers have committed to make thousands of popular and award-winning titles available to students over a three-year period include: Bloomsbury, Candlewick, Cricket Media, Hachette, HarperCollins, Lee & Low, Macmillan, National Geographic Kids, Penguin Random House, Simon & Schuster.

Snapshot of the Open eBooks initiative

Each partner has made, and will continue to make, a unique contribution to the success of this initiative:

  • The app: The New York Public Library developed the app that allows users to easily access the full text and illustrations of thousands of titles generously contributed by publishers.
  • The distribution services: Baker & Taylor provided support with publisher relations, content management and the digital distribution technology.
  • The eBook collection: The Digital Public Library of America recruited and enlisted a team of expert librarians to curate the collection to ensure a diverse, compelling, and appropriately targeted set of thousands of titles—something for every child at any age and reading level to read, learn from, and enjoy.
  • Reaching the children: First Book, a non-profit social enterprise that provides books and educational resources to classrooms and programs serving kids in need, will tap into its network of more than 225,000 schools and programs to reach children in Title I schools, Head Start programs, military families, after school or community programs, and others serving low-income families.

Qualifications

The Open eBooks app is available through Title I and Title I-eligible schools as well as libraries, preschools, and community after school programs serving a minimum of 70 percent children in need. The program will also be available through schools and programs serving children whose families are enlisted in the armed forces, or serving special needs children.

How do programs and classrooms get started?

The Open eBooks initiative site, at www.openebooks.net, has full program instructions, including Frequently Asked Questions and links to program registration. From there, qualifying educators, librarians, community program directors, and others working with low-income children and youth must register their organization with First Book. Next, users will request a code and PIN combination for every student they serve or device available, and they should indicate the student’s grade level from one of three categories: elementary, middle or high school. Qualifying educators will be able to obtain enough codes to cover all of the students that they serve. Codes will correspond to Open eBooks Elementary Collection (for PreK – Grade 4), Open eBooks Middle School Collection (for Grades 5 – 8), Open eBooks High School Collection (for Grades 9 – 12). An All Ages code will also be available.

The registrant will receive a confirmation email with the codes and a letter for families and caregivers with instructions on how to download the Open eBooks app and input the code and PIN combination for their child. The app requires a device with an iOS 8.0 or later operating system or Android equivalent.  

The Open eBooks app allows users to instantly borrow up to 10 eBooks at a time to their digital device. Each borrowed eBook will be available for 56 days before it must be renewed, or the eBook will be automatically returned. Because of this automatic return process, there are no late fees or penalties for Open eBooks users. Students and their families can choose eBooks based on the topics that get them excited about reading and learning, and sort by reading level, grade level, or title. The app can be used anywhere with an Internet connection.

The First Book Help Team can be reached at help@firstbook.org or by phone at (866) 732-3669 (8am – 6pm EST).

Information and updates on the initiative will be shared on the Open eBooks website and on Facebook and Twitter.

About the Digital Public Library of America

Launched in April 2013, the Digital Public Library of America brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world. Connecting digital collections of a growing network of the nation’s libraries, archives, and museums, the DPLA provides access to this collection, free to all, through its website and API. Learn more about the DPLA by watching this brief video or by visiting their website at http://dp.la.

About The New York Public Library

The New York Public Library is a free provider of education and information for the people of New York and beyond. With 92 locations—including research and branch libraries—throughout the Bronx, Manhattan, and Staten Island, the Library offers free materials, computer access, classes, exhibitions, programming and more to everyone from toddlers to scholars, and has seen record numbers of attendance and circulation in recent years. The New York Public Library serves more than 18 million patrons who come through its doors annually and millions more around the globe who use its resources at nypl.org. To offer this wide array of free programming, The New York Public Library relies on both public and private funding. Learn more about how to support the Library at nypl.org/support.  

About First Book

First Book is a non-profit social enterprise that has distributed more than 135 million books and educational resources to programs and schools serving children from low-income families throughout the United States and Canada. By making new, high-quality books and educational resources available on an ongoing basis to its network of educators and program leaders, First Book is transforming the lives of children in need and elevating the quality of education. For more information, please visit firstbook.org or follow the latest news on Facebook or Twitter.  

About Baker & Taylor

Baker & Taylor, LLC is the premier worldwide distributor of books, digital content and entertainment products. The company offers cutting-edge digital media services and innovative technology platforms to thousands of publishers, libraries, schools and retailers worldwide. Baker & Taylor also offers industry-leading customized library services and retail merchandising solutions. Charlotte, N.C.-based Baker & Taylor is majority owned by Castle Harlan Partners IV, L.P., an institutional private equity fund managed by Castle Harlan, Inc., a leading private equity investment firm. For more information, please visit www.baker-taylor.com.

Press Contacts:

Info@openebooks.net

The Open eBooks team is proud to support President Obama’s ConnectED Initiative. Learn more at www.wh.gov/ConnectED.

In the Library, With the Lead Pipe: Change In Publication Schedule

planet code4lib - Wed, 2016-02-24 12:00

Hello friends and readers! We’re writing to let you know about a change in our publication strategy, effective immediately.

Moving forward, we will be publishing articles on a rolling basis, instead of on every other Wednesday. At the moment, fewer submissions are coming in, but we still think In The Library With The Lead Pipe is an important dialogue space for our profession. Keep sending us your proposals as you think of topics—we’ll just be sharing them as they come, instead of adhering to a regular publication schedule. Each article will still get a minimum of two weeks on the front page of the site. You may recall that we made a change in our publication strategy last May as well. We think it’s important to be transparent as a journal about when and why we make decisions like this.

Thinking of submitting a proposal? Check out what our editorial board members are most interested in from our most recent calls for articles:

Cynthia Ng: Tips and Error Fixing When Using the Save As DAISY Word Plugin

planet code4lib - Wed, 2016-02-24 00:43
I’ve been using the Save As DAISY Word plugin a lot lately, and while the documentation is pretty good, there are some bugs and other things that pop up. Since the plugin is no longer under development, I thought I would document here some of the workarounds and ways to fix errors caused by existing … Continue reading Tips and Error Fixing When Using the Save As DAISY Word Plugin

Patrick Hochstenbach: Crosshatching a beard

planet code4lib - Tue, 2016-02-23 21:05
Filed under: portaits, Sketchbook Tagged: beard, copic, crosshatching, ink, moleskine, pen, sketchbook

SearchHub: Secure Fusion: SSL Configuration

planet code4lib - Tue, 2016-02-23 17:00

This is the first in a series of articles on securing your data in Lucidworks Fusion.

The first step in securing your data is to make sure that all data sent to and from Fusion is encrypted by using HTTPS and SSL instead of regular HTTP. Because this encryption happens at the transport layer, not the application layer, when Fusion is configured for HTTPS and SSL, the only noticeable change is the lock icon the browser location display which indicates that the HTTPS protocol is being used:

SSL encryption keeps your data private as it travels across the wire, preventing intermediate servers from eavesdropping on your conversation, or worse. SSL certificates are used to verify the identity of of a server in order to prevent “man-in-the-middle” attacks. Should you always configure Fusion to use SSL? If Fusion is on a secure network and doesn’t accept requests from external servers, no. Otherwise, Yes!

To configure Fusion to use SSL, you must configure the Fusion UI service for SSL. All requests to Fusion go through the Fusion UI service. This includes requests to the Fusion REST-API services, because the Fusion UI service contains the Fusion authentication proxy which controls user access permissions. Because the Fusion UI service (currently) uses the Jetty server, most of this post is about configuring the Jetty server for SSL. The Eclipse website provides a good overview and detailed instructions on configuring Jetty for SSL: http://www.eclipse.org/jetty/documentation/current/configuring-ssl.html

There are two conceptual pieces to the configuration puzzle:

  • SSL keypairs, certificates, and the JSEE keystore
  • Fusion UI service configuration
Puzzle Piece One: SSL Keypairs, Certificates, and the JSEE Keystore

In order to get started, you need a JSSE keystore for your application which contains an SSL keypair and a signed SSL certificate. The SSL protocol uses a keypair consisting of a publicly shared key and a private key which is never shared. The public key is part of the SSL certificate. The server’s keystore contains both the keypair and the certificate.

At the start of the session, the client and server exchange series a of messages according to the SSL Handshake protocol. The handshake process generates a shared random symmetric encryption key which is used for all messages exchanged subsequent to the handshake. During the initial message exchange, the server sends its SSL certificate containing its public key to the client. The next two turns of the conversation establish the shared symmetric key. Because of clever properties of the keypair, the client uses the public key to generate a message which can only be decrypted by the holder of the private key, thus proving the authenticity of the server. Since this process is computationally expensive, it is carried out only once, during the handshake; after that, the shared symmetric key is used with an agreed-on encryption algorithm, details of which are beyond the scope of this blog post. A nice overview of this process, with schematic, is available from this IBM docset. For the truly curious, I recommend reading this writeup of the math behind the handshake for non-mathematicians.

In addition to the public key, a certificate contains the web site name, contact email address, company information. Certificates are very boring to look at:

Bag Attributes friendlyName: localhost localKeyID: 54 69 6D 65 20 31 34 35 35 38 34 30 33 35 36 37 37 35 Key Attributes: <No Attributes> -----BEGIN RSA PRIVATE KEY----- Proc-Type: 4,ENCRYPTED DEK-Info: DES-EDE3-CBC,E2BCF2C42A11885A tOguzLTOGTZUaCdW3XzoP4xDPZACEayuncv0HVtNRR3PZ5uQNUzZaNX0OgbSUh5/ /w6Fo7yENJdlTgMC4XafMRN+rTCfVj3XBsnOvQVj7hLiDq1K26XpvD79Uvb2B4QU ... (omitting many similar lines) ... x3LI5ApQ2G2Oo3OnY5TZ+EYuHgWSICBZApViaNlZ4ErxXp1Xfj4iFtfi50hcChco poL9RdLpOx/CyLuQZZn5cjprIjDA3FcvmjBfOlmE+xm+eNMIKpS54w== -----END RSA PRIVATE KEY----- Bag Attributes friendlyName: localhost localKeyID: 54 69 6D 65 20 31 34 35 35 38 34 30 33 35 36 37 37 35 subject=/C=NA/ST=NA/L=Springfield/O=some org/OU=some org unit/CN=firstname lastname issuer=/C=NA/ST=NA/L=Springfield/O=some org/OU=some org unit/CN=firstname lastname -----BEGIN CERTIFICATE----- MIIDqzCCApOgAwIBAgIEIwsEjjANBgkqhkiG9w0BAQsFADB4MQswCQYDVQQGEwJO QTELMAkGA1UECBMCTkExFDASBgNVBAcTC1NwcmluZ2ZpZWxkMREwDwYDVQQKEwhz ... (omitting many similar lines) ... fALku9VkH3j7PidVR5SJeFzwjvS+KvjpmxAsPxyrZyZwp2qMEmR6NPjLjYjE+i4S 04UG7yrKTm9CuElddLFAnuwaNAuifbbZ6P3BR3rFaA== -----END CERTIFICATE-----

Certificates are signed by a CA (Certificate Authority), either a root CA or an intermediate CA. Intermediate CAs provide enhanced security, so these should be used to generate the end user certificate.

You need to get a signed certificate and an SSL keypair from your sys admin and put it into the keystore used by the Fusion UI Jetty server. In a production environment, you will need to set up your keystore file in a secure location with the appropriate permissions and then configure the Fusion UI Jetty server to use this keystore. If you don’t have a signed SSL certificate, you can get a keystore file which contains an self-signed certificate suitable for development and demos by running the Jetty start.jar utility, details in the next section.

The Java keytool utility which is part of the JDK can be used to store the server certificate and private key in the keystore. There are several file formats used to bundle together the private key and the signed certificate. The most commonly used formats are PKCS12 and PEM. PKCS12 files usually have filename extension “.p12” or “.pfx” and PEM files usually have filename extension “.pem”. In a Windows environment, you will most likely have a “.pfx” file which contains both the private key and the signed certificate and can be uploaded into the keystore directly (see example below). In a *nix environment, if you have a bundle of certification files and a keypair file, you will have to use the openssl tool to create a PKCS12 file which can then be uploaded into the keystore via the keytool. Signed certificate files have suffix “.crt” and private key files have suffix “.key”, however you should always check to see whether these are binary files or if the contents are ascii, which are most likely already in the PEM format shown above.

The following example uses the Java keytool utility to create a new keystore named “my.keystore.jks” from the private key and signed certificate bundle “my.keystore.p12” which is in pkcs12 format. The keytool prompts for a keystore passwords for both the source and destination keystore files:

> keytool -importkeystore \ > -srckeystore my.keystore.p12 \ > -srcstoretype pkcs12 \ > -destkeystore my.keystore2.jks \ > -deststoretype JKS Enter destination keystore password: Re-enter new password: Enter source keystore password: Entry for alias localhost successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled

To check your work, you can use the keytool command “-list” option:

> keytool -list -keystore my.keystore2.jks Enter keystore password: Keystore type: JKS Keystore provider: SUN Your keystore contains 1 entry localhost, Feb 18, 2016, PrivateKeyEntry, Certificate fingerprint (SHA1): 63:1E:56:59:65:3F:83:2D:49:F1:AC:87:15:04:1A:E4:0C:E1:26:62 Puzzle Piece Two: Fusion UI Service Configuration

The Fusion UI service uses the Jetty server, which means that you must first configure the Fusion UI service Jetty server which is found in the Fusion directory apps/jetty/ui, and then change the port and protocol information in the Fusion UI service start script which is found in the Fusion directory bin.

Jetty Server Configuration

The configuration files for Fusion services which run on the Jetty server are found in the Fusion distribution directory apps/jetty. The directory apps/jetty/ui contains the Jetty server configured to run the Fusion UI service for the default Fusion deployment. The directory apps/jetty/home contains the full Jetty distribution.

The following information is taken from the http://www.eclipse.org/jetty/documentation/current/quickstart-running-jetty.html#quickstart-starting-https documentation.

The full Jetty distribution home directory contains a file “start.jar” which is used to configure the Jetty server. The command line argument “–add-to-startd” is used to add additional modules to the server. Specifying “–add-to-startd=https” has the effect of adding ini files to run an ssl connection that supports the HTTPS protocol as follows:

  • creates start.d/ssl.ini that configures an SSL connector (eg port, keystore etc.) by adding etc/jetty-ssl.xml and etc/jetty-ssl-context.xml to the effective command line.
  • creates start.d/https.ini that configures the HTTPS protocol on the SSL connector by adding etc/jetty-https.xml to the effective command line.
  • checks for the existence of a etc/keystore file and if not present, downloads a demonstration keystore file.

Step 1: run the start.jar utility

To configure the Fusion UI service jetty server for SSL and HTTPS, From the directory apps/jetty/ui, run the following command:

> java -jar ../home/start.jar --add-to-startd=https

Unless there is already a file called “keystore” present, this utility will install a local keystore file that contains a self-signed certificate and corresponding keypair which can be used for development and demo purposes. In addition, it adds files “http.ini” and “ssl.ini” to the local “start.d” subdirectory:

> ls start.d http.ini https.ini ssl.ini

This set of “.ini” files determines the Jetty control the configuration of the server. These files are Java properties files that are used to configure the http, https, and ssl modules. In the default Fusion distribution, the directory apps/jetty/ui/start.d only contains the file “http.ini”, so the Fusion UI service runs over HTTP. The HTTPS module requires the SSL module, so “.ini” files for both are added.

Step 2: edit file start.d/ssl.ini

The “.ini” files are Java property files. As installed, the “ssl.ini” file is configured to use the demonstration keystore file. This is convenient if you are just trying to do a local install for development purposes, and is especially convenient if you don’t yet have the requisite certification bundles and keystore – at this point, your configuration is complete. But if you have a real keystore, you’ll need to edit all keystore-related properties in the “ssl.ini” file:

### SSL Keystore Configuration ## Setup a demonstration keystore and truststore jetty.keystore=etc/keystore jetty.truststore=etc/keystore ## Set the demonstration passwords. ## Note that OBF passwords are not secure, just protected from casual observation ## See http://www.eclipse.org/jetty/documentation/current/configuring-security-secure-passwords.html jetty.keystore.password=OBF:1vny1zlo1x8e1vnw1vn61x8g1zlu1vn4 jetty.keymanager.password=OBF:1u2u1wml1z7s1z7a1wnl1u2g jetty.truststore.password=OBF:1vny1zlo1x8e1vnw1vn61x8g1zlu1vn4

Note the obfuscated passwords – use the tool mentioned above: http://www.eclipse.org/jetty/documentation/current/configuring-security-secure-passwords.html which can be used to obfuscate, checksum, or encrypt passwords.

Step 3 (optional): disable use of HTTP

You can entirely disable use of HTTP by removing the HTTP connector from the jetty startup configuration:

> rm start.d/http.ini Fusion UI Service Configuration

The Fusion UI service startup script is found in the Fusion distribution bin directory, in files bin/ui and bin/ui.cmd, for *nix and Windows respectively. It sets a series of environment variables and then starts the Jetty server. The arguments to exec command to start the Jetty server require one change:
change the command line argument from “jetty.port=$HTTP_PORT” to “https.port=$HTTP_PORT”

The UI service startup script is called from the main start script bin/fusion. Once Fusion startup is complete, you should be able to access it securely in the browser via the HTTPS protocol (see initial screenshot above). If the server is using a self-signed certificate, both Firefox and Chrome browsers will issue warnings requiring you to acknowledge that you really want to trust this server. The Chrome browser also flags sites using a self-signed certificate by displaying a warning icon instead of the secure lock icon:

Discussion

SSL is the seatbelt required for cruising the information highway: before sending your data to some remote server, buckle up! Future releases of Fusion will provide even more security in the form of “SSL everywhere”, meaning that it will be possible to configure Fusion’s services so that all traffic between the components will be encrypted, keeping your data safe from internal prying eyes.

SSL provides data security at the transport layer. Future posts in this series will cover:

  • Security at the application layer, using Fusion’s fine-grained permissions
  • Security at the document layer with MS Sharepoint security-trimming
  • Fusion and Kerberos

The post Secure Fusion: SSL Configuration appeared first on Lucidworks.com.

Terry Reese: MarcEdit: Build New Field Enhancement

planet code4lib - Tue, 2016-02-23 16:20

I’m wrapping up a few odds and ends prior to releasing the next MarcEdit update – mostly around the linked data work and how the tool works with specific linked data services – but one of the specific changes that should make folks using the Build New Field tool happy is the addition of a new macro that can be used to select specific data elements when building a new field. 

So, for those that might not be aware, the build new field tool is a pattern based tool that allows users to select information from various MARC fields in a record and create a new field.  You can read about the initial description at: http://blog.reeset.net/archives/1782 and the enhancements that added a kind of macro language to the tool here: http://blog.reeset.net/archives/1853

When the tool runs, one of the assumptions that is made is that the tool pulls the data for the pattern from the first field/field/subfield combination that meets the pattern criteria.  This works well if your record has only a single field for the data that you need to capture.  But what if you have multiple fields.  Say for example, the user needs to create a call number, and one of those elements will be the ISBN – however, the record has multiple ISBN fields like:
=020  \\$a123456 (ebook)
=020  \\$a654321 (hardcopy)

Say I need to specifically get the ISBN from the hardcopy.  In the current build new tool function, this wouldn’t be possible without first changing the first 020 to something else (like an 021) – then changing it back when the operation was completed.  This is because if I used say:
=099  \\$aMyCall Number {020$a}

I would get the first 020$a value.  There hasn’t been a way to ask for the tool to find specific field data in this function.  But that has changed – I’ve introduced: find. 

Function: .find
Arguments: needle
Example: {020$a.find(“hardcopy”)}

Find will allow you to selectively find data in a field.  So, in the example above, I can now select the correct 020.
=020  \\$aMyCall Number {020$a.find(“hardcopy”).replace(“(hardcopy)”,””)}

This will output:
=020  \\$aMyCall Number 654321

A couple notes about usage.  Find must always be the first option in a chain of macros.  This is because the tool actually does the other operations like substitutions – so the criteria being queried must reflect the data in the record at read – not after it has been processed.  If you place find in any other position, you may invalidate your pattern. 

This will be part of the next upcoming MarcEdit update.

–tr

Pages

Subscribe to code4lib aggregator