You are here

Feed aggregator

Islandora: Islandora CLAW Lessons - Update!

planet code4lib - Mon, 2016-03-21 13:31

We are three lessons into our series of webinars detailing how to develop in Islandora CLAW, led by CLAW Committer Diego Pino ( If you haven't been attending, you've missed out on some great expressions of the CLAW stack via colorful doodles, such as the difference between the Islandora 7.x-1.x hamburger and the Islandora 7.x-2.x lobster chimera.


If that doesn't make any sense to you, then good news! You can catch up on what you've missed by viewing the recorded sessions:

Week One: Intro to Fedora 4.x

Week Two: Hands-on Creating Fedora 4.x Resources

Week Three: Data Flow in the CLAW

Once you have caught up, why not join us for the rest of the lessons in real time? They will continue on for another six weeks, every Tuesday at 11AM Eastern time, on Adobe Connect. Here's what's in store:

General Outline:
  • Basic Notions of Fedora 4 (sessions 1 and 2)
    • How Fedora 4 Works - General Intro and differences between Fedora 3 and 4
      • RDF instead of XML
      • Fedora 4 REST API
  • Introduction to CLAW
    • How Data Flows (session 3)
    • Sync Gateway (how to trigger the sync) (session 3)
      • Basics of Camel (session 4)
    • Adding/creating new content type (session 5 - 6)
    • PHP Microservices (session 7 - 8)
      • Intro/Overview
      • Basics of Silex
      • Dissecting a Service
      • Interacting with Fedora via Microservices
  • How to Join a Sprint (session 9)

With thanks to our hosts:


Access Conference: Discounts and Scholarships 2016

planet code4lib - Mon, 2016-03-21 05:23

We all know that Access is one of the best deals around for a tech conference. There are great speakers, great activities and great food, all for one amazingly low price. You get the hackfest, two and half days of our single-stream conference and the workshop for one great price of $450 Canadian (I know, right?).

The keeners and well-organized won’t miss out on the Early Bird tickets, which should hit store shelves in June for an incredible $350. That translates into less than $100 per day of awesomeness and we feed you. Shut the front door!

If you are working on an even tighter budget, there are still some options for you:

  1. Students – if you are a full-time student and trying save for Access, we are also making available 25 deeply discounted tickets just for you at the rock-bottom price of $200.
  2. Be a Presentersubmit a proposal, rock our world and we’ll hook you up for $300. That’s a 33.33333% savings for sharing your awesome project, idea or words of wisdom with your peers. What can go wrong?

Finally, if attending Access is still a stretch for your budget, we will once again have two Diversity Scholarships available. To qualify, you need to be from a “traditionally underrepresented and/or marginalized group,” be unable to attend the conference without some financial assistance and must not have received a scholarship to attend either of the previous two conferences. Meet the criteria and you’ll be eligible for a draw of one our $1000 Diversity Scholarships to help you attend the conference.

We hope to see you in Fredericton.

Jonathan Rochkind: “Apple Encryption Engineers, if Ordered to Unlock iPhone, Might Resist”

planet code4lib - Mon, 2016-03-21 03:33

From the NYTimes, “Apple Encryption Engineers, if Ordered to Unlock iPhone, Might Resist

SAN FRANCISCO — If the F.B.I. wins its court fight to force Apple’s help in unlocking an iPhone, the agency may run into yet another roadblock:Apple’s engineers.

Apple employees are already discussing what they will do if ordered to help law enforcement authorities. Some say they may balk at the work, while others may even quit their high-paying jobs rather than undermine the security of the software they have already created, according to more than a half-dozen current and former Apple employees.

Do software engineers have professional ethical responsibilities to refuse to do some things even if ordered by their employers?

Filed under: General

DuraSpace News: NOW AVAILABLE: DSpace 5.5 With Security Fixes/Bug Fixes to 5.x

planet code4lib - Mon, 2016-03-21 00:00

From Tim Donohue, DSpace Tech Lead on behalf of the DSpace developers

Austin, TX  DSpace 5.5 is now available providing security fixes to both the XMLUI and JSPUI, along with bug fixes to the DSpace 5.x platform.

DuraSpace News: VIVO Updates for March 20–User Group Meeting, Summit Recap++

planet code4lib - Mon, 2016-03-21 00:00

From Mike Conlon, VIVO Project Director

Tara Robertson: digitization: just because you can, doesn’t mean you should

planet code4lib - Sun, 2016-03-20 17:44

I learned this week that Reveal Digital has digitized On Our Backs (OOB), a lesbian porn magazine that ran from 1984-2004. This is a part of the Independent Voices collection that “chronicles the transformative decades of the 60s, 70s and 80s through the lens of an independent alternative press.” For a split second I was really excited — porn that was nostalgic for me was online! Then I quickly thought about friends who appeared in this magazine before the internet existed. I am deeply concerned that this kind of exposure could be personally or professionally harmful for them.

While Reveal Digital went through the proper steps to get permission from the copyright holder, there are ethical issues with digitizing collections like this. Consenting to a porn shoot that would be in a queer print magazine is a different thing to consenting to have your porn shoot be available online. I’m disappointed in my profession. Librarians have let down the queer community by digitizing On Our Backs.

Why is this collection different?

The nature of this content makes it different from digitizing textual content or non-pornographic images. We think about porn differently than other types of content.

Most of the OOB run was published before the internet existed. Consenting to appear in a limited run print publication is very different than consenting to have one’s sexualized image be freely available on the internet. These two things are completely different. Who in the early 90s could imagine what the internet would look like in 2016?

In talking to some queer pornographers, I’ve learned that some of their former models are now elementary school teachers, clergy, professors, child care workers, lawyers, mechanics, health care professionals, bus drivers and librarians. We live and work in a society that is homophobic and not sex positive. Librarians have an ethical obligation to steward this content with care for both the object and with care for the people involved in producing it.

How could this be different?

Reveal Digital does not have a clear takedown policy on their website. A takedown policy describes the mechanism for someone to request that digital content be taken off a website or digital collection. Hathi’s Trust’s takedown policy is a good example of a policy around copyright. When I spoke to Peggy Glahn, Program Director for Reveal Digital she explained there isn’t a formal takedown policy. Someone could contact the rights holder (the magazine publisher, the photographer, or the person who owns the copyright to the content) and have them make the takedown request to Reveal Digital. Even for librarians it’s sometimes tricky to track down the copyright holder of a magazine that’s not being published anymore. By being stewards of this digital content I believe that Reveal Digital has an ethical obligation to make this process clearer.

I noticed that not all issues are available online. Peggy Glahn said that they digitized copies from Sallie Bingham Center for Women’s History & Culture at Duke University and Charles Deering McCormick Library of Special Collections at Northwestern University but they are still missing many of the later issues. More issues should not be digitized until formal ethical guidelines have been written. This process should include consultation with people who appeared in OOB.

There are ways to improve access to the content through metadata initiatives. I’m really, really excited by Bobby Noble and Lisa Sloniowski‘s proposed project exploring linked data in relation to Derrida and feminism. I’ve loved hearing how Lisa’s project has shifted from a physical or digital archive of feminist porn to a linked data project documenting the various relationships between different people. I think the current iteration avoids dodgy ethics while exploring new ways of thinking about the content and people through linked data. Another example of this is Sarah Mann’s index of the first 10 years of OOB for the Canadian Gay and Lesbian Archive.

We need to have an in depth discussion about the ethics of digitization in libraries. The Zine librarian’s Code of Ethics is the best discussion of these issues that I’ve read. There two ideas that are relevant to my concerns are about consent and balancing interests between access to the collection and respect for individuals.

Whenever possible, it is important to give creators the right of refusal if they do not wish their work to be highly visible.

Because of the often highly personal content of zines, creators may object to having their material being publicly accessible. Zinesters (especially those who created zines before the Internet era) typically create their work without thought to their work ending up in institutions or being read by large numbers of people. To some, exposure to a wider audience is exciting, but others may find it unwelcome. For example, a zinester who wrote about questioning their sexuality as a young person in a zine distributed to their friends may object to having that material available to patrons in a library, or a particular zinester, as a countercultural creator, may object to having their zine in a government or academic institution.

Consent is a key feminist and legal concept. Digitizing a feminist porn publication without consideration for the right to be forgotten is unethical.

The Zine librarian’s Code of Ethics does a great job of articulating the tension that sometimes exists between making content available and the safety and privacy of the content creators:

To echo our preamble, zines are “often weird, ephemeral, magical, dangerous, and emotional.” Dangerous to whom, one might ask? It likely depends on whom one asks, but in the age of the Internet, at least one prospectively endangered population are zinesters themselves. Librarians and archivists should consider that making zines discoverable on the Web or in local catalogs and databases could have impacts on creators – anything from mild embarrassment to the divulging of dangerous personal information.

Zine librarians/archivists should strive to make zines as discoverable as possible while also respecting the safety and privacy of their creators.

I’ve heard similar concerns with lack of care by universities when digitizing traditional Indigenous knowledge without adequate consultation, policies or understanding of cultural protocols. I want to learn more about Indigenous intellectual property, especially in Canada. It’s been a few years since I’ve looked at Mukurtu, a digital collection platform that was built in collaboration with Indigenous groups to reflect and support cultural protocols. Perhaps queers and other marginalized groups can learn from Indigenous communities about how to create culturally appropriate digital collections.

Librarians need to take more care with the ethical issues, that go far beyond simple copyright clearances, when digitizing and putting content online.

Karen G. Schneider: Margin of error

planet code4lib - Sat, 2016-03-19 22:38

I just had a wonderful stroke of luck that bailed me out of a big ole boneheaded error I made yesterday. It is the kind of error that I have a certain notoriety for — not all the time, just once in a while, when I am on overload and stop reading email all the way through, forget to review checklists, and otherwise put myself in a dangerous position with decision-making. The stroke of luck was due to someone who had a solid sixth sense that something was not quite right.

This error reminded me of my most illustrious “did not read the memo” gaffe, which I share here for the first time ever.

At my last university, I was invited to participate in a university president’s inauguration ceremony and quickly scanned the invitational email. Wear regalia and process to a stage? Sounds easy enough! Ok, on to the next problem!

But after we were seated (on a large, brightly-lit stage facing audience of oh, several hundred), I gradually realized that everyone else on stage was getting up one by one, and giving a speech. My hands started trembling. I had no speech. I looked out into the audience. There were the other library people, gazing calmly at their fearless leader. I mean, if anyone likes to give a speech and can knock one out of the park, it would be me, right? The woman who has presented seventy-bazillion times?

My mouth turned to ancient parchment and I could feel cold perspiration wending its way down my torso. I suspect if you had been able to see my eyes, they would have been two fully-dilated orbs in my panicked face. I could feel the hair on my head whitening.

Out of about two dozen people on stage, I could see that I was scheduled to go next to last. The speakers walked to the podium one by one. What to do, what to do?

Breathe. What tools did I have at hand? Breathe. I have a small paper program for the inauguration. Breathe. What is going on with the speeches? Breathe. Observation: the speeches are mostly too long. Breathe. Try to still my hands. Notice that the audience is getting restless. Breathe. Smile out at the audience. Breathe.

It was my turn–a turn that for once in my life came far too quickly. I walked to the podium, looked out at the audience, and smiled. I slowly unfolded the small program and frowned at it for a moment as if it were my speaking notes while I mentally rehearsed the two or three points I would make. I began with a joke about not wanting to speak too long. Other words, now forgotten, ensued, as I winged it onstage. I could hear laughter and appreciative rustling, though I was so anxious my vision was too blurred to see past the lectern for the next two or three minutes. I summed up my speech by noting that the university, like our library, was small and mighty, a joke which if you know me has a visual cue as well.

As soon as I was outside, I owned up that mistake to my team. Not to brag about getting through a disastrous mistake unscathed (well, maybe a little), but also to fully claim my error. This situation was awful and funny and educational, all at once. It was about my strengths, but also about my weaknesses. I believe I slept 14 hours that night. It became part of our library lore.

There were many clues that I was in the vulnerability zone for error yesterday. Distraction, overflowing email, too many simultaneous “channels”; I had even remarked the previous week that I was trying hard, but sometimes not succeeding, at not responding to email messages while I was in a face-to-face meeting.  The people I was interacting with were equally busy and besides, it wasn’t their job to see that the conditions for making major errors had become highly favorable. That was my job, as the senior mechanic in charge of this project, and I wasn’t doing it.  Clues abounded, but as my overload factor increased, I missed them — a classic case of being unaware that I was unaware. And I ignored the checklist sitting in front of me just waiting to help me, if only I would let it do so.

I had excellent training in the Air Force about the value of using checklists, and I have touted their use in libraries. People often need convincing that checklists work and that checklists are not an indication that they are somehow dumb or stupid for not being able to extemporize major tasks, even though there is a preponderance of evidence underscoring their utility. In aircraft maintenance, failure to follow checklists could, and sometimes did, cost lives; even when lives were not at stake, failure to follow checklists sometimes led to expensive errors. And yes, for yesterday’s mistake, there was a perfectly reasonable checklist, but I didn’t review it. Just as there were email messages I didn’t read all the way through, and just as I didn’t catch that I wasn’t shifting my attention to where it needed to be.

As I reflected today about awareness, checklists, and stumbling toward errors, I looked outward and thought, this is what this presidential campaign feels like to me. There are cues and signs swirling around us, and an abundance of complementary cautionary tales spanning the entire history of human civilization. Anger, vulgarity, and veiled hints at violence abound. The standards for public discourse have declined to the point where children are admonished not to listen to possible future leaders. We worry, with half a mind, that what looked like a lame but forgettable joke a few months back is simultaneously surfacing and fomenting an ugliness that has been burbling under the body politic for some time now. We watch people dragged away and sucker-punched at rallies as they clumsily try to be an early-warning system for what they fear lies ahead. We have all learned what “dog-whistle” means–and yet as the coded words and actions fly around us, we still do not understand why this is happening. We sit on this stage, programs wadded in our sweating hands, watching and watched by the restive audience until our vision blurs; and we do not have a checklist, but we do have our sixth sense.

Bookmark to:

Patrick Hochstenbach: Comics Art in Relationship

planet code4lib - Sat, 2016-03-19 06:45
Homework for the California College of the Arts online course: create a design based on a given existing script.   Filed under: Comics Tagged: art, comic, comics, homework, monster

FOSS4Lib Recent Releases: TemaTres Vocabulary Server - 2.1

planet code4lib - Fri, 2016-03-18 20:00

Last updated March 18, 2016. Created by Peter Murray on March 18, 2016.
Log in to edit this page.

Package: TemaTres Vocabulary ServerRelease Date: Monday, March 14, 2016

pinboard: Twitter

planet code4lib - Fri, 2016-03-18 17:28
We named our API-first archives data model after the beloved Trapper Keeper #c4l16 #code4lib

pinboard: RAD’s Code4Lib 2016 presentation is now online!...

planet code4lib - Fri, 2016-03-18 17:28
We named our API-first archives data model after the beloved Trapper Keeper #c4l16 #code4lib

FOSS4Lib Recent Releases: Evergreen - 2.10.0

planet code4lib - Fri, 2016-03-18 15:38
Package: EvergreenRelease Date: Thursday, March 17, 2016

Last updated March 18, 2016. Created by gmcharlt on March 18, 2016.
Log in to edit this page.

New features and enhancements of note in Evergreen 2.10.0 include:

Open Knowledge Foundation: International Open Data Day in Addis Abba, Ethiopia

planet code4lib - Fri, 2016-03-18 15:14

This blog post was written By Solomon Mekonne Co-founder, Code4Ethiopia & Local Organizer, Open Knowledge

An open data interest group representing 25 participants from universities, NGOs, CSOs and government ministries attended an open data event on 5th March, 2016, with theme “Raising Open Data awareness in the grass root community of Ethiopia”. The event was organized by Code4Ethiopia and Open Knowledge Ethiopia, with the support of Open Knowledge International and Addis Ababa University, in connection with Open Data Day which is a global celebration of openness.

The event was opened by Mr. Mesfin Gezehagn, a University Librarian at the Addis Ababa University (AAU). Mr. Mesfin briefed the participants that Addis Ababa University has been providing training on open research data and open science to postgraduate students and academicians to see more researchers practicing open data sharing (making data free to use, reuse, and redistribute) and open science (making scientific research, data and other results and work flows available to all). He also stated that the University collaborates with open data communities like Open Knowledge Ethiopia and Code4Ethiopia.

Mr. Mesfin also informed the participants that AAU has started drafting a policy to ensure mandatory submission of research data for researches that are sponsored by the University to open the data to the public.

Following the opening, three of the Cofounders of Code4Ethiopia (Solomon Mekonnen, Melkamu Beyene and Teshome Alemu), and a Lecturer at the Department of Computer Science of AAU (Desalegn Mequanint) presented discussion areas for participants. The presentations were focused on Code4Ethiopia and Open Knowledge Ethiopia Programmes , raising Open Data awareness to the grass root Community of Ethiopia , open data experience in African countries, and,  social, cultural & economic factors affecting open data implementation in the Ethiopia.

Following, the workshop was opened for discussion by Daniel Beyene, co-founder of Code4Ethiopia. The participants recommend that advocacy should be done from top to down starting from the policy makers to grass root community of Ethiopia and they also proposed that Code4Ethiopia and Open Knowledge Ethiopia in collaboration international partners should organize a national sensitization Open Data Hackathon to reach various stakeholders.

The workshop also identified challenges in Ethiopia for open data implementation including lack of awareness, absence of policy level commitment from governments and lack of appropriate data science skills & data literacy. The participants also selected data sets that need priority for the country’s development and that interest the general public which includes budget data, expenditure (accounts) data, census,  trade information, election data, health and educational data.

The workshop was concluded by thanking our partners Open Knowledge International and Addis Ababa University for their contribution to the success of the event. All of the participants have also been invited to join Code4Ethiopia and the Open Knowledge community. Most of the participants have agreed to join these two communities to build open data ecosystem in Ethiopia.

State Library of Denmark: CDX musings

planet code4lib - Fri, 2016-03-18 13:16

This is about web archiving, corpus creation and replay of web sites. No fancy bit fiddling here, sorry.

There is currently some debate on CDX, used by the Wayback Engine, Open Wayback and other web archive oriented tools, such as Warcbase. A CDX Server API is being formulated, as is the CDX format itself. Inspired by a post by Kristinn Sigurðsson, here comes another input.

CDX components, diagram by Ulrich Karstoft Have

CDX Server API

There is an ongoing use case-centric discussion of needed features for a CDX API. Compared to that, the CDX Server API – BETA seems a bit random. For example: A feature such as regexp-matching on URLs can be very heavy on the backend and open op for easy denial of service (intentional as well as unintentional). It should only be in the API if it is of high priority. One way of handling all wishes is to define a core API with the bare necessities and add extra functionality as API extensions. What is needed and what is wanted?

The same weighing problem can be seen for the fields required for the server. Kristinn discusses CDX as format and boils the core fields down to canonicalized URL, timestamp, original URL, digest, record type, WARC filename and WARC offset. Everything else is optional. This approach matches the core vs. extensions functionality division.

With optional features and optional fields, a CDX server should have some mechanism for stating what is possible.

URL canonicalization and digest

The essential lookup field is the canonicalised URL. Unfortunately that is also the least formalised, which is really bad from an interoperability point of view. When the CDX Server API is formalised, a strict schema for URL canonicalisation is needed.

Likewise, the digest format needs to be fixed, to allow for cross-institution lookups. As the digests do not need to be cryptographically secure, the algorithm chosen (hopefully) does not become obsolete with age.

It would be possible to allow for variations of both canonicalisation and digest, but in that case it would be as extensions rather than core.

CDX (external) format

CDX can be seen as a way of representing a corpus, as discussed on the RESAW workshop in december 2015.

  • From a shared external format perspective, tool-specific requirements such as sorting of entries or order of fields are secondary or even irrelevant.
  • Web archives tend to contain non-trivial amounts of entries, so the size of a CDX matters.

With this in mind, the pure minimal amount of fields would be something like original URL, timestamp and digest. The rest is a matter of expanding the data from the WARC files. On a more practical level, having the WARC filename and WARC offset is probably a good idea.

The thing not to require is the canonicalized URL: It is redundant, as it can be generated directly from the original URL, and it unnecessarily freezes the canonicalization method to the CDX format.

Allowing for optional extra fields after the core is again pragmatic. JSON is not the most compact format when dealing with tables of data, but it has the flexibility of offering entry-specific fields. CDXJ deals with this, although it does not specify the possible keys inside of the JSON blocks, meaning that the full CDX file has to be iterated to get those keys. There is also a problem of simple key:value JSON entries, which can be generically processed, vs. complex nested JSON structures, which requires implementation-specific code.

CDX (internal) format

Having a canonicalized and SURTed URL as the primary field and having the full CDX file sorted is an optimization towards specific tools. Kris touches lightly on the problem with this by suggesting that the record type might be better positioned as the secondary field (as opposed to timestamp) in the CDX format.

It follows easily that the optimal order or even representation of fields depends on tools as well as use case. But how the tools handle CDX data internally really does not matter as long as they expose the CDX Server API correctly and allows for export in an external CDX format. The external format should not be dictated by internal use!

CDX (internal vs. external) format

CDX is not a real format yet, but tools do exist and they expects some loosely defined common representation to work together. As such it is worth considering if some of the traits of current CDXes should spill over to an external format. Primarily that would be the loosely defined canonicalized URL as first field and the sorted nature of the files. In practice that would mean a near doubling of file size due to the redundant URL representation.

CDX implementation

The sorted nature of current CDX files has two important implications: Calculating the intersections between two files is easy and lookup on primary key (canonicalised URL) can algorithmically be done in O(log n) time using binary search.

In reality, simple binary search works poorly on large datasets, due to the lack of locality. This is worsened by slow storage types such as spinning drives and/or networked storage. There are a lot of tricks to remedy this, from building indexes to re-ordering the entries. The shared theme is that the current non-formalised CDX files are not directly usable: They require post-processing and extensions by the implementation.

The take away is that the value of a binary search optimized external CDX format is diminishing as scale goes up and specialized implementations are created. Wholly different lookup technologies, such as general purpose databases or search engines, has zero need for the format-oriented optimizations.



Library Tech Talk (U of Michigan): Developing Pathways to Full-text Resources with User Journeys

planet code4lib - Fri, 2016-03-18 00:00

How does a library present the right information to patrons at the right time and place in the face of changing services, new technologies and vendors? User Journeys provide a way to create and improve what information, services and tools will help users on their path to the resources and services they seek. Find out what insights our team gained from developing User Journeys and we'll tell you about tools, resources and templates you can use to make your own!

Evergreen ILS: Evergreen 2.10.0 released

planet code4lib - Thu, 2016-03-17 22:57

Thanks to the efforts of many contributors, the Evergreen community is pleased to announce the release of version 2.10.0 of the Evergreen open source integrated library system. Please visit the download page to get it!

New features and enhancements of note in Evergreen 2.10.0 include:

  • Improved password management and authentication. Evergreen user passwords are now stored with additional layers of encryption and may only be accessed directly by the database, not the application layer.
  • To improve privacy and security, Evergreen now stores less data about credit card transactions.
  • A new library setting has been added which enables a library to prevent their patrons from being opted in at other libraries.
  • To improve patron privacy, patron checkout history is now stored in separate, dedicated database table instead of being derived from the main circulation data.
  • Patrons can now delete titles that they do not wish to appear in their checkout history.
  • A new action/trigger event definition (“New User Created Welcome Notice”) has been added that will allow you to send a notice after a new patron has been created.
  • The web staff client now includes a patron editor/registration form.
  • Funds are now marked as paid when the invoice is marked as closed rather than when the invoice created.
  • A new “paid” label appears along the bottom of each line item in the PO display when every non-canceled copy on the line item has been invoiced.
  • The MARC stream importer is now able to import authority records as well as bibliographic records.
  • When inspecting a queue in MARC Batch Import/Export, there is now a link to download to MARC file any records in the queue that were not imported into the catalog.
  • Coded value maps have been added for a variety of fixed fields.
  • MARC batch import can now assign monograph part labels when adding or overlaying copies.
  • The stock indexing definitions now include a search and facet index on the form/genre field (tag 655).
  • The web staff client now includes preview functionality for cataloging, including MARC recording editing, authority maintenance, and volume/copy management.
  • HTML reports can now be sorted by clicking on the header for a given column.
  • Evergreen’s unAPI support now includes access to many more record types.

With the release of 2.10.0, bugfixes for the web staff client will now be considered for backporting to maintenance releases in the 2.10.x series, particularly in the areas of circulation and patron management.  Libraries are encouraged to try out the web staff client and file bug reports for it.

Support for PostgreSQL 9.1 is deprecated as of the release of Evergreen 2.10. Users are recommended to install Evergreen on PostgreSQL 9.2 or later. In the next major release following 2.10, Evergreen will no longer officially support PostgreSQL 9.1.

For more information about what’s in the release, check out the release notes.

Jenny Rose Halperin: Media for Everyone?

planet code4lib - Thu, 2016-03-17 21:42
Media for Everyone? User empowerment and community in the age of subscription streaming media

The Netflix app is displayed alongside other streaming media services. (Photo credit: Matthew Keys / Flickr Creative Commons)

<noscript class="js-progressiveMedia-inner"><img class="progressiveMedia-noscript js-progressiveMedia-inner" src="*_KSJX2-NzgtAve-Hb&lt;/noscript&gt;&lt;noscript class=" /></noscript><noscript class="js-progressiveMedia-inner"><br /></noscript><noscript class="js-progressiveMedia-inner"><br /></noscript> Fragments of an Information Architecture

In 2002, Tim O’Reilly wrote the essay “Piracy is Progressive Taxation and other thoughts on the evolution of online distribution,” which makes several salient points that remain relevant as unlimited, native, streaming media continues to take the place of the containerized media product. In the essay, he predicts the rise of streaming media as well as the intermediary publisher on the Web that serves its purpose as content exploder. In an attempt to advocate for flexible licensing in the age of subscription streaming media, I’d like to begin by discussing two points in particular from that essay: “Obscurity is a far greater threat to authors and creative artists than piracy” and “’Free’ is eventually replaced by a higher-quality paid service.”

As content becomes more fragmented and decontainerized across devices and platforms (the “Internet of Things”), I have faith that expert domain knowledge will prevail in the form of vetted, quality materials, and streaming services provide that curation layer for many users. Subscription services could provide greater visibility to artists by providing unlimited access and new audiences. However, the current licensing regulations surrounding content on streaming subscription services privilege the corporation rather than the creator, further exercising the hegemony of the media market. The first part of this essay will discuss the role of serendipity and discovery in streaming services and how they contribute to user engagement. Next, I will explore how Creative Commons and flexible licensing in the age of unlimited subscription media can return power to the creator by supporting communities of practice around content creation in subscription streaming services.

Tim O’Reilly’s second assertion that “’Free’ is eventually replaced by a higher-quality paid service” is best understood through the lens of information architecture. In their seminal work Information Architecture for the World Wide Web, Morville, Arango, and Rosenfeld write about how most software solutions are designed to solve specific problems, and as they outgrow their shells they become ecosystems, thereby losing clarity and simplicity. While the physical object’s data is constrained within its shell, the digital object provides a new set of metadata based on the user’s usage patterns and interests. Media is spread out among a variety of types, devices, and spaces, platforms cease to define the types of content that people consume, with native apps replacing exportable, translatable solutions like the MP3 or PDF. Paid services utilize the data from these ecosystems and create more meaningful consumption patterns within a diverse media landscape.

What content needs is coherency, that ineffable quality that helps us create taxonomy and meaning across platforms. Streaming services provide a comfortable architecture so users don’t have to rely on the shattered, seemingly limitless, advertising-laden media ecosystem of the Internet. Unlimited streaming services provide the coherency that users seek in content, and their focus should be on discoverability and engagement.

If you try sometimes, you might get what you need: serendipity and discoverability in streaming media

Not all streaming services operate within the same content model, which provides an interesting lens to explore the roles of a variety of products. Delivering the “sweet spot” of content to users is an unfulfillable ideal for most providers, and slogging through a massive catalog of materials can quickly cause information overload.

When most content is licensed and available outside of the service, discoverability and user empowerment should be the primary aim of the streaming media provider.

While Spotify charges $9.99 per month for more music than a person can consume in their entire lifetime, the quality of the music is not often listed as a primary reason why users engage with the product. In fact, most of the music on Spotify I can describe as “not to my taste,” and yet I pay every month for premium access to the entire library. At Safari Books Online, we focused on content quality in addition to scope, with “connection to expert knowledge” and subject matter coherency being a primary reason why subscribers paid premium prices rather than relying on StackOverflow or other free services.

Spotify’s marketing slogan, “Music for everyone” focuses on its content abundance, freemium model, and ease of use rather than its quality. The marketing site does not mention the size of Spotify’s library, but the implications are clear: it’s huge.

These observations beg a few questions:

  1. Would I still pay $9.99 per month for a similar streaming service that only provided music in the genres I enjoy like jazz, minimal techno, or folk by women in the 1970s with long hair and a bone to pick with Jackson Browne?
  2. What would I pay to discover more music in these genres? What about new music created by lesser-known artists?
  3. What is it about Spotify that brought me back to the service after trying Apple Music and Rdio? What would bring me back to Safari if I tried another streaming media service like Lynda or Pluralsight?
  4. How much will users pay for what is, essentially, an inflexible native discoverability platform that exists to allow them access to other materials that are often freely available on the Web in other, more exportable formats?

Serendipity and discoverability were the two driving factors in my decision to stay with Spotify as a streaming music service. Spotify allows for almost infinite taste flexibility and makes discoverability easy through playlists and simple search. In addition, a social feed allows me to follow my friends and discover new music. Spotify bases its experience on my taste preferences and social network, and I can easily skip content that is irrelevant or not to my liking.

To contrast, at Safari, while almost every user lauded the diversity of content, most found the amount of information overwhelming and discoverability problematic. As a counter-example, the O’Reilly Learning Paths product have been immensely popular on Safari, even though the “paths” consist of recycled content from O’Reilly Media repackaged to improve discoverability. While the self-service discovery model worked for many users, for most of our users, guidance through the library in the form of “paths” provides a serendipitous adventure through content that keeps them wanting more.

Music providers like Tidal have experimented with exclusive content, but content wants to be free on the Internet, and streaming services should focus on user need and findability, not exclusivity. Just because a Beyonce single drops first on Tidal, it doesn’t mean I can’t torrent it soon after. In Spotify, the “Discover Weekly” playlists as well as the ease of use of my own user-generated playlists serve the purpose of “exclusive content.” By providing me the correct dose of relevant content through playlists and social connection, Spotify delivers a service that I cannot find anywhere else, and these discoverability features are my primary product incentive. Spotify’s curated playlists, even algorithmically calculated ones, feel home-spun, personal, and unique, which is close to product magic.

There seems to be an exception to this rule in the world of streaming television, where users generally want to be taken directly to the most popular exclusive content. I would argue that the Netflix ecosystem is much smaller than in a streaming business or technical library or music service. This is why Netflix can provide a relatively limited list of rotating movies while focusing on its exclusive content while services like Spotify and Safari consistently grow their libraries to delight their users with the extensive amount of content available.

In fact, most people subscribe to Netflix for its exclusive content, and streaming television providers that lag behind (like Hulu), often provide access to content that is otherwise easily discoverable other places on the Web. Why would I watch Broad City with commercials on Hulu one week after it airs when I can just go to the Free TV Project and watch it an hour later for free? There is no higher quality paid service than free streaming in this case, and until Hulu strikes the balance between payment, advertising, licensed content, and exclusive content, they will continue to lag behind Netflix.

As access to licensed content becomes centralized and ubiquitous among a handful of streaming providers, it should be the role of the streaming service to provide a new paradigm that supports the role of artists in the 21st Century that challenges the dominant power structures within the licensed media market.

Shake it off, Taylor: the dream of Creative Commons and the power of creators

As a constantly evolving set of standards, Creative Commons is one way that streaming services can focus on a discoverability and curation layer that provides the maximum benefit to both users and creators. If we allow subscription media to work with artists rather than industry, we can increase the power of the content creator and loosen stringent, outdated copyright regulations. I recognize that much of this section is a simplification of the complex issue of copyright, but I wish to create a strawman that brings to light what Lawrence Lessig calls “a culture in which creators get to create only with the permission of the powerful, or of creators from the past.” The unique positioning of streaming, licensed content is no longer an issue that free culture can ignore, and creating communities of practice around licensing could ease some of the friction between artists and subscription services.

When Taylor Swift withheld her album from Apple Music because the company would not pay artists for its temporary three-month trial period, it sent a message to streaming services that withholding pay from artists is not acceptable. I believe that Swift made the correct choice to take a stand against Apple for not paying artists, but I want to throw a wrench into her logic.

Copies of 1989 have probably been freely available on the Internet since before its “official” release. (The New Yorker ran an excellent piece on leaked albums last year.) By not providing her album to Apple Music but also not freely licensing it, Swift chose to operate under the old rules that govern content, where free is the exception, not the norm.

Creative Commons provides the framework and socialization that could provide streaming services relevancy and artists the new audiences they seek. The product that users buy via streaming services is not necessarily music or books (they can buy those anywhere), it is the ability to consume it in a manner that is organized, easy, and coherent across platforms: an increased Information Architecture. The flexible licensing of Creative Commons could at least begin the discussion to cut out the middle man between streaming services, licensing, and artists, allowing these services to act more like Soundcloud, Wattpad, or Bandcamp, which provide audience and voice to lesser-known artists. These services do what streaming services have so far failed to do because of their licensing rules: they create social communities around media based on user voice and community connection.

The outlook for both the traditional publishing and music industries are similarly grim and to ignore the power of the content creator is to lapse into obscurity. While many self-publishing platforms present Creative Commons licensing as a matter of course and pride, subscription streaming services usually present all content as equally, stringently licensed. Spotify’s largest operating costs are licensing costs and most of the revenue in these transactions goes to the licensor, not the artist. To rethink a model that puts trust and power in the creator could provide a new paradigm under which creators and streaming services thrive. This could take shape in a few ways:

  • Content could be licensed directly from the creator and promoted by the streaming service.
  • Content could be exported outside of the native app, allowing users to distribute and share content freely according to the wishes of its creator.
  • Content could be directly uploaded to the streaming service, vetted or edited by the service, and signal boosted according to the editorial vision of the streaming content provider.

When Safari moved from books as exportable PDFs to a native environment, some users threatened to leave the service, viewing the native app as a loss of functionality. This exodus reminds me that while books break free of their containers, the coherence of the ecosystem maintains that users want their content in a variety of contexts, usable in a way that suits them. Proprietary native apps do not provide that kind of flexibility. By relying on native apps as a sole online/offline delivery mechanism, streaming services ultimately disenfranchise users who rely on a variety of IoT devices to consume media. Creative Commons could provide a more ethical licensing layer to rebalance the power differential that streaming services continue to uphold.

The right to read, listen, and watch: streaming, freedom, and pragmatism

Several years ago, I would probably have scoffed at this essay, wondering why I even considered streaming services as a viable alternative to going to the library or searching for content through torrents or music blogs, but I am fundamentally a pragmatist and seek to work on systems that present the most exciting vision for creators. 40 million Americans have a Netflix account and Spotify has over 10 million daily active users. The data they collect from users is crucial to the media industry’s future.

To ignore or deny the rise of streaming subscription services as a content delivery mechanism has already damaged the free culture movement. While working with subscription services feels antithetical to its goals, content has moved closer and closer toward Stallman’s dystopian vision from 1997 and we need to continue to create viable alternatives or else continue to put the power in the hands of the few rather than the many.

Licensed streaming services follow the through line of unlimited content on the Web, and yet most users want even more content, and more targeted content for their specific needs. The archetype of the streaming library is currently consumption, with social sharing as a limited exception. Services like Twitter’s Vine and Google’s YouTube successfully create communities based on creation rather than consumption and yet they are constantly under threat, with large advertisers still taking the lion’s share of profits.

I envision an ecosystem of community-centered content creation services that are consistently in service to their users, and I think that streaming services can take the lead by considering licensing options that benefit artists rather than corporations.

The Internet turns us all into content creators, and rather than expanding ecosystems into exclusivity, it would be heartening to see a streaming app that is based on community discoverability and the “intertwingling” of different kinds of content, including user-generated content. The subscription streaming service can be considered as industry pushback in the age of user-generated content, yet it’s proven to be immensely popular. For this reason, conversations about licensing, user data, and artistic community should be a primary focus within free culture and media.

The final lesson of Tim O’Reilly’s essay is: “There’s more than one way to do it,” and I will echo this sentiment as the crux of my argument. As he writes, “’Give the wookie what he wants!’… Give it to [her] in as many ways as you can find, at a fair price, and let [her] choose which works best for [her].” By amplifying user voice in curation and discoverability as well as providing a more fair, free, and open ecosystem for artists, subscription services will more successfully serve their users and creators in ways that make the artistic landscape more humane, more diverse, and yes, more remixable.

DPLA: Announcing our fourth class of Community Reps

planet code4lib - Thu, 2016-03-17 17:00

We are extremely excited to introduce and welcome our fourth class of DPLA Community Reps–-volunteers who engage their local communities by leading DPLA outreach activities. We received a great response to our fourth call for applicants, and we’re pleased to now add another fantastic group of Community Reps to our outstanding and dedicated corps of volunteers from the first three classes.

Our fourth class continues our success at bringing together volunteers from all over the US representing diverse fields and backgrounds. Our newest reps work in K-12 schools, public libraries, state libraries, municipal archives, public history and museums, technology, genealogy, education technology, and many areas of higher education. This round we are excited to have a very strong cohort of educators as well as representation from diverse disciplines including psychology, social work, art history, and studio art.

Our newest reps have already shared some of their great ideas for connecting new communities with DPLA and we’re eager to support this new class’ creative outreach and engagement work.  We thank them for helping us grow the DPLA community! For more detailed information about our Reps and their plans, including the members of the fourth class, please visit our Meet the Reps page.

The next call for our fifth class of Reps will take place early next year (January 2017).  To learn more about this program and follow our future calls for applicants, check out our Community Reps page.

David Rosenthal: Dr. Pangloss loves technology roadmaps

planet code4lib - Thu, 2016-03-17 15:00
Its nearly three years since we last saw the renowned Dr. Pangloss chuckling with glee at the storage industry's roadmaps. But last week he was browsing Slashdot and found something much to his taste. Below the fold, an explanation of what the good Doctor enjoyed so much.

The Slashdot entry that caught the Doctor's eye was this:
Several key technologies are coming to market in the next three years that will ensure data storage will not only keep up with but exceed demand. Heat-assisted magnetic recording and bit-patterned media promise to increase hard drive capacity initially by 40% and later by 10-fold, or as Seagate's marketing proclaims: 20TB hard drives by 2020. At the same time, resistive RAM technologies, such as Intel/Micron's 3D XPoint, promise storage-class memory that's 1,000 times faster and more resilient than today's NAND flash, but it will be expensive — at first. Meanwhile, NAND flash makers have created roadmaps for 3D NAND technology that will grow to more than 100 layers in the next two to three generations, increasing performance and capacity while ultimately lowering costs to that of hard drives."Very soon flash will be cheaper than rotating media," said Siva Sivaram, executive vice president of memory at SanDisk.ASTC roadmap (2015)The article by Lucas Merian that sparked the post has the wonderfully Panglossian title These technologies will blow the lid off data storage, and it has some quotes that the Doctor really loves, such as the one above from Siva Siaram, and the ASTC technology roadmap showing a 30% Kryder rate from next year with HAMR and BPM shipping in 2021.

But the good Doctor tends not to notice that the article is actually careful to balance things like "HAMR technology will eventually allow Seagate to achieve a linear bit density of around 10 trillion (10Tbits) per square inch" with "Seagate has already demonstrated HAMR HDDs with 1.4Tbits per square inch" (my emphasis). If you pay attention to these caveats it is actually a good survey of the technologies still out on the roadmap.

But curmudgeons like me remember that back in 2013 the Doctor was rubbing his hands over statements like:
Seagate is projecting HAMR drives in 2014 and WD in 2016.In 2016 we hear that:
Seagate plans to begin shipping HAMR HDDs next year.So in three years HAMR has gone from next year to "next year". Not to mention the graph I keep pointing to from 2008 showing HAMR taking over in 2009 and BPM taking over in 2013. So actually HAMR has taken 8 years to go from next year to next year. And BPM has taken 8 years to go from 5 years out to 5 years out.

Why is this? As the technologies get closer and closer to the physical limits, the difficulty and cost of moving from "demonstration" to "shipping" increase. For example, lets suppose Seagate could demonstrate HAMR in 2013 and will ship it in 2017. BPM is even harder than HAMR, so if it is going to ship in 2021 it should be demonstrable this year. Has anyone heard that it will be?

The article also discusses 3D NAND flash, which also featured in Robert Fontana's wonderful presentation to the Library of Congress Storage Architecture workshop. From his slides I extracted the cost ratio between flash and hard disk for the period 2008-2014, showing that it was converging very slowly. Eric Brewer made the same point in his FAST 2016 keynote. Flash is a better medium than hard disk, so even if the manufacturing cost per byte were the same, the selling price for flash would be higher. But, as the article points out:
factories to build 3D NAND are vastly more expensive than plants that produce planar NAND or HDDs -- a single plant can cost $10 billionso no-one is going to make the huge investment needed for 3D NAND to displace hard disks from the cloud storage market because it wouldn't generate a return.

The article also covers "Storage Class Memories" (SCM) such as Intel/Micron's Xpoint, mentioning price:
even if Intel's Xpoint ReRAM technology enters the consumer PC marketplace this year, its use will be limited to the highest-end products due to cost.Actually, Intel isn't positioning Xpoint as a consumer storage technology but, as shown in the graph, as initially being deployed as an ultra-fast but non-volatile layer between DRAM and flash,

As I commented on Fontana's presentation:
The roadmaps for the post-flash solid state technologies such as 3D Xpoint are necessarily speculative, since they are still some way from shipping in volume. But by analogy with flash we can see the broad outlines. They are a better technology than flash, 1000 times faster than NAND, 1000 times the endurance, and 100 times denser. So even if the manufacturing cost were the same, they would command a price premium. The manufacturing cost will initially be much higher because of low volumes, and will take time to ramp down.So, despite the good Doctor's enthusiasm, revolutionary change in the storage landscape is unlikely. We are unlikely to see ASTC's 30% Kryder rate, 3D NAND will not become cheaper for bulk storage than hard disk, and SCM will not have a significant effect on the cost of storage in the foreseeable future.

Islandora: The State of the CLAW

planet code4lib - Thu, 2016-03-17 14:25

The state of the work is that is in progress. Like Fedora 4, CLAW is a complete rewrite of the entire Islandora stack. It is a collaborative community effort, and needs the resources of the community. An update on the project was included in the most recent Islandora Community Newsletter. You can check that out here.

  • We have weekly CLAW Calls that you are more than welcome to join us on, and add items to the agenda.
  • We send updates to the list each week after each call, and you can view them all here.
  • We have monthly sprints which are held during the last two weeks of the month. If you (or your colleagues) are in a position to join, you are more than welcome to join us there too.
  • We also have weekly CLAW lessons which are led by Diego Pino Navarro. You can find more information on them here.
  • Data model, and Hydra interoperability? We're working on implementing the Portland Common Data Model (PCDM). More is available on that here and here.

If you want to see CLAW completed faster, you can help! 

  • Contribute developer time. Either your own, or some developer time from your institution. Not comfortable with the stack? Thats what CLAW lessons are for!
  • Contribute funds: The Islandora Foundation is very close to having the necessary membership funding to hire a Technical Lead, who could devote a lot more time to coordinating the development of CLAW than our current volunteer team has available. Joining the Islandora Foundation has many benefits, but adding a Technical Lead to the project will be a big one in the CLAW corner.
  • Contribute opinions: We need to know how you want CLAW to work. You are welcome to attend the weekly CLAW Call detailed above. Please also watch the listserv for questions about features and use cases.


Subscribe to code4lib aggregator