You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 5 hours 33 min ago

HangingTogether: Slam bam WAM: Wrangling best practices for web archiving metadata

Wed, 2016-08-24 01:14

The OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM, of course) was launched last January and has been working hard–really hard–ever since. Twenty-five members from Partner libraries and archives have dug in to address the challenge of devising best practices for describing websites–which are, it turns out, very odd critters compared to other types of material for which descriptive standards and guidelines already exist. In addition, user needs and behaviors are quite different from those we’re familiar with.

Our plan at the outset: do an extensive literature review on both user needs and existing metadata practices in the web context, study relevant descriptive standards and institution-specific web archiving metadata guidelines, engage the community along the way to confirm the need for this work and obtain feedback, and, ultimately, issue two reports: the first on user needs and behaviors specific to archived web content, the second outlining best practices for metadata. The heart of the latter will be a set of recommended data elements accompanied by definitions and the types of content that each should contain.

At this juncture we’ve drawn several general conclusions:

  • Descriptive standards don’t address the unique characteristics of websites.
  • Local metadata guidelines have little in common with each other.
  • It’ll therefore be challenging to sort it all out and arrive at recommended best practices that will serve the needs of users of archived websites.

We’ve reviewed nine sets of institution-specific guidelines. The table below shows the most common data elements, some of which are defined very differently from one institution to another. Only three appear in all nine guidelines: creator/contributor, title, and description.

Collection name/title Language Creator/contributor Publisher Date of capture Rights/access conditions Date of content Subject Description Title Genre URL

Our basic questions: Which types of content are the most important to include in metadata records describing websites? And which generic data elements should be designated for each of these concepts?

Here are some of the specific issues we’ve come across:

  • Website creator/owner: Is this the publisher? Creator? Subject? All three?
  • Publisher: Does a website have a publisher? If so, is it the harvesting institution or the creator/owner of the live site?
  • Title: Should it be transcribed verbatim from the head of the home page? Or edited to clarify the nature/scope of the site? Should acronyms be spelled out? Should the title begin with, e.g., “Website of the …”
  • Dates: Beginning/end of the site’s existence? Date of capture by a repository? Content? Copyright?
  • Extent: How should this be expressed? “1 online resource”? “6.25 Gb”? “approximately 300 websites”?
  • Host institution: Is the institution that harvests and hosts the site the repository? Creator? Publisher? Selector?
  • Provenance: In the web context, does provenance refer to the site owner? The repository that harvests and hosts the site? Ways in which the site has evolved?
  • Appraisal: Does this mean the reason why the site warrants being archived? The collection of a set of sites as named by the harvesting institution? The scope of the parts of the site that were harvested?
  • Format: Is it important to be clear that the resource is a website? If so, how best to do this?
  • URL: Which URLs should be linked to? Seed? Access? Landing page?
  • MARC21 record type: When coded in the MARC 21 format, should a website be considered a continuing resource? Integrating resource? Electronic resource? Textual publication? Mixed material? Manuscript?

We’re getting fairly close to completing our literature review and guidelines analysis, at which point we’ll turn to determining the scope and substance of the best practices report. In addition to defining a set of data elements, it’ll be important to set the problem in context and explain how our analysis has led to the conclusions we draw.

So stay tuned! We’ll be sending out a draft for community review and are hoping to publish both reports within the next six months. In the meantime, please send your own local guidelines, as well as pointers to a few sample records, to me at dooleyj@oclc.org. Help us make sure we get it right!

About Jackie Dooley

Jackie Dooley leads OCLC Research projects to inform and improve archives and special collections practice.

Mail | Web | Twitter | Facebook | More Posts (19)

DuraSpace News: VIVO Updates for August 21–Conference Wrap-Up, Improved Documentation

Wed, 2016-08-24 00:00

From Mike Conlon, VIVO project director

Equinox Software: Evergreen 2009: Not Just Code

Tue, 2016-08-23 19:13

This is the fourth in our series of posts leading up to Evergreen’s Tenth birthday.  

I first became aware of Evergreen in 2007 when I saw a posting on a library technology listserv.  As an open source advocate and a librarian, I began following its progress. Skip forward to a cold morning in January 2009 and I was letting IT managers and library directors from around the state of South Carolina into a meeting room.  I was the IT manager at the Florence County Library and two months previously we, as a library, had decided to move to Evergreen.  We had written a long term technology plan and a critical part was our Integrated Library System.  Aside from Georgia, we saw Evergreen being adopted in Michigan and Indiana.  I knew that in time Evergreen would match and surpass our other options.

We also knew that an open source community was going to require changing our perspective of what our relationship to the ILS looked like.  The old proprietary vendor had legal control over aspects of the community and there were limits to what we could share among ourselves as customers.  Libraries had to strike special deals with strict non-disclosure agreements to gain access to source code and the insight to how the ILS worked behind the user interface.

To say that this was going to be different would be an understatement.  The source code was not only not confidential but openly published.  People developed reports and freely published them on community wikis while articles appeared in journals and on personal blogs.  The lack of a corporate gatekeeper was both invigorating and a little overwhelming.  Bringing in a vendor to run an ILS as a service made sense to us but could we convince others to join us?

We asked if other libraries would be interested in an Evergreen consortium.  The answer was yes.  There were a lot of concerns but the experts we called in, Equinox Software, seemed the perfect choice.  No one knew Evergreen as a team as well as they did and they had worked with small libraries and big consortiums.  Partnering with Equinox allowed us to start the migration process quickly despite very little in-house knowledge of Evergreen across our libraries.  And without a proprietary gatekeeper, other libraries in the consortium could dig into the deeper technologies to the degree they were comfortable doing so.  My library was definitely one of the others that wanted more.

We knew that user community was important.  Even with its limitations the user group of our previous ILS had been valuable.  2009 was the year I went to the inaugural Evergreen Conference.  2009 was the year I became active on the listservs, mostly watching but answering questions where I could.  2009 was the year I first volunteered to help out with community activities.  2009 was the year I first gave feedback on new features and bugs.  2009 was the year I, as a user, became a part of the community and saw an impact from it.  And, frankly, it was kind of easy.  I had an advantage being both a librarian and having a technical background but as I met others as new to the community as I was I saw them doing the same thing.  Where they became involved varies based on their interests and skills but everyone who wanted to found a place.  I even recognized a few from the user groups of my old ILS.  Before these people had been names and faces I vaguely recognized from meetings at ALA and listservs.  They had been users of the same thing I used.  Now, in the Evergreen community they were fellows and peers.

The open development process meant that I got a chance to provide feedback on features being developed that we weren’t paying for.  I had participated in feedback about features for proprietary ILSes.  It always felt like throwing pennies in a wishing well and crossing my fingers.  I didn’t work at libraries large enough to drive the process of development so we had to hope that the really big customers wanted the same things we did.  Here, the process wasn’t just open in name but discussion about requirements and needs was being done in public forums.  Input was not just allowed but encouraged.  It was clearly a matter of pride for the developers to know that their work was as widely useful as possible.  I could follow the process and choose to participate if it was a feature I was interested in.  And behind each of these things was a person, someone I got to know on a listserv or a conference.  

In December of 2009 our third wave of libraries went live.  Things had calmed down from the hectic early days of migrations.  It had been almost a year now since that early meeting when we went from “we want to” to “we are doing this.”  I remember having time to spend looking at bugs because we had the Christmas slowdown common to public libraries when a developer at another consortium sent me an email.  I commented on a bug that wasn’t a high community priority but it, well, bugged me.  I had helped this developer with testing some patches both to help him out and to give myself more patching experience.  He carved out time and fixed that bug for me.  The truth is that any human endeavor involves an economy of personalities.  But in the software world of meritocracy, open source projects are often more about code than people.  They are often tightly focused projects that do specific things.  An ILS isn’t tightly focused.  It touches on a vast swath of a library’s operations.  It took me a while to realize what now seems obvious in hindsight, but Evergreen isn’t tightly focused and it’s not about code.  Code is critical to the project as it is the means to an end but Evergreen is about people.  I learned a lot in 2009 but things have changed.  I’ve changed jobs and code has changed but the fact that Evergreen is about people hasn’t changed.

–Rogan Hamby, Project and Data Analyst

SearchHub: Solr Troubleshooting: Treemap Approach

Tue, 2016-08-23 17:58

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting a talk from the newest Solr committer, Alexandre Rafalovitch. Alexandre will also be presenting at Lucene/Solr Revolution 2016.

Solr is too big of a product to troubleshoot as if it were a black box or by random tests. Fortunately, there is a way to use Solr’s API, logs, specific queries, and even source code to iteratively narrow a problem definition to something concrete and fixable. Solr already has most of the tools for good troubleshooting, but they are not positioned or documented as such. Additionally, there are various operating system tools that can be used for troubleshooting Solr. This talk provides viewers with the mental model and practical tools to become better troubleshooters with their own Solr projects.

Alexandre is a full-stack IT specialist with more than 20 years of industry and non-profit experience, including in Java, C# and HTML/CSS/JavaScript. He develops projects on Windows, Mac and Linux. His current focus is a consultancy specialized on popularizing Apache Solr. Alex has written one book about Solr already (Apache Solr for Indexing Data How-to). He has presented at Lucene/Solr Revolution 2014 and 2015, as well as multiple times at JavaOne and various smaller venues. Alexandre became an Apache Lucene/Solr committer in August 2016.

Solr Troubleshooting – Treemap Approach: Presented by Alexandre Rafolovitch, United Nations from Lucidworks

Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Solr Troubleshooting: Treemap Approach appeared first on Lucidworks.com.

Library of Congress: The Signal: Congress.gov Nominated for Award

Tue, 2016-08-23 17:24

Poster of the legislative process. Congress.gov

FedScoop, the Washington DC government tech media company, announced that Congress.gov is one of their nominees for the 2016 FedScoop 50 awards.

Features on Congress.gov (which In Custodia Legis has been posting about throughout its development) include:

  • Ability to narrow and refine search results
  • Ability to simultaneously search all content across all available years
  • Bill summaries and bill status from the 93rd Congress through the present
  • Bill text from the 103rd Congress through the present Congressional Record
  • Committee landing pages
  • Comprehensive searching across bill text
  • Congressional Record index
  • Congressional reports
  • Easier identification of current bill status
  • Effective display on mobile devices
  • Executive communications
  • House and Senate calendars
  • Links to video of the House and Senate floor
  • Members’ legislative history and biographical profiles
  • Nominations
  • Persistent URLs
  • Top searched bills
  • Treaties

The FedScoop website states, “Congress.gov is the official website for U.S. federal legislative information. The site provides free access to accurate, timely, and complete legislative information. The Library of Congress manages Congress.gov and ingests data from the Office of the Clerk of the U.S. House of Representatives, the Office of the Secretary of the Senate, the Government Publishing Office, and the Congressional Budget Office. Congress.gov is fully responsive and intuitive. The success of Congress.gov has enabled the Library of Congress to retire legacy systems, better serve the public, members of Congress and congressional staff, and to work more effectively with data partners.”

Vote for your favorite Tech Program of the Year.

LibUX: Blueprint for Trello

Tue, 2016-08-23 15:21

If you think of a journey map as an aerial view, the top-down plot of your user’s tour through a service — imagine each step involved registering for a new library card — then the service blueprint is its cross-section.

Think of the service blueprint as the cross-section of a journey map.

Sussing-out the systems and processes that underlie that journey returns a lot of value for your time spent in the way of insight, and I would guess that, like the journey map, the service blueprint might provide the most bang for your buck.

Blueprinting was born out of sticky notes and conference rooms and lends itself to being lo-fi, especially because it’s a team sport. And although the best UI for group-think is a wall, there is a need for remote tools like Mural that is priced for teams without an enterprise budget.

I noticed in Erik Flowers and Megan Miller’s Practical Service Blueprinting Guide, this example —

www.practicalservicedesign.com

— that, to me, looks a little like Trello. Can you see it?

Trello is an organize-how-you-will collaborative kanban board, sort of.

Trello brings a lot to the table but in the world of budget-usability its real boons are that it’s free, ubiquitous, and extensible. It can be integrated into just about any workflow and accessed anywhere from any device. Updates are real-time. Shoot, I almost wish I had an affiliate link. I use it for everything — and, chances are, a lot of you do too — and for that alone it makes sense to me to leverage its consistency in our user experience work as a blueprinting tool, if we can.

So, inspired by Scrum for Trello — a browser extension that adds Fibonacci numbers and a burn down chart — I started Blueprint for Trello. All it really does is make items using shortcodes like [touchpoint] look more like the practical service blueprint.

Really beta

This is my first browser extension — so it’s just Chrome, for now — and at the time of this writing there is an important to-do item that until checked makes this pretty incomplete. Namely, it only applies the skin on page load – so as you add cards or move things around, they lose their styling and need a refresh. Still, I want to give Blueprint for Trello its first spin at a workshop I’m teaching this afternoon, so I thought I’d share what I have.

Installation
  1. Download and unzip the folder (Github).
  2. Using Chrome, navigate to chrome://extensions and check the box “Developer Mode”
  3. Choose “Load unpacked extensions”
    1. Find the folder blueprint-for-trello or blueprint-for-trello-master
    2. And, depending how you downloaded it, choose the folder of the same name inside it.

 

Usage

Blueprint for Trello applies styling to cards that include in their title a number of short codes:

  • [touchpoint]
  • [actor]
  • [system]
  • [stakeholder]
  • [observation]
  • [data]
  • [question]
  • [critical]
  • [policy]
  • [idea]

It replaces these with icons and applies a color to the cards.

From there, the Practical Service Blueprinting Guide should take you the rest of the way.

David Rosenthal: Content negotiation and Memento

Tue, 2016-08-23 15:00
Back in March Ilya Kreymer summarized discussions he and I had had about a problem he'd encountered building oldweb.today thus:
a key problem with Memento is that, in its current form, an archive can return an arbitrarily transformed object and there is no way to determine what that transformation is. In practice, this makes interoperability quite difficult. What Ilya was referring to was that, for a given Web page, some archives have preserved the HTML, the images, the CSS and so on, whereas some have preserved a PNG image of the page (transforming it by taking a screenshot). Herbert van de Sompel, Michael Nelson and others have come up with a creative solution. Details below the fold.


I suggested that what we were really talking about was yet another form of content negotiation; Memento (RFC7089) specifies content negotiation in the time dimension, HTTP specifies content negotiation in the format and language "dimensions", and what Ilya wanted was content negotiation in the "transform" dimension to allow a requestor to choose between transformed and untransformed versions of the page. Ilya's list of transforms was:
  • none - the URL content exactly as originally received.
  • screenshot - an image of the rendered page.
  • altered-dom - the DOM altered as, for example, by archive.is.
  • url-rewritten - URLs in the page rewritten to point to preserved pages in the archive.
  • banner-inserted - the page framed by archival metadata as, for example, by the Wayback Machine.
Ilya's and my idea was that a new HTTP header would be defined to support this form of content negotiation.

Banner-inserted contentoutlined in redShawn Jones, Herbert and Michael objected that defining new HTTP headers was hard, and wrote a detailed post which explained the scope of the problem:
In the case of our study, we needed to access the content as it had existed on the web at the time of capture. Research by Scott Ainsworth requires accurate replay of the headers as well. These captured mementos are also invaluable to the growing number of research studies that use web archives. Captured mementos are also used by projects like oldweb.today, that truly need to access the original content so it can be rendered in old browsers. It seeks consistent content from different archives to arrive at an accurate page recreation. Fortunately, some web archives store the captured memento, but there is no uniform, standard-based way to access them across various archive implementations. Their proposal was to use two different Memento TimeGates, one for the transformed and one for the un-transformed content.

The elegance of Herbert et al's latest proposal comes from eliminating the need to define new HTTP headers or to use multiple TimeGates. Instead, they propose using the standard Prefer header from RFC7240. They write:
Consider a client that prefers a true, raw memento for http://www.cnn.com. Using the Prefer HTTP request header, this client can provide the following request headers when issuing an HTTP HEAD/GET to a memento. GET /web/20160721152544/http://www.cnn.com/ HTTP/1.1 Host: web.archive.org Prefer: original-content, original-links, original-headersConnection: close
As we see above, the client specifies which level of raw-ness it prefers in the memento. In this case, the client prefers a memento with the following features:
  1. original-content - The client prefers that the memento returned contain the same HTML, JavaScript, CSS, and/or text that existed in the original resource at the time of capture.
  2. original-links - The client prefers that the memento returned contain the links that existed in the original resource at the time of capture.
  3. original-headers - The client prefers that the memento response uses X-Archive-Orig-* to express the values of the original HTTP response headers from the moment of capture.
The memento that is returned can carry the the Preference-Applied HTTP response header indicating which of the requested preferences have been applied to the returned content. This is closely analogous to the earlier suggestion of content negotiation but doesn't require either new headers or multiple TimeGates.

The details of their proposal are important, you should read it.

Open Knowledge Foundation: Come talk Open Data and Land Governance with Cadasta on the LandPortal! Join the online discussion Sept 6-20th, 2016

Tue, 2016-08-23 11:39

Earlier this year, Open Knowledge International announced a joint-initiative with the Cadasta Foundation to explore property rights data with the ultimate goal of defining the land ownership dataset for the Global Open Data Index. Lindsay Ferris from the Cadasta team shares more on how you can get involved on issues related to open data in land governance. Register as a user on the Land Portal and take part in September 2016’s LandDebate.

Are you interested in ensuring land governance is open and transparent? Do you want to understand your role in using land administration information and how it can help provide greater security of  property rights? We are excited to hear from you!

Cadasta Foundation is pleased to announce that we are partnering with the Land Portal Foundation to organize an online discussion on Open Data and Land. From September 6th – 20th, 2016, we will facilitate a LandDebate on the Land Portal, posing questions to spark conversation. The Land Portal is an online resource hub and networking platform for the land community. We will hear from leading members of the open data and land governance communities on the topic of open data in land governance — all stakeholders, including CSOs, government officials, private sector actors and researchers are invited to be involved! To find out more about LandDebates, take a look at some past topics here.

Image credit: Cadasta

The land administration sector plays a critical role in governing what is often the most valuable asset of states – the land and natural resources. Unfortunately, given the high value of land, and the power that goes along with access to it, the land sector is ripe for potential abuse. As such, it is a sector where greater transparency plays a critical role in ensuring accountability and equitable access and enforcement of land rights. Opacity in land governance can enable major corruption in land management, increase difficulty in unlocking the value of the land as an asset, and foster a lack of awareness of land policies and legal frameworks by citizens; all of which can undermine land tenure security.

Unfortunately, land administration data ranging from property registries and cadastres to datasets collected through participatory mapping and research is often inaccessible. The information needed to close these gaps to understand who has a right to what property and under what terms remains closed, often at the expense of the most vulnerable populations. Further, due to privacy and security concerns associated with sharing information on vulnerable populations, opinions remain mixed on what should be released as “open data” for anyone to access, reuse and share. The hope is that opening up the data in a way that takes these concerns into account can level the playing field and reduce information asymmetry so that everyone — individuals, communities, NGOs, governments and the private sector — can benefit from land information.

As part of Cadasta Foundation’s on-going research on open data in land, the aim of this discussion is to bring together these stakeholders to address the implications of open data for land governance, including understanding the links between transparency and global challenges, such as overcoming poverty, strengthening property rights for vulnerable populations, enhancing food security and combating corruption. We also hope to broaden consensus on this issue, define what data is important for the community to be open and begin to collect examples of best practices that can be used as an advocacy point going forward. All of Cadasta’s open data research resources can be found here.

To be a part of the LandDebate, simply register as a user on the Land Portal. Then, you’ll be able to dive right in when the conversation begins on September 5th through the Open Data and Land page. If you’d like to reach out with questions on the content, how to get involved or to contribute comments in advance of the LandDebate, contact us atopen@cadasta.org! Finally, to get some background on open data in land, check out Cadasta’s existing resources on the topic here. We’re excited to hear from you.

This piece was written by Lindsay Ferris and is cross-posted on the Cadasta blog.

FOSS4Lib Recent Releases: ePADD - 2.0

Mon, 2016-08-22 19:56

Last updated August 22, 2016. Created by Peter Murray on August 22, 2016.
Log in to edit this page.

Package: ePADDRelease Date: Friday, August 19, 2016

SearchHub: Where Search Meets Machine Learning

Mon, 2016-08-22 18:38

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Verizon’s Joaquin Delgado and Diana Hu’s talk, “Where Search Meets Machine Learning”.

Joaquin and Diana discuss ML-Scoring, an open source framework they’ve created that tightly integrates machine learning models into popular search engines, replacing the default IR-based ranking function.

Joaquin A. Delgado, PhD. is currently Director of Advertising and Recommendations at OnCue (acquired by Verizon). Previous to that he held CTO positions at AdBrite, Lending Club and TripleHop Technologies (acquired by Oracle). He was also Director of Engineering and Sr. Architect Principal at Yahoo! His expertise lies on distributed systems, advertising technology, machine learning, recommender systems and search. He holds a Ph.D in computer science and artificial intelligence from Nagoya Institute of Technology, Japan.

Diana Hu is currently the lead data scientist on the architecture team at Verizon OnCue. She steers the algorithm efforts to bring models from research to production in machine learning, NLP, and computer vision TV projects in recommender systems and advertising. Previously, she worked at Intel labs, where she researched large scale machine learning frameworks. She holds an MS and BS in Electrical and Computer Engineering from Carnegie Mellon University where she graduated with highest honors and was inducted into the Electrical Engineering Honor Society – Eta Kappa Nu.

Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado, Verizon from Lucidworks

Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Where Search Meets Machine Learning appeared first on Lucidworks.com.

Equinox Software: Evergreen 2008

Mon, 2016-08-22 17:42

This is the third in our series of blog posts leading up to Evergreen’s 10th birthday.  The other posts can be found here and here.

At the beginning of 2008, I was working for the South Carolina State Library. Like many libraries, we were joining the OSS movement by implementing open source tools to market services and increase discoverability of our collections. We were encouraging our public libraries to do the same offering classes on blogging, wikis and social media. It was no surprise when the State Library hosted visitors from PINES and Equinox to introduce Evergreen.

Having worked for an ILS vendor previously, I was intrigued by the possibility of an open source ILS. The idea that libraries could take back some control over how their ILS developed was exciting! There was a fair amount of skepticism in the audience. Evergreen was in its toddler years and, for some, they needed to see it mature a little before jumping on board. For others, they were ready for a change and saw Evergreen as the opportunity they needed to move their libraries into the future. This is where SC LENDS started to form, but I’ll let my colleague Rogan tell you that story.

For me, 2008 was the year I packed my bags for Georgia to join the Equinox team. I was inspired by what PINES had started and wanted to be a part of building Evergreen further. I knew libraries would gravitate quickly toward the open source business model and developing their own solutions. After all, libraries have been centered around open access and community since their inception.

It’s been a privilege to watch Evergreen grow from those early days to adulthood. I no longer have to talk to potential customers about open source concerns or the maturity of Evergreen. Our discussions are centered around the robust features of Evergreen and how it can work for their library. I still encounter skepticism but it often results in the best discussions. On those occasions where skeptics become true believers, we find our strongest community supporters.

Evergreen is turning 10 years old! It feels more like a 21st birthday celebration because we’ve come so far so fast. I raise my glass to Evergreen and everyone who has been a part of this first 10 years. I can’t wait to see what the next 10 years will bring!

–Shae Tetterton, Director of Sales

Harvard Library Innovation Lab: Summer Fellows Share, Join Us

Mon, 2016-08-22 16:36

LIL fellows are wrapping up their terms this week! Please join us for and learn from our Fellows as they present their research involving ways we can explore and utilize technology to preserve, prepare, and present information for the common good.

Over 12 weeks, the Fellows produced everything from book chapters, web applications, and board games ­ and, ultimately, an immeasurable amount of inspiration that extends far beyond the walls of Langdell.  They explored subjects such as text data modeling, web archiving, opening legal data, makerspaces, and preserving local memory in places disrupted by disaster.

Please RSVP to Gail Harris

Our fellows will be sharing their work these fascinating topics on Wednesday, August 24 from 1:00-3:00 in Casperon Room.

LibUX: Library Service Design with Joe Marquez

Mon, 2016-08-22 13:06

In our January design trends episode of the podcast, we guessed that this year would begin the library service design zeitgeist. We were pretty self-congratulatory when several months later ALA published Library Service Design, written by Joe Marquez and Annie Downey. Here is the gist:

Service design is a holistic, cocreative, and usercentered approach to understanding user behavior for creating or refining services. Use this Lita Guide to help as a toolkit for implementing service design studies and projects at all types of libraries. It begins with directions for how to create a service design team and assembling a user working group for your library and move through the various phases in a service design journey. The authors outline the tools required to gain insights into user behavior and expectation and how to diagnose the difference between a symptom and a problem users face when interacting within the library environment. The guide features a series of examples that the service design team can use to learn how to work with library staff and patrons to find out what current user experience is like and how to refine services to better meet user expectations. Publisher’s summary

In this episode of the LibUX Podcast, Joe and I talk about service design, the role of the UX designer, organizational inertia and inherited ecology, blueprinting, and a lot more.

Please like, subscribe, and review. We appreciate it!

Notes

Help us out and say something nice. Your sharing and positive reviews are the best marketing we could ask for.

If you like, you can download the MP3 or subscribe to LibUX on Stitcher,iTunes, YouTube, Soundcloud, Google Play Music, or just plug our feed straight into your podcatcher of choice.

Open Knowledge Foundation: Open Knowledge Brazil summer 2016 update

Mon, 2016-08-22 11:19

This blog post is part of our summer chapters updates and was written by the team of OK Brazil. 

Brazil is not only about the Olympics. A lot has been going on in the Brazilian chapter of the Open Knowledge Network as well. Here we highlight the significant chapter developments, including some new faces and some exciting plans.

Personnel

One of the most crucial changes in the chapter is in the area of human resources.  Ariel Kogan, an OK Brazil longtime member, took over as CEO from Tom (Everton) Zanella Alvarenga. We wish Tom much luck in his new path and would like to thank him for the work he has done for the chapter so far.
Also, We also have a new addition to our chapter, Elza Albuquerque who joined us as our communication officer.  Lastly, we have a new advisory board. You can meet our new board in this link.

 

Open Spending News
Where did my money go website already has the executive budget data for four Brazilian cities: São Paulo (SP), Belo Horizonte (MG), in order toCuritiba (PR) and Recife (PE). The Brazilian Open Spending team is looking for more information about the others so they can add them to the platform.

We also welcome a new developer to the OpenSpending team, Lucas Ansei. He will be responsible for the next system implementations.

 

Our latest publications

Open Knowledge Brasil planning sessions Credit: Open Knowledge Brasil – Rede pelo Conhecimento Livre Facebook

Global events
– Trip to Estonia, digital government laboratory.

In July, Ariel Kogan and Thiago Rondon (Open Spending coordinator and Adviser for Open Knowledge Brazil)  travelled to Estonia to learn about their experience with e-government, e-vote, data security and administration. The trip was supported by Fundacion Avina, in the context of EuVoto (I vote) project.
– OKBr participation in the Berlin International Open Knowledge leadership course by Rufus Pollock. The participation in this meeting was also possible thanks to Fundacion Avina support.

Transparency

Check our accounts and balance –

Copy of Bank Statement

Trial Balance

 

Final words…

Lastly, OK Brasil is in the process of planning ahead. We initiated a new strategic planning process for the chapter for 2016-2018. The goal is to validate what was built in previous stages in order to increase new contributions to present the first OKBr planning document from 2016 to 2018.

Have a look at the Open Knowledge Brazil retrospective and next steps and let us know what you think. We are looking forward to hearing from the global community and connecting more with what others are up to.  Follow us on Facebook or Twitter for more live updates!

Karen Coyle: Wikipedia and the numbers falacy

Mon, 2016-08-22 03:44
One of the main attempts at solutions to the lack of women on Wikipedia is to encourage more women to come to Wikipedia and edit. The idea is that greater numbers of women on Wikipedia will result in greater equality on the platform; that there will be more information about women and women's issues, and a hoped for "civilizing influence" on the brutish culture.

This argument is so obviously specious that it is hard for me to imagine that it is being put forth by educated and intelligent people. Women are not a minority - we are around 52% of the world's population and, with a few pockets of exception, we are culturally, politically, sexually, and financially oppressed throughout the planet. If numbers created more equality, where is that equality for women?

The "woman problem" is not numerical and it cannot solved with numbers. The problem is cultural; we know this because attacks against women can be traced to culture, not numbers: the brutal rapes in India, the harassment of German women by recent-arrived immigrant men at the Hamburg railway station on New Year's eve, the racist and sexist attacks on Leslie Jones on Twitter -- none of these can be explained by numbers. In fact, the stats show that over 60% of Twitter users are female, and yet Jones was horribly attacked. Gamergate arose at a time when the number of women in gaming is quite high, with data varying from 40% to over 50% of gamers being women. Women gamers are attacked not because there are too few of them, and there does not appear to be any safety in numbers.

The numbers argument is not only provably false, it is dangerous if mis-applied. Would women be safer walking home alone at night if we encouraged more women to do it?  Would having more women at frat parties reduce the rape culture on campus? Would women on Wikipedia be safer if there were more of them? (The statistics from 2011 showed that 13% of editors were female. The Wikimedia Foundation had a goal to increase the number to 25% by 2015, but Jimmy Wales actually stated in 2015 that the number of women was closer to 10% than 25%.) I think that gamergate and Twitter show us that the numbers are not the issue.

In fact, Wikipedia's efforts may have exacerbated the problem. The very public efforts to bring more women editors into Wikipedia (there have been and are organized campaigns both for women and about women) and the addition of more articles by and about women is going to be threatening to some members of the Wikipedia culture. In a recent example, an edit-a-thon produced twelve new articles about women artists. They were immediately marked for deletion, and yet, after analysis, ten of the articles were determined to be suitable, and only two were lost. It is quite likely that twelve new articles about male scientists (Wikipedia greatly values science over art, another bias) would not have produced this reaction; in fact, they might have sailed into the encyclopedia space without a hitch. Some editors are rebelling against the addition of information about women on Wikipedia, seeing it as a kind of reverse sexism (something that came up frequently in the attack on me).

Wikipedia's culture is a "self-run" society. So was the society in the Lord of the Flies. If you are one of the people who believe that we don't need government, that individuals should just battle it out and see who wins, then Wikipedia might be for you. If, instead, you believe that we have a social obligation to provide a safe environment for people, then this self-run society is not going to be appealing. I've felt what it's like to be "Piggy" and I can tell you that it's not something I would want anyone else to go through.

I'm not saying that we do not want more women editing Wikipedia. I am saying that more women does not equate to more safety for women. The safety problem is a cultural problem, not a numbers problem. One of the big challenges is how we can define safety in an actionable way. Title IX, the US statute mandating equality of the sexes in education,  revolutionized education and education-related sports. Importantly, it comes under the civil rights area of the Department of Justice. We need a Title IX for the Internet; one that requires those providing public services to make sure that there is no discrimination based on sex. Before we can have such a solution, we need to determine how to define "non-discrimination" in that context. It's not going to be easy, but it is a pre-requisite to solving the problem.

DuraSpace News: MEET the Fedora Camp NYC Instructors; Curriculum Available

Mon, 2016-08-22 00:00

Austin, TX  Fedora Camp NYC, hosted by Columbia University Libraries, will be offered at Columbia University’s Butler Library in New York City November 28-30, 2016. Final details–instructors and curriculum are now available on the wiki.

District Dispatch: ALA submits comments on digital deposit

Fri, 2016-08-19 20:11

ALA, as a member of the Library Copyright Alliance (LCA), submitted comments to the U.S. Copyright Office (CO) regarding “Mandatory Deposit of Electronic Books and Sound Recordings Available Only Online.” The comments describe the importance of deposit in ensuring that the Library of Congress (Library) continues to build and preserve a national collection of works.

ALA submitted comments to the U.S. Copyright Office this week

In January 2010 the Copyright Office implemented an interim rule regarding the mandatory deposit of electronic works not available in a physical format, which previously had been exempted from the mandatory deposit rules. The LCA now recommends that the Copyright Office expand this interim rule to include online-only books and sound recordings in its deposit program. The LCA also strongly recommends that, as digital security has improved since the current interim rule was considered, the CO provide broader public access to digital deposit content.

The post ALA submits comments on digital deposit appeared first on District Dispatch.

District Dispatch: CopyTalk webinar on finding the public domain

Fri, 2016-08-19 18:46

What would it take to pull together the expertise to perform copyright reviews on 11 million digitized books? The Copyright Review Management System (CRMS) was recognized by the ALA in 2016 as the recipient of the prestigious L. Ray Patterson Award for copyright advocacy – the first group effort so honored. CRMS is a University of Michigan-led initiative to answer to this question, coordinating trained reviewers from 17 institutions over 8 years to assess the copyright status of digitized books held in the HathiTrust Digital Library.  To date, partners on the IMLS-supported Copyright Review Management System projects have made over 300,000 copyright determinations for US-published books and over 100,000 determinations for books published in Canada, Australia and the United Kingdom. What does this mean for a responsible search for the public domain? Join Melissa Levine, Lead Copyright Officer and PI, University of Michigan Library and Kristina Eden, Copyright Review Project Manager, HathiTrust to learn about the recently published CRMS Toolkit, lessons from CRMS for shared work across libraries to tackle hard problems, and what’s in store for CRMS.

Day/Time: Thursday, September 1, 2016 at 2pm Eastern/11am Pacific for our hour long free webinar.

Go to http://ala.adobeconnect.com/copytalk/ and sign in as a guest. You’re in.

This program is brought to you by OITP’s copyright education subcommittee.

Read the CRMS Toolkit online on for free: http://dx.doi.org/10.3998/crmstoolkit.14616082.0001.01

You can order this book through Amazon. For more information, visit: http://quod.lib.umich.edu/c/crmstoolkit

The post CopyTalk webinar on finding the public domain appeared first on District Dispatch.

Jez Cope: Software Carpentry: SC Build; or making a better make

Fri, 2016-08-19 18:30

Software tools often grow incrementally from small beginnings into elaborate artefacts. Each increment makes sense, but the final edifice is a mess. make is an excellent example: a simple tool that has grown into a complex domain-specific programming language. I look forward to seeing the improvements we will get from designing the tool afresh, as a whole…
Simon Peyton-Jones, Microsoft Research (quote taken from SC Build page)

Most people who have had to compile an existing software tool will have come across the venerable make tool (which usually these days means GNU Make). It allows the developer to write a declarative set of rules specifying how the final software should be built from its component parts, mostly source code, allowing the build itself to be carried out by simply typing make at the command line and hitting Enter.

Given a set of rules, make will work out all the dependencies between components and ensure everything is built in the right order and nothing that is up-to-date is rebuilt. Great in principle but make is notoriously difficult for beginners to learn, as much of the logic for how builds are actually carried out is hidden beneath the surface. This also makes it difficult to debug problems when building large projects. For these reasons, the SC Build category called for a replacement build tool engineered from the ground up to solve these problems.

The second round winner, ScCons, is a Python-based make-like build tool written by Steven Knight. While I could find no evidence of any of the other shortlisted entries, this project (now renamed SCons) continues in active use and development to this day.

I actually use this one myself from time to time and to be honest I prefer it in many cases to trendy new tools like rake or grunt and the behemoth that is Apache Ant. Its Python-based SConstruct file syntax is remarkably intuitive and scales nicely from very simple builds up to big and complicated project, with good dependency tracking to avoid unnecessary recompiling. It has a lot of built-in rules for performing common build & compile tasks, but it’s trivial to add your own, either by combining existing building blocks or by writing a new builder with the full power of Python.

A minimal SConstruct file looks like this:

Program('hello.c')

Couldn’t be simpler! And you have the full power of Python syntax to keep your build file simple and readable.

It’s interesting that all the entries in this category apart from one chose to use a Python-derived syntax for describing build steps. Python was clearly already a language of choice for flexible multi-purpose computing. The exception is the entry that chose to use XML instead, which I think is a horrible idea (oh how I used to love XML!) but has been used to great effect in the Java world by tools like Ant and Maven.

Equinox Software: Evergreen 2007

Fri, 2016-08-19 16:34

This is the second in our series of posts leading up to Evergreen’s birthday.  The series starts with Jason’s post from yesterday.  Please do read that if you haven’t already!

In 2007 Evergreen became Open Source software in practice, not just in name.

The three committers to the subversion repository in use at the time, Jason, Bill, and myself had, over nearly three years, personally typed all of the code that made up the software, gladly accepting the many suggestions for improvement we received via email and in person, of course.

On Monday, March 5, 2007 that changed when Bill applied the first actual patch we’d received over the previous weekend. It wasn’t a huge new feature or reams of documentation, but it was a watershed moment for the project — with that one commit, pictured below, Evergreen was now software owned by more than one entity, locking in the promise of its Open Source licence by making sure nobody could hide the code away in the future.

Soon after that we began receiving contributions of code from more individuals, some folks from libraries, such as Travis Schafer, and some, like Scott McKellar, just folks that make a hobby of “nosing about in other people’s code.” By the end of 2007, the list of committers had grown from 3 to 4 — Dan Scott got the commit bit on September 7 — and contributors were in the double digits. Both accelerated quickly in later years.

Some of these early contributors were at institutions either testing or deploying Evergreen. Some of them are still there, running those installations. Of the production deployments, most were small compared to the PINES. However, the seeds of large sites were being planted.

It was in late 2007 that the first two SITKA libraries migrated to Evergreen; Prince Rupert Library and Fort Nelson Public Library. Now SITKA numbers approximately 200, but I’ll remember those first two clearly forever. I’ll remember them because 2007 was also the year Equinox began providing services to libraries interested in Evergreen, and I had the opportunity to personally perform those migrations in the fall of 2007.

It’s interesting, to me, at least, to remember that SITKA was Equinox’s very first customer; they signed their support and migration contract with us more than a month before PINES. In a way I think of that as emblematic of Evergreen’s past and its future promise.  We are a community that innovates. We are largely a community of early-adopters-cum-leaders.  And we are a community focused on both the promise and the pledge of Open Source — the development methodology and philosophy.  Looking back, all the way to the start, I see that’s always been the case and I’m proud to have been and still be a part of that.

— Mike Rylander, President

 

Pages