You are here

Feed aggregator

Nicole Engard: Bookmarks for March 24, 2015

planet code4lib - Tue, 2015-03-24 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Scrollback Create rooms based on your interest or follow existing ones. Share ideas, discuss realtime and redefine your online community experience with Scrollback.

Digest powered by RSS Digest

The post Bookmarks for March 24, 2015 appeared first on What I Learned Today....

Related posts:

  1. Koha Users and Developers to Meet at KohaCon 2009
  2. Can you say Kebberfegg 3 times fast
  3. Capturing, Sharing and Acting on Ideas

FOSS4Lib Recent Releases: Archivematica - 1.3.2

planet code4lib - Tue, 2015-03-24 17:05

Last updated March 24, 2015. Created by Peter Murray on March 24, 2015.
Log in to edit this page.

Package: ArchivematicaRelease Date: Wednesday, March 18, 2015

David Rosenthal: The Opposite Of LOCKSS

planet code4lib - Tue, 2015-03-24 15:00
Jill Lepore's New Yorker "Cobweb" article has focused attention on the importance of the Internet Archive, and the analogy with the Library of Alexandria. In particular on the risks implicit in the fact that both represent single points of failure because they are so much larger than any other collection.

Typically, Jason Scott was first to respond with a outline proposal to back up the Internet Archive, by greatly expanding the collaborative efforts of ArchiveTeam. I think Jason is trying to do something really important, and extremely difficult.

The Internet Archive's collection is currently around 15PB. It has doubled in size in about 30 months. Suppose it takes another 30 months to develop and deploy a solution at scale. We're talking crowd-sourcing a distributed backup of at least 30PB growing at least 3PB/year.

To get some idea of what this means, suppose we wanted to use Amazon's Glacier. This is, after all, exactly the kind of application Glacier is targeted at. As I predicted shortly after Glacier launched, Amazon has stuck with the 1c/GB/mo price. So in 2017 we'd be paying Amazon $3.6M a year just for the storage costs. Alternately, suppose we used Backblaze's Storage Pod 4.5 at their current price of about 5c/GB, for each copy we'd have paid $1.5M in hardware cost and be adding $150K worth per year. This ignores running costs and RAID overhead.

It will be very hard to crowd-source resources on this scale, which is why I say this is the opposite of Lots Of Copies Keep Stuff Safe. The system is going to be short of storage; the goal of a backup for the Internet Archive must be the maximum of reliability for the minimum of storage.

Nevertheless, I believe it would be well worth trying some version of his proposal and I'm happy to help any way I can. Below the fold, my comments on the design of such a system.
ReliabilityWhy is reliability so important for this system? After all, I've been arguing that reliability isn't as important as people think elsewhere. Lets suppose that somehow we have a single copy of the 30PB on disks, and that we perform an integrity check via checksums 10 times a year. Optimistically, we assume these disks never fail for any reason, but they achieve their specified Unrecoverable Bit Error Rate (UBER) of 10-15. There are 2.4*1017 bits in the copy, so on average every time we do an integrity check we will get 240 bad bits. Pessimistically, we assume these bits are randomly distributed (this makes the analysis much easier).

If, as Jason suggests, the backup is divided into 70K 500GB blocks, the probability that any of them will have more than 1 bad bit is small, so we will lose 10*240*500GB of data every year, or 1.2PB, or about a third of the incoming data. Of course, we can repair these failures from the Internet Archive itself, at the cost of increasing the bandwidth impact from about an additional quarter of the Archive's current bandwidth to about a third (see below). But the probability that the Archive and the backup would lose the same data becomes significant.

This argues for much smaller blocks, to reduce the impact of the UBER at the cost of increasing the overhead of the system. Smaller blocks would also make it possible for more people to contribute storage, both from the cost of their contribution and from the impact on their bandwidth. Downloading 500GB on my DSL link would take its entire capacity for two weeks.

In real life, even in data centers disks fail in all sorts of ways that make UBER fairly unimportant. The crowd-sourced disks are likely to be much less reliable still. So the system needs to replicate the data.
ReplicationThe discussions I've seen so far assume that the data is simply replicated, as it is in the LOCKSS system, but only three times. Even replicating by a factor of three means the demand for storage in the backup network by 2018 is nearly 100PB. Clearly, some scheme that gave adequate reliability but used less storage would help significantly. There are two techniques that might help, erasure coding and entanglement. Warning: the following discussion is radically simplified, see here.
Erasure CodingErasure coding is like a distributed version of RAID; files are divided into storage blocks. These data blocks are organized into groups of N. For each group (M>N) blocks are stored, containing the data from the N blocks mixed together so that from any N blocks in the group the original data can be recovered. This allows for non-integer replication factors; the replication factor is (M/N). There are two ways to do this:
  • The N data blocks can be stored unchanged, and (M-N) parity blocks computed from the N blocks can be added. This is the way RAID typically works; it has the advantage that, if nothing has gone wrong, reading a data block requires accessing only a single block. Writing a data block requires writing (1+M-N) blocks, as the parity blocks need to be updated to reflect the new data.
  • The N data blocks are not stored. Instead M blocks are stored each computed from all of the N data blocks in the group. Reading a block requires accessing N blocks,  writing a data block requires accessing M blocks.
The second form is much more expensive, so why would you do it? In a distributed system these accesses can happen in parallel, so the impact is less.  Also, reads of the backup, other than for integrity checking, will be rare, and integrity checks do not need to recover data "in the clear"; the performance costs are not significant in this application.

The real importance is that no individual storage node can, if compromised, reveal any data. In the context of a crowd-sourced backup of the Internet Archive, this is important. If a node in the backup network contains data from the Archive "in the clear" the owner of the node might be in trouble if the relevant authorities considered that content undesirable. If the owner has deniability, in the sense that they can say "there is no way I can know what the data I am storing is, and no way anyone can recover usable data from my disk alone" it is much harder for the authorities to claim that the owner is doing something bad.

The second form of erasure coding has desirable properties for a backup of the Archive, and it can significantly reduce the demand for storage. Examples of systems using the second form of erasure coding are Tahoe-LAFS and Cleversafe.
EntanglementEntanglement was introduced in two 2001 papers, Tangler: A Censorship-Resistant Publishing System Based On Document Entanglements by Marc Waldman and David Mazières, and Dagster: Censorship-Resistant Publishing Without Replication by Adam Stubblefield and Dan Wallach. It has recently been revived in a strengthened form by Verónica Estrada Galiñanes and Pascal Felber in their paper Helical Entanglement Codes: An Efficient Approach for Designing Robust Distributed Storage Systems.

Like the second form of erasure coding, entanglement does not store the data blocks themselves. Each stored block contains data derived from multiple data blocks. The key difference is that erasure coding mixes the data from a fixed group of blocks, whereas entanglement does not organize the blocks into groups but mixes each incoming block with a pseudo-randomly chosen set of stored blocks. This has the following effects:
  • Since the information from which a data block can be recovered is spread across the whole set of stored blocks, deleting or over-writing a data block will affect other data blocks. If the spread is wide enough, selective censorship is effectively blocked and the system is append-only. For the Internet Archive backup application, this is a good thing.
  • Entanglement supports only integer replication factors:
    • Tangler's publishing algorithm takes two stored blocks and a data block and outputs two new stored blocks, thus its replication factor is two. A data block can be recovered from any three of the four (two input and two output) stored blocks.
    • In the default three-strand configuration of the Helical Entanglement Code (HEC) system publishing takes three stored blocks, one from each strand, and a data block and outputs three new stored blocks,  one for each strand. Its replication factor is thus three. Absent data loss, recovering a data block requires accessing two successive stored blocks from any of the three strands. If data loss means that none of the three strands can supply the necessary blocks, a search process can recover the lost stored blocks from information in other stored blocks.
Entanglement systems vary in how they spread the information about a data block among the stored blocks. In Towards A Theory of Data Entanglement James Aspnes et al introduced two criteria for this:
  • A system provides document dependency if a document cannot be recovered if any document it is entangled with is lost.
  • A system provides all-or-nothing integrity if no document can be recovered if any document is lost.
They show that Dagster and Tangler do not meet these criteria. Helical Entanglement is claimed to provide all-or-nothing integrity, the stronger of the criteria. In her brief Work-In-Progress talk at FAST15, Verónica Estrada Galiñanes showed that HEC systems could be configured with large numbers of strands and devices and a replication factor of four to have very high tolerance for failures.

Entanglement has several desirable properties for a backup of the Archive, but it has too high a replication factor to be practical.
RequirementsIf I were doing the design I would start from the end I haven't seen any discussion of so far. Its neat to have a backup copy of the archive, but if it won't actually work when it is needed what's the point? So I'd start the design by looking not at how the data gets out there, but at the use cases when the data needs to get back. Two obvious cases are:
  • The archive loses say 10TB, perhaps because it suffers a rash of correlated disk failures. How does the archive get it back from the backup?
  • The Big One hits the Bay Area and the entire archive is lost. How can the service be re-constituted from the backup?
Note that time is a big issue here. If it is theoretically possible to recover the needed data, but only in a timescale that's so long everyone will have forgotten about the archive by the time it's back, nothing has really been achieved. Recovery needs to be modelled with realistic upstream bandwidths, which will be much less than downstream for most nodes, replication factors, and proportion of accessible nodes.

Once I had a good recovery design, then I'd figure out:
  • How to get data from the archive into a system like that.
  • How to continually audit the system verify that it was in good enough shape to work when needed, something systems used only in an emergency frequently fail to do.
So lets say the requirements are:
  • Capacity of 35PB by late 2017.
  • Replication factor less than 1.5, to limit storage demand by late 2017 to less than about 50PB, or say 100K volunteers each providing 500GB.
  • Provides deniability (see above).
  • Ingest bandwidth of 15PB/year by late 2017 (the current content of the archive needs to be backed up in say 2 years while it is growing say 5PB/year). Note that this is about 4Gb/s leaving the archive, or roughly an additional 25% outbound bandwidth.
  • 95% probability of correctly recovering 10TB in 5 days (it is assumed that much of the content will be off-line most of the time, so instant recovery cannot be a requirement).
  • 95% probability of correctly recovering 95% of the entire archive in 90 days.
  • Meet these requirements in the face of 5% malign nodes conspiring with each other, and realistic error and availability probabilities for the non-malign nodes.
  • The system self-configures as available storage resources change so as to:
    • Ensure all content is stored.
    • Minimize the variation of replication factor across the content.
As I said, this is an extremely difficult problem.

District Dispatch: Upcoming National Library Legislative Day deadlines

planet code4lib - Tue, 2015-03-24 12:30

National Library Legislative Day is May 4-5, 2015 and it will be here soon. Here are a few dates you should know:

  • March 31st is the last day to receive the discount rate available at the Liaison Hotel as part of our room block.
  • April 1st is the last day we are accepting nominations for the WHCLIST award.
  • April 24th is the last day to register online for National Library Legislative Day

If you haven’t already, please take a moment to share the following information with your networks, on social media, and through any listservs you moderate. Encourage everyone to join us, and to pass the information along to friends and associates who may also be interested:

“It is so important to have library advocates in Washington, DC to participate in impactful face-to-face meetings. And in the wake of the sweeping changes to both the House and the Senate in the 2014 Congressional elections, it is more important than ever that library supporters rally together to speak up on behalf of libraries and the communities they serve.

This year, National Library Legislative Day will be held May 4-5, 2015. If you haven’t already, please take a moment to consider joining us this year. Registration information and the discount code for the hotel room block are both available on the ALA Washington Office website.

Know a non-librarian who gets fired up about library issues? First-time participants are eligible for a unique scholarship opportunity. The White House Conference on Library and Information Services Taskforce (WHCLIST) and the ALA Washington Office are calling for nominations for the 2015 WHCLIST Award. Recipients of this award receive a stipend ($300 and two free nights at a D.C. hotel) to a non-librarian participant in National Library Legislative Day.

The promotional video for National Library Legislative Day this year can be found here.

Any questions regarding National Library Legislative Day can be directed to the Washington Office Grassroots Communications Specialist, Lisa Lindle.

The post Upcoming National Library Legislative Day deadlines appeared first on District Dispatch.

Library of Congress: The Signal: DPOE Interview with Sam Meister

planet code4lib - Tue, 2015-03-24 12:16

The following is a guest post by Barrie Howard, IT Project Manager at the Library of Congress.

This interview is part of a series about digital preservation training inspired by the Library’s Digital Preservation Outreach & Education (DPOE) Program. Today’s interview is with Sam Meister, University of Montana-Missoula, who is a DPOE Train-the-Trainer Workshop instructor and is also an instructor for the Society of American Archivists Digital Archives Specialist (DAS) Curriculum and Certificate Program.

Sam Meister

Barrie: Sam, you attended the inaugural DPOE Train-the-Trainer Workshop in Washington, DC, correct? Can you tell the readers about your experience and how you and others have benefited as a result?

Sam: I was fortunate to have had the opportunity to participate in the inaugural DPOE Train-the-trainer workshop in September 2011, very soon after I began my position as Digital Archivist at the University of Montana. This was a very valuable and timely opportunity as the experience kick-started my thinking and practice around digital preservation in the context of my new institutional setting.

When I recall the experience of those two days in September I think of a group of professionals who were excited and energized, eager to learn how they could play a role in sharing and building community of digital preservation knowledge around the United States. While some of us were coming to the workshop with less professional experience than others there was a feeling of collaboration and mutual benefit and respect amongst all of the participants. Additionally, we had an amazing opportunity to learn directly from leading experts and practitioners in the digital preservation field. The two workshop days were filled with exercises, conversations and debate, and by the end I felt ready to return to my corner of the country and share my new knowledge and skills with others in my region.

I feel I have benefited from my experience in a few different ways. Through workshops sponsored by the Montana Historical Society I have been able to present the DPOE curriculum to representatives from multiple cultural heritage organizations around the state of Montana. The workshop opportunities provided an opportunity to better understand the particular needs and concerns of smaller institutions, grounding and solidifying the importance of educational materials such as DPOE to encourage and empower individuals to take some action now, even if small. Additionally, I believe that education is integral to the sustainability of digital preservation, whether that is at a small historical society or a national digital preservation network. The DPOE experience has provided me with a set of materials and approaches to continue to further my own educational activities in the digital preservation arena.

Graphic used in previous DPOE workshops.

Barrie: Since becoming an official DPOE Trainer, you’ve taught SAA DAS courses and helped revise the DPOE Workshop Curriculum. Did your DPOE training inform the development of the SAA DAS course(s) you teach?

Sam: Yes, my DPOE training definitely informed my approach to teaching in the SAA DAS program. There is some crossover in the type of audience, made up of professionals who have been in the field, but may be in the beginning stages of developing and implementing local solutions for the long-term management of digital content. In that way, I have drawn from my experiences teaching DPOE workshops and applied this in the SAA DAS settings.

I would say that the approaches I have utilized have more to do with form and structure rather than the course content itself. In a one- to two-day workshop setting with around 20 people there is an opportunity to find a balance between presenting the course content and making room for a more conversational atmosphere. I have found that workshop participants desire real-world examples of the concepts that they are learning about, so I attempt to ground the course materials in my own experiences as a digital archivist. This may sometimes lead to slightly tangential conversations, but often these end up being very valuable to attendees.

Some of the desired outcomes for either the DPOE program or the SAA DAS courses are to instill in the participants the skills, knowledge and confidence to return to their institutions and start doing something, even if small, to tackle the digital preservation challenges they face. Learning about how others have dealt with similar situations helps to strengthen and build this confidence.

Barrie: Have you developed any other training materials from the DPOE Curriculum?

Sam: I have started developing a set of training materials drawing from the DPOE Curriculum that could be utilized in a workshop or training event on personal digital archiving. I’m in the early stages of developing these materials, but am excited about the potential of the DPOE Curriculum applying to this type of audience.

Barrie: Regarding training opportunities, what do you think are the strengths and challenges of traditional in-person learning environments versus distance learning options?

Sam: To date, I have only presented training opportunities for in-person settings, so I can speak to my experience as instructor in that regard. As a student, I have had experiences with both traditional and distance learning environments. Drawing on my experience as both an instructor and a student I would say that one of the challenges in a distance learning setting is creating an environment for meaningful exchange and dialogue between the students and instructor and amongst the students themselves.

While digital preservation as a subject may not be post-structuralism or philosophy, I have found that the opportunities for direct and immediate exchange, whether as an entire group, small groups, or as pairs, clearly benefit students by allowing for a deeper engagement with new, unfamiliar and seemingly abstract concepts. As an instructor, I know what parts of the course content may be difficult, and I can sense via body language or facial expressions if I need to spend additional time on a particular section to clarify. Additionally, as an audience that is primarily practicing professionals, the sense of community and support that results from attendance at in-person workshops is a very valuable outcome that will be of assistance well after they return to their particular institutions.

That said, digital preservation is a global challenge, and distance learning technology is steadily advancing to improve the experience of both instructors and students. While at this time we may not be able to fully replicate the in-person setting in an online environment, we should continue to make any and all efforts to expand the knowledge and skills needed to develop sustainable digital preservation solutions around the globe.

Barrie: What are your plans as a digital preservation training instructor for 2015?

Sam: I’m looking forward to an initial opportunity to participate as a instructor in a train-the-trainer workshop sometime in 2015 to help educate a new set of trainers to expand the digital preservation education network even further.

William Denton: Click here to scab

planet code4lib - Tue, 2015-03-24 00:29

The strike at York University, where I work, enters its fourth week tomorrow.

The university’s Labour Disruption Update site (they are always very careful to call it a “labour disruption”, not a strike) has some information for people in CUPE 3903 Units 1 (TAs) and 3 (GAs) who want to come back to work. If you follow through and log in with your York account, this is what you get.

“By clicking this checkbox, I agree that I am resuming all of my work assignments.”

It’s sad to see York University, home to decades of progressive thinking and social activism, doing this.

DuraSpace News: A View from the DuraSpace Summit

planet code4lib - Tue, 2015-03-24 00:00

Jonathan Markow, DuraSpace CSO, speaking at the DuraSpace Summit where members gathered on March 11-12, 2015

HangingTogether: Digitization challenges – a discussion in progress

planet code4lib - Mon, 2015-03-23 22:52

Internet Archive book scanner | Wikimedia Commons

It has been some time since we hosted our Digitization Matters symposium, which led to our report, Shifting Gears. This event and findings from the surveys of archives and special collections in the US and Canada, and  the UK and Ireland have helped to shape our work in the OCLC Research Library Partnership for some time. However, we felt like enough time had gone by, and enough had changed that it was time for us to begin some new discussions in order to frame future work.

We often hear from library colleagues that they continue to experience challenges associated with digitization of collections, so earlier this month we hosted some discussions (via WebEx) to try to get a handle on what some of those challenges are. Prior to the conversations, we asked participants to characterize their digitization challenges, and then did some rough analysis on the responses. Challenges fell into a number of areas.

  • Rights issues (copyright, privacy)
  • Born Digital, web harvesting
  • Issues with digital asset management systems (DAMS) or institutional repositories (IR)
  • Storage and preservation
  • Metadata: Item-level description vs collection descriptions
  • Process management / workflow / shift from projects to programs
  • Selection – prioritizing users over curators and funders
  • Audio/Visual materials
  • Access: are we putting things where scholars can find them?

We opted not to include the first four issues in our initial discussion — copyright, and rights issues in general, are quite complicated (and with a group that includes people from Canada, Europe, Asia, Australia, and New Zealand I’m not sure we could address it well). We have done quite a bit of work on born digital (and are currently investigating some areas related to web harvesting). At least for our first foray, discussions on DAMS and IRs seemed like they could have gone down a very tool-specific path. Likewise with storage and preservation. Even taking these juicy topics off the table, we still found we had plenty to chew on.

Metadata: Item-level description vs collection descriptions

Many of our discussion participants are digitizing archival collections — there is an inherent challenge in digitizing collections at the item or page level when the bulk of the description is at a collection level. People described “resistance” to costly item level description, and a desire to find an “adequate” aggregate description. On the other hand, there was an acknowledgement of the tension between keeping costs down and satisfying users who may have different expectations. A key here may be a more nuanced view of context — for correspondence, an archival approach may be fine. In other circumstances, not. Some institutions are digitizing collections (such as papyri) where the ability to describe the items is not resident in the library. How can we engage scholars to help us with this part of our work?

Process management / workflow / shift from projects to programs

Many institutions are still very much in project mode, looking to transition to programs. For those who have or are working towards digitization programs, there is a struggle to get stakeholders all on the same page: at some institutions, the content owners, metadata production unit, and technical teams seldom if ever come together; here, getting all parties together to establish shared expectations is essential. Some institutions are looking to establish workflows that will more effectively allow them to leverage patron-driven requests, while others are thinking about the implications of contributing content to aggregators like DPLA. One institution has started scanning with student employees — when students have a few minutes here or there, they can sit down at a scanning station and scan for 10-15 minutes — this leads to a steady stream of content.

Selection – prioritizing users over curators and funders

Many institutions are still operating under a model whereby curators or subject librarians feed the selection pool, either through a formal or informal process. Even in these models, it can be difficult to get input from all — there tend to be a small pool of people who engage in the process. At one institution, people who come with a digitization request are also asked to serve as “champions” and are expected to bring something to the project — contributing student hours to enhance metadata, for example. One institutions views selection as coming through three streams — donor initiated, vendor or commercial partner initiated, and initiated by the curatorial group (emphasizing that the three are not mutually exclusive). Another institution is looking at analytics and finding that curator initiated requests generate less online traffic than patron initiated requests. In a similar vein, a third institution is looking at what is being used in the reading room and considering making digitization requests based on that information. Even though people’s survey responses indicated that they would like to move selection more towards directly serving researchers needs, from the discussion I’d observe that few institutions have established models to do so.

Audio/Visual materials

As with born digital, everyone has A/V materials in their collection, and making them more accessible is a concern. A participant from one institution observed that they see key differences in interest for these formats — for example, filmmakers, not scholars, are the people who will seek out video. If there is a transcript for materials, that may impact demand. A/V projects tend to focus on at-risk materials, since costs are so high. Some institutions are beefing up their reformatting capacities, in anticipation of needing to act on these materials. If you are interested in this area, you will want to track the activities of the  (US based) Federal Agencies Audio-Visual Working Group.

Access: are we putting things where scholars can find them

For many institutions, aggregation is the name of the game, and thinking as a community about aggregating content is key: “Standalone silos don’t help users find our things.” Whether materials are in discovery repositories that are hosted by the institution or elsewhere, discoverability and user experience are concerns. One institution assigns students to search for materials via Google and in repositories. Are collections findable?

Thanks to all who took part in our discussions! I hope we’ll have more to report in the future.

 

 

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (280)

Shelley Gullikson: UXLibs conference: thoughts

planet code4lib - Mon, 2015-03-23 21:53

My first post on UXLibs was bits taken from my conference notes. This is what shook out when I reread all my notes and reflected a bit.

Matthew Reidsma (who was somehow even more inspiring in person than online, and I’m not sure how that’s even possible) spoke in his keynote about Heidegger, including his concept of being-in-the-world, and the question “How does the world reveal itself to us through our encounters with it?” In my notes, I continued “How does the library reveal itself through our encounters with it?” and – more pertinent to my work – “How does the library website reveal itself through our encounters with it?” Matt went on to explain that by interacting with things, we are making meaning. So, by interacting with the library website, what meaning are we helping our students make?

This made me think of the great workshop I’d had with Andrew Asher on the first day. One of the many things we did was watch videos of students trying to find information. A second year student needed to find peer reviewed articles but clearly had no idea what this meant. A fourth year student came upon an article on her topic from the Wall Street Journal and thought it could be useful in her paper because it sounded like it was on her topic and came from a credible source (not seeming to realize that a credible source is not the same as a scholarly source).  I found it striking that neither of these students seemed to understand what scholarship looked like; what it meant for a thing to be a scholarly source.

So, taking those two points together, is there a way we can help students make meaning of scholarship through interacting with our website? And I don’t just mean, how can we help them understand how to find various scholarly materials (you find books in this way, you find journal articles in that way), but can we help them understand how to interact with a journal article in a scholarly context? Can we help them use that article to first create understanding and then create their own scholarly work?

This in turn circles back to Donna Lanclos’ keynote on the first day where she challenged us to move beyond helping our users with wayfinding, and engage with them in the act of creation. She challenged us to move beyond the model of the bodiless scholar whose chair is hard, who can’t leave the library to eat, and who has to endure horrible searching on crappy library websites to find what they need. The finding part doesn’t have to be so hard. The hard part should be thinking about what you’ve found and then making something new out of it.

So, to grab a phrase from Paul-Jervis Heath’s keynote, “how might we” design a library website that helps students make meaning out of the scholarship they are finding? How might we design a library website that helps students focus less on finding and more on thinking and creating?

Since reading Emma Coonan’s great piece in UKSG News, “The ‘F’ word,” about moving away from a focus on finding in the context of information literacy, I’ve been wondering how we could do this in the context of the library website. UXLibs has prodded me further, and – even better – given me some tools, techniques, and a giant mound of inspiration to get out and try to start working on it.


Nicole Engard: Bookmarks for March 23, 2015

planet code4lib - Mon, 2015-03-23 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Tutanota An Open Source Encrypted Gmail Alternative

Digest powered by RSS Digest

The post Bookmarks for March 23, 2015 appeared first on What I Learned Today....

Related posts:

  1. Search in Gmail Attachments
  2. Open Source Options for Education
  3. Handy Spam Tip

DPLA: Remembering Sabra Statham

planet code4lib - Mon, 2015-03-23 19:41

The DPLA family was devastated to hear that one of our Community Reps, Sabra Statham, passed away suddenly on Friday. Sabra was a Digital Project Coordinator at Pennsylvania State University and had joined the Reps program in 2014 as part of the second class. In the last year, she worked enthusiastically to represent DPLA in conversation with local Pennsylvania genealogy groups and in collaboration with her fellow Pennsylvania reps.  She was multitalented: in addition to her innovative work in the library at Penn State, she was an accomplished musician and a scholar of musical modernism.

On a personal note, I met Sabra when I visited Penn State last year, and was enormously impressed with her and her many projects. She cared deeply about reaching a broader audience through digital means, and worked on many fronts toward that goal, including through DPLA.

Recently Sabra was selected to receive a 2015 DPLA + DLF Cross-Pollinator Travel Grant to attend DPLAfest 2015. It’s hard for all of us to understand that she won’t be joining us in Indianapolis next month.

Our thoughts and prayers are with Sabra’s family and friends.

DPLA: National Endowment for the Humanities announces award to support expansion of DPLA Hub network

planet code4lib - Mon, 2015-03-23 17:15

The National Endowment for the Humanities awarded $250,000 to the Digital Public Library of America today, in support of DPLA’s effort to continue to build its network of Service Hubs across the United States. The funds will be used to help cover states that currently do not have an on-ramp, through a state or regional digital library, for their collections to get into DPLA’s national, open collection. The award is being made as part of NEH’s new Common Good initiative, which is highlighting and demonstrating the importance of the humanities to the general public.

“We see our mission of bringing the riches of America’s libraries, archives, and museums to everyone as in great harmony with Chairman Adams’ Common Good initiative,” said Dan Cohen, DPLA’s Executive Director. “We deeply appreciate receiving this funding under that banner, and look forward to working with NEH and its other grantees to connect the public with the works of the humanities for years to come.”

The National Endowment for the Humanities has been a major supporter of DPLA from its inception. Today’s grant supplements the $1,000,000 that NEH provided to help launch DPLA in April 2013.

“These supplemental funds from NEH will allow us to continue to grow the map by supporting DPLA’s efforts to assist in the development of new state and regional based Service Hubs,” said Emily Gore, DPLA Director for Content. “The increase in hubs will allow us to come closer to reaching our goal of having an on-ramp to DPLA for all interested cultural heritage institutions in the US.”

The NEH’s official announcement can be found here.

About the National Endowment for the Humanities

Created in 1965 as an independent federal agency, the National Endowment for the Humanities supports research and learning in history, literature, philosophy, and other areas of the humanities by funding selected, peer-reviewed proposals from around the nation. Additional information about the National Endowment for the Humanities and its grant programs is available at: www.neh.gov.

About DPLA

The Digital Public Library of America (http://dp.la) strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. Since launching in April 2013, it has aggregated over 8.5 million items from over 1,700 institutions. The DPLA is a registered 501(c)(3) non-profit.

 

Islandora: Long Tail Updates

planet code4lib - Mon, 2015-03-23 15:22

Time to take another tour of the edges of Islandora, where work is being done in the community on tools and modules that you can use (or help to build!) to make your Islandora better. Previous iterations of this blog have featured modules that long since became part of the Islandora release stack, and we hope that some of today's featured modules will also make that journey someday:

Manuscript Solution Pack

The Manuscript Solution Pack, developed by discoverygarden, is an important step forward for digital humanities support in Islandora, allowing users to create and view manuscripts with the upload of TEI and XSLT and CSS documents. 

Users will be able to view transformed manuscript TEI (via the upload XSLT) side by side with the image(s) of the manuscript (via the Open Seadragon viewer), and browse all of this via Box / Folder hierarchies as defined by their record in an associated finding aid, such as EAD.

Database Solution Pack

This module is explicitly a work in progress and not ready for use, but when it is finished it will allow a user to ingest database dumps (.SQL files) as Islandora objects that are then spun up into live, browsable (read-only) database instances using Docker and Adminer.

Its creator, Alex Garnett from Simon Fraser, is looking for feedback and collaboration, so if you like the sounds of this solution pack, jump in on the GitHub issues.

Remote Object Copy

This module is being developed by Ashok Modi of Cherry Hill to support a project where fedora objects that were in Islandora 6.x had to be moved over to to Islandora 7.x and into new content models. To use it, you give it credentials for to the islandora 6.x site, provide a PID, and that object can be copied over to Islandora 7.x.

Tuque connects to the remote repo, gets the object and its datastreams, and maps them to the appropriate 7.x solution pack content model to bring over. 

There are still some bugs to work out and an ongoing discussion here on the listserv if you want to know more.

Generate/Regenrate Collection Derivatives

The brainchild of discoverygarden's Dan Aitken (you may know him as QADan), this module adds two new fieldsets to collection management pages, to handle derivatives on one object or many objects at once. The first allows a user to generate or regenerate any and all available derivative datastreams on selected or all objects in a collection. The second allows a user to regenerate the DC datastream for selected or all objects in a collection. 

Islandora Taxonomy Autocompletes

Our last Long Tail module comes from Donald Moses at UPEI. Islandora Taxonomy Autocompletes works with Drupal taxonomies and Islandora XML Form Builder form fields to bundle up some authority driven autocomplete options for your MODS metadata. If you want to add new authorities to the list, Don would love to hear from you.

CrossRef: Discounts for Large Deposits of Datasets and Book Chapters

planet code4lib - Mon, 2015-03-23 15:09

CrossRef is happy to announce new discounts for bulk deposits of datasets and book chapters or reference entries that will take effect from April 1st, 2015. The goal of the discounts is to make it easier for CrossRef member publishers to deposit high volumes of datasets and book chapters or entries in reference works that are currently too expensive.

The following discount fee schedule will apply for deposits from a single title or database made in a CrossRef billing quarter effective April 1st, 2015.

The following discount fee schedule will apply for deposits from a single book title or reference work made in a CrossRef billing quarter effective April 1st, 2015.

If you have any questions about our fees please contact info@crossref.org.

District Dispatch: WHCLIST award nomination deadline approaching

planet code4lib - Mon, 2015-03-23 14:00

The deadline for WHCLIST nominations is quickly approaching! If you are planning to nominate a non-librarian, first-time attendee of National Library Legislative Day, the nomination requirements are due no later than April 1, 2015. The winner of this award receives a stipend of $300 and two free nights at the Liaison hotel.

The White House Conference on Library and Information Services Taskforce (WHCLIST) has been an effective force in library advocacy nationally, statewide and locally since the White House Conferences on Library and Information Services in 1979 and 1991. WHCLIST has provided its assets to the ALA Washington Office to transmit its spirit of dedicated, passionate library support to a new generation of advocates. Both ALA and WHCLIST are committed to ensuring the American people get the best library services possible.

The criteria for the WHCLIST Award are:

  • The recipient should be a library supporter (trustee, friend, general supporter) and not a professional librarian.
  • Recipient should be a first-time attendee of NLLD.

Representatives of WHCLIST and the ALA Washington office will choose the recipient. The ALA Washington Office will contact the recipient’s senators and representatives to announce the award. The winner of the WHCLIST Award will be announced at NLLD.

To apply for the WHCLIST award, please submit a completed NLLD registration form; a letter explaining why you should receive the award; and a letter of reference from a library director, school librarian, library board chair, Friend’s group chair, or other library representative to:

Lisa Lindle
Grassroots Communications Coordinator
American Library Association
1615 New Hampshire Ave., NW
First Floor
Washington, DC 20009
202-628-8419 (fax)
llindle@alawash.org

Note: Applicants must register for NLLD and pay all associated costs. Applicants must make their own travel arrangements. The winner will be reimbursed for two free nights in the NLLD hotel in D.C and receive the $300 stipend to defray the costs of attending the event.

The post WHCLIST award nomination deadline approaching appeared first on District Dispatch.

Harvard Library Innovation Lab: Link roundup March 23, 2015

planet code4lib - Mon, 2015-03-23 12:50

A sprinkling of spring links.

When to Wend or Wind? | Pulio’s Word Blog

When to wend and why to wind

Be The First | Schulz Library Blog

Be The First display at the library highlights hidden gems that have never been checked out

Memory in the Flesh

Transferable, regrowable, distributed memories. Maybe libraries contain storehouses for flatworms and not text.

How Carrots Became The New Junk Food | Fast Company | Business + Innovation

If baby carrots successfully market themselves as junk food, libraries should market themselves as? Extreme coupons?

Nintendo Forms Partnership to Develop Mobile Games

Software is eating the world. So is mobile.

Mark E. Phillips: Metadata Edit Events: Part 1 – When

planet code4lib - Mon, 2015-03-23 12:30

This is going to be another several post series as I wade through some of the data we have been collecting for the past year related to metadata editing and various events within a metadata record’s lifecycle.

Background

For the past few years the UNT Libraries has been collecting data about how long our metadata editors are spending editing records in our systems.  We’ve written on the overall change of metadata in our digital library and presented those findings as last years Dublin Core Metadata Initiatives conference in Austin Texas with a paper called “How Descriptive Metadata Changes in the UNT Libraries’ Collection: A Case Study“. The goal of collecting data about metadata change is that we will have a better idea of how our metadata editors are interacting with our systems.

What is an edit event?

Our metadata system will create a log entry when a user opens a record to begin editing.  This log acts as the start of a timer for the given edit session of that specific record by a given user.  When the user publishes that metadata record back into the system the log entry is queried,  the amount of time that has passed is recorded along with the metadata editors username,  identifier for the record and state (hidden or unhidden) is in when the item is saved.  This information is submitted to the Metadata Event Service and logged.

An edit event ends up looking like this once it has been created

id event_date duration username record_id record status record status change record quality record quality change 73515 2014-01-04T22:57:00 24 mphillips ark:/67531/metadc265646 1 0 1 0

With this information we are able to create a number of views into the metadata editing workflow in our environment,  we can easily see the number of metadata edits on a given day, within the month and for the entire period we’ve been collecting data.  We can view the total number of edits,  the number of unique records edited, and finally the number of hours that our users have spent editing records within a given period.

Below are a few screenshots from our Edit Event Service web-interface.

Homepage for the UNT Libraries Edit Event Service

Daily View for the UNT Libraries Edit Event Service

Monthly View for the UNT Libraries Edit Event Service

Yearly View for the UNT Libraries Edit Event Service

User Detail View for the UNT Libraries Edit Event Service

We are able to query a given day, month, year to view statistics as well as show the rankings and information for a specific user or digital object in the system.

Analyzing a year of data.

We were interested in taking a deeper look at the metadata edit events and that is what the following posts in this series will cover.  A year’s worth of metadata edit data was extracted from the event service.  This was paired with two other datasets,  descriptive metadata about the items editing including contributing institution, collection, resource type and format fields. We also classified each user in the dataset with their status as either an UNT-Employee or Non-UNT-Employee, and finally their rank as either Librarian, Staff, Student, or Unknown rank.  These datasets were merged to form a complete record for each metadata event in the Edit Events Dataset.  They were added to a Solr index that was used in analyzing this data.

A total of 94,222 edit events occurred from January 1, 2014 to December 31, 2014 and are the base dataset for the analysis presented here.

Month, Day, Hour

During 2014 we averaged 7,852 metadata edits per month

January 10,133 February 5,082 March 5,960 April 5,543 May 6,622 June 5,136 July 8,099 August 10,508 September 10,989 October 12,840 November 7,712 December 5,598

Monthly Metadata Edit Events for the University of North Texas

Looking at the day of the week that metadata edits occurred shows the expected pattern of the majority of metadata editing activities taking place during the week with fewer happening on the weekend.  The breakdown by day of the week is presented in the table below.

Sunday Monday Tuesday Wednesday Thursday Friday Saturday 2,765 17,506 19,580 16,876 20,838 14,416 2,241

Metadata Edit Events for the University of North Texas by weekday

The hour of day that metadata is edited is interesting to take a look at.  For the most part you will see the majority of editing being done during the work week,  with the afternoons being the time of day that most records are edited.  The full data is presented below.

Hour Edit Events 0:00 237 1:00 77 2:00 58 3:00 41 4:00 19 5:00 86 6:00 290 7:00 601 8:00 1,836 9:00 6,189 10:00 8,948 11:00 8,868 12:00 8,134 13:00 10,760 14:00 11,653 15:00 11,184 16:00 9,114 17:00 4,868 18:00 3,564 19:00 2,439 20:00 1,947 21:00 1,787 22:00 937 23:00 585

Presented as a graph you can easily see the swell of metadata editing in the afternoons.

Metadata Edit Events for the University of North Texas by hour of the day

If you combine the day of the week and hour of the day data into a single table you will get something like this.

94,222 edit events plotted to the time and day they were performed

In the image above,  green is lower number of edits and red represents higher numbers of edits.  It shows that Thursday afternoons tend to be very busy, while Friday is much lighter compared to other days of the week.

That’s it for the first post in this series,  I have a plan for information about Who is editing records,  What records are they editing, and then finally How Much time are we spending on metadata editing.  Check back for future posts.

As always feel free to contact me via Twitter if you have questions or comments.

 

 

Karen G. Schneider: Snowglobes and my research quest

planet code4lib - Mon, 2015-03-23 02:35

Today I was stopped at a red light in downtown Santa Rosa, and I looked over to see a tough guy in a muscle car with sheer delight plastered across his face. We were enjoying the same magical scene: thousands of tiny white petals scudding across the avenue, swirling in the air, drifting onto benches and signs and people.

This could explain the sneezing fit I had last night, but that snowglobe moment was worth it. When we were contemplating this move, no one said we would experience this beautiful warm snowfall. No one has commented on it to me at all. I guess it’s just me and Tough Guy, thrilled by the floor show.

I had no idea how beautiful this small city, and our neighborhood in particular, would be in the spring. The neighbors’ gardens are not even in full bloom, yet every block is resplendent with color and redolent with fragrance. My rosebushes, brave little souls who survived five years on a cold, partially shaded, windswept deck in San Francisco, are stretching their limbs toward the warmth and the light, their foliage thick and lush, their buds fat, the first rose gorgeously impeccable.

I am stretching my own limbs to the light as well, professionally and in my growth as a scholar–and with leadership studies, of course the two are ever entwined). Coming back from some reasonably tolerable conference, I realized I was happy to walk into the library. It is a human institution and not the Good Ship Lollypop, but it’s filled with caring people determined to make a difference in other people’s lives. (I wonder what things were really like on GSL, anyway. Probably lots of dental issues.)

Last night I turned in my last short homework assignment for the doctoral program. Assuming it doesn’t bounce back to me with a request for revision (Lord please no — I cannot write anything more about net neutrality), I have completed my last class for this program. Up next: completing my qualifying paper, studying for and taking comprehensive exams, developing and defending a dissertation proposal, then doing the research for, writing, and defending my dissertation.

Piece of cake, eh?

Yes, a lot of work, and the doctoral work is folded under a lot of work-work, and (since some of you may be wondering) compounded by my mother’s health care crisis, which has its four-month anniversary in two days. It’s one of those life crises many of us will deal with at some point — a foreign land that, when you get there, you find populated with a lot of people you know.

But I get a lot of sustenance from my doctoral work. My qualifying paper is about the lived experiences of openly gay and lesbian academic library directors. (A friend of mine teased me that I should interview myself, which reminded me of a stern lecture everyone in my class in the MFA program received about The Crime Of Solipsism, which sounded like something we should stand in a corner for.)

I deeply love this research project, and I earned this love. I did the hard thing — prolonging this project by over a year by torpedoing two papers that were too small, too meaningless, too insufficient, too lacking in rigor; papers I wouldn’t want to see my name on — to find my literary-research beshert, that topic I was meant to wrap myself around. The kind of topic that pulls me into its own snowglobe, where I stand arms upraised in its center, watching meaning swirl around me, its brilliant small bits glinting in the sunlight.

Later on, I hope, I’ll write a bit more about my research. I owe a lot to the great people who shared their time and thoughts about my work in this area, giving me courage to ditch the crap and focus on the gold, and to the subjects who providing fascinating, heartening, hilarious, heart-tugging, thoughtful, surprising, invigorating, and fully real interviews for my research. The Association of Openly Gay and Lesbian Academic Library Directors could fit in a hotel suite, but it’s a group I’d share that suite or even a foxhole with, hands-down.

Bookmark to:

Shelley Gullikson: UXLibs conference: notes

planet code4lib - Mon, 2015-03-23 00:29

I’ve been quiet of late, as we’ve not been doing any user testing this term; instead we’ve been taking a step back and thinking bigger about our website. But after attending the User Experience in Libraries conference (UXLibs) last week, I’m excited to move forward with user testing/research and thinking big.

St. Catharine’s College, Cambridge, site of UXLibs

UXLibs was amazing amazing. Don’t believe me? Check out the #UXLibs Twitter stream during the week of the conference. I’m not going to try to capture the essence of the conference (see these posts by Ned Potter, and conference organizer Andy Priestner for that). Rather, I’m just pulling out particular bits from my notes that resonate most strongly with me. Many of these may not make sense out of context, but I’m happy to provide context if you ask.

From the keynote by Donna Lanclos:

  • What happens if we decenter staff expertise?
  • Find out what users understand not what they want
  • Not helping with wayfinding but engagement with creation
  • If an activity has intrinsic value, does it need to be assessed?
  • We want people to “revel in independent thought” (Revel!)
  • If we’re going to do ethnography, we have to be okay with feeling uncomfortable, and with feeling comfortable with ambiguity. We need institutional support for uncertainty.
  • A pedagogy of questions involves “a voracious not-knowing” (from @jessifer)
  • Do a small proof-of-concept project and use ethnography to see if it’s working

From a workshop with Andrew Asher:

[we explored a couple of ethnographic techniques: cognitive mapping (e.g. asking people to draw a map of the library from memory, or mapping out where they went when and what they did there), and respective process interviews (asking people to draw each step of a step-by-step process as you ask them about that process)]

  • The location of mapping exercises (i.e. in the library or away from it) doesn’t seem to influence the content of the maps created
  • Mapping can demonstrate where prime real estate is being used for low-impact things
  • Commuter campuses [and so probably commuter students] are very different from residential, when looking at mapping journals
  • Drawing can help with specificity but don’t get too hung up on the drawing

From the keynote by Paul-Jervis Heath:

  • People are fundamentally unable to tell you what will help them (they don’t know or don’t notice)
  • Should vs want creates an interesting tension -> how do you help people be the better version of themselves?
  • Books are sharks!
  • Rules of improv are good rules for ideation
  • I really have to read Gamestorming one of these days

From a workshop with Matt Borg and Matthew Reidsma:

[we were introduced to the wonderful world of grouping post-its with affinity mapping (by voice, pain points and then categories) and empathy mapping (by what people say, what they think, what they do, and what they feel)]

  • Maybe we should add “games” to our “search books, articles and more” Summon box
  • We need to have empathy with our colleagues, as well as with our users
  • Add the demographic, etc. metadata to post-its to make it easier to find patterns

From the keynote by Matthew Reidsma:

  • All those links on the website – people put them there
  • Interacting with things = making meaning
  • Usability is beyond functional, it’s making sure people have meaningful interactions with the world
  • It’s easy to recover from breakdowns [errors, confusion] when you understand how the thing you’re using/doing works
  • Usability could be helping people better understand our tools/services so they can better recover
  • Test to learn, not just perfect; learn how people cope

There was so so so much more than this. I have a follow-up post on some bigger picture stuff. But there’s so much more than that too. I’m going to be processing this conference for a while.


Pages

Subscribe to code4lib aggregator