You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 7 hours 12 min ago

Hydra Project: OR2015 NEWS: Registration Opens; Speakers from Mozilla and Google Announced

Mon, 2015-03-16 09:39

Of interest to many in the Hydra Community:

We are pleased to announce that registration is now open for the 10th International Conference on Open Repositories, to be held on June 8-11, 2015 in Indianapolis, Indiana, United States of America. Full registration details and a link to the registration form may be found at:

OR2015 is co-hosted by Indiana University Bloomington Libraries, University of Illinois at Urbana-Champaign Library, and Virginia Tech Libraries.

OR2015 Registration and Fees:

An early registration fee of $450 USD will be available until May 8. After May 8, the registration fee will increase to $500 USD. This registration fee covers participation in general conference sessions, workshops, and interest group sessions, as well as the conference dinner on Wednesday, June 10 and poster reception on Tuesday, June 9. For a draft outline of the conference schedule, please see:

Participants may register online at: If you have any questions about registering for OR2015, please contact the Conference Registrar at Any other questions about the conference may be directed to the conference organizing committee by using the form at:

Hotel Reservations:

The OR2015 conference will take place at the Hyatt Regency Indianapolis hotel, conveniently located in the heart of downtown Indianapolis. Special room rates at the Hyatt starting at $159 USD per night have been negotiated for conference attendees and will be available for booking through May 16. More information on hotel reservations and travel is available at:

Keynote and Featured Speakers:

Reflecting the significant milestone of the 10th Open Repositories conference and this year’s theme of “Looking Back, Moving Forward: Open Repositories at the Crossroads,” we are pleased to announce the conference’s two plenary speakers:

Kaitlin Thaney will be giving the opening keynote talk on the morning of Tuesday, June 9. Kaitlin is director of the Mozilla Science Lab, an open science initiative of the Mozilla Foundation focused on innovation, best practice and skills training for research. Prior to Mozilla, she served as the Manager of External Partnerships at Digital Science, a technology company that works to make research more efficient through better use of technology. Kaitlin also advises the UK government on infrastructure for data intensive science and business, serves as a Director for DataKind UK, and is the founding co-chair for the Strata Conference series in London on big data. Prior to Mozilla and Digitial Science, Kaitlin managed the science program at Creative Commons, worked with MIT and Microsoft, and wrote for the Boston Globe. You can learn more about the Science Lab at  and follow Kaitlin online at @kaythaney.

Anurag Acharya will be the featured speaker at the plenary session on the morning of Wednesday, June 10, presenting on “Indexing repositories: pitfalls and best practices.” Anurag is a Distinguished Engineer at Google and creator of Google Scholar, and he previously led the indexing group at Google. He has a Bachelors in Computer Science from the Indian Institute of Technology, Kharagpur and a PhD in Computer Science from Carnegie Mellon. Prior to joining Google, he was a post-doctoral researcher at the University of Maryland, College Park and an assistant professor at the University of California, Santa Barbara.

We look forward to seeing you at OR2015!

Jon Dunn, Julie Speer, and Sarah Shreeves
OR2015 Conference Organizing Committee

Holly Mercer, William Nixon, and Imma Subirats
OR2015 Program Co-Chairs

Galen Charlton: Henriette Avram versus the world: Is COBOL capable of processing MARC?

Mon, 2015-03-16 02:17

Is the COBOL programming language capable of processing MARC records?

A computer programmer in 2015 could be excused for thinking to herself, what kind of question is that!?! Surely it’s obvious that any programming language capable of receiving input can parse a simple, antique record format?

In 1968, it apparently wasn’t so obvious. I turned up an article by Henriette Avram and a colleague, MARC II and COBOL, that was evidently written in response to a review article by a Hillis Griffin where he stated

Users will require programmers skilled in languages other than FORTRAN or COBOL to take advantage of MARC records.

Avram responded to Griffin’s concern in the most direct way possible: by describing COBOL programs developed by the Library of Congress to process MARC records and generate printed catalogs. Her article even include source code, in case there were any remaining doubts!

I haven’t yet turned up any evidence that Henriette Avram and Grace Hopper ever met, but it was nice to find a close, albeit indirect connection between the two of them via COBOL.

Is the debate between Avram and Griffen in 1968 regarding COBOL and MARC anything more than a curiosity? I think it is — many of the discussions she participated in are reminiscent of debates that are taking place now. To fair to Griffin, I don’t know enough about the computing environment of the late sixties to be able to definitely say that his statement was patently ill-informed at the time — but given that by 1962 IBM had announced that they were standardizing on COBOL, it seems hardly surprising that Avram and her group would be writing MARC processing code in COBOL on an IBM/360 by 1968. To me, the concerns that Griffin raised seem on par with objections to Library Linked Data that assume that each library catalog request would necessarily mean firing off a dozen requests to RDF providers — objections that have rejoinders that are obvious to programmers, but perhaps not so obvious to others.

Plus ça change, plus c’est la même chose?

FOSS4Lib Recent Releases: Avalon Media System - 3.3

Mon, 2015-03-16 01:12
Package: Avalon Media SystemRelease Date: Tuesday, March 10, 2015

Last updated March 15, 2015. Created by Peter Murray on March 15, 2015.
Log in to edit this page.

Indiana University and Northwestern University are pleased to announce Avalon Media System 3.3. Release 3.3 adds the following capabilities:

  • MARC Metadata Import
  • Ingestion of pre-transcoded derivatives with multiple quality levels 
  • Script for recovering disk space taken up by temporary Matterhorn files
  • UI Improvements and Bug fixes

Users of Avalon 3.2 can take advantage of these new features by Upgrading Avalon 3.2 to Avalon 3.3.

FOSS4Lib Updated Packages: MediaSCORE

Sun, 2015-03-15 23:41

Last updated March 15, 2015. Created by Peter Murray on March 15, 2015.
Log in to edit this page.

MediaSCORE (Media Selection: Condition, Obsolescence, and Risk Evaluation) enables a detailed analysis of degradation and obsolescence risk factors for most analog and physical digital audio and video formats.

MediaRIVERS (Media Research and Instructional Value Evaluation and Ranking System) guides a structured assessment of research and instructional value for media holdings.

Some additional key features of the software include:

  • Browser-based web-application that works on any Windows and Mac operating systems using all popular browsers.
  • Enables teams to enter and edit data simultaneously.
  • Permissions based access and views across MediaSCORE and MediaRIVERS.
  • Controlled vocabularies and field validation to help ensure consistent data entry.
  • Provides auditing path to help with quality assurance and transparency.

The two applications are bundled together but may be used separately. They can be found along with a detailed user guide on GitHub at . Also available is a conceptual document that explores assessment of research and instructional value.

The software requires installation and configuration on a server, requiring the appropriate expertise. AVPreserve is also offering MediaSCORE/RIVERS as a hosted application on a monthly subscription basis.

Package Type: Data Preservation and ManagementLicense: Apache 2.0 Package Links Development Status: In DevelopmentOperating System: Browser/Cross-PlatformProgramming Language: PHPOpen Hub Link: Hub Stats Widget: 

Jenny Rose Halperin: Productivity rules

Sun, 2015-03-15 21:44

Last month I blogged for Safari about successfully changing fields, being fearless, and improving yourself through reading and learning. My blog post received a wonderful response, and I am proud to share that this month I begin my official new position as Safari’s Customer Success Manager. I am staying with my team and am super excited to make our product more useful to both our new and existing customers. I’ll be blogging intermittently about what I’m doing, learning, and making with Safari.

While I believe what I wrote in that post, I’ve felt a bit hypocritical because it’s come to my attention throughout this long, dark winter that

I waste a lot of time.

So much time! From time spent writing emotive letters that I never send to time reading the Wikitravel details of places I want to visit. I sleep late on weekends, occasionally drink too many glasses of wine on weeknights, often eat way more than an allotted portion while distractedly checking my phone during dinner, and spend hours looking at pairs of black pants on the Internet that I will never buy. (I love black pants, particularly loose, comfortable ones. Let this link be a hint to anyone who ever wants to buy me a present.) I believe strongly that there is a healthy balance between time-wasting and productivity, and I am afraid that this winter I crossed my own line and need to work on getting myself back to my center.

I’ve always been an over-achieving time waster; I’m the kind of person who knows all the details of Madonna’s Wikipedia page and still somehow finds the time to do all the things. I manage to consistently find the time for birthday parties, lazy afternoons, potlucks, puppet shows, and performing while always submitting applications, papers, and my taxes on time. I have always volunteered with my community, whether gardening or teaching or manning a booth, and I try to be there both in time and spirit for my friends. I am a master of very little and a generalist who can do a lot of things adequately, including playing music, speaking German and Spanish, and holding intelligent conversation on about a million topics. My lack of focus is what drew me to the interdisciplinarity of American Studies and later Library Science, but

because I am okay at a lot of things, I have often felt like I am not good at anything.

My lack of mastery augments an incredible social knowledge that makes me great at cocktail parties, but not so great at specialized skills, particularly those that I have tried and failed to learn repeatedly like drawing or programming computers.

Lounging around and wasting time makes me stressed, and yet I find myself in Wikipedia holes, on Buzzfeed lists, mindlessly thumbing through Instagram, and Googling ex-boyfriends more than I would like to admit. I have an addictive information-seeking brain, and the Internet has been both an asset and a curse for me as I find myself up late, watching the bar below my apartment close, absorbing both everything and nothing at once. (Pro-tip for other addictive minds: Never begin a television program with a seemingly unlimited number of episodes at 9PM on a week night. You will regret it.)

The Internet has made it easier to live vicariously through others, which is another double-edged sword that often makes life feel more complicated than it actually is. All my friends, professional contacts, and the celebrities who interest me seem to be living fulfilled lives, so I submit to the worst kind of voyeurism, one that’s tinged with envy and the feeling that this life could be mine if I were only more “_______.” This kind of time wasting makes me want to delete all my Internet history, take a shower, and maybe smash my phone against a wall. Even admitting that I do it in a public manner makes me feel slightly uncomfortable, but I think it’s important to recognize this is a human byproduct of the Internet age.

I’m not taking the capitalist tack that says all time has to be productive, self-improvement time, and one only has to read a Romantic novel to realize that people actually probably were not more productive in “olden days.” (I wonder how much time a Jane Austen heroine spent staring at the wall?) Instead of judging or feeling shame, (both feelings that society unfortunately encourages,) I want to practice weening myself off behaviors that don’t make me feel like my best self and hope that others feel inspired to make similar changes for their health and the health of their communities.

In order to kick off this process, I did what I do best, and what I do to make most of my decisions: I made a chart.

I titled the page:

“Be more productive. Overcome winter blues. Get moving.”

The chart’s four cardinal directions pointed to:

  • Have to do
  • Want to do
  • Do less of
  • Do more of

I brainstormed for about 25 minutes and then wrote a list of the immediate tasks I needed to do within the next week in order to make these “productivity hacks” reality (excuse the jargon.)

I wasn’t sure what was going to come out of the exercise, but when I looked at the page, I was surprised to see that most of my “negative” behaviors revolved around a few, distinct categories. In making the chart, I saw that “worrying about the opinion of others” came up 4 times, “relying too much on technology” came up 5 times, and “drinking less frequently” came up 2 times. (My 26 year old hangovers are much worse than my 21 year old hangovers!)

In contrast, doing creative work like playing music, dancing, and writing came up 7 times and giving back to my community came up 4 times on my “positive” behavior list. Being kinder to my environment, both in terms of resources and social awareness also came up frequently.

I am going to use the weeks leading up to my 27th birthday to take some steps towards doing my best work and realizing my unique talents through this exercise and others encouraged by productivity experts. I am also going to use this month to research improving productivity and share out my findings on this blog.

It’s time to focus on my creative and nurturing self and feel more alive in my body this spring. Winter has been hard on all of us Bostonians, but in adapting my behaviors to fit my goals, I am taking the first steps toward a daily practice to be my best self.



Nicole Engard: SxSW: A New Generation: Creativity and Open Source

Sun, 2015-03-15 21:24

For my final session of the day I attended an interview of Ryan Leslie by Matt Mullenweg titled “A new generation: Creativity and Open Source”.

Ryan Leslie graduated from Harvard at 19 and on the forefront of technology and music. Ryan gave us all his cell phone number so we could text him during the session because he’s very in to being open and giving back. “As creators we try to find our way in the dark – we don’t have any concrete data on who’s buying our music”. When his second album was released he wondered by the label couldn’t just email everyone who bought his first album to tell them about his album. The problem is that the labels don’t have that info. So Ryan shares his info with his fans so that he can keep in touch with them. He used the Twilio API to create a tool to reply to text messages he gets asking the sender for their email address.

Ryan decided that he was going to sell records on his own instead of through a label and has made significantly more money that way. When working with the label it’s like the worst business loan that ever. Instead you can use tools like Tilt, Kickstarter and other crowdfunding sites to get that loan to start up the business and publish your own records. When you’re able to connect directly with your users you can identify who your real supports are.

Instead of spending money on a tool like salesforce Ryan decided to spend 40 days on CodeAcademy and learned Ruby on Rails and wrote his own tool using Twilio to now gather info on his audience. Before he would just get sales reports from iTunes – once he started selling direct he got a better feel for who it was who was listening to him music. “Everyone you know, whether they buy your album or not, can contribute to your project” – contacts as currency. “It’s beautiful when the communication can be two way”.

Ryan shared a story where he and several other musicians were all together and started coming up with a song together and collaborating on the sound. Matt said this was the story that was the most like open source – where several artists come together to collaborate and build something special together.

Ryan made a very open source comment – some of the most successful solutions are when they solve for a problem they face personally. He wants software that was developed by people like him. He mentioned TopSpin which never really worked for him because he didn’t know who wrote it – or where they came from (life experiences). It comes down to shared experiences – even moving up in the music industry is about the people they know and their relationship equity.

Matt recommended that we all read 1,000 True Fans by Kevin Kelly.

“When creators are inspired they share it” – Ryan Leslie

The post SxSW: A New Generation: Creativity and Open Source appeared first on What I Learned Today....

Related posts:

  1. SxSW: Building the Open Source Society
  2. ATO2014: Open Source Schools: More Soup, Less Nuts
  3. The open source behind Twitter

Nicole Engard: SxSW: Magical UX and the Internet of Things

Sun, 2015-03-15 18:31

This afternoon Josh Clarke spoke to us about ‘Magical UX and the Internet of Things’.

A lot of what we’re seeing these days with tech interaction has come with mobile technology. Touch started it all and now we have things like voice and facial recognition. So now makers of digital products need to think about these new ways we should interact with the digital world. There is now a way for us to cast “spells” – wave our arms and something happens. We can even get our own magic wand at Josh even showed us how his wand could be used to light candles. It’s not all novelty like that though, there are some real business and practical uses for this.

Josh showed us a video : where the developer created a hack where he grabs things from his TV and puts them on his mobile phone. While it looks awesome it’s so simple with our household devices.

Magic and Technology

“Any sufficiently advanced technology is indistinguishable from magic” – Arthur C. Clarke.

For example – no one is every going to want to wear a computer on their body – but now we have smart watches. Sometimes is what’s we see now that prevents us from developing things that seem like magic.

“Fantasy fulfills a need for a simpler more controllable world” – Alan Kay

We need to make technology seem like magic. Touch for example makes it seem like we’re actually touching the data. We need to use fantasy to think about how want to interact – Alan Kay also said “One goal: the computer disappears in to the environment.” An example of this is the magic wand. But we don’t have to go to Olivander’s to find that wand – we all have one already in our smart phone. Our phone is the magic wand for everyone. The phone is the first IoT (Internet of Things) device for us all. Now we want to put more of the smarts our phone as in to other things. Our phones (and other IoT devices) have “Sensors + Smarts + Connectivity”.

Think of the first time you used Shazam – that was another kind of magic – it was paying attention to the world around you to listen to music. Now we’re seeing this kind of thing with our cameras and translation apps (we used Google Translate to translate signs while in France last week). This also means that we can carry fewer things with us – we don’t carry maps, or cameras or alarm clocks – we use our phones for this. Mobile phones actually bring computing power to immobile objects because we have these phones with us all the time – locks, light bulbs, etc. We can embed smartphone brains in everything and anything. The nappy notifier is a device and app to help you manage your baby’s diapers.

On average we spend over 3 hours a day looking at our phone screen. The more connected we are the more disconnected we are – this means that for those who have been designing mobile interfaces these last few years have been doing too good a job. We want to move these things off of the screen in our hands and out in to the world. Neiman Marcus has this magic mirror that lets you see your entire outfit and compare it to others you have tried on.

Josh recommends that we read ‘Enchanted Objects’ by David Rose. We have many magically smart devices in our homes these days – the Roomba for example is like the broom in Fantasia or Google Now can be like the sorting hat. There is even a device that emulates the ruby slippers in the Wizard of Oz! is a tool that’s like the easy button for life. There’s also IFTTT that is all about magically making things happen! There’s also Zapier which is similar to IFTTT.

So the point of Magic and Technology is to make the computer invisible – we want it to be easier not harder. The magic happens at the point of inspiration – we embed the smarts in our devices around the house. We have centuries of UX ideas to pull from (Wizard of Oz and Fantasia for example).

Up til now we have been tying our digital only to screens. Now we can interact with the world – the world is the interface. For example how do we make the physical shopping experience as easy as the online shopping experience. There is all this info you can’t get from both environments when in the other.

The world is a data source – we need tools that gather data from around us. For example the Snapshot device from Progressive Insurance. Automatic though provides info to the driver (us) to help the car talk to us. Propellerhealth is a device you an add to an inhaler that will help those with asthma learn more about their disease – because they’re also used in communities the devices can gather group data and explain the environment. These devices are passive – they’re the modern crystal balls. They gather data from the physical world and push them back to the digital world.

The world is reactive – the things we do in the world cause a reaction. Our actions are a source of data! The Ares Sand Table is an example of this – this way physical and digital are totally in sync. The Minkoff Mirror is a device to help bring the digital in to the dressing room in the physical store – so now the world is interacting with the data. These are intentional interfaces.

The world is a big canvas – if you thought designing for mobile and desktop was tough – imagine designing for an entire room. For example the Immersion Room or this video: or this These are therables (instead of wearables) – these are smart environments – which means we can wear fewer bracelets.

The world has depth and mass – we’re used to 2 dimensional interfaces – the world is not flat! Our magic has to account for that. MIT Thaw is about this. Thaw shows us how to have these expensive devices (our phone and laptop) talk to each other better. What would be a better way to move music from your phone to your computer – why not have the phone and computer work together better :

So we want to gather data for insight, channel intention in to action, use the whole big canvas of the world – make the world smarter and keep in mind that interactions have mass.

Magic Imagined

This is not a challenge of technology, it’s a challenge of imagination!

Let’s start with Google Glass – it always looked like an engineering project. Instead they should have asked ‘what if this thing was magic?’

Let’s look at a coffee cup and ask ‘what if this thing were magic’ – what is this cup witness to and what actions is it next to? What can it hear, what can it see and how can it serve us more. We’re not turning it into something else, we’re designing for the thing’s essential thinness. The goal is not to make things talk – the goal is to improve the conversation. We want things that do their jobs better. We should be bending technology to our lives – make us more human – not less. Colorup takes the color from around it for the light. We need to bank on illusion and embrace misdirection. Context aware experiences should reflect the lies of what we tell ourselves about how these things work. One of the ways we do this is to expose as little technology as possible (make the computer invisible). If everything can be an interface we don’t want everything shouting at all. Oliver for example weaves the number of unread messages we have in our email in to our lives – instead of shouting at us. Remember that magic can be a little ridiculous. It’s okay to make things a little bit ridiculous right now.

What happens when magic goes wrong? Technology lets us down all the time. As we have more and more technology in our lives it becomes more real. “How smart does your bed have to be before you are afraid to go to bed at night” – Rich Gold. Human’s know better. We need to build systems that know they’re not smart enough. Sometimes we build these systems with people first like Lyft and Uber – then when technology catches up (self driving cars) we can bring in the magic. It’s not Harry Potter’s wand that’s magic – it’s Harry. Humans are needed.

It’s not “can” we do this – it’s ‘How will we?”!

The post SxSW: Magical UX and the Internet of Things appeared first on What I Learned Today....

Related posts:

  1. ATO2014: Open Source & the Internet of Things
  2. Planning for the handheld mobile future
  3. NFAIS: Creating new value for business professionals

Mark Leggott

Sun, 2015-03-15 16:30

It pains me to say this after a number of good Loomware years, but it is increasingly challenging for me to keep Loomware updated, so I am "archiving" the site with this post - leaving it open but making the last post the one that says "No more posts!".

Patrick Hochstenbach: First time in my life to a barber

Sun, 2015-03-15 09:07
Filed under: Doodles Tagged: arthur, barber, brushpen, cartoon, cartoons, cat, doodle

John Miedema: A first rough cut at Lila’s calculation of association between slips. Keep it simple to scale for large volumes of text.

Sun, 2015-03-15 03:43

How will Lila calculate the strength of association between slips? Here is a first rough cut. The key thing I want to illustrate is the use of simple steps to decide the subject of a slip and compute a quantitative measure of association with other slips. The method must be simple and quick to scale for large volumes of text. A chess program has to contend with infinite combinations. It uses a simple evaluation of game rules and a sum of chess piece points. E.g., if a move leaves white with ten points and black with nine points, then a particular move is a good one for one. This simple calculation (applied for as many iterations as a game level will allow) is sufficient to beat most chess players on the planet. Lila uses a comparable approach.

The simple rules for text analysis are the following:

  1. Extract nouns. Basic grammar says that you find the subject of a sentence in the nouns: people, places and things.
  2. Use word properties to rank their relative meaningfulness. For example, if a word has a lower frequency of usage it can be considered more interesting and important. There are several such word properties that can be applied by a simple calculation. Just word frequency will be used here. Based on the properties, rank the nouns.
  3. Use synonyms and variant forms to match on meaning rather than just a single surface form. Variant forms and synonyms are a simple and powerful semantic matching technique.

Here is an example. Suppose Stephen Hawking used Lila when writing A Brief History of Time. Suppose this line was a slip in his writing project:

Most people would find the picture of our universe as an infinite tower of tortoises rather ridiculous, but why do we think we know better? What do we know about the universe, and how do we know it?

In this first rough cut, Lila analyzes the slip and generates the following table:

Noun Frequency of usage
(academic) Rank Synonym tortoise 123 1 turtle tower 1563 2 castle universe 4868 3 cosmos

Lila has applied the rules:

  1. Nouns were extracted.
  2. For each noun, frequency of usage was used to calculate rank. An arbitrary rule for this example limits the nouns of interest to the top three ranked nouns.
  3. Synonyms, e.g., tortoise = turtle, were generated by simple look-up from a list.

The nouns are used to find other related slips and compute their strength of association.

A first Google search was performed on [tortoise tower universe]. It would make sense to apply a boost factor to the keywords based on the ranking; in this case I trusted Google to use word order. Many results were nearly identical to the original slip. Nearly identical slips may be interesting to Hawking but will not add much insight.

A second search was performed on the synonyms [turtle castle cosmos]. Divergent results were found, such as a website about Turtle’s ice cream. A snippet was selected from the website for analysis by the Lila algorithm:

“Turtles Oreo
There’s more to see…
Sign up to discover and save different things to try in 2015.
About this map
Ice Cream

28 Pins •
[Image: Cosmic Castle”

Noun Frequency of usage
(academic) Rank Association cream 419 1 No Match pins 724 2 No Match castle 788 3 Match

This site is about an unrelated subject, ice cream. Limiting again to the top three ranked nouns, there is only one match — between Hawking’s term “tower” and its synonym “castle.” A measure of association of 1/3 or 0.33 is computed. This low value could be used to obscure or exclude the slip.

Another result matched better, a blog about turtles in cosmology. A snippet was analyzed using Lila’s algorithm:

The Cosmic Turtle Around the World


In Japanese mythology, the tortoise supports the ‘Abode of the Immortals’ and the ‘Cosmic Mountain’, where the Cosmic Mountain relates to the axis mundi – the world axis.

Noun Frequency of usage
(academic) Rank Association cosmos/cosmic 955 1 Match turtle 1116 2 Match axis 2046 3 No Match

Perhaps Hawking would not be interested in such a blog. Perhaps he would. In this case there are two matches. A measure of association of 2/3 or 0.66 is computed. A more refined algorithm would weigh in the triple use of “cosmic.” This site is related subject matter.

This is a first rough cut of how Lila will calculate the strength of association between slips. Certainly a more sophisticated algorithm is required, taking in account multiple word properties. The algorithm should weigh words as more important if they repeat within a slip, especially if they repeat in author-suggested categories and tags. But sophistication must always answer to the need for a simple algorithm. Simplicity is the only way to achieve reasonable performance when analyzing large quantities of text.

Nicole Engard: SxSW: Building the Open Source Society

Sat, 2015-03-14 18:30

This was a core conversation lead by Stephanie Geerlings and Jesse Cooke.

How do you promote your project:

  • Articles/blog posts
  • Twitter – not as powerful as a a blog post
  • Screencasts – really gets people interested – it’s important to note that this can be time consuming but practice makes perfect
  • Get users to promote/education people – especially in government (in Hawaii they have released over 400 state and government sites on WordPress – but the people there still seem to think they need to pay for a system)
  • Get community members to education/promote/mentor because the bigger the project the higher the barrier to entry
  • Going where the developers are – be in the right IRC channels or on the right mailing lists
  • Documentation is key – if people can’t use your documentation no one will use your product – a great example of documentation are the VagrantDocs
  • Have first experiences be pleasant – website and personal experiences

How do we sustain the collaboration:

  • Text does not lend itself to working together well – sometimes opening up a hangout or Skype will save a project
    • Open communication – if you use something like a hangout to communicate then the log of that conversation is lost – it’s not transparent.
  • One of that things we haven’t done well as a community is to explain that open source is not free – we need to take in to consideration the time it takes to support the project – and promote it – this includes peer review
  • Get companies using your product to help financially – if those companies can’t give hours it would be great if they helped with crowdfunding
  • IEEE releasing a tool this summer to help with open source communities and collaboration
  • Don’t be an echo-chamber – don’t only hang out with people in the same field – keep it multidisciplinary to get the most out of it

How do we thank people who don’t participate in writing code:

  • Badges or some sort of equity system where people can show their worth
  • Self promoting – explain where the project would be without you/your contribution
  • If people are designing logos or something that isn’t code related get them set up on git anyway so they can play too – people want to see their name on the project and get credit for their contribution even if it’s not code
  • Put an acknowledgements page together to thank those who don’t write code
  • Thank people by sending them to conferences (if you have the funding) maybe give it an award name so people can put it on their resume to show what they achieved

The post SxSW: Building the Open Source Society appeared first on What I Learned Today....

Related posts:

  1. ATO2014: Building a premier storytelling platform on open source
  2. Keynote: Licensing Models and Building an Open Source Community
  3. ATO2014: How Raleigh Became an Open Source City

Ranti Junus: Favourite excerpt for the time being

Sat, 2015-03-14 17:30

[Mark]”…Codfish Islands was infested with feral cats. In other words, cats that have returned to the wild.”

[Doug]”I always think that’s an artificial distinction. I think all cats are wild cats. They just act tame if they think they’ll get a saucer of milk out of it…”

— Douglas Adams and Mark Carwardine, Last Chance to See

Ranti Junus: Digital Collections, Data Visualization, and Accessibility: What to Do? (repost)

Sat, 2015-03-14 17:15

[This is another crosspost from the Digital Scholarship Collaborative Sandbox blog from the MSU Libraries. The original blog post can be read there.]

In my earlier post “Digital Collections and Accessibility”, I touched upon the considerations we would need to address when building or creating digital collections (or other things that rely heavily on utilizing images such as data visualization) for public use. Here are the questions I put down in that post:

“Given the ubiquitous nature of digital collections, the goal that these collections would be used as part of scholarly activities, and the library’s mission to disseminate the information as widely as possible, there is one aspect that many of us need to address when we plan for a digitization project: how do people with disabilities access these collections without getting lost? Can they also get the same access and benefit from our collections if they only rely on their screen readers (or refreshable Braille, or any other assistive technology)? Can people move around our website easily using just a keyboard (for those with hand-coordination difficulty who cannot use a mouse)?”

So: planning. Planning is an important part when incorporating accessibility into building a collection. Typically, building a digital collection starts with designing the metadata (PDF) and then proceeds to further development activities such as database design, content creation, data entry, and coding/front end development. Whichever process that we develop, we would like to see that the website is well designed and the information presented is useful for our audience (I am assuming that most digital collections created and made available are designed for web access, with an added bonus if they also employ a responsive design.)

Image Display

If you visit digital collections developed by various institutions, you’ll see that they present their collections differently. Many would display the collections be like a catalog that shows an image, the physical description, and related information such as the owner, creator, and copyright statement at the minimum.) Some also include an interpretation of the object (think the label of an object or painting displayed in a museum.)

Regardless how the object is presented (by description or interpretation), accessibility considerations are still the same. The most common considerations: the web page needs to be properly structured by using proper headings; the flow of information presented on the page needa to make sense for screen reader users or keyboard-only users; search forms need to be properly labeled; images need to have alternative text (usually referred to as “alt-text”.) This is when the planning for the page design and coding becomes important.

Consider this page:

and consider how the flow of information would be read by a screen reader and how a screen reader user might hear it:

Typical screen readers read the information displayed as if the CSS is disabled; they read web content in the order that it appears in the code.
(Bonus: if you have not seen or heard how screen reader users interact with a website, you can view the recording of accessibility test of our e-resources page (.mp4) done by my blind student. We did this as part of our accessibility test routines for the library electronic resources.)

Both images above should be sufficient to give us ideas of how a sighted user might interact with the page and how a screen reader users might hear it. Our eyes can focus on and narrow down to a certain section faster while screen reader users need to listen to the whole thing first before they can work on distinguishing the part that provides the actual information of the object being displayed. Hence, careful planning when designing the metadata and the page is needed to make sure our collection is both useful and usable for our audience regardless how they access it.

Data visualization

A lot of data visualization rely on colored graphics when conveying the information. It is trickier to tackle because of the colors used and, unlike most images used in digital collections, data visualization conveys very rich information.

Consider this example with three different color representations:

(Data visualization of world population of children age 0-14 years old. The information is grouped by regions (South Asia, East Asia, Africa, South America, Middle East, Europe and Russia, and North America.) Original data can be found at

By looking at the colors used on the image above, we can see that the information is grouped based on the region (South Asia, East Asia and Pacific, Africa, Europe, etc.) and the color density of each individual block reflects the population density of the area.

(Data visualization of world population ages 0-14 years old as seen by a person with red green color blindness, such as protanopia.)

The second image shows how the visualization might be seen by those with the red green color blindness (protanopia), one of the most common types of color blindness. Here, East Asian and African regions are no longer distinguishable. Similarly, South American,Russian, and European regions are also no longer distinguishable.

(Data visualization of world population of children age 0-14 years old as seen by a person with total color blindness (chromatopsia).)

This last image shows how those colors don’t really convey the grouping of the regions to those with total color blindness (achromatopsia, which is a rare condition but still exists.)

The point of these examples: do not use color alone to convey meaning.

As far as I know, there is no practical solution yet for making data visualization fully accessible. Several options that can help increasing the accessibility: supplement the color with text or provide summaries or text description right after the image (alt-text or image caption). If the description is too long to be listed on the same page, create a separate page and link to it. Similar to designing for digital collection, designing for visualization also needs careful planning.


Designing for accessibility for our digital collection or data visualization should be done as part of the planning phase. This would allow us to optimize the output of our work and eliminate or reduce the need to revisit the design for corrections later on. Careful planning on how we want to display the information and to convey the meaning of the graphics/images would benefit all of our users regardless how they access our collections.


Nicole Engard: SxSW: End to Brogramming: How Women are Shaping Tech

Fri, 2015-03-13 22:47

Leah Cheyrnikoff, the moderator started off with this quote from Newsweek:

A combination of that very traditional Wall Street wolf-ism among Northern California’s venture capital boys’ club and the socially stunted boy-men that the money men like to finance has created a particularly toxic atmosphere for women in Silicon Valley.

On the panel were : Danika Laszuk, Nicole Sanchez, and Nellie Bowles — all successful women in the technology field.

Danika said that she started in the tech industry pretty much by accident. She did grow up in a house where you father was a tinkerer and so she was very interested in computers – but she studied play writing in college. Nellie’s path was also by accident – she wanted to be a science writer and started writing about tech parties and somehow that got her in to the tech blogging world. Now she finds herself (recently) in court blogging about the Ellen Pao and Kleiner Perkins case. Nicole didn’t imagine herself in the tech world even though she went to fancy schools and was in school with Mark Zuckerberg – it never occurred to her. She called this a tragedy – and I agree – all three women got in to tech by accident … pretty much the same way I did.

Nicole said she likes tech – it helps her get things done faster. She said you just have to see someone who looks like you do it and then you believe you can do it too.

Leah asked the panel how real are the gender issues in the tech world. Nellie says that with much of her research she has found that these issues are well founded and sexism is very real. Danika says that in addition to these overt actions, there are also a lot of things in the culture that don’t fit with the lifestyles of most women. The tech world can foster working from noon to late in the night and this doesn’t work for most people who want a family (not just women). Nicole says that while this is all true and scary it’s not all negative – no one has ever made her feel really horrible – not everyone is out to get you (this is the experience I have personally so I love that she said this).

Nicole brought up the fact that there isn’t always intentional bias – the market looks for people who look like them – so finding investors for example as a black woman is extremely difficult before she is like a ‘unicorn’. Nellie talked about ‘pattern matching’ – investors look for people who fit a certain image. They look for a 22 year old white man who went to an amazing school (Stanford or Harvard). It’s a very specific image that investors look for and that image is not us (women).

Leah mentioned that men are raised differently than women – they are raised to be more confident and more aggressive. This makes it hard for women to break in to the industry. Nicole mentioned that she wants to come across as confident but doesn’t want to come across as an ‘arrogant know it all’ – we worry about that. Unlike most women Nicole loves to pitch. Most women though seem to often apologize – and sometimes they like to talk about the team – what the team has done versus what they personally have done. Danika talked about how when a man is assertive in negotiating for a raise is seen as a good employee – if a woman does the same thing she is seen as aggressive instead of assertive. You see this in the pay gap.

Dankia talked about how diversity is great for business! Racial, gender, etc etc. This diversity breeds more innovation because everyone thinks differently. This is going to make companies stronger. She has been lucky that she works at a company that values diversity (Jawbone). This does mean that those doing the hiring have to work harder to find these people for the betterment of the company.

Another good point that was brought up is that men probably also want to spend time with their kids and coach teams – it’s just that we always pin this on women because it’s a cultural norm. It’s unfair – especially to me as a woman without children to be pigeon-holed in that way.

Nicole brought up a great point – a horrible – but true point. She was a meetup for tech people at SxSW a couple of years ago and a black woman approached her and started talking to her about the tech world and Nicole’s first thought was ‘what do you know about tech’? Men aren’t the only ones with these prejudices – women have them too. So, we need more women talking to women about tech and getting them thinking about what they can do. In addition to the employers hiring more from more diverse pools we also have to hold universities accountable for teaching more women what they can do.

Nellie says that finding a mentor is the most important thing. She had to force herself to be a mentee – she sort of latched on to people and forced them to teach her. This mentor doesn’t have to be a woman, but you might have to force the issue with men because of fears they might have about being inappropriate. Nicole mentioned the difference between a mentor and a sponsor. A mentor will help you get and keep a job, but a sponsor is in charge of pushing you to the next level. These two roles can be (but don’t have to be) the same person. Danika took a different approach – she created her own personal board of directors – instead of one mentor. She has someone on her board who is an amazing manager and someone who is a killer deal maker and someone who is a great marketer – she crafted her board around these different ‘super powers’. This group of mentors or advisers will help you with different topics and at different parts of your career. I like this model – it’s kind of what I have because I can name several mentors in several different areas of my career.

The post SxSW: End to Brogramming: How Women are Shaping Tech appeared first on What I Learned Today....

Related posts:

  1. ATO2014: Women in Open Source Panel
  2. ATO2014: Women in Open Source
  3. SxSW: Curious Bridges: How Designers Grow the Future

Nicole Engard: SxSW: Behind the Social at WGBH

Fri, 2015-03-13 21:11

WGBH is the number one producer of content for PBS. Great content is not a problem for the people on this panel – the problem they sometimes have is doing more than the bear minimum with social.

Today’s panel was made of up:

Molly Jacobs, works with American Experience on PBS and spends about 20% of her time on social networks. Hannah Auerbach from Antiques Roadshow spends up to 40% of her time on social networks. Olivia Wong works on Masterpiece on PBS and spends about 100% of her time on social media.

Social Media Best Practices:

  1. Know your audience

  2. Have a unique voice
  3. Plan ahead
  4. Prioritize your platforms
  5. Take advantage of partnerships
  6. Be visual
  7. Engage

For #1 Molly says ‘content only’ instead of ‘content first’ for the fans of her American history series. The goal is to be their audience – be total history nerds – not just them posting stuff for the audience but them posting for everyone. Instead of saying that the broadcast is going to happen she posts about the events that the broadcast cover to get interest ahead of time.

For #2 Hannah they decided that their voice was one of a trade magazine – they post content from a lot of other related folks.

For Olivia the biggest challenge they have is that there is a lot of buzz out there ahead of the time because Downton airs in the UK before it airs here. For #2 the voice of Masterpiece is a knowledgeable, fun voice that knows a lot about the history of Masterpiece and entertainment.

Molly plans ahead (#3) by posting facts that will then later be shown on a episode of American Experience. Hannah does this by giving context to conversations that are going on from her show. For example Antiques Roadshow had nothing to say about whether a dress is gold or blue so they passed on joining in on that conversation. “Be nimble. Instead of news hijacking, figure out how you can add context to what’s happening now”

When talking about #4 Olivia has just recently started using tools like Vine and Instagram because it’s hard to figure out what you can offer on each platform that’s different because you have overlap in followers. You don’t want to repeat content on all outlets. You have to figure out how to diversify and keep it unique. Hannah is primarily focused on Facebook and Pinterest with content from Antiques Roadshow. Molly encourages us not to be afraid to fail – try out the tools – if you fail it’s no big deal. You can choose the one tool that you love and start there.

Side note – give Vine a try because it’s a great way to share little nuggets with your audience.

We live in a sharing economy. So no one is going to listen to you if you’re pointing at yourself all the time – you have to share others’ content. Olivia talked about how (for #5) they formed partnerships with Jane Austin fan bloggers to get them to live tweet during Downton Abbey. They even found fashion bloggers to live tweet about the fashion in the show. They aren’t paid, they’re just asked if they want to tweet while they watch and get promotion by doing so. “Always be on the lookout for high profile people who love your content. They are the best ambassadors”

Olivia brings in cast to do video interviews for #6 (visuals). Molly mentioned that everyone loves visuals – but keep in mind that different types of audiences like different types of content. Keep diversification in mind.

Olivia is always looking for ways to do things a little bit differently to engage folks (#7). What are the new things they can do that are familiar at the same time. Again – don’t be afraid to fail. Hannah finds that they have different audiences – their social media (with the exception of Facebook) audience is younger than their broadcast audience. It excites her to see that their social media is succeeding in exciting users. Facebook users though get really excited and participate in the online community – which shows that the social media policies are succeeding. Molly gets excited when a post on Facebook shows a reach higher than the number of followers they have on the page.

Each speaker gave us one final lesson they have from using social media on a shoestring budget.

Olivia started by telling us about her experience with the finale of Downton. Her team bought a poll app for Facebook for $200 (My Polls) and posted a poll every day up to the finale asking people what they thought was going to happen – they got over 30,000 viewers of the polls – but they saw an uptake of 70% participation rate on their page. At the end they did an infographic of the most popular answers and they took all the responses and posted them to the Masterpiece website.

Hannah has been emailing appraisers before their show letting them know when they were going to air. Recently she started including social media links and hashtags in the emails to get more interaction – this is a free way to get others tweeting/posting about the show.

The post SxSW: Behind the Social at WGBH appeared first on What I Learned Today....

Related posts:

  1. SxSW: New Social Networks Are Changing Entire Industries
  2. ATO2014: Social media for slackers
  3. Social Mention

CrossRef: New CrossRef Members

Fri, 2015-03-13 20:16

Updated March 9, 2015

Voting Members
American Mental Health Counselors Association
Audio Engineering Society
Auricle Technologies, Pvt., Ltd.
Austrian Geological Society (OGG)
Croatian Society of Art Historians
Editorial Board of Journal Radioelectronics, Nanosystems, Information Technology RENSIT
Entomological Society of Israel
Eurasian Academy of Sciences
Future Energy Service and Publishing
Harvard Education Publishing Group
Institute of Environmental Sciences and Technology (IEST)
International Academy Publishing (IAP)
Paul Mellon Centre for Studies in British Art
Pyatigorsk State Linguistic University
Techmind Research Society
The Finnish Society of Photogrammetry and Remote Sensing

Represented Members
Bumhan Philosophical Society
Centro Universitario de Maringa
Daegu Historical Association
Historical and Social Educational Ideas
Institute of Archaeology and Ethnography SB RAS
Journal of Security Strategies
Kazan Medical Journal
Modern Studies in English
Panorama of Brazilian Law
Raizes e Amidos Tropicais/Tropical Roots and Starches
Real Economy Publishing
Sociedade Brasileira de Dermatologia

Last updated March 2, 2015

Voting Members
Asian Scientific Publishers
Global Business Publications
Institute of Polish Language
Journal of Case Reports
Journal Sovremennye Tehnologii v Medicine
Penza Psychological Newsletter
Science and Education, Ltd.
The International Child Neurology Association (ICNA)
Universidad de Antioquia

Represented Members
Balkan Journal of Electrical & Computer Engineering (BAJECE)
EIA Energy in Agriculture
Faculdade de Enfermagem Nova Esperanca
Faculdade de Medicina de Sao Jose do Rio Preto - FAMERP
Gumushane University Journal of Science and Technology Institute
Innovative Medical Technologies Development Foundation
Laboratorio de Anatomia Comparada dos Vertebrados
Nucleo para o Desenvolvimento de Tecnologia e Ambientes Educacionais (NPT)
The Journal of International Social Research
The Korean Society for the Study of Moral Education
Turkish Online Journal of Distance Education
Uni-FACEF Centro Universitario de Franca
Yunus Arastirma Bulteni

Eric Hellman: 16 of the top 20 Research Journals Let Ad Networks Spy on Their Readers

Fri, 2015-03-13 13:55
A recent query to the "LibLicense" listserv asked:
Is there any kind of organization that has put together a website or list of database providers/publishers that indicate the extent to which they respect patron privacy?The answer is "no", but I thought it would useful to look at the top journal publishers to see if their websites are built with an orientation towards reader privacy.

I came up with a list of 20 top journals. I took the 10 journals with the most citations and the 10 journals with the most citations per published article, according to the SCImago journal rankings.

I used Ghostery to count the number of trackers present on the web page for an article in each journal. Each of these trackers gets a feed of each user's browsing behavior. I looked at the trackers to see if user browsing behavior was being sent to advertising networks. I also determined whether the journal supported secure connections. Based on these results, I assigned a letter grade for each journal.
Passing, Grade ANone of the scholarly journals I looked at earned excellent grades for reader privacy.
Passing, Grade BTwo journals, both published by the American Physical Society, earned good grades for reader privacy. They use a social sharing widget that respects privacy.
Reviews of Modern Physics.  Ranked #2 in citations/article. 1 Tracker (Google Analytics). No advertising networks. Supports HTTPS, but allows insecure connections.Physical Review Letters. Ranked #9 in total citations, #393 in citations/article. 1 Tracker (Google Analytics). No advertising networks. Supports HTTPS, but allows insecure connections.Passing Grade CTwo journals, both published by Annual Reviews, earned acceptable grades for reader privacy.

Annual Review of Immunology. Ranked #3 in citations/article. 1 Tracker (Google Analytics). No advertising networks. Insecure connections only.Annual Review of Biochemistry. Ranked #5 in citations/article. 1 Tracker (Google Analytics). No advertising networks. Insecure connections only.
Failing Grade DFailing grades are earned by publishers that allow their readers to be tracked by advertising networks. These networks get access to the full browsing history of a user and track them with cookies; it's difficult for users to maintain anonymity when most of their web browsing is exposed to tracking.
Science, published by AAAS. Ranked #5 in total citations, #49 in citations/article. 10 Trackers. Multiple advertising networks. Science gets a D rather than an F because it supports HTTPS, although it allows insecure connections.Failing Grade F15 journals earned failing grades because their participation in advertising networks exposes their readers to tracking and spying. Some of the publishers are more flagrant about this than others. Maybe I should have given F+ to some and F- to others. All of these journals force insecure connections.

PLoS One, published by the Public Library of Science. #1 in total citations, #1776 in citations/article. 3 trackers. One advertising network.
Proceedings of the National Academy of Sciences of the United States, published by the National Academy of Sciences. #2 in total citations, #155 in citations/article. 3 trackers. One advertising network.
Journal of Biological Chemistry
, published by the American Society for Biochemistry and Molecular Biology. #8 in total citations, #513 in citations/article. 3 trackers. One advertising network.
Quarterly Journal of Economics
, published by Oxford Journals. #6 in citations/article. 4 trackers. One advertising network.
Chemical Communications
, published by the Royal Society of Chemistry. #10 in total citations, #680 in citations/article. 6 trackers. Multiple advertising networks.
Journal of the American Chemical Society
, published by the American Chemical Society. #4 in total citations, #185 in citations/article. 7 trackers. Multiple advertising networks.
Chemical Reviews
, published by the American Chemical Society. #10 in citations/article. 8 trackers. Multiple advertising networks. 
CA: A Cancer Journal for Clinicians
, published by Wiley. #1 in citations/article. 9 trackers. Multiple advertising networks.
, published by Elsevier. #4 in citations/article. 9 trackers. Multiple advertising networks.
Angewandte Chemie - International Edition
, published by Wiley. #6 in total citations, #202 in citations/article. 11 trackers. Multiple advertising networks.
Nature Genetics
, published by Nature Publishing Group. #7 in citations/article. 11 trackers. Multiple advertising networks.
, published by Nature Publishing Group. #3 in total citations, #11 in citations/article. 11 trackers. One advertising network.
Nature Reviews Genetics
, published by Nature Publishing Group. #8 in citations/article. 12 trackers. Multiple advertising networks.
Nature Reviews Molecular Cell Biology
, published by Nature Publishing Group. #9 in citations/article. 13 trackers. Multiple advertising networks.
New England Journal of Medicine,
 published by the Massachusetts Medical Society. #7 in total citations, #41 in citations/article. 14 trackers. Multiple advertising networks.
RemarksI'm particularly concerned about the medical journals that participate in advertising networks. Imagine that someone is researching clinical trials for a deadly disease. A smart insurance company could target such users with ads that mark them for higher premiums. A pharmaceutical company could use advertising targeting researchers at competing companies to find clues about their research directions. Most journal users (and probably most journal publishers) don't realize how easily online ads can be used to gain intelligence as well as to sell products.
In defense of the publishers, it should be noted that the web advertising business has developed very rapidly over the past few years due to intense competition. A few years ago, the attacks on user privacy enabled by the ad networks' massive data collection were mostly theoretical. But competition has led the networks to increase their targeting ability and scoop up more and more "demographic" data. What was theory a few years ago is today's reality. We still have time to prevent tomorrow's privacy disaster, but change will only happen if the institutions that purchase and fund these journals learn what's really going on and start to demand the privacy that readers deserve.

HangingTogether: Complete* List of Terry Pratchett’s Discworld Novels

Fri, 2015-03-13 13:29

In honor of Terry Pratchett, I want to share with everyone, one of my favorite places in all the worlds – Terry Pratchett’s Discworld.  If you know it, you love it.  If you don’t know it, I highly encourage you to explore it.  There are over 40 books in the series, and I’ve read them all – many more than once. Well, to be honest, many more than a dozen times. It is a world populated by many creatures including, but not limited to; humans, sentient luggage, dwarves, trolls, witches, wizards, vampires, werwolves, heros, gods, and one Nobby Nobs.  These books never fail to inspire me, and I want to share them with you.

Fortunately, lots and lots of libraries around the world hold these books.  To help you find them, I’ve compiled the complete* list of all the Discworld books with links to WorldCat (so that you can find them near you).  Enjoy! and be warned – reading one book, usually leads to reading 3 or more.  (It’s the original binge watching. I know, I’ve been binge reading Pratchett since the 1990s.)

Complete* List of Terry Pratchett’s Discworld Novels (In order of publication date)

Not Discworld – but I love it; Good Omens with Neil Gaiman.

There are more Discworld books that are not novels; mapps, cookbooks, portfolios, and handbooks.  You can find these in your library too, WorldCat can help:

And now, if you’ll excuse me I have appointments with Rincewind, Commander Vimes, Tiffany Aching, Granny Weatherwax, Moist von Lipwig, and the Librarian**. I’ll send a clacks to let you know when I’ll be back.

Terry Pratchett, I will never forget you and thank you for sharing the Discworld with our world.


* The list is as complete as I could make it. But I’m only human, and the series is impressive, if not magical unto itself and does mysterious things. (I don’t have proof, but I think the books change a bit on every 3rd reading. Text can be slippery that way.) Whatever I’ve missed, please post it in the comments with a link to WorldCat if you can.

** the Librarian is a side character in many stories, but he’s a personal favorite.  But don’t call him a monkey, unless you want your arm ripped off.

^ these two are, strictly speaking, picture books and not novels. But I put them on the list anyway.

*^Also, if you’ve never read Pratchett before, I recommend you just pick one and read.  the Discworld series is actually made of several series and they do have a reading order. A quick internet search for “Discworld reading order” can lead you to some guides. You don’t really need it though. Read what you think looks interesting and just dive in.  I don’t think the Discworld would approve of that much order imposed upon it anyway.

About JD Shipengrover

JD Shipengrover. OCLC Research. Information Architect. My primary focus is to bring user-centered interface design and usability principles to the web applications created by OCLC Research. I have been with OCLC for over 7 years and have been working as a Web Creative for 15+ years.

Mail | More Posts (2)

Library of Congress: The Signal: Creating Workflows for Born-Digital Collections: An NDSR Project Update

Fri, 2015-03-13 13:28

The following is a guest post by Julia Kim, National Digital Stewardship Resident at New York University Libraries.

Julia Kim analyzing Jeremy Blake’s digital artwork. Photo by Elena Olivo.

I’m now into the last leg of my nine-month residency, and I’m amazed by what has been accomplished and the major steps still ahead of me. In this post, I’ll give a project update on my primary task: to create, test and implement access-driven workflows of born-digital collections at New York University Libraries.

My residency is very broad; I am tasked with investigating and implementing workflows that encompass the entirety of the born-digital process, from accession to access (project overview). This means that while I spent a month learning digital forensics techniques, I have also researched and implemented workflow steps that occur before acquisition and after ingest. Rather than signing off when the bits have been checked, duplicated and dispersed in multiple locations to long-term storage, I’ve also focused on access. In the past five months, I’ve worked on many collections. Such depth and breadth has been crucial. Time and again, I’ve been challenged to revise and refine my sense of the workflow.

The ingestion of incoming born-digital material is time consuming. In many cases, I only create a bit-exact disk image or copy of the content for ingest with minimal metadata from my end. NYU’s three archives (and now Abu Dhabi) collect actively. Imaging or copying files, validating, bagging and ingesting such increasingly large collections tie up our dedicated imaging station and localized storage. This past week, for example, I finished ingesting a collection into the repository with 2 TB, 5 TB and 3 TB hard drives. It took the full weekend to create the initial image of the 2 TB hard drive and validate with checksums and approximately the same amount of time for ingest into the repository. The Digital Forensics Lab, however, contains a number of other computers at my disposal in addition to the imaging desktop. This is also extremely helpful with collections that rely on other operating systems.

NYU’s Digital Forensics Laboratory.

Over the course of my residency I’ve also worked with the digital counterparts of previously published hybrid collections including Exit Art Archive (2 TB organizational RAID) and the Robert Fitch Papers (several floppy disks with easily renderable text files and no researcher restrictions). The collection I’ve spent the most time with is the Jeremy Blake Papers which were acquired in 2007. These “papers” include files copied on-site at the donor’s house from Blake’s MacBook Pro, an external hard drive and a flash drive. NYU also acquired several hundred optical disks, three additional hard drives, dozens of zip disks and digital linear tapes. The Blake Papers present many of the challenges that hinder access: sheer data size and variety of media format types, a prevalence of incompletely documented or misunderstood proprietary file formats, and complicated rights and privacy restrictions.

Jeremy Blake’s PSD files, accessed with a Power PC.

The bulk of the Blake Papers is composed of Photoshop files (PSD) that span the late 1990s to 2007. To create his work, Blake would collage different sources into Photoshop. These sources would be layered and further processed to create the dense and dreamlike imagery characteristic of his final moving image work. Blake would share these layered PSD files with close collaborators that animated his still images and composed the soundtracks under his close supervision.

PSD file format normalization was not a viable preservation solution. Normalization would render a file with fifty layers, turned on and off in different ways, into a singular flat image. Any normalization process would lose Blake’s working process, the area in which we thought his archive could be most valuable to future researchers. We cannot simply migrate the files to TIFF 6.0. Paradoxically, any TIFF that did encompass layers would no longer be a true TIFF.

While Photoshop has retained robust backward and forward compatibility with its files and software, Blake’s working methods are very much a product of the intersection of developing technologies and art-making practices of his time. His methods, were cutting edge at the time, but they seem unimaginably labor-intensive today. For these reasons, his works will be migrated through Photoshop software to the current version of Photoshop, but they will also be migrated and made accessible through emulations of the approximate software versions and operating systems used. Some of my focus recently was to create these emulations.

Emulated Access of Blake’s artwork.

Next month, I will lead and design a usability test of representative portions of the Jeremy Blake Papers and the Exit Art Collection with a small, representative group of NYU’s Fales Library & Special Collections researchers. This will serve as a pilot test for making accessible emulation of complex media. It will also be an opportunity to test my documentation as I explain these concepts and strategies to researchers unused to the idea of archival research done with only a (non-networked) laptop.

A secondary purpose will be to note qualities of interest to researchers. This may seem an odd question to pose, but given the still enormous effort needed to stabilize and make accessible this type of work, it is worth noting which qualities researchers are interested in. Their subjects of research and even their definition of “content” may differ. A digital humanist may be more interested in the timestamps across a large digital collection rather than any of the text and image “content” in the files themselves. Some researchers may be well versed in Photoshop’s changes, while some may only be interested in the finalized moving images. Through these pilot studies, I hope to answer some of these questions while creating a template for other archivists interested in replicating and adding to the data gathered from this study.

In addition to this technical work, I’m also coordinating a born-digital workflows CURATEcamp (April 23), which will be hosted at the beautiful landmark Brooklyn Historical Society in Brooklyn Heights. This un-conference will bring together digital archivists, stewards, repository managers, and staff involved in managing born-digital collections for discussions, presentations and demonstrations. In addition to two streams of small groups that will tackle issues like the Forensic Toolkit’s integration into workflows, we will also have a larger stream of demonstrations and workshops to highlight developments with BitCurator Access, for example.

In addition to CURATEcamp, I will be sharing updates of my work at the American Institute of Conservation conference (May 2015), as well as at the Society of American Archivists (August 2015). It’s been especially gratifying to be able to learn from different intersecting worlds and competencies, whether moving images, digital curation, fine art or archiving.

The activities and tasks mentioned in this post should keep me busy for the next two months. As someone who loves investigating and research with tangible “hands-on” components and outputs, this has been a great experience for me. I’d like to note that without the administrative and technical support from my mentors, Don Mennerich and Lisa Darms, this work would not have been at all possible. I have been able to explore very interesting questions with not only exceptional collections, but exceptional mentors.