You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 6 hours 11 min ago

DuraSpace News: Update 5: Beta Pilot Projects Set to Kick-Off

Tue, 2014-09-09 00:00
From David Wilcox, Fedora Product Manager   Winchester, MA This is the fifth in a series of updates on the status of Fedora 4.0 as we move from the Beta [1] to the Production Release. The updates are structured around the goals and activities outlined in the July-December 2014 Planning document [2], and will serve to both demonstrate progress and call for action as needed. New information since the last status update is highlighted in bold text.  

Library of Congress: The Signal: Hybrid Born-Digital and Analog Special Collecting: Megan Halsband on the SPX Comics Collection

Mon, 2014-09-08 17:29

Megan Halsband, Reference Librarian with the Library of Congress Serial and Government Publications Division.

Every year, The Small Press Expo in Bethesda, Md brings together a community of alternative comic creators and independent publishers. With a significant history of collecting comics, it made sense for the Library of Congress’ Serial and Government Publications Division and the Prints & Photographs Division to partner with SPX to build a collection documenting alternative comics and comics culture. In the last three years, this collection has been developing and growing.

While the collection itself is quite fun (what’s not to like about comics), it is also a compelling example of the way that web archiving can complement and fit into work developing a special collection. To that end, I am excited to talk with Megan Halsband, Reference Librarian with the Library of Congress Serial and Government Publications Division and one of the key staff working on this collection as part of our Content Matters interview series.

Trevor: First off, when people think Library of Congress I doubt “comics” is one of the first things that comes to mind. Could you tell us a bit about the history of the Library’s comics collection, the extent of the collections and what parts of the Library of Congress are involved in working with comics?

Megan: I think you’re right – the comics collection is not necessarily one of the things that people associate with the Library of Congress – but hopefully we’re working on changing that! The Library’s primary comics collections are two-fold – first there are the published comics held by the Serial & Government Publications Division, which appeared in newspapers/periodicals and later in comic books, as well as the original art, which is held by the Prints & Photographs Division.

Example of one of the many comics available through The Library of Congress National Digital Newspaper Program. The End of a Perfect Day. Mohave County miner and our mineral wealth (Kingman, Ariz.) October 14, 1921, p.2.

The Comic Book Collection here in Serials is probably the largest publicly available collection in the country, with over 7,000 titles and more than 125,000 issues. People wonder why our section at the Library is responsible for the comic books – and it’s because most comic books are  published serially.  Housing the comic collection in Serials also makes sense, as we are also responsible for the newspaper collections (which include comics). The majority of our comic books come through the US Copyright Office via copyright deposit, and we’ve been receiving comic books this way since the 1930?s/1940?s.

The Library tries to have complete sets of all the issues of major comic titles but we don’t necessarily have every issue of every comic ever published (I know what you’re thinking and no, we don’t have an original Action Comics No. 1 – maybe someday someone will donate it to us!). The other main section of the Library that works with comic materials is Prints & Photographs – though Rare Book & Special Collections and the area studies reading rooms probably also have materials that would be considered ‘comics.’

Trevor: How did the idea for the SPX collection come about? What was important about going out to this event as a place to build out part of the collection? Further, in scoping the project, what about it suggested that it would also be useful/necessary to use web archiving to complement the collection?

Megan: The executive director of SPX, Warren Bernard, has been working in the Prints & Photographs Division as a volunteer for a long time, and the collection was established in 2011 after an Memorandum of Understanding was signed between the Library and SPX. I think Warren really was a major driving force behind this agreement, but the curators in both Serials and Prints & Photographs realized that our collections didn’t include materials from this particular community of creators and publishers in the way that it should.

Small Press Expo floor in 2013

Given that SPX is a local event with an international reputation and awards program (SPX awards the Ignatz) and the fact that we know staff at SPX, I think it made sense for the Library to have an ‘official’ agreement that serves as an acquisition tool for material that we wouldn’t probably otherwise obtain. Actually going to SPX every year gives us the opportunity to meet with the artists, see what they’re working on and pick up material that is often only available at the show – in particular mini-comics or other free things.

Something important to note is that the SPX Collection – the published works, the original art, everything – is all donated to the Library. This is huge for us – we wouldn’t be able to collect the depth and breadth of material (or possibly any material at all) from SPX otherwise.  As far as including online content for the collection, the Library’s Comics and Cartoons Collection Policy Statement (PDF) specifically states that the Library will collect online/webcomics, as well as award-winning comics. The SPX Collection, with its web archiving component,  specifically supports both of these goals.

Trevor:  What kinds of sites were selected for the web archive portion of the collection? In this case, I would be interested in hearing a bit about the criteria in general and also about some specific examples. What is it about these sites that is significant? What kinds of documentation might we lose if we didn’t have these materials in the collection?

Archived web page from the American Elf web comic.

Megan: Initially the SPX webarchive (as I refer to it – though its official name is Small Press Expo and Comic Art Collection) was extremely  selective – only the SPX website itself and the annual winner of the Ignatz Award for Outstanding Online Comic were captured.  The staff wanted to see how hard it would be to capture websites with lots of image files (of various types). Turns out it works just fine (if there’s not paywall/subscriber login credentials required) – so we expanded the collection to include all the Ignatz nominees in the Outstanding Online Comic category as well.

Some of these sites, such as Perry Bible Fellowship and American Elf, are long-running online comics who’s creators have been awarded Eisner, Harvey and Ignatz awards. There’s a great deal of content on these websites that isn’t published or available elsewhere – and I think that this is one of the major reasons for collecting this type of material. Sometimes the website might have initial drafts or ideas that later are published, sometimes the online content is not directly related to published materials, but for in-depth research on an artist or publication, often this type of related content is extremely useful.

Trevor: You have been working with SPX to build this collection for a few years now. Could you give us an overview of what the collection consists of at this point? Further, I would be curious to know a bit about how the idea of the collection is playing out in practice. Are you getting the kinds of materials you expected? Are there any valuable lessons learned along the way that you could share? If anyone wants access to the collection how would they go about that?

Megan: At this moment in time, the SPX Collection materials that are here in Serials include acquisitions from 2011-2013, plus two special collections that were donated to us, the Dean Haspiel Mini-Comics Collection and the Heidi MacDonald Mini-Comics Collection.  I would say that the collection has close to 2,000 items (we don’t have an exact count since we’re still cataloging everything) as well as twelve websites in the web archive. We have a wonderful volunteer who has been working on cataloging items from the collection, and so far there are over 550 records available in the Library’s online catalog.

Mini comics from the SPX collection

Personally, I didn’t have any real expectations of what kinds of materials we would be getting – I think that definitely we are getting a good selection of mini-comics, but it seems like there are more graphic novels that I anticipated. One of the fun things about this collection are the new and exciting things that you end up finding at the show – like an unexpected tiny comic that comes with its own magnifying glass or an oversize newsprint series.

The process of collecting has definitely gotten easier over the years. For example, the Head of the Newspaper Section, Georgia Higley, and I just received the items that were submitted in consideration for the 2014 Ignatz Awards. We’ll be able to prep permission forms/paperwork in advance of the show for the materials we’re keeping from this material, and it will help us cut down on potential duplication. This is definitely a valuable lesson learned! We’ve also come up with a strategy for visiting the tables at the show – there are 287 tables this year – so we divide up the ballroom between four of us (Georgia and I, as well as two curators from Prints & Photographs – Sara Duke and Martha Kennedy) to make it manageable.

We also try to identify items that we know we want to ask for in advance of the show – such as ongoing serial titles or debut items listed on the SPX website – to maximize our time when we’re actually there. Someone wanting to access the collection would come to the Newspaper & Current Periodical Reading Room to request the comic books and mini-comics. Any original art or posters from the show would be served in the Prints & Photographs Reading Room. As I mentioned – there is still a portion of this collection that is unprocessed – and may not be immediately accessible.

Trevor: Stepping back from the specifics of the collection, what about this do you think stands for a general example of how web archiving can complement the development of special collections?

Megan: One of the true strengths of the Library of Congress is that our collections often include not only the published version, but also the ephemeral material related to the published item/creator, all in one place. From my point of view, collecting webcomics gives the Library the opportunity to collect some of this ‘ephemera’ related to comics collections and only serves to enhance what we are preserving for future research. And as I mentioned earlier, some of the content on the websites provides context, as well as material for comparison, to the physical collection materials that we have, which is ideal from a research perspective.

Trevor:  Is there anything else with web archiving and comics on the horizon for your team? Given that web comics are such significant part of digital culture I’m curious to know if this is something you are exploring. If so, is there anything you can tell us about that?

We recently began another web archive collection to collect additional webcomics beyond those nominated for Ignatz Awards – think Dinosaur Comics and XKCD. It’s very new (and obviously not available for research use yet) – but I am really excited about adding materials to this collection. There are a lot of webcomics out there – and I’m glad that the Library will now be able to say we have a selection of this type of content in our collection! I’m also thinking about proposing another archive to capture comics literature and criticism on the web – stay tuned!

HangingTogether: Innovative solutions for dealing with born-digital content in obsolete forms – Part 1

Mon, 2014-09-08 17:00

[Tweet] AB Schmuland: Obsolete media brings them in at 8 am EDT on a Saturday! #saa14 #s601 http://t.co/9BaDz0IhOs

I chaired a lightning talk session at SAA 2014 in Washington DC on August 16. The premise was that many archives have received materials in forms that they cannot even read. Archives are acquiring born-digital content at increasing rates and it’s hard enough to keep up with current formats. It makes sense to reach out to the community for help with more obscure media. I found ten speakers who had confronted this problem and figured out innovative solutions to getting material into a form that could be more easily managed.

[Tweet] Jennifer Schaffner: “my name is ___ and I have born-digital on crazy old media that I can barely identify that I have no idea what to do with” #saa14 #s601

The speakers’ stories were so encouraging to others in similar situations that I wanted to share them further.

This is the first of three posts. We start with a talk about the array of media an archives might confront, followed by a talk about an effort to test how much can be done in house.

Lynda Schmitz Fuhrig, the Electronic Records Archivist at the Smithsonian Institution Archives urged archivists to ingest materials off removable media as soon as possible — if possible. She itemized some of more typical physical media the SI Archives has and the workstations they maintain to access them. Then she told of some successes they’d had getting content off less typical forms, like Digital Audio Tapes, data tapes, interactive compact discs, and digital videocassettes.

[Tweet] Kevin Schlottmann: National air and space website from 1994 recovered from tape in 2012 #s601 #saa14

Finally she cautioned about some of the media we may be overly confident about: CDs and DVDs – not just that drives to read them are no longer standard issue, but that their life spans can vary dramatically.

She suggested looking to schools, eBay, craigslist, and listservs to obtain out of date equipment and considering whether another archives could help with your format. For formats that simply cannot be read, she raised the possibility of waiting until a researcher wants it and seeing if the researcher is willing to pay to have a vendor transfer the data.

Moryma Aydelott, Special Assistant to the Director of Preservation at the Library of Congress, described developing cross-division in-house workflows for processing 3 ½” and 5 ¼” floppy disks.

The goal was to get a backup copy of the items stored on long term storage, while encouraging standard practices and increasing staff digital competencies. She described the software used (xcopy and FTK Imager) to get complete and unchanged copies of the content. Tabs that make the floppies read-only were used to prevent disks being accidentally overwritten during copying. After reading data off the disks, the workflow included steps to create checksums and other files using the BagIt specification, and for items to be inventoried as they’re saved to tape-based long term storage. The workflows were documented, staff was trained, and processes were customized to particular situations.

[Tweet] Sasha Griffin: Balance outsourcing with developing staff competences in-house #s601 #saa14

Curatorial divisions had been contemplating transferring data off of these media but were unsure how to start, and this project gave them some help and confidence to get going. Now the Preservation Reformatting Division is assembling a lab with scanners, portable drives, and a FRED machine. It will be available to staff in all LC curatorial divisions and those staff are helping to determine other hardware and software the lab should include. A committee has formed to develop scalable ways of processing materials that can’t be processed in house.

Next up: Part 2 will continue with four speakers talking about solutions to particularly challenging formats.

About Ricky Erway

Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.

Mail | Web | Twitter | LinkedIn | More Posts (32)

OCLC Dev Network: Enhancements Planned for September 14

Mon, 2014-09-08 14:00

In addition to the upcoming VIAF changes we shared last week (currently planned for September 16), a separate  release on September 14 will bring enhancements to a couple of our WorldShare Web services.

Hydra Project: Hydra Connect #2 is a sell-out!

Mon, 2014-09-08 09:39

We’re more than pleased to tell you that Hydra’s second Connect meeting, to be held in Cleveland 30 September – 3rd October, is a sell-out!  Not only have we sold all the tickets, we have a waiting list of people hoping we might manage to find a little more space.  We’re looking forward to seeing 160 faces, friends old and new, at Case Western Reserve University in three weeks.

HangingTogether: Linked Data Survey results 6 – Advice from the implementers

Mon, 2014-09-08 08:00

 

 

OCLC Research conducted an international linked data survey for implementers between 7 July and 15 August 2014. This is the sixth--and last--post in the series reporting the results.  

An objective in conducting this survey was to learn from the experiences of those who had implemented or were implementing linked data projects/services.  We appreciate that so many gave advice. About a third of those who have implemented or are implementing a linked data project are planning to implement another within the next two years; another third are not sure.

Asked what they would differently if they were starting their project again, respondents answered with issues clustered around organizational support and staffing, vocabularies, and technology. One noted that legal issues seriously delayed the release of their linked data service and that legal aspects need to be addressed early.

Organizational support and staffing:

  • Have a clear mandate for the project. Our issues have stemmed from our organization, not the technology or concept.
  • It would have been useful to have a greater in-house technical input.
  • With hindsight we have more realistic expectations. if funding would allow I would hire a programmer to the project.
  • Attempt to garner wider organisational support and resources before embarking on what was in essence a very personal project.
  • We also would have preferred to have done this as an official project, with staff resources allocated, rather than as an ad-hoc, project that we’ve crammed into our already full schedules.
  • Have dedicated technical project manager – or at least a bigger chunk of time.
  • Have more time planned and allocated for both myself and team members.

Vocabularies

  • Build an ontology and formal data model from the ground up.
  • Align concepts we are publishing with other authorities, most of which didn’t exist at the time.
  • Vocabulary selection, avoid some of the churn related to that process.
  • Make more accurate and detailed records so that it is easier for people using the data to clear up ambiguity of similar names.
  • I might seek a larger number of partners to contribute their controlled vocabularies or thesauri in advance.

Technology

  • We would immediately use Open Refine to extract and clean up data after the first export from Access
  • We would provide a SPARQL endpoint for the data if we had the opportunity.
  • We would give more thought to service resilience from the perspective of potential denial of service attacks.
  •  Well define the schema first before we generated the records. Use the schema to validate all of the records before we stored them in the system’s database.
  • It is still a pity that the Linked Data Pilot is not more integrated to the production system. It would have easier if the LOD principles would have been included in this production system from the beginning.
  • We might have done more to help our vendor understand the complexity of the LCNAF data service as well as the complexity of the MARC authority format.
  • Better user experience; we chose to focus on data mining vs data use.
  • Transforming the source data into semantic form, before attempting process (clustering, clean up, matching).
  • A stable infrastructure is vital for the scalability of the project.

General advice

Much of the advice for both those considering projects to consume linked data and those considering projects to publish linked data cluster around preparation and project management:

  • Ask what benefit doing linked data at all will really have.
  • There is more literature and online information relating to consuming linked data than there was when we started so our advice would be to read as widely as possible and consult with experts in the community.
  • Get a semantic web expert in the team
  • The same as any other project: have a detailed programme.
  • Have a focus. Do your research. Overestimate time spent.
  • Take a Linked Data class
  • Estimate the time required for the project and then double it.  The time to explain details of MARC, EAD, and local practices and standards to the vendor, to test functionality of the application, and to test navigational design elements of the application all require dedicated blocks of time.
  • Bone up on your tech skills.  It’s not magic; there is no wand you can wave.
  • Basic project management, basic data management, basic project planning are really important at the onset.
  • Having a detailed program before starting. Get institutional commitment.  Unless the plan is to do the smallest thing… the investment is great enough to warrant some kind of administrative blessing, at the minimum.
  • Take advantage of the many great (and free) resources for learning about RDF and linked data.
  • Start with a small project and then apply the knowledge gained and the tools built to larger scale projects.
  • Find people at other institutions who are doing linked data so you can bounce ideas off of each other.
  • Plan, plan, plan! Do research. Understand all that there is going on and what challenges you will have before you reach them.
  • Automate, automate, automate

Advice for those considering a project to consume linked data

  • Linking to external datasets is very important but also very difficult.
  • Find authorities for your specific domain from the outset, and if they don’t exist don’t be afraid to define and publish your own concepts.
  • Firm understanding of ontologies
  • Use CIDOC CRM / FRBRoo for cultural heritage sources. It will be far more costs effective and provide the highest quality of data that can be integrated preserving the variability and language of your data.
  • Pick a problem you can solve. Start with schema.org as core vocabulary. Lean toward JSON-LD instead of rdfxml. Like agile fail quick and often. Store the index in a triplestore.
  • Make a decision what kind of granularity of data you want to make available as linked data – no semantics for now. We cannot make our data to transform as linked data as one to one relationship – there should be a data that will not be available in linked data. If you want to make your data discoverable, then schema.org semantic will work the best.
  • Sometimes the data available just won’t work with your project. Keep in mind that something may look like a match at first but the devil is in the details. 

Advice for those considering a project to publish linked data

General advice: “Try to consume it first!”

Project management

  • It’s possible to participate in linked data projects even by producing data and leaving the work of linking to others.
  • Managing expectations of content creators is tough – people often have expectations of linked data that aren’t possible. The promise of being able to share and link things up can efface the work required to prepare materials for publication.
  • Always look at what others have done before you. Build a good relationship with the researcher with whom you are working; leverage the knowledge and experience of that person or persons. Carefully plan your project ahead of time, in particular the metadata.
  • Look at the larger surrounding issues.  It is not enough to just dump your data out there.  Be prepared to perform some sort of analytics to capture information as to uses of the data.  Also include a mechanism for feedback about the data and requested improvements/enhancements.  The social contract of linked data is just as important as the technical aspects of transforming and publishing the data.
  • Just do it, but consider if you’re just adding more bad data to the web — dumping a set of library records to RDF is pointless. Consider the value of publishing data. Reusing data is probably more interesting.
  • The assumption that the data needs to be there in order to be used is, I think, wrong. The usefulness of data is in its use; create a service one uses oneself and it is valuable and useful. Whether others actually use it is irrelevant.
  • Pay attention to reuse existing ontologies in order to improve interoperability and user comprehension of your published data. 

Technical advice

  • Publish the highest quality possible that will also achieve semantic and contextual harmionisation. You will end up doing it again otherwise and therefore it is far more cost effective and gets the best results.
  • Don’t use fixed field/ value data models. For cultural heritage data use CIDOC CRM / FRBRoo.
  • Offer a SPARQL endpoint to your data.
  • Use JSON-LD.
  • Museums need to take a good look at their data and make sure that they create granular data, i.e. each concept (actors, keywords, terms, objects, events, …) needs to have unique ids, which in turn will be referenced in URIs. Also publishing linked data means embracing a graph data structure, which is a total departure from traditional relational data structure: linked data forces you to make explicit what is only implicit in the database.  Modeling data for events is challenging but rewarding. Define what data entities your museum is responsible for… Being able to define URIs for entities means being able to give them unique identifiers and and there are many data issues that need to be taken care of within an institution.  Also, very important is that producing LOD requires the data manager to think differently about data, and not about information.  LOD requires that you make explicit knowledge that is only implicit in a traditional relational database.

 Recommended Resources

This is a compilation of resources–conferences, linked data projects, listservs, websites–respondents found particularly valuable in learning more about linked data.

Conferences valuable in learning more about linked data: American Medical Informatics Association meetings,  Computer Applications in Archaeology, Code4Lib conferences, Digital Library Federation’s forums, Dublin Core Metadata Initiative, European Library Automation Group, European Semantic Web Conferences, International Digital Curation Conference, International Semantic Web Conference, Library and Information Technology Association’s national forums, Metadata and Digital Object Roundtable (in association with the Society of American Archivists), Scholarly Publishing and Academic Resources Coalition conferences, Semantic Web in Libraries, Theory and Practice of Digital Libraries

Linked data projects implementers track:

  • 270a Linked Dataspaces
  • AMSL, an electronic management system based on linked data technologies
  • Library of Congress’ BIBFRAME (included in the survey responses)
  • Bibliothèque Nationale de France’s Linked Open Data project
  • Bibliothèque Nationale de France’s OpenCat: Interesting data model – lightweight FRBR model together with reuse of commonly used web ontologies (DC; FOAF, etc.); scalable open source platform (cubicweb). Opencat aims to demonstrate that data published on data.bnf.fr can be re-used by other libraries, in particular public libraries.
  • COMSODE (Components Supporting the Open Data Exploitation)
  • Deutsche National Bibliothek’s Linked Data Service
  • Yale Digital Collections Center’s Digitally Enabled Scholarship with Medieval Manuscripts, linked data-based.
  • ESTC (English Short-Title Catalogue): Moving to a linked data model; tracked because one of the aims is to build communities of interest among researchers.
  • Libhub: Of interest because it has the potential to assess the utility of BIBFRAME as a successor to MARC21.
  • LIBRIS, the Swedish National Bibliography
  • Linked Data 4 Libraries (LD4L): “The use cases they created are valuable for communicating the possible uses of linked data to those less familiar with linked data and it will be interesting to see the tools that are developed as a result of the projects.” (Included in the survey responses)
  • Linked Jazz: Reveals relationships of the jazz community, something similar to what a survey respondent wants to accomplish.
  • North Carolina State University’s Organization Name Linked Data: Of interest because it demonstrates concepts in practice (included in the survey responses).
  • Oslo Public Library’s Linked Data Cataloguing: “It is attempting to look at implementing linked data from the point of view of actual need… of a real library for implementation. Cataloguing and all aspects of the system will be designed around linked data.” (Included in the survey responses)
  • Pelagios: Uses linked data principles to increase the discoverability of ancient data through place associations and a major spur for a respondent’s project.
  • PeriodO:  A gazetteer of scholarly assertions about the spatial and temporal extents of historical and archaeological periods; addresses spatial temporal definitions.
  • Spanish Subject Headings for Public Libraries Published as Linked Data (Lista de Encabezamientos de Materia para las Bibliotecas Públicas en SKOS)
  • OCLC’s WorldCat Works (included in the survey responses)

Listservs: bibframe@listserv.loc.gov (Bibliographic Framework Transition Initiative Forum), Code4lib@listserv.nd.edu, DCMI (Dublin Core Metadata Initiative) listservs, data-ac-uk@jiscmail.ac.uk,  dlf-announce@lists.clir.org (Digital Library Federation), lod-lam@googlegroups.com, public-ldp@w3.org (linked data platform working group), semantic-web@w3.org

Websites:

Analyze the responses yourself!

If you’d like to apply your own filters to the responses, or look at them more closely, the spreadsheet compiling all survey responses (minus the contact information which we promised we’d keep confidential) is available at: http://www.oclc.org/content/dam/research/activities/linkeddata/oclc-research-linked-data-implementers-survey-2014.xlsx

 

 

 

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (50)

Patrick Hochstenbach: Creating Cat Bookmarks

Sun, 2014-09-07 08:45
Filed under: Comics Tagged: bookmark, books, cartoon, cat, Cats, comic, Illustrator, literature, Photoshop, reading

Patrick Hochstenbach: Trying out caricature styles

Sat, 2014-09-06 15:10
Filed under: Doodles Tagged: belgië, belgium, caricature, karikatuur, kris peeters, politics, politiek

Cynthia Ng: Google Spreadsheets Tip: Show Data from All Sheets in One

Sat, 2014-09-06 05:13
I’ve been working with spreadsheets a lot lately, and while anything Excel related is well documented and I’m more familiar with, Google spreadsheets does things differently.Today’s post is just a quick tip really, but thought I’d document because it took a long time for me to find the solution, plus I had to play around […]

Ed Summers: Agile in Academia

Sat, 2014-09-06 01:44

I’m just finishing up my first week at MITH. What a smart, friendly group to be a part of, and with such exciting prospects for future work. Riding along the Sligo Creek and Northwest Branch trails to and from campus certainly helps. Let’s just say I couldn’t be happier with my decision to join MITH, and will be writing more about the work as I learn more, and get to work.

But I already have a question, that I’m hoping you can help me with.

I’ve been out of academia for over ten years. In my time away I’ve focused on my role as an agile software developer — increasingly with a lower case “a”. Working directly with the users of software (stakeholders, customers, etc), and getting the software into their hands as early as possible to inform the next iteration of work has been very rewarding. I’ve seen it work again, and again, and I suspect you have too on your own projects.

What I’m wondering is if you know of any tips, books, articles, etc on how to apply these agile practices in the context of grant funded projects. I’m still re-aquainting myself with how grants are tracked, and reported, but it seems to me that they seem to often encourage fairly detailed schedules of work, and cost estimates based on time spent on particular tasks, which (from 10,000 ft) reminds me a bit of the waterfall.

Who usually acts as the product owner in grant drive software development projects? How easy is it to adapt schedules and plans based on what you have learned in a past iteration? How do you get working software into the hands of its potential users as soon as possible? How often do you meet, and what is the focus of discussion? Are there particular funding bodies that appreciate agile software development? Are grants normally focused on publishing research and data instead of software products?

Any links, references, citations, tips or advice you could send my way here, @edsu, or email would be greatly appreciated. I’ve already got Bethany Nowviskie‘s Lazy Consensus bookmarked for re-reading

CrossRef: CrossRef Indicators

Fri, 2014-09-05 19:14

Updated July 25, 2014

Total no. participating publishers & societies 5100
Total no. voting members 2433
% of non-profit publishers 57%
Total no. participating libraries 1885
No. journals covered 35,406
No. DOIs registered to date 68,416,081
No. DOIs deposited in previous month 552,871
No. DOIs retrieved (matched references) in previous month 34,385,296
DOI resolutions (end-user clicks) in previous month 98,365,532

CrossRef: New CrossRef Members

Fri, 2014-09-05 19:11

Updated September 3, 2014

Voting Members
Annex Publishers, LLC
Association for Medical Education in Europe (AMEE)
Breakthrough Institute, Rockefeller Philanthropy Advisors
COMESA - Leather and Leather Products Institute (COMESA/LLPI)
Hebrew Union College Press
Incessant Nature Science Publishers Pvt Ltd.
Instituto Brasileiro de Avaliacao Psicologica (IBAP)
Instituto Nanocell
Scandinavian Psychologist
Servicios Ecologicos y Cientificos SA de CV
Shared Science Publishers OG
Society of Biblical Literature/SBL Press
Universidad Adolfo Ibanez
Vanderbilt University Library
Visio Mundi Academic Media Group

Represented Members
Global E-Business Association
Institute of Philosophy
Korea Consumer Agency
Korea Distribution Science Association
Korea Employment Agency for the Disabled/Employment Development Institute
Korea Society for Hermeneutics
Korean Association for Japanese Culture
Korean Society for Curriculum Studies
Korean Society for Drama
Korean Society for Parenteral and Enteral Nutrition
Korean Society of Biology Education
Korean Society on Communication in Healthcare
Korean Speech-Language and Hearing
Soonchunhyang Medical Research Institute
The Discourse and Cognitive Linguistics Society of Korea
The Society of Korean Dances Studies

Last updated August 26, 2014

Voting Members
A Fundacao para o Desenvolvimento de Bauru (FunDeB)
Associacao de Estudos E Pesquisas Em Politicas E Practicas Curriculares
Association For Child and Adolescent Mental Health (ACAMH)
International Journal of Advanced Information Science and Technology
School of Electrical Engineering and Informatics (STEI) ITB
Universidad Autonoma del Caribe

Sponsored Members
Journal of Nursing and Socioenvironmental Health
California Digital Library

Represented Members
Association of East Asian Studies
Korea Computer Graphics Society
Korea Institute for Health and Social Affairs
Korea Service Management Society
Korean Association of Multimedia-Assisted Learning
Korean Association for Government Accounting
Korean Counseling Association
The Korean Poetics Studies Society
The Society of Korean Language and Literature

Updated September 3, 2014

CrossRef: CrossRef Indicators

Fri, 2014-09-05 19:07

Updated September 3, 2014

Total no. participating publishers & societies 5339
Total no. voting members 2548
% of non-profit publishers 57%
Total no. participating libraries 1898
No. journals covered 35,763
No. DOIs registered to date 69,191,919
No. DOIs deposited in previous month 582,561
No. DOIs retrieved (matched references) in previous month 35,125,120
DOI resolutions (end-user clicks) in previous month N/A

Harvard Library Innovation Lab: Link roundup September 5, 2014

Fri, 2014-09-05 17:36

This is the good stuff.

Photogrammar

So nice, could even be taken further, I’d imagine they’ve got a lot of ideas in the works –

Our Cyborg Future: Law and Policy Implications | Brookings Institution

Whoa, weird. Our devices and us.

Evolution of the desk

The desk becomes clear of its tools as those tools centralize in the digital space.

Mass Consensual Hallucinations with William Gibson

Technology trumps ideology.

Awesomeness: Millions Of Public Domain Images Being Put Online

Mining the archive for ignored treasure.

John Miedema: The four steps Watson uses to answer a question. An example from literature.

Fri, 2014-09-05 16:01

Check out this excellent video on the four steps Watson uses to answer a question. The Jeopardy style question (i.e., an answer) comes from the topic of literature, so quite relevant here: “The first person mentioned by name in ‘The Man in the Iron Mask’ is this hero of a previous book by the same author.’ This video is not sales material, but a good overview of the four (not so simple) steps: 1. Question Analysis, 2. Hypothesis Generation, 3. Hypothesis & Evidence Scoring, 4. Final Merging & Ranking. “Who is d’Artagnan?” I am so pleased that IBM is sharing its knowledge in this way. I had new insight watching it.

Library of Congress: The Signal: Studying, Teaching and Publishing on YouTube: An Interview With Alexandra Juhasz

Fri, 2014-09-05 15:07

Alexandra Juhasz, professor of Media Studies at Pitzer College

The following is a guest post from Julia Fernandez, this year’s NDIIPP Junior Fellow. Julia has a background in American studies and working with folklife institutions and worked on a range of projects leading up to CurateCamp Digital Culture in July. This is part of a series of interviews Julia conducted to better understand the kinds of born-digital primary sources folklorists, and others interested in studying digital culture, are making use of for their scholarship.

The numbers around user-generated video are staggering. YouTube, one of the largest user-generated video platforms, has more than 100 hours of video content uploaded to it every minute. What does this content mean for us and our society? What of it should we aspire to ensure long-term access to?

As part of the NDSA Insights interview series, I’m delighted to interview Alexandra Juhasz, professor of Media Studies at Pitzer College. Dr. Juhasz has written multiple articles on digital media and produced the feature films “The Owls” and “The Watermelon Woman.” Her innovative “video-book” “Learning from YouTube” was published by MIT Press, but partly enabled through YouTube itself, and is available for free here. In this regard, her work is relevant to those working in digital preservation both in better understanding the significance of user-generated video platforms like YouTube and in understanding new hybrid forms of digital scholarly publishing.

Julia: In the intro to your online video-book “Learning From YouTube” you say “YouTube is the Problem, and YouTube is the solution.” Can you expand on that a bit for us?

Alex: I mean “problem” in two ways. The first is more neutral: YouTube is my project’s problematic, its subject or concern. But I also mean it more critically as well: YouTube’s problems are multiple–as are its advantages–but our culture has focused much more uncritically on how it chooses to sell itself: as a democratic space for user-made production and interaction. The “video-book” understands this as a problem because it’s not exactly true. I discuss how YouTube isn’t democratic in the least; how censorship dominates its logic (as does distraction, the popular and capital).

YouTube is also a problem in relation to the name and goals of the course that the publication was built around (my undergraduate Media Studies course also called “Learning from YouTube” held about, and also on, the site over three semesters, starting in 2007). As far as pedagogy in the digital age is concerned, the course suggests there’s a problem if we do all or most or even a great deal of our learning on corporate-owned platforms that we have been given for free, and this for many reasons that my students and I elaborate, but only one of which I will mention here as it will be most near and dear to your readers’ hearts: it needs a good archivist and a reasonable archiving system if it’s to be of any real use for learners, teachers or scholars. Oh, and also some system to evaluate content.

YouTube is the solution because I hunkered down there, with my students, and used the site to both answer the problem, and name the problems I have enumerated briefly above.

Julia: What can you tell us about how you approached the challenges of teaching a course about YouTube? What methods of analysis did you apply to its content? How did you select which materials to examine given the vast scope and diversity of YouTube’s content?

Alex: I have taught the course three times (2007, 2008, 2010). In each case the course was taught on and about YouTube. This is to say, we recorded class sessions (the first year only), so the course could be seen on YouTube; all the class assignments needed to take the form of YouTube “writing” and needed to be posted on YouTube (as videos or comments); and the first time I taught it, the students could only do their research on YouTube (thereby quickly learning the huge limits of its vast holdings). You can read more about my lessons learned teaching the course here and here.

The structure of the course mirrors many of the false promises of YouTube (and web 2.0 more generally), thereby allowing students other ways to see its “problems.” It was anarchic, user-led (students chose what we would study, although of course I graded them: there’s always a force of control underlying these “free” systems), public, and sort of silly (but not really).

As the course developed in its later incarnations, I developed several kinds of assignments (or methods of analysis as you put it), including traditional research looking at the results of published scholars, ethnographic research engaging with YouTubers, close-textual analysis (of videos and YouTube’s architecture), and what I call YouTours, where students link together a set of YouTube videos to make an argument inside of and about and with its holdings. I also have them author their own “Texteo” as their final (the building blocks, or pages, of my video-book; texteo=the dynamic linking of text and video), where they make a concise argument about some facet of YouTube in their own words and the words of videos they make or find (of course, this assignment allows them to actually author a “page” of my “book,” thereby putting into practice web 2.0?s promise of the decline of expertise and the rise of crowd-sourced knowledge production).

Students choose the videos and themes we study on YouTube. I like this structure (giving them this “control”) because they both enjoy and know things I would never look at, and they give me a much more accurate reading of mainstream YouTube than I would ever find on my own. My own use of the site tends to take me into what I call NicheTube (the second, parallel structure of YouTube, underlying the first where a few videos are seen by many many people, and these are wholly predictable in their points of view and concerns. On YouTube it’s easy to find popular videos. On NicheTube content is rarely seen, hard to find and easy to lose; everything might be there, but very few people will ever see it.

Now that YouTube Studies has developed, I also assign several of the book-length studies written about it from a variety of disciplines (I list these below). When I first taught the class in 2007, my students and I were generating the primary research and texts of YouTube Studies: producing work that was analytical and critical about the site, in its vernaculars, and on its pages.

Julia: What were some of the challenges of publishing an academic work in digital form? A large part of the work depends on linking to YouTube videos that you did not create and/or are no longer available. What implications are there for long-term access to your work?

Alex: I discuss this in greater length in the video-book because another one of its self-reflexive structures, mirroring those of YouTube, is self-reflexivity: an interest in its own processes, forms, structures and histories.

While MIT Press was extremely interested and supportive, they had never “published” anything like this before. The problems were many, and varied, and we worked through them together. I’ve detailed answers to your question in greater details within the video-book, but here’s one of the lists of differences I generated:

  • Delivery of the Work
  • Author’s Warranty
  • Credit
  • Previous Publication
  • Size of the Work
  • Royalties
  • Materials Created by Other Persons
  • Upkeep
  • Editing
  • Author’s Alterations
  • Promotion
  • Index

Many of these differences are legal and respond directly to the original terms in the contract they gave me that made no sense at all with a born-digital, digital-only object, and in particular about writing a book composed of many things I did not “own,” about “selling” a book for free, making a book that was already-made, or moving a book that never needed to be shipped.

One solution is that the video-book points to videos, but they remained “owned” by YouTube (I backed up some of the most important and put them on Critical Commons knowing that they might go away). But, in the long run, I do not mind that many of the videos fade away, or that the book itself will probably become quickly unreadable (because the systems is written on will become obsolete). It is another myth of the Internet that everything there is lasting, permanent, forever. In fact, by definition, much of what is housed or written there is unstable, transitory, difficult to find, or difficult to access as platforms, software and hardware change.

In “On Publishing My YouTube “Book” Online (September 24, 2009)” I mention these changes as well:

  1. Audience. When you go online your readers (can) include nonacademics.
  2. Commitment. Harder to command amid the distractions.
  3.  Design. Matters more; and it has meaning.
  4.  Finitude. The page(s) need never close.
  5.  Interactivity. Should your readers, who may or may not be experts, author too?
  6.  Linearity. Goes out the window, unless you force it.
  7.  Multimodality. Much can be expressed outside the confines of the word.
  8.  Network. How things link is within or outside the author’s control.
  9.  Single author. Why hold out the rest of the Internet?
  10.  Temporality. People read faster online. Watching video can be slow. A book is long.

Now, when I discuss the project with other academics, I suggest there are many reasons to write and publish digitally: access, speed, multi-modality, etc. (see here), but if you want your work to be seen in the future, better to publish a book!

Julia: At this point you have been studying video production since the mid 90s. I would be curious to hear a bit about how your approach and perspective have developed over time.

Alex: My research (and production) interests have stayed consistent: how might everyday people’s access to media production and distribution contribute to people’s and movement’s empowerment? How can regular citizens have a voice within media and therefore culture more broadly, so that our interests, concerns and criticisms become part of this powerful force?

Every time I “study” the video of political people (AIDS activists, feminists, YouTubers), I make video myself. I theorize from my practice, and I call this “Media Praxis” (see more about that here). But what has changed during the years I’ve been doing this and thinking about it is that more and more people really do have access to both media production and distribution since when I first began my studies (and waxed enthusiastically about how camcorders were going to foster a revolution). Oddly, this access can be said to have produced many revolutions (for instance the use of people-made media in the Arab Spring) and to have quieted just as many (we are more deeply entrenched in both capitalism’s pull and self-obsessions then at any time in human history, it seems to me!). I think a lot about that in the YouTube video-book and in projects since (like this special issue on feminist queer digital media praxis that I just edited for the online journal Ada).

Julia: You end up being rather critical of how popularity works on YouTube. You argue that “YouTube is not democratic. Its architecture supports the popular. Critical and original expression is easily lost to or censored by its busy users, who not only make YouTube’s content, but sift and rate it, all the while generating its business.” You also point to the existence of what you call “NicheTube,” the vast sea of little-seen YouTube videos that are hard to find given YouTube’s architecture of ranking and user-generated tags.” Could you tell us a bit more about your take on the role of filtering and sorting in it’s system?

Alex: YouTube is corporate owned, just as is Facebook, and Google, and the many other systems we use to find, speak, navigate and define our worlds, words, friends, interests and lives. Filtering occurs in all these places in ways that benefit their bottom lines (I suggest in “Learning From YouTube” that a distracted logic of attractions keeps our eyeballs on the screen, which is connected to their ad-based business plan). In the process, we get more access to more and more immediate information, people, places and ideas than humans ever have, but it’s filtered through the imperatives of capitalism rather than say those of a University Library (that has its own systems to be sure, of great interest to think through, and imbued by power like anything else, but not the power of making a few people a lot of money).

The fact that YouTube’s “archive” is unorganized, user-tagged, chaotic and uncurated is their filtering system.

Julia: If librarians, archivists and curators wanted to learn more about approaches like yours to understanding the significance and role of online video what examples of other scholars’ work would you suggest? It would be great if you could mention a few other scholars’ work and explain what you think is particularly interesting about their approaches.

Alex: I assign these books in “Learning from YouTube”: Patrick Vonderau, “The YouTube Reader”; Burgess and Green, “YouTube” and Michael Strangelove, “Watching YouTube.” I also really like the work of Michael Wesch and Patricia Lange who are anthropologists whose work focuses on the site and its users.

Outside of YouTube itself, many of us are calling this kind of work “platform studies,” where we look critically and carefully at the underlying structures of the underlying structures of Internet culture. Some great people working here are Caitlin Benson-Allott, danah boyd, Wendy Chun, Laine Nooney, Tara McPherson, Siva Vaidhyanathan and Michelle White.

I also think that as a piece of academic writing, Learning from YouTube (which I understand to be a plea for the longform written in tweets, or a plea for the classroom written online) is in conversation with scholarly work that is thinking about the changing nature of academic writing and publishing (and all writing and publishing, really). Here I like the work of Kathleen Fitzpatrick or Elizabeth Losh, as just two examples.

Julia: I would also be interested in what ways of thinking about the web you see this as being compatible or incompatible with other approaches to theorizing the web. How is your approach to studying video production online similar or different from other approaches in new media studies, internet research, anthropology, sociology or the digital humanities?

Alex: “Learning from YouTube” is new media studies, critical Internet studies, and DH, for sure. As you say above, my whole career has looked at video; since video moved online, I did too. I think of myself as an artist and a humanist (and an activist) and do not think of myself as using social science methods although I do learn a great deal from research done with in these disciplines.

After “Learning from YouTube” I have done two further web-based projects: a website that tries to think about and produce alternatives to corporate-made and owned Internet experiences (rather than just critique this situation), www.feministonlinespaces.com; and a collaborative criticism of the MOOC (Massive Online Open course), what we call a DOCC (Distributed Open Collaborative Course): http://femtechnet.newschool.edu.

In all three cases I think that “theorizing the web” is about making and using the web we want and not the version that corporations have given to us for free. I do this using the structures, histories, theories, norms and practices of feminism, but any ethical system will do!

FOSS4Lib Recent Releases: Library Instruction Recorder - 1.0.0

Fri, 2014-09-05 13:31
Topics: bibliographic instructioninstructioninstruction schedulinglibrarylibrary instructionlibrary instruction recorderteachingPackage: Library Instruction RecorderRelease Date: Friday, August 29, 2014

Last updated September 5, 2014. Created by Cliff Landis on September 5, 2014.
Log in to edit this page.

Initial release of Library Instruction Recorder

FOSS4Lib Updated Packages: Library Instruction Recorder

Fri, 2014-09-05 13:28

Last updated September 5, 2014. Created by Cliff Landis on September 5, 2014.
Log in to edit this page.

The Library Instruction Recorder (LIR) is a WordPress plugin designed to record library instruction classes and provide statistical reports. It is simple, easy-to-use, and intuitive.

Features

Accessible only from the WordPress Dashboard, allowing it to be used on either internally- or externally-facing WordPress instances.
Displays classes by: Upcoming, Incomplete, Previous and My Classes
Customizable fields for Department, Class Location, Class Type and Audience.
Customizable flags (i.e. "Do any students have disabilities or special requirements?" "Is this a First Year Experience class?")
Ability to duplicate classes for multiple sessions.
Statistical reports can be narrowed by date range or primary librarian. Reports are downloadable as .csv files.
Email reminder to enter the number of students who attended the class.

Package Links Releases for Library Instruction Recorder TechnologyLicense: GPLv3Development Status: Production/StableOperating System: Browser/Cross-PlatformProgramming Language: PHPDatabase: MySQL

HangingTogether: Linked Data Survey results 5 – Technical details

Fri, 2014-09-05 13:00

OCLC Research conducted an international linked data survey for implementers between 7 July and 15 August 2014. This is the fifth post in the series reporting the results.   

20 of the linked data projects that publish linked data are not yet accessible. Of those that are, 25 make their data accessible through Web pages and 24 through SPARQL Endpoint. Most offer multiple methods; when only one method is offered it’s by SPARQL Endpoint or file dumps.

The alphabetical list below shows the ways survey respondents make their linked data accessible; those that include methods used by Dewey, FAST, ISNI, VIAF, WorldCat.org and WorldCat.org Works are checked.

Of the 59 responses to the question about the serializations of linked data used, the majority use RDF/XML (47 projects/services).  Here’s the alphabetical list of the serializations used; those that include uses by Dewey, FAST, ISNI, VIAF, WorldCat.org and WorldCat.org Works are checked.

Trix was the only other serialization cited. The remainder of the “other” responses was for projects that hadn’t yet been implemented or the respondent wasn’t sure.

The technologies used by respondents for consuming linked data overlap with those used for publishing linked data.  Most are mentioned only once or twice so these lists are in alphabetical order.

Technologies mentioned for consuming linked data:

  • Apache Fuseki
  • ARC2 on PHP
  • Bespoke Jena applications (or bespoke local software tools)
  • CURL API
  • eXist database
  • HBase/Hadoop
  • Javascript
  • jQuery
  • Map/Reduce
  • Orbeon Xforms (or just Xforms)
  • RDF Store
  • Reasoning
  • SKOS repository
  • Solr
  • SPARQL
  • Web browsers
  • XML
  • Xquery

Technologies mentioned for publishing linked data:

  • 4store
  • AllegroGraph
  • Apache Digester
  • Apache Fuseki
  • ARC2 on PHP
  • Django
  • Drupal7
  • EADitor (https://github.com/ewg118/eaditor)
  • Fedora Commons
  • Google Refine
  • HBase/Hadoop
  • Humfrey
  • Java
  • JAX-RS
  • Jena applications
  • Lodspeakr
  • Map/Reduce
  • MarkLogic XML Database
  • Orbeon Xforms
  • OWLIM RDF triple store API
  • OWLIM-SE Triple Store by Ontotext Software
  • Perl
  • Pubby
  • Python
  • RDF Store
  • RDFer by British Museum
  • Saxon/XSLT
  • Solr
  • SPARQL
  • Sublima topic tool
  • The European Library Linked Data Platform
  • Tomcat
  • Virtuoso Universal Server (provide SPARQL endpoint)
  • xEAC (https://github.com/ewg118/xEAC)
  • XSLT
  • Zorba

Coming next: Linked Data Survey results-Advice from the implementers (last in the series)

About Karen Smith-Yoshimura

Karen Smith-Yoshimura, program officer, works on topics related to renovating descriptive and organizing practices with a focus on large research libraries and area studies requirements.

Mail | Web | Twitter | More Posts (50)

Lukas Koster: Looking for data tricks in Libraryland

Fri, 2014-09-05 12:12

IFLA 2014 Annual World Library and Information Congress Lyon – Libraries, Citizens, Societies: Confluence for Knowledge

After attending the IFLA 2014 Library Linked Data Satellite Meeting in Paris I travelled to Lyon for the first three days (August 17-19) of the IFLA 2014 Annual World Library and Information Congress. This year’s theme “Libraries, Citizens, Societies: Confluence for Knowledge” was named after the confluence or convergence of the rivers Rhône and Saône where the city of Lyon was built.

This was the first time I attended an IFLA annual meeting and it was very much unlike all conferences I have ever attended. Most of them are small and focused. The IFLA annual meeting is very big (but not as big as ALA) and covers a lot of domains and interests. The main conference lasts a week, including all kinds of committee meetings, and has more than 4000 participants and a lot of parallel tracks and very specialized Special Interest Group sessions. Separate Satellite Meetings are organized before the actual conference in different locations. This year there were more than 20 of them. These Satellite Meetings actually resemble the smaller and more focused conferences that I am used to.

A conference like this requires a lot of preparation and organization. Many people are involved, but I especially want to mention the hundreds of volunteers who were present not only in the conference centre but also at the airport, the railway stations, on the road to the location of the cultural evening, etc. They were all very friendly and helpful.

Another feature of such a large global conference is that presentations are held in a number of official languages, not only English. A team of translators is available for simultaneous translations. I attended a couple of talks in French, without translation headset, but I managed to understand most of what was presented, mainly because the presenters provided their slides in English.

It is clear that you have to prepare for the IFLA annual meeting and select in advance a number of sessions and tracks that you want to attend. With a large multi-track conference like this it is not always possible to attend all interesting sessions. In the light of a new data infrastructure project I recently started at the Library of the University of Amsterdam I decided to focus on tracks and sessions related to aspects of data in libraries in the broadest sense: “Cloud services for libraries – safety, security and flexibility” on Sunday afternoon, the all day track Universal Bibliographic Control in the Digital Age: Golden Opportunity or Paradise Lost?” on Monday and “Research in the big data era: legal, social and technical approaches to large text and data sets” on Tuesday morning.

Cloud Services for Libraries

It is clear that the term “cloud” is a very ambiguous term and consequently a rather unclear concept. Which is good, because clouds are elusive objects anyway.

In the Cloud Services for Libraries session there were five talks in total. Kee Siang Lee of the National Library Board of Singapore (NLB) described the cloud based NLB IT infrastructure consisting of three parts; a private, public and hybrid cloud. The private (restricted access) cloud is used for virtualization, an extensive service layer for discovery, content, personalization, and “Analytics as a service”, which is used for pushing and recommending related content from different sources and of various formats to end users. This “contextual discovery” is based on text analytics technologies across multiple sources, using a Hadoop cluster on virtual servers. The public cloud is used for the Web Archive Singapore project which is aimed at archiving a large number of Singapore websites. The hybrid cloud is used for what is called the Enquiry Management System (EMS), where “sensitive data is processed in-house while the non-sensitive data resides in the cloud”. It seems that in Singapore “cloud” is just another word for a group of real or virtual servers.

In the talk given by Beate Rusch of the German Library Network Service Centre for Berlin and Brandenburg KOBV the term “cloud” meant: the shared management of data on servers located somewhere in Germany. KOBV is one of the German regional Library Networks involved in the CIB project targeted at developing a unified national library data infrastructure. This infrastructure may consist of a number of individual clouds. Beate Rusch described three possible outcomes: one cloud serving as a master for the others, a data roundabout linking the other clouds, and a cross cloud dataspace where there is an overlapping shared environment between the individual clouds. An interesting aspect of the CIB project is that cooperation with two large commercial library system vendors, OCLC and Ex Libris, is part of the official agreement. This is of interest for other countries that have vested interests in these two companies, like The Netherlands.

Universal Bibliographic Control in the Digital Age

The Universal Bibliographic Control (UBC) session was an all day track with twelve very diverse presentations. Ted Fons of OCLC gave a good talk explaining the importance of the transition from the description of records to the modeling of entities. My personal impression lately is that OCLC all in all has been doing a good job with linked data PR, explaining the importance and the inevitability of the semantic web for libraries to a librarian audience without using technical jargon like URI, ontology, dereferencing and the like. Richard Wallis of OCLC, who was at the IFLA 2014 Linked Data Satellite Meeting and in Lyon, is spreading the word all over the globe.

Of the rest of the talks the most interesting ones were given in the afternoon. Anila Angjeli of the National Library of France (BnF) and Andrew MacEwan of the British Library explained the importance, similarities and differences of ISNI and VIAF, both authority files with identifiers used for people (both real and virtual). Gildas Illien (also one of the organizers of the Linked Data Satellite Meeting in Paris) and Françoise Bourdon, both BnF, described the future of Universal Bibliographic Control in the web of data, which is a development closely related to the topic of the talks by Ted Fons, Anila Angjeli and Andrew MacEwan.

The ONKI project, presented by the National Library of Finland, is a very good example of how bibliographic control can be moved into the digital age. The project entails the transfer of the general national library thesaurus YSA to the new YSO ontology, from libraries to the whole public sector and from closed to open data. The new ontology is based on concepts (identified by URIs) instead of monolingual text strings, with multilingual labels and machine readable relationships. Moreover the management and development of the ontology is now a distributed process. On top of the ontology the new public online Finto service has been made available.

The final talk of the day “The local in the global: universal bibliographic control from the bottom up” by Gordon Dunsire applied the “Think globally, act locally” aphorism to the Universal Bibliographic Control in the semantic web era. The universal top down control should make place for local bottom up control. There are so many old and new formats for describing information that we are facing a new biblical confusion of tongues: RDA, FRBR, MARC, BIBO, BIBFRAME, DC, ISBD, etc. What is needed are a number of translators between local and global data structures. On a logical level: Schema Translator, Term Translator, Statement Maker, Statement Breaker, Record Maker, Record Breaker. These black boxes are a challenge to developers. Indeed, mapping and matching of data of various types, formats and origins are vital in the new web of information age.

Research in the big data era

The Research in the big data era session had five presentations on essentially two different topics: data and text mining (four talks) and research data management (one talk). Peter Leonard of Yale University Library started the day with a very interesting presentation of how advanced text mining techniques can be used for digital humanities research. Using the digitized archive of Vogue magazine he demonstrated how the long term analysis of statistical distribution of related terms, like “pants”, “skirts”, “frocks”, or “women”, “girls”, can help visualise social trends and identify research questions. To do this there are a number of free tools available, like Google Books N-Gram Search and Bookworm. To make this type of analysis possible, researchers need full access to all data and text. However, rights issues come into play here, as Christoph Bruch of the Helmholtz Association, Germany, explained. What is needed is “intelligent openness” as defined by the Royal Society: data must be accessible, assessable, intelligible and usable. Unfortunately European copyright law stands in the way of the idea of fair use. Many European researchers are forced to perform their data analysis projects outside Europe, in the USA. The plea for openness was also supported by LIBER’s Susan Reilly. Data and text mining should be regarded as just another form of reading, that doesn’t need additional licenses

IdeasBox

IdeasBox packed

A very impressive and sympathetic library project that deserves everybody’s support was not an official programme item, but a bunch of crates, seats, tables and cushions spread across the central conference venue square. The whole set of furniture and equipment, that comes on two industrial pallets, constitutes a self supporting mobile library/information centre to be deployed in emergency areas, refugee camps etc. It is called IdeasBox, provided by Libraries without Borders. It contains mobile internet, servers, power supplies, ereaders, laptops, board games, books, etc., based on the circumstances, culture and needs of the target users and regions. The first IdeasBoxes are now used in Burundi in camps for refugees from Congo. Others will soon go to Lebanon for Syrian refugees. If librarians can make a difference, it’s here. You can support Libraries without Borders and IdeadBox in all kinds of ways: http://www.ideas-box.org/en/support-us.html.

IdeasBox unpacked

Conclusion

The questions about data management in libraries that I brought with me to the conference were only partly addressed, and actual practical answers and solutions were very rare. The management and mapping of heterogeneous and redundant types of data from all types of sources across all domains that libraries cover, in a flexible, efficient and system independent way apparently is not a mainstream topic yet. For things like that you have to attend Satellite Meetings. Legal issues, privacy, copyright, text and data mining, cloud based data sharing and management on the other hand are topics that were discussed. It turns out that attending an IFLA meeting is a good way to find out what is discussed, and more importantly what is NOT discussed, among librarians, library managers and vendors.

The quality and content of the talks vary a lot. As always the value of informal contacts and meetings cannot be overrated. All in all, looking back I can say that my first IFLA has been a positive experience, not in the least because of the positive spirit and enthusiasm of all organizers, volunteers and delegates.

(Special thanks to Beate Rusch for sharing IFLA experiences)

Pages