You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib -
Updated: 2 days 9 hours ago

Peter Murray: Thursday Threads: Web Time Travel, Fake Engine Noise, The Tech Behind Delivering Pictures of Behinds

Thu, 2015-02-05 11:21
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

In this week’s DLTJ Thursday Threads: the introduction of a web service that points you to old copies of web pages, dispelling illusions of engine noise, and admiring the technical architecture of Amazon Web Services that gives us the power to witness Kim Kardashian’s back side.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

Introducing the Memento Web Time Travel Service

The Time Travel service helps you find versions of a page that existed at some time in the past. These prior versions of web pages are named Mementos. Mementos can be found in web archives or in systems that support versioning such as wikis and revision control systems.

When you enter the web address of a page and a time in the past, the Time Travel service tries to find a Memento for that page as it existed around the time of your choice. This will work for addresses of pages that currently exist on the web but also for those that have meanwhile vanished.

- About the Time Travel Service, Last updated: 19-Jan-2015

The folks at Los Alamos National Laboratory have been working on web-time-travel for years. What started with browser plugins has now become a web service that can be used to find old copies of web pages found in caches throughout the world. Thought the Internet Archive’s Wayback Machine was the only game in town? Check out the Memento Time Travel service.

America’s best-selling cars and trucks are built on lies: The rise of fake engine noise

Stomp on the gas in a new Ford Mustang or F-150 and you’ll hear a meaty, throaty rumble — the same style of roar that Americans have associated with auto power and performance for decades.

It’s a sham. The engine growl in some of America’s best-selling cars and trucks is actually a finely tuned bit of lip-syncing, boosted through special pipes or digitally faked altogether. And it’s driving car enthusiasts insane.

- America’s best-selling cars and trucks are built on lies: The rise of fake engine noise, by Drew Harwell, The Washington Post, 21-Jan-2014

I knew they were adding “engine noise” to the all-electric Prius car because it was so quiet that it could startle people, but I didn’t know it was happening to so-called “muscle cars”.

A look at Amazon’s world-class data-center ecosystem

Amazon VP and Distinguished Engineer James Hamilton shares what makes the company’s armada of data centers run smoothly.

- A look at Amazon’s world-class data-center ecosystem, by Michael Kassner, TechRepublic, 8-Dec-2014

Among the geek community, there must be some awe at how Amazon seems to create infinitely big data centers that can be used for everything from powering Netflix to this humble blog. Amazon is also notoriously secret about how it does things. This article provides a glimpse into how Amazon Web Services achieves the scale that it does.

How PAPER Magazine’s web engineers scaled their back-end for Kim Kardashian

On November 11th 2014, the art-and-nightlife magazine PAPER “broke the Internet” when it put a Jean-Paul Goude photograph of a well-oiled, mostly-nude Kim Kardashian on its cover and posted the same nude photos of Kim Kardashian to its website (NSFW). It linked together all of these things—and other articles, too—under the “#breaktheinternet” hashtag. There was one part of the Internet that PAPER didn’t want to break: The part that was serving up millions of copies of Kardashian’s nudes over the web.

Hosting that butt is an impressive feat. You can’t just put Kim Kardashian nudes on the Internet and walk away —that would be like putting up a tent in the middle of a hurricane. Your web server would melt. You need to plan.

- How PAPER Magazine’s web engineers scaled their back-end for Kim Kardashian (SFW), by Paul Ford, Medium, 21-Jan-2015

Speaking of how Amazon can seemingly scale to infinite levels, this article tells the story of how one online publisher ramped up their server capacity to meet the demands of users flocking to see Kim Kardashian’s rear end. (And who said the internet wasn’t a valuable tool…)

Link to this post!

Nicole Engard: Bookmarks for February 4, 2015

Wed, 2015-02-04 20:30

Today I found the following resources and bookmarked them on <a href=

  • Greenfoot Teach and learn Java programming
  • Blockly Games Blockly Games is a series of educational games that teach programming. It is designed for children who have not had prior experience with computer programming. By the end of these games, players are ready to use conventional text-based languages.
  • Blockly Blockly is a library for building visual programming editors

Digest powered by RSS Digest

The post Bookmarks for February 4, 2015 appeared first on What I Learned Today....

Related posts:

  1. How To Get More Kids To Code
  2. Learn from The Sims
  3. NFAIS 2009: Born Digital – Born Mobile

HangingTogether: The Evolving Scholarly Record, Washington, DC edition

Wed, 2015-02-04 19:32


Brian Lavoie, presenting in the GWU International Brotherhood of Teamsters Labor History Research Center

On December 10th, we held our second Evolving Scholarly Record Workshop at George Washington University in Washington, DC (you can read Ricky Erway’s summary of the first workshop, starting here). Many thanks to Geneva Henry and all the staff at GWU for hosting us in the fabulous International Brotherhood of Teamsters Labor History Research Center.This workshop, and others, build on the framework presented in the OCLC Research report, The Evolving Scholarly Record.

At George Washington Gelman library attending OCLC workshop #esrworkshop

— Martha Kyrillidou (@kyrillidou) December 10, 2014

Our first speaker, Brian Lavoie (OCLC Research) presented the ESR Framework and put it into context. What is considered part of the record is constantly expanding – for example, blogs and social media, which would previously not have been included. The evolution of how scholarship is recorded, makes it challenging to organize the record in a consistent and reliable ways. The ecosystem of stakeholders is evolving as well. It became clear to Brian and others involved in discussions around the problem space that a framework was necessary in order to support strategic discussions across stakeholders and across domains.

Formats shift (print to dig), boundaries blur (from books to data sets), characteristics change (static works to dynamic) #esrworkshop

— Keith Webster (@CMKeithW) December 10, 2014

In addition to traditional scholarly outcomes, there are two additional areas of focus, process and aftermath.

Process is what leads up to the publication of the outcomes – in the framework, process is composed of method, evidence and discussion (important because outcomes usually consolidate thanks to discussions with peers). Anchoring outcomes in process will help reproducibility. Scholarly activities continue in aftermath: discussion (including post publication reviews and commentaries), revision (enhancement, clarification), re-use (including repackaging for other audiences).

In the stakeholder ecosystem, the traditional roles (create, fix, collect, use) are being reconfigured. For example, in addition to libraries, service providers like Portico and JSTOR are now important in the collect role. Social media and social storage services, which are entirely outside the academy, are now part of create and use.  New platforms, like figshare, are taking on the roles of fix and collect. The takeaway here? The roles are constant, but the configurations of the stakeholders beneath them are changing.

How does the traditional "scholarly record" fit into today's consumer-producer model of web content? Not well, it seems. #esrworkshop

— Scott W. H. Young (@hei_scott) December 10, 2014

Our second speaker, Herbert van de Sompel (Los Alamos National Laboratory) gave perspective from the network point of view. His talk was a modified reprise of his presentation at the June OCLC / DANS workshop in Amsterdam, which Ricky nicely summarized in a previous posting. Herbert will also be speaking at our workshop coming up in March, so if you’d like to catch him in action, sign up for that session.

Wow @hvdsomp is basically reading my mind. "The web will fundamentally change from human-readable to machine-actionable." #esrworkshop

— Scott W. H. Young (@hei_scott) December 10, 2014

Our third speaker was Geneva Henry (George Washington University) – Geneva represented the view from the campus. We will be posing her viewpoint in a separate blog post, later this week but her remarks touched on the various campus stakeholders in the scholarly record – scholars, media relations, promotion and tenure committee, the office of research, the library.

Look to your students: How are they communicating? asks Henry. They’re future scholars; don’t expect drastic behavior changes #esrworkshop

— Mark Newton (@libmark) December 10, 2014

Daniel Hook (Digital Science), shared his “view from the platform.” (Digital Science is the parent company of several platform services, such as FigShare, AltMetrics, Symplectic Elements, and Overleaf). Daniel stressed the importance in transparency and reproducibility of research – there is a need for a demonstrable pay-off for investors in research. There is a delicate balance to be reached in collaboration versus competition in research. We are in an era of increased collaboration and the “fourth age of research” is marked by international collaboration. Who “owns” research, and the scholarly record? Individual researchers? Their institutions? Evaluation of research increasingly calls for demonstrating impact of research. Identifiers are glue – identifiers for projects, for researchers, for institutions. The future will be in dynamically making assertions of value and impact across institutions, and to build confidence in those assertions.

Funders keen to assess impact of research they funded at a macro-level – have you influenced policy? The economy? #esrworkshop

— Keith Webster (@CMKeithW) December 10, 2014

Finally Clifford Lynch (Coalition for Networked Information) gave some additional remarks, highlighting stress points. Potentially, the scholarly record is huge, especially with an expanded range of media and channels. The minutes of science are recording every minute, year in year out. Selection issues are challenging, to say the least. Is it sensible to consider keeping everything?  Cliff called for hard questions to be asked, and for studies to be done. Some formats seem to be overlooked — video, for example.

We concluded the meeting with a number of break-out sessions that took up focused topics. The groups came back with tons of notes, and also some possible “next steps” or actions that could be taken to move us forward. Those included.

  • Promulgating name identifiers and persistent IDs for use by other stakeholders
  • Focusing on research centers and subject/disciplinary repositories to see what kinds of relationships are needed
  • Mining user studies/reviews to pull out research needs/methods/trends/gaps and find touch-points to the library
  • Following the money in the ESR ecosystem to see whether there are disconnects between shareholder interests and scholar value
  • Pursuing with publishers whether they will collect the appropriate contextual processes and aftermaths
  • Investigating funding, ROI, and financial tradeoffs
  • Getting involved during the grant planning processes so that materials flow to the right places instead of needing to be rescued after the fact

Thanks to all of our participants, but particularly to our hosts, our speakers, our notetakers and those who helped record the event on Twitter. We’re looking forward to another productive workshop in Chicago (in March) and then expect to culminate the three workshops at the ESR workshop in San Francisco (in June) where we’ll focus on how we can collaboratively move things forward to do our best to ensure stewardship of the scholarly record now and into the future.

I have 19 pages of notes from 5 breakout sessions. We'll boil down discussions and talking points for next steps soon #esrworkshop

— Merrilee Proffitt (@MerrileeIAm) December 10, 2014

The @DANSKNAW & #esrworkshop events on archiving the future scholarly record have given me focus on tech aspects that must be tackled first

— Herbert (@hvdsomp) December 11, 2014

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (278)

LITA: Jobs in Information Technology: February 4

Wed, 2015-02-04 17:58

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Digital Initiatives Librarian, University of North Carolina Wilmington, Wilmington, NC

Director of Library Services, Marymount California University, Rancho Palos Verdes, CA

Sr. UNIX Systems Administrator, University Libraries, Virginia Tech, Blacksburg, VA

Technology Projects Coordinator, Oak Park Public Library, Oak Park, IL

Visit the LITA Job Site for more available jobs and for information on submitting a  job posting.


LITA: Agile Development: What is a User Story?

Wed, 2015-02-04 13:00

Image courtesy of Paul Downey’s photostream.

So far in this series, I’ve talked about the pros and cons of Agile, and reviewed the methodology’s core values. Today I want to move beyond the “what” and into more of the “how.” I’ll start by looking at user stories.

A user story is the basic unit of Agile development. User stories should be written by the business, not by the development team. They should clearly state the business value that the project is expected to create, as well as the user that will benefit. The focus should be on the problem being solved, not the software being built. This not only increases efficiency, but also provides flexibility for the development team: how they solve the problem is up to them.

There’s a generally accepted template for writing user stories: “As a [user type], I want to [specific functionality] so that [tangible benefit].” I’m not crazy about using this convention because it seems contrived to me, but it does make it easier to understand the priorities of Agile development: a feature exists to provide a benefit for a specific user or user group. If you can’t express functionality in this manner, then it is either superfluous or a technical requirement (there’s a separate document for those, which is written during and after development, not before).

A great user story should follow the INVEST model: user stories should be Independent, Negotiable, Valuable, Estimatable, Small, and Testable (you can read about this in more detail in the links provided below). The main thing to remember, though, is that we’re really just trying to create software where every component can be proven to solve a specific problem for a specific user. It all comes back to justifying programming effort in terms of the value it provides once it’s been released into the wild. Let’s look at some examples, based on developing a tool to keep track of tasks:

  • “As a task list creator, I can see all of my tasks together.” This story is too vague, and will result in developers guessing about the true purpose of this feature.
  • “As a task list creator, I can see all of my tasks together so I can download them to MS Excel.” This one is too specific. MS Excel is a technical requirement, and should not be part of the user story text. The real need is for a downloadable version of the task list; limiting it to one specific format at this point may lead to problems later on.
  • “As a task list creator, I can see all of my tasks together so I can download them.” This is better, but it still doesn’t answer the question of value. Why do I need to download the tasks? This looks ok, but reality I have created a false dependency between two separate features.
  • “As a task list creator, I can download a task list to so I can share it with project stakeholders.” Now we’re getting somewhere! The user needs to share the list with other members of the team, which is why she needs a downloadable version.

User story writing is iterative and investigative. At this point, I could argue that downloading, just like display, is an unnecessary step, and that the real feature is for some mechanism that allows all project members to see the task list, and the true need is for the team to work on the list together. That’s where the value-add is. Everything else is bells and whistles. Maybe downloading is the most efficient way to share the list, but that decision is part of the development process and should be documented in the technical requirements. Maybe there are other reasons to add a download feature; those belong on separate stories.

As a business-side stakeholder with an engineering background, my first attempts at creating user stories did not go well. I like to tinker and get my hands dirty, so trying to keep design out of my user stories proved difficult; it’s easy for me to get bogged down in technical details and focus on fixing what’s in front of me, rather than asking whether it’s even necessary in the first place. Any time design creeps into product requirements, it adds a layer of abstraction that makes it harder for a development team to understand what it is you really want. It took me a while to learn that lesson (you could argue that I’m still learning it). Besides, when a product owner gets involved in designing software, it’s hard to avoid creating an attachment to the specific design, regardless of whether it meets user needs or not. It’s best to stay out of that process altogether (no matter how much fun it may be) and maintain the focus on the user.

Writing user stories can be frustrating, especially if you’re new to Agile, but they are a great way to discover the true user needs that should drive your software development project. If you want to learn more about user stories, you can go here, here, or here. I’ll be back next month to talk about prioritization and scheduling.

What’s your experience with user stories? Do you have any tips on writing a great user story?

Library Tech Talk (U of Michigan): Web Accessibility, Part 1: How Do You Internet?

Wed, 2015-02-04 00:00
The University of Michigan Library is working hard to improve the accessibility of all our websites. This brings up a simple question: what does it mean to make a website accessible?

DuraSpace News: CALL for Proposals for Sixth Annual VIVO Conference Workshops

Wed, 2015-02-04 00:00

Boston, MA  The Sixth Annual VIVO Conference will be held August 12-14, 2015 at the Hyatt Regency Cambridge, overlooking Boston. The VIVO Conference creates a unique opportunity for people from across the country and around the world to come together to explore ways to use semantic technologies and linked open data to promote scholarly collaboration and research discovery.

OCLC Dev Network: Developer House Project: More of the Same - Faster Results with Related Resources

Tue, 2015-02-03 20:00

We have started sharing projects created at our Developer House in December, and this week we’re happy to share another. This project and post come to you from Bill Jones and Steelsen Smith.    

OCLC Dev Network: Developer House Project: More of the Same - Faster Results with Related Resources

Tue, 2015-02-03 20:00

We have started sharing projects created at our Developer House in December, and this week we’re happy to share another. This project and post come to you from Bill Jones and Steelsen Smith.    

Tim Ribaric: OLA SuperConference 2015 Presentation Material and Recap

Tue, 2015-02-03 17:23


(View from the Podium)

OLA SuperConference 2015 was last week. I had the good opportunity to attend as well as too present.

read more

Mita Williams: Be future compatible

Tue, 2015-02-03 16:42
Hmmm, I thought kindly published my last post but did not update the RSS feed, so I made this re-post:

On February 1st, I gave a presentation the American Library Association Midwinter Conference in Chicago, Illinois as part of the ALA Masters Series called Mechanic Institutes, Hackerspaces, Makerspaces, TechShops, Incubators, Accelerators, and Centers of Social Enterprise. Where do Libraries fit in? ::
But after inspection, it looks like the RSS feed that is generated by Feedburner has been updated in such a way that I - using feedly - I needed to re-subscribe. Now, I'm not sure who is at fault for this: Feedburner, feedly, or myself for using a third party to distribute a perfectly good rss feed.

I don't follow my reading statistics very closely but I do know that the traffic to this site is largely driven by Twitter and Facebook -- much more than hits from, say, other blogs.  And yet, I'm disturbed that the 118 readers using the now defunct feedly rss feed will not know about what I'm writing now. I'm sad because while I've always had a small audience for this blog - I have always been very proud and humbled that I had this readership because attention is a gift.

LITA: Why Everyone Should be a STEMinist

Tue, 2015-02-03 16:10

Flickr, ITU/R. Farrell 2013

This blog post is not solely for the attention of women. Everyone can benefit from challenging themselves in the STEM field. STEM stands for Science Technology Engineering and Math. Though there is debate on whether there is a general shortage of Americans in STEM fields, women and minorities represent a large deficit in these areas of study. It goes without saying that all members of the public should invest in educating themselves in at least one of the STEM fields.

STEM is versatile

There is nothing wrong with participating in the humanities and liberal arts. In fact, we need those facets of our cultural identity. The addition of STEM skills can greatly enhance those areas of knowledge and make you a more diverse, dynamic and competitive employee. You can be a teacher, but you can also be an algebra or biology teacher for K-12 students. You can be a librarian, but you can also be a systems librarian, medical reference librarian, web and data librarian etc. Organization, communication and creative skills are not lost in the traditionally analytical STEM fields. Fashion designers and crafters can use computer-aided design (CAD) to create graphic representations of a product. CAD can also be used to create 3D designs. Imagine a 22nd Century museum filled with 3D printed sculptures as a representation of early 21st Century art.

Early and mid-career professionals

You’re not stuck with a job title/description. Most employers would be more than happy if you inquired about being the technology liaison for your department. Having background knowledge in your area, as well as tech knowledge, could place you in a dynamic position to recommend training, updates or software to improve your organizations management. It is also never too late to learn. In fact, having spent decades in an industry, you’re more experienced. Experience brings with it a treasure of potential.

Youth and pre-professionals

According to the National Girls Collaborative Project, at the K-12 level girls are holding their ground with boys in the math and science curriculum. The reality is that, contrary to previously held beliefs, girls and young women are entering the STEM fields. The drop-off occurs at the post-secondary level. Women account for more than half of the population at most American universities. However, women account for the least university degrees in math, engineering and computer science (approx 18%). Hispanics, blacks and Native Americans combined account for 13% of science and engineering degrees earned in 2011. Though there is an abundance of articles, listing the reasons why women are dropping like flies from STEM education, many of them boil down to options. Either there aren’t any or not enough. Young females are discouraged at a young age, so they opt to shift toward other areas of study they believe their skills are better suited. There is also the unfortunate tale of the post-graduate woman who is forced to trade her dream career in order to care for her children (or is marginalized for employment because she has children).

A little encouragement

As a female and a minority, the statistics mirror my personal narrative. I was a late arrival to the information technology field because, when I was younger, I was terrible at math. I unfortunately assumed, like many other girls, that I would never be able to work in the sciences because I wasn’t “naturally” talented. I also didn’t receive my first personal computer until I was in middle school (late 90s to early 2000s). Back then a computer would easily set you back a couple grand. The few computer courses I took championed boys as future programmers and technicians. I thought that boys were naturally great with technology without realizing that it was a symptom of their environment. It would have been great if someone pulled me aside and explained that natural talent is a myth. That if I was willing to work diligently, I could be as good as the boys.

Organizations like exist to remind everyone that women and minorities are capable of holding positions in STEM fields. This is by no means an endorsement for STEMinist, I just thought the addition of STEM as another frontier for feminism should be recognized. If you type “women in STEM” into a search engine, you’ll be inundated with other organizations that are adding to the conversation. No matter where your views fall on the concept of feminism, or females and minorities in the sciences, we all have a role to play in encouraging them to pursue their interests as a career.


Are you or do you know a woman/minority who is contributing to science and technology? No matter how small you believe the contribution to be, leave a comment in hopes of encouraging someone.

DPLA: Getting Involved: Our Expanding Community Reps Program

Tue, 2015-02-03 15:51

There are so many ways to get involved with the Digital Public Library of America, each of which contributes enormously to our mission of connecting people with our shared cultural heritage. Obviously we have our crucial hubs and institutional partners, who work closely with us to bring their content to the world. If you’re a software developer, you can build upon our code, write your own, and create apps that help to spread that content far and wide. And if you want to provide financial support, that’s easy too.

But I’m often asked about more general ways that one can advance DPLA’s goals, especially in your local community, and perhaps if you’re not necessarily a librarian, archivist, museum professional, or technologist.

Our Community Reps program exists precisely for that reason. DPLA staff and partners can only get the word out so far each year, and we need people from all walks of life who appreciate what we do to step forward and act as our representatives across the county and around the globe. We currently have another round of applications, and I ask you to strongly consider joining the program.

Two hundred people, from all 50 states and from five foreign countries have already done so. They orchestrate public meetings, hold special events, go into classrooms, host hackathons, and use their creativity and voices to let others know about DPLA and what everyone can do with it. I know this first-hand through my interactions with Reps in virtual meetings, on our special Reps email list, and from meeting many in person as I travel across the country. Reps are a critical part of the DPLA family, who often have innovative ideas that make a huge difference to our efforts.

We’re incredibly fortunate to have so many like-minded and energetic people join us and the expanding DPLA community through the Reps program. It’s a great chance to be a part of our mission. Apply today!

delicious: Planet Code4Lib

Tue, 2015-02-03 15:10

Library of Congress: The Signal: Office Opens up with OOXML

Tue, 2015-02-03 14:40

The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in the Office of Strategic Initiatives.

Before VisiCalc, Lotus 1-2-3, and Microsoft Excel, spreadsheets were manual although their compilers took advantage of adding machines. And there were contests, natch. This 1937 photograph from the Library’s Harris & Ewing collection portrays William A. Offutt of the Washington Loan and Trust Company. It was produced on the occasion of Offutt’s victory over 29 competitors in a speed and accuracy contest for adding machine operators sponsored by the Washington Chapter, American Institute of Banking.

We are pleased to announce the publication of nine new format descriptions on the Library’s Format Sustainability Web site. This is a closely related set, each of which pertains to a member of the Office Open XML (OOXML) family.

Readers should focus on the word Office, because these are the most recent expression of the formats associated with Microsoft’s family of “Office” desktop applications, including Word, PowerPoint and Excel. Formerly, these applications produced files in proprietary, binary formats that carried the filename extensions doc, ppt, and xls. The current versions employ an XML structure for the data and an x has been added to the extensions: docx, pptx, and xlsx.

In addition to giving the formats an XML expression, Microsoft also decided to move the formats out of proprietary status and into a standardized form (now focus on the word Open in the name.) Three international organizations cooperated to standardize OOXML. Ecma International, an international, membership-based organization, published first in 2006. At that time, Caroline Arms (co-compiler of the Library’s Format Sustainability Web site) served on the ECMA work group, which meant that she was ideally situated to draft these descriptions.

In 2008, a modified version was approved as a standard by two bodies who work together on information technology standards through a Joint Technical Committee (JTC 1): International Organization for Standardization and International Electrotechnical Commission. These standards appear in a series with identifiers that lead off with ISO/IEC 29500. Subsequent to the initial publication by ISO/IEC, ECMA produced a second edition with identical text. Clarifications and corrections were incorporated into editions published by this trio in 2011 and 2012.

Here’s a list of the nine:

  • OOXML_Family, OOXML Format Family — ISO/IEC 29500 and ECMA 376
  • OPC/OOXML_2012, Open Packaging Conventions (Office Open XML), ISO 29500-2:2008-2012
  • DOCX/OOXML_2012, DOCX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • DOCX/OOXML_Strict_2012, DOCX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • PPTX/OOXML_2012, PPTX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • PPTX/OOXML_Strict_2012, PPTX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • XLSX/OOXML_2012, XLSX Transitional (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 1-4
  • XLSX/OOXML_Strict_2012, XLSX Strict (Office Open XML), ISO 29500:2008-2012; ECMA-376, Editions 2-4
  • MCE/OOXML_2012, Markup Compatibility and Extensibility (Office Open XML), ISO 29500-3:2008-2012, ECMA-376, Editions 1-4

Microsoft is not the only corporate entity to move formerly proprietary specifications into the realm of public standards. Over the last several years, Adobe has done the same thing with the PDF family. There seems to be a new business model here: Microsoft and Adobe are proud of the capabilities of their application software–that is where they can make money–and they feel that wider implementation of these data formats will help their business rather than hinder it.

Office work in the days before computer support. This photograph of the U.S. Copyright Office (part of the Library of Congress) was made in about 1920 by an unknown photographer. Staff members are using typewriters and a card file to track and manage copyright information. The original photograph is held in the Geographical File in the Library’s Prints and Photographs Division.

Although an aside in this blog, it is worth noting that Microsoft and Adobe also provide open access to format specifications that are, in a strict sense, still proprietary. Microsoft now permits the dissemination of its specifications for binary doc, ppt, and xls, and copies have been posted for download at the Library’s Format Sustainability site. Meanwhile, Adobe makes its DNG photo file format specification freely available, as well as its older TIFF format specification.

Both developments–standardization for Office XML and PDF and open dissemination for Office, DNG and TIFF–are good news for digital-content preservation. Disclosure is one of our sustainability factors and these actions raise the disclosure levels for all of these formats, a good thing.

Meanwhile, readers should remember that the Format Sustainability Web site is not limited to formats that we consider desirable. We list as many formats (and subformats) as we can, as objectively as we can, so that others can choose the ones they prefer for a particular body of content and for particular use cases.

The Library of Congress, for example, has recently posted its preference statements for newly acquired content. The acceptable category for textual content on that list includes the OOXML family as well as OpenDocument (aka Open Document Format or ODF), another XML-formatted office suite. ODF was developed by the Organization for the Advancement of Structured Information Standards, an industry consortium. ODF’s standardization as ISO/IEC 23600 in 2006 predates ISO/IEC’s standardization of OOXML. The Format Sustainability team plans to draft descriptions for ODF very soon.

Nick Ruest: An Exploratory look at 13,968,293 #JeSuisCharlie, #JeSuisAhmed, #JeSuisJuif, and #CharlieHebdo tweets

Tue, 2015-02-03 14:29
#JeSuisCharlie #JeSuisAhmed #JeSuisJuif #CharlieHebdo

I've spent the better part of a month collecting tweets from the #JeSuisCharlie, #JeSuisAhmed, #JeSuisJuif, and #CharlieHebdo tweets. Last week, I pulled together all of the collection files, did some clean up, and some more analysis on the data set (76G of json!). This time I was able to take advantage of Peter Binkley's twarc-report project. According to the report, the earliest tweet in the data set is from 2015-01-07 11:59:12 UTC, and the last tweet in the data set is from 2015-01-28 18:15:35 UTC. This data set includes 13,968,293 tweets (10,589,910 retweets - 75.81%) from 3,343,319 different users over 21 days. You can check out a word cloud of all the tweets here.

First tweet in data set (numberic sort of tweet ids):


— Thierry Puget (@titi1960) January 7, 2015


If you want to experiment/follow along with what I've done here, you can "rehydrate" the data set with twarc. You can grab the Tweet ids for the data set from here (Data & Analysis tab).

% --hydrate JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweet-ids-20150129.txt > JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweets-20150129.json

The hydration process will take some time. I'd highly suggest using GNU Screen or tmux, and grabbing approximately 15 pots of coffee.


In this data set, we have 133,970 tweets with geo coordinates availble. This represents about 0.96% of the entire data set.

The map is available here in a separate page since the geojson file is 83M and will potato your browser while everything loads. If anybody knows how to stream that geojson file to Leaflet.js so the browser doesn't potato, please comment! :-)


These are the top 10 users in the data set.

  1. 35,420 tweets Promo_Culturel
  2. 33,075 tweets BotCharlie
  3. 24,251 tweets YaMeCanse21
  4. 23,126 tweets yakacliquer
  5. 17,576 tweets YaMeCanse20
  6. 15,315 tweets iS_Angry_Bird
  7. 9,615 tweets AbraIsacJac
  8. 9,318 tweets AnAnnoyingTweep
  9. 3,967 tweets rightnowio_feed
  10. 3,514 tweets russfeed

This comes from twarc-report's

$ ~/git/twarc-report/ -o text JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweets-20150129.json Hashtags

There are teh top 10 hashtags in the data set.

  1. 8,597,175 tweets #charliehebdo
  2. 7,911,343 tweets #jesuischarlie
  3. 377,041 tweets #jesuisahmed
  4. 264,869 tweets #paris
  5. 186,976 tweets #france
  6. 177,448 tweets #parisshooting
  7. 141,993 tweets #jesuisjuif
  8. 140,539 tweets #marcherepublicaine
  9. 129,484 tweets #noussommescharlie
  10. 128,529 tweets #afp

These are the top 10 URLs in the data set. 3,771,042 tweets (27.00%) had an URL associated with them.

These are all shortened urls. I'm working through and issue with

  1. (43,708)
  2. (19,328)
  3. (17,033)
  4. (14,118)
  5. (13,252)
  6. (12,407)
  7. (9,228)
  8. (9,044)
  9. (8,721)
  10. (8,581)

This comes from twarc-report's

$ ~/git/twarc-report/ -o text JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweets-20150129.json Media

These are the top 10 media urls in the data set. 8,141,552 tweets (58.29%) had a media URL associated with them.

36,753 Occurrences

35,942 Occurrences

33,501 Occurrences

31,712 Occurrences

29,359 Occurrences

26,334 Occurrences

25,989 Occurrences

23,974 Occurrences

22,659 Occurrences

22,421 Occurrences

This comes from twarc-report's

$ ~/git/twarc-report/ -o text JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweets-20150129.json tags: #JeSuisCharlie#JeSuisAhmed#JeSuisJuif#CharlieHebdotwarctwarc-report

LITA: To Infinity (Well, LibGuides 2.0) And Beyond

Tue, 2015-02-03 13:00

Ah, but do I? Credit: Buffy Hamilton


LibGuides is a content management system distributed by Springshare and used by approximately 4800 libraries worldwide to curate and annotate resources online. Generally librarians use it to compile subject guides, but more and more libraries are also using it to build their websites. In 2014, Springshare went public with a new and improved version called LibGuides 2.0.

When my small university library upgraded to LibGuides 2.0, we went the whole hog. After migrating our original LibGuides to version 2, I redid the entire library website using LibGuides, integrating all our content into one unified, flexible content management system (CMS).

Today’s post considers my library’s migration to LibGuides 2.0 as well as assessing the product. My next post will look at how we turned a bunch of subject guides into a high-performing website.

A faculty support page built using LibGuides 2.0 (screenshot credit: Michael Rodriguez)


According to the LibGuides Community pages, 913 libraries worldwide are running LibGuides v1, 439 are running LibGuides v1 CMS, and 1005 are running some version of LibGuides 2.0. This is important because (1) a lot of libraries haven’t upgraded yet, and (2) Springshare has a virtual monopoly on the market for library resource guides. Notwithstanding, Springshare does offer a quality product at a reasonable price. My library pays about $2000 per year for LibGuides CMS, which adds enhanced customization and API features to the regular LibGuides platform.

We did consider dropping LibGuides in favor of WordPress or another open source system, but we concluded that consolidating our web services as much as possible would enhance ease-of-use and ease training and transitions among staff. Our decision was also influenced by the fact that we use LibCal 2.0 for our study room reservation system, while Florida’s statewide Ask a Librarian virtual reference service, in which we participate, is switching to LibAnswers 2.0 by summer 2015. LibGuides, LibCal, and LibAnswers now all integrate seamlessly behind a single “LibApps” login.

LibApps admin interface (screenshot credit: Michael Rodriguez)


Since the upgrade is free, we decided to migrate before classes recommenced in September 2014. We relentlessly weeded redundant, dated, or befuddling content. I deleted or consolidated four or five guides, eliminated the inconsistent tagging system, and rearranged the subject categories. We picked a migration date, and Springshare made it happen within 30 minutes of the hour we chose.

I do suggest carefully screening your asset list prior to migration, because you have the option of populating your A-Z Database List from existing assets simply by checking the box next to each link you want to add to the database list. We overlooked this stage of the process and then had to manually add 140 databases to our A-Z list post-migration. Otherwise, migration was painless. Springshare’s tech support staff were helpful and courteous throughout the process.

Check out Margaret Heller and Will Kent’s ALA TechConnect blog post on Migrating to LibGuides 2.0 or Bill Coombs’ review of LibGuides 2 for other perspectives on the product migration.

Benefits of LibGuides 2.0

Mobile responsive. All the pages automatically realign themselves to match the viewport (tablet, smartphone, laptop, or desktop) through which they are accessed. This is huge.

Modern code. LibGuides 2.0 is built in compliance with HTML5, CSS3, and Bootstrap 3.2, which is a vast improvement given that the previous version’s code seems to date from 1999.

Custom URLs. Did you know that you can work with Springshare and your IT department to customize the URLs for your Guides? Or that you can create a custom homepage with a simple redirect? My library’s website now lives at a delightfully clean URL:

Hosting. Springshare hosts its apps on Amazon servers, so librarians can focus on building content instead of dealing with IT departments, networks, server crashes, domain renewals, or FTP.

A-Z Database List. Pool all your databases into one easily sortable, searchable master list. Sort by subject, database type, and vendor and highlight “best bets” for each category.

Customizations. Customize CSS and Bootstrap for individual guides and use the powerful new API to distribute content outside LibGuides (for LibGuides CMS subscribers only). The old API has been repurposed into a snazzy widget generator to which any LibGuides subscriber has access.

Dynamic design. New features include carousels, image galleries, and tabbed boxes.

Credit: Flickr user Neal Jennings


Hidden code. As far as I can tell, librarians can’t directly edit the CSS or HTML as in WordPress. Instead, you have add custom code to empty boxes in order to override the default code.

Inflexible columns. LibGuides 2.0 lacks the v1 feature wherein librarians could easily adjust the width of guides’ columns. Now we are assigned a preselected range of column widths, which we can only alter by going through the hassle of recoding certain guide templates. Grr.

Slow widgets. Putting multiple widgets on one page can reduce load time, and occasionally a widget won’t load at all in older versions of IE, forcing frustrated users to refresh the page.

Closed documentation. Whereas the older documentation is available on the open web for anyone to see, Springshare has locked most of its v2 documentation behind a LibApps login wall.

No encryption. Alison Macrina’s recent blog post on why we need to encrypt the web resonated because LibGuides’ public side isn’t encrypted. Springshare can provide SSL for domains, but they can’t help with custom domains maintained by library IT on local servers.

Proprietary software. As a huge advocate for open source, I wince at relying on a proprietary CMS rather than on WordPress or Drupal, even though Springshare beats most vendors hollow.

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.


That said, we are delighted with the new and improved LibGuides. The upgrade has significantly enhanced our website’s user-friendliness, visual appeal, and performance. The next post in this two-part series will look at how we turned a bunch of subject guides into a library website.

Over to you, dear readers! What is your LibGuides experience? Any alternatives to suggest?

Open Knowledge Foundation: India’s Science and Technology Outputs are Now Under Open Access

Tue, 2015-02-03 11:13

This is a cross-post from the Open Knowledge India blog, see the original here.

As a new year 2015 gift to the scholars of the world, the two departments (Department of Biotechnology [DBT] and Department of Science and Technology [DST]) under the Ministry of Science and Technology, Government of India had unveiled Open Access Policy to all its funded research.

The policy document dated December 12, 2014 states that “Since all funds disbursed by the DBT and DST are public funds, it is important that the information and knowledge generated through the use of these funds are made publicly available as soon as possible, subject to Indian law and IP policies of respective funding agencies and institutions where the research is performed“.

As the Ministry of Science and Technology funds basic, translational and applied scientific research in the country through various initiatives and schemes to individual scientists, scholars, institutes, start-up, etc., this policy assumes very significance and brings almost all the science and technology outputs (here published articles only) generated at various institutes under Open Access.

The policy underscores the fact that by providing free online access to the publications is the most effective way of ensuring the publicly funded research is accessed, read and built upon.

The Ministry under this policy has set up two central repositories of its own ( and and a central harvester ( which will harvest the ful-text and metadata from these repositories and other repositories of various institutes established/funded by DBT and DST in the country.

According to the Open Access policy, “the final accepted manuscript (after refereeing, revision, etc. [post-prints]) resulting from research projects, which are fully or partially funded by DBT or DST, or were performed using infrastructure built with the support of these organizations, should be deposited“.

The policy is not only limited to the accepted manuscripts, but extends to all scholarship and data which received funding from DBT or DST from the fiscal year 2012-13 onwards.

As mentioned above that many of the research projects at various institutes in the country are funded by DBT or DST, this policy definitely, encourage the establishment of Open Access Institutional Repositories by the institutes and opening up of access to all the publicly funded research in the country.

Terry Reese: MarcEdit 6 Update

Tue, 2015-02-03 06:04

This MarcEdit update includes a couple fixes and an enhancement to one of the new validation components.  Updates include:

** Bug Fix: Task Manager: When selecting the Edit Subfield function, once the delete subfield checkbox is selected and saved, you cannot reopen the task to edit.  This has been corrected.
** Bug Fix: Validate ISBNS: When processing ISBNs, validation appears to be working incorrectly.  This has been corrected.  The ISBN validator now automatically validates $a and $z of any field specified.
** Enhancement: Validate ISBNs: When selecting the field to validate — if just the field is entered, the program automatically examines the $a and $z.  However, you can specify a specific field and subfield for validation. 


Validate ISBNs

This is a new function (as of the last update) that utilizes the mathematical formula to examine the ISBN and determine if the number is mathematically correct.  As I work into the future, I’ll add functionality to enable users to ensure that the ISBN is actually in use and linked to the record referenced in the record.  To use the function, open the MarcEditor, Select the Reports Menu, and then Validate ISBNs. 

Once selected, you will be asked to specify a field or field and subfield to process.  If just the field is selected, the program will automatically evaluate the $a and $z if present.  If the field and subfield is specified, the program will only evaluate the specified subfield.

When run, the program will output any ISBN fields that cannot be mathematically validated.


To get the update, utilize the automated update utility or go to to get the current download.


William Denton: Measure the Library Freedom

Tue, 2015-02-03 01:41

The winners of the Knight News Challenge: Libraries were announced a few days ago. I didn’t know about the Knight Foundation (those are the same Knights as in Knight Ridder (“Not to be confused with Knight Rider or Night Rider”)) but they’re giving out lots of money to lots of good projects. DocumentCloud got funded a few years ago, and the Internet Archive got $600,000 this round, and well deserved it is. I was struck by how two winners fit together: the Library Freedom Project, which got $244,700, and Measure the Future, which got $130,000.

The Library Freedom Project has this goal:

Providing librarians and their patrons with tools and information to better understand their digital rights by scaling a series of privacy workshops for librarians.

Measure the Future says:

Imagine having a Google-Analytics-style dashboard for your library building: number of visits, what patrons browsed, what parts of the library were busy during which parts of the day, and more. Measure the Future is going to make that happen by using simple and inexpensive sensors that can collect data about building usage that is now invisible. Making these invisible occurrences explicit will allow librarians to make strategic decisions that create more efficient and effective experiences for their patrons.

Our goal is to enable libraries and librarians to make the tools that measure the future of the library as physical space. We are going to build open tools using open hardware and open source software, and then provide open tutorials so that libraries everywhere can build the tools for themselves.

Moss is boss.

I like collecting and analyzing data, I like measuring things, I like small computers and embedded devices, even smart dust—it always comes back to Vernor Vinge, this time A Deepness In the Sky—but I must say I don’t like Google Analytics even though we use it at work. Any road up:

We will be producing open tutorials that outline both the open hardware and the open source software we will be using, so that any library anywhere will be able to purchase inexpensive parts, put them together, and use code that we provide to build their own sensor networks for their own buildings.

The people behind Measure the Future are all top in the field, but, cripes, it looks like they want to combine users, analytics, metrics, sensors, embedded devices, free software, open hardware and “library as place” into a well-intentioned ROI-demonstrating panopticon.

Delicious Mondrian cake. So moist, so geometric.

I’m not going to get all Michel Foucault you, but I recently read The Inspection House: An Impertinent Field Guide to Modern Surveillance by Tim Maly and Emily Horne:

The panopticon is the inflexion point and the culmination point of this new regime. It is the platonic ideal of the control the disciplinary society is trying to achieve. Operation of the panopticon does not require special training or expertise; anyone (including the children or servants of the director, as Bentham suggests) can provide the observation that will produce the necessary effects of anxiety and paranoia in the prisoner. The building itself allows power to be instrumentalized, redirecting it to the accomplishment of specific goals, and the institutional architecture provides the means to achieve that end.

Measure the Future has all the best intentions and will use safe methods, but still, it vibes hinky, this idea of putting sensors all over the library to measure where people walk and talk and, who knows, where body temperature goes up or which study rooms are the loudest … and then that would get correlated with borrowing or card swipes at the gate … and knowing that the spy agencies can hack into anything unless the most extreme security measures are taken and there’s never a moment’s lapse … well, it makes me hope they’ll be in close collaboration with the Library Freedom Project.

And maybe the Library Freedom Project can ask them why, when we’re trying to help users protect themselves as their own governments try to eliminate privacy forever, we’re planting sensors around our buildings because we now think that neverending monitoring of users will help us improve our services and show our worth to our funders.