You are here

Feed aggregator

Erin White: Easier access for databases and research guides at VCU Libraries

planet code4lib - Wed, 2015-01-07 15:00

Today VCU Libraries launched a couple of new web tools that should make it easier for people to find or discover our library’s databases and research guides.

This project’s goal was to help connect “hunters” to known databases and help “gatherers” explore new topic areas in databases and research guides1. Our web redesign task force identified these issues in 2012 user research.

1. New look for the databases list

Since the dawn of library-web time, visitors to our databases landing page were presented with an A to Z list of hundreds of databases with a list of subject categories tucked away in the sidebar.

The new design for the databases list presents a few ways to get at databases, in this order:

For the hunters:

  • Search by title with autocomplete (new functionality)
  • A to Z links

For the gatherers:

  • Popular databases (new functionality)
  • Databases by subject

And, on database subject pages and database search results, there are links to related research guides.

2. Suggested results for search

Building on the search feature in the new database list, we created an AJAX Google Adwords-esque add-on to our search engine (Ex Libris’ Primo) that recommends databases or research guides results based on the search query. For longer, more complex queries, no suggestions are shown.

Try these queries:

Included in the suggested results:

3. Updates to link pathways for databases

To highlight the changes to the databases page, we also made some changes to how we are linking to it. Previously, our homepage search box linked to popular databases, the alphabet characters A through Z, our subject list, and “all”.

The intent of the new design is to surface the new databases list landing page and wean users off the A-Z interaction pattern in lieu of search.

The top three databases are still on the list both for easy access and to provide “information scent” to clue beginner researchers in on what a database might be.

Dropping the A-Z links will require advanced researchers to make a change in their interaction patterns, but it could also mean that they’re able to get to their favorite databases more easily (and possibly unearth new databases they didn’t know about).

Remaining questions/issues
  • Research guides search is just okay. The results are helpful a majority of the time and wildly nonsensical the rest of the time. And, this search is slowing down the overall load time for suggested results. The jury is still out on whether we’ll keep this search around.
  • Our database subject categories need work, and we need to figure out how research guides and database categories should relate to each other. They don’t connect right now.
  • We don’t know if people will actually use the suggested search results and are not sure how to define success. We are tracking the number of clicks on these links using Google Analytics event tracking – but what’s good? How do we know to keep this system around?
  • The change away from the A-Z link list will be disruptive for many and was not universally popular among our librarians. Ultimately it should be faster for “hunters”, but we will likely hear groans.
  • The database title search doesn’t yet account for common and understandable misspellings2 of database names, which we hope to rectify in the future with alternate titles in the metadata.
Necessary credits

Shariq Torres, our web engineer, provided the programming brawn behind this project, completely rearchitecting the database list in Slim/Ember and writing an AJAX frontend for the suggested results. Shariq worked with systems librarians Emily Owens and Tom McNulty to get a Dublin Core XML file of the databases indexed and searchable in Primo. Web designer Alison Tinker consulted on look and feel and responsified the design for smaller-screen devices. A slew of VCU librarians provided valuable feedback and QA testing.

  1. I believe this hunter-gatherer analogy for information-seeking behaviors came from Sandstrom’s An Optimal Foraging Approach to Information Seeking and Use (1994) and have heard it in multiple forms from smart librarians over the years.
  2. Great info from Ken Varnum’s Database Names are Hard to Learn (2014)

DPLA: What’s Ahead for DPLA: Our New Strategic Plan

planet code4lib - Wed, 2015-01-07 14:45

The Digital Public Library of America launched on April 18, 2013, less than two years ago. And what a couple of years it has been. From a staff of three people, a starting slate of two million items, and 500 contributing institutions, we are now an organization of 12, with over eight million items from 1,300 contributing institutions. We have materials from all 50 states—and from around the world—in a remarkable 400 languages. Within our collection are millions of books and photographs, maps of all shapes and sizes, material culture and works of art, the products of science and medicine, and rare documents, postcards, and media.

But focusing on these numbers and their growth, while gratifying and a signal DPLA is thriving, is perhaps less important than what the numbers represent. DPLA has always been a community effort, and that community, which became active in the planning phase to support the idea of a noncommercial effort to bring together American libraries, archives, and museums, and to make their content freely available to the world, has strengthened even more since 2013. A truly national network and digital platform is emerging, although we still have much to do. A strong commitment to providing open access to our shared cultural heritage, and a deeply collaborative spirit, is what drives us every day.

Looking back, 2013 was characterized by a start-up mode: hiring staff, getting the site and infrastructure live, and bringing on a first wave of states and collections. 2014 was a year in which we juggled so much: many new hubs, partners, and content, lining up additional future contributors, and beginning to restructure our technology behind the scenes to prepare for an even more expansive collection and network.

Beginning this year, and with the release of our strategic plan for the next three years, the Digital Public Library of America will hit its stride. We encourage you to read the plan to see what’s in store, but also to know that it will require your help and support; so much in the plan is community-driven, and will be done with that same emphasis on widespread and productive collaboration.

We will be systematically putting in place what will be needed to ensure that there’s an on-ramp to the DPLA for every collection in the United States, in every state. We call this “completing the map,” or making sure that we have a DPLA service hub available to every library, archive, museum, and cultural heritage site that wishes to get their materials online and described in such a way as to be broadly discoverable. We also plan to make special efforts around certain content types—areas where there are gaps in our collection, or where we feel DPLA can make a difference as an agent of discovery and serendipity.

We have already begun to make some major technical improvements that will make ingesting content and managing metadata even better. This initiative will accelerate and be shared with our network. Moreover, we will make a major effort in the coming years to make sure that our valuable unified collection reaches every classroom, civic institution, and audience, to educate, inform, and delight.

There’s a lot to do. We just put a big pot of coffee on. Please join us for this next phase of rapid growth and communal effort!

LibUX: Links Should Open in the Same Window

planet code4lib - Wed, 2015-01-07 07:53

A question came up on ALA Think Tank:

What do you prefer: to click a link and it open in a new tab or for it to open in the same page? Is there a best practice?

There is. The best practice is to leave the default link behavior alone. Usually, this means that the link on a website will open in that same window or tab. Ideas about what links should do are taken for granted, and “best practices” that favor links opening new windows – well, aren’t exactly.

It’s worth taking a look at the original thread because I really hesitate to misrepresent it. I’m not bashing. Well-meaning Think Tankers were in favor of links opening new tabs. Below, I cherry-picked a few comments to communicate the gist:

  • “Most marketing folks will tell you that If it is a link outside your website open in a new tab, that way they don’t lose your site. Within your own site then stay with the default.”
  • “New tab because it’s likely that I want to keep the original page open. And, as [name redacted] mentions, you want to keep them on your site.”
  • “External links open in new tabs.”
  • “I choose to open in a new tab, so the user can easily return to the website in the original tab.”
  • “I was taught in Web design to always go with a new tab. You don’t want to navigate people away from your site.”
  • “I prefer a new tab.”
  • “I prefer a new tab” – not a duplicate.
  • “Marketers usually tell you new tab so people don’t move away from your page as fast.”
  • “I like new tabs because then I don’t lose the original page.”
  • “I prefer new tabs.”
  • “I think best practice is to open links on a new tab.”

There were three themes that kept recurring:

  1. We don’t want users to leave the website
  2. Users find new tabs or windows convenient
  3. I prefer …

I linked these up to a little tongue-in-cheek section at the bottom, but before we get squirrelly I want to make the case for linking within the same window.

Links should open in the same window

Best-in-show user experience researchers Nielsen Norman Group write that “links that don’t behave as expected undermine users’ understanding of their own system,” where unexpected external linking is particularly hostile. See, one of the benefits of the browser itself is that it frees users “from the whims of particular web page or content designers.” For as varied and unique as sites can be, browsers bake-in consistency. Consistency is crucial.


Jakob’s Law of the Web User Experience
Users spend most of their time on other websites.

Design conventions are useful. The menu bar isn’t at the top of the website because that’s the most natural place for it; it’s at the top because that is where every other website put it. The conventions set by the sites that users spend the most time on–Facebook, Google, Amazon, Yahoo, and so on–are conventions users expect to be adopted everywhere.

Vitaly Friedman summarizes a bunch of advice from usability-research powerhouses with this:

[A] user-friendly and effective user interface places users in control of the application they are using. Users need to be able to rely on the consistency of the user interface and know that they won’t be distracted or disrupted during the interaction.

And in case this just feels like a highfalutin excuse to rip off a design, interaction designer and expert animator Rachel Nabors makes the case that

Users … may be search-navigators or link-clickers, but they all have additional mental systems in place that keep them aware of where they are on the site map. That is, if you put the proper markers in place. Without proper beacons to home in on, users will quickly become disoriented.

This is all to stress the point that violating conventions, such as the default behaviors of web browsers, is a big no-no. The default behavior of hyperlinks is that they open within the same page.

While not addressing this question directly, Kara Pernice–the managing director at Nielsen Norman Groupwrote last month about the importance of confirming the person’s expectation of what a link is and where the link goes. Breaking that promise actually endangers the trust and credibility of the brand – in this case, the library.

Accessibility Concerns

Pop-ups and new windows have certain accessibility issues which can cause confusion for users relying on screen readers to navigate the website. WebAIM says:

Newer screen readers alert the user when a link opens a new window, though only after the user clicks on the link. Older screen readers do not alert the user at all. Sighted users can see the new window open, but users with cognitive disabilities may have difficulty interpreting what just happened.

Compatibility with WCAG 2.0 involves an “Understanding Guideline” which suggests that the website should “provide a warning before automatically opening a new window or tab.” Here is the technique.


On Twitter, I said:

Your links shouldn't open new windows. There are exceptions, but this is a 90/10 rule here. #alatt #libux

— Michael Schofield (@schoeyfield) January 6, 2015

Hugh Rundle, who you might know, pointed out a totally legit use case:

@schoeyfield I don’t disbelieve you, but I do find it difficult to comprehend. If I’m reading something I want to look at the refs later.

— Hugh Rundle (@HughRundle) January 6, 2015

Say you’re reading In the Library with the Lead Pipe where the articles can get pretty long, and you are interested in a bunch of links peppered throughout the content. You don’t want to be just midway through the text then jump to another site before you’re ready. Sometimes, having a link open in a new tab or window makes sense.

But hijacking default behavior isn’t a light decision. Chris Coyier shows how to use target attributes in hyperlinks to force link behavior, but gives you no less than six reasons why you shouldn’t. Consider this: deciding that such-and-such link should open in a new window actually eliminates navigation options.

If a link is just marked up without any frills, like <a href=>, users’ assumed behavior of that link is that it will open in the same tab/window, but by either right-clicking, using a keyboard command, or lingering touch on a mobile device, the user can optionally open in it in a new window. When you add target=_blank to the mix, alternate options are mostly unavailable.

I think it’s a compelling use-case of opening reference links in new windows midway through long content, but it’s worth considering whether the inconvenience of default link behavior is greater than the interaction cost and otherwise downward drag on overall user experience.

Uh, you said “exceptions” …

In my mind, I do think it is a good idea to use target=_blank when opening the link will interrupt an ongoing process:

  • the user is filling out a form and needs to click on a link to review, say, terms of service
  • the user is watching video or listening to audio

Jakob Nielsen and Vitaly Friedman think it’s a-okay to link to a non-html-document, like a pdf or mp3. Chris Coyier, however, doesn’t think so.

So, yeah, there are exceptions.

So, is there a best practice?

The best practice is to leave the default link behavior alone. It is only appropriate to open a link in a new tab or window in the rarest use cases.

Frequent Comments, or, Librarians Aren’t Their Users

We don’t want users to leave the website.

Marketing folks say this sort of thing. They are the same people who demand carousels, require long forms, and make ads that look like regular content. Using this reasoning, opening a link in a new window isn’t just an antipattern, it is a dark pattern – a user interface designed to trick people.

Plus, poor user experiences negatively impact conversion rate and the bottom line. Tricks like the above are self-defeating.

Users find new tabs or windows convenient.

No they don’t.

I prefer ….

You are not your user.

The post Links Should Open in the Same Window appeared first on LibUX.

Peter Sefton: Letter of resignation

planet code4lib - Wed, 2015-01-07 04:02

to: The boss

cc: Another senior person, HR

date: 2014-12-10

Dear <boss>,

As I discussed with you last week, I have accepted a position with UTS, starting Feb 9th 2015, and I resign my position with UWS. My last day will be Feb 6th 2015.

Regards, Peter

Dr PETER SEFTON Manager, eResearch, Office of the Deputy Vice-Chancellor (Research & Development) University of Western Sydney

Anticipated FAQ:

  • What? eResearch Support Manager – more or less the same gig as I’ve had at UWS, in a tech-focussed uni with a bigger team, dedicated visualisation service and HPC staff, an actual budget and mostly within a few city blocks.

  • Why UTS? A few reasons.

    • There was a job going, I thought I’d see if they liked me. They did. I already knew some of the eResearch team there. I’m confident we will be good together.

    • It’s a continuing position, rather than the five-year, more-than-half-over contract I was on, not that I’m planning to settle down for the rest of my working life as an eResearch manager or anything.

    • The physical concentration should be conducive to Research Bazaar #resbaz activities such as Hacky Hour.

  • But what about the travel!!!!? It will be 90 minutes laptop time each way on a comfy, reasonably cheap and fairly reliable train service with almost total mobile internet coverage, with a few hundred metres walking on either end. That’s a change from 35-90 minutes each way depending on what campus(es) I was heading for that day and the mode of transport, which unfortunately was mostly motor vehicle. I do not like adding yet another car to Sydney’s M4, M7 or M5, despite what I said in my UWS staff snapshot. I think I’ll be fine with the train. If not, oops. Anyway, there are inner-Sydney family members and mates I’ll see more of if only for lunch.

    When the internets stop working the view is at its best. OK, apart from the tunnel and the cuttings.

  • What’s the dirt on UWS? It’s not about UWS, I’ve had a great time there, learned how to be an eResearch manager, worked on some exciting projects, made some friends, and I’ll be leaving behind an established, experienced eResearch team to continue the work. I’m sorry to be going. I’d work there again.

  • Why did you use this mode of announcement? I was inspired by Titus Brown, a few weeks a go.

[updated 2015-01-07 – typo]

Library Tech Talk (U of Michigan): Website Refresh: Really Thinking It Through

planet code4lib - Wed, 2015-01-07 00:00
The Digital Library Production Service (DLPS) recently did a thoughtful and comprehensive update of its web presence on the University of Michigan Library website. This post summarizes the process and calls out the value of having a web content strategist in the mix.

Harvard Library Innovation Lab: Link roundup January 6, 2015

planet code4lib - Tue, 2015-01-06 19:15

Whoa! A batch of links in one day.

STUDIO for Creative Inquiry » Balance from Within

The sofa provides a space for a range of social interactions.

Career Spotlight: What I Do as a Librarian

Librarian career spotlight. “Customer service is always my number one goal.”

Cartoon: Dewey

Hilarious provisional additions to the Dewey Decimal System

Lincoln Book Tower | Ford’s Theatre

A 34 foot tower of books about Abraham Lincoln lives at the Ford’s Theatre Center for Education and Leadership

Watch This 3D-Printed Object Fold and Launch Paper Airplanes | Mental Floss

Use your 3D printer to make an all-in-one paper airplane folder and launcher

Library of Congress: The Signal: Report Available for the 2014 DPOE Training Needs Assessment Survey

planet code4lib - Tue, 2015-01-06 18:49

The following is a guest post by Barrie Howard, IT Project Manager at the Library of Congress.

In September, the Digital Preservation Outreach and Education (DPOE) program wrapped up the “2014 DPOE Training Needs Assessment Survey” in an effort to get a sense of current digital preservation practice, a better understanding about what capacity exists for organizations and professionals to effectively preserve digital content and some insight into their training needs. An executive summary (PDF) and full report (PDF) about the survey results are now available.

The respondents expressed an overwhelming concern for making their content accessible for at least a ten-year horizon, and showed strong support for educational opportunities, like the DPOE Train-the-Trainer Workshop, which provides training to working professionals, increasing organizational capacity to provide long-term access to digital content.

As mentioned in a previous blog post announcing the survey results, this survey was a follow-up to an earlier survey conducted in the summer and fall of 2010.  The questions addressed issues such as the primary function of an organization (library, archive, museum, etc.), staff size and responsibilities, collection items, preferred training content and delivery options and financial support for professional development and training. There was good geographic coverage in the responses from organizations in 48 states, Washington D.C. and Puerto Rico, and none of the survey questions were skipped by any of the respondents. Overall, the distribution of responses was about the same from libraries, archives, museums and historical societies between 2010 and 2014, although there was a notable increase in participation from state governments.

The most significant takeaways from the 2014 survey are:

1) an overwhelming expression of concern that respondents ensure their digital content is accessible for ten or more years (84%);

2) evidence of a strong commitment to support employee training opportunities (83%, which is an increase from 66% reported in 2010), and;

3) similar results between 2010 and 2014. This trend will be of particular interest when the survey is conducted again in 2016.

Other important discoveries reveal changes in staff size and configuration over the last four years. There was a marked 6% decrease in staff size at smaller organizations (those with 1-50 employees), and a slight 2% drop in staff size at large organizations with over 500 employees. In comparison, medium-size organizations reported a 4% uptick in the staff range of 51-200, and 3% for the 201-500 tier. There was a substantial 13% increase across all organizations in paid full-time or part-time professional staff with practitioner experience, and a 5% drop in organizations reporting no staff at all. These findings suggest positive trends across the digital preservation community, which bodes well for the long-term preservation of our collective cultural heritage. Born-digital content wasn’t extant as a choice for the 2010 survey regarding content held by respondents, yet is a close second to reformatted materials. This will be another closely-monitored data point in 2016.

Preparation of charts and graphs by Mr. Robert L. Bostick and Mrs. Florence A. Phillips, 1951. Library of Congress Prints and Photographs Division.

Regarding training needs, online delivery is trending upward across many sectors to meet the constraints of reduced travel and professional development budgets. However, results of the 2014 survey reveal respondents still value intimate, in-person workshops as one of their most preferred delivery options with webinars and self-paced, online courses as the next two choices. Respondents demonstrated a preference for training focused on applicable skills, rather than introductory material on basic concepts, and show a preference to travel off-site within a 100-mile radius for half- to full-day workshops over other options.

DPOE currently offers an in-person, train-the-trainer workshop, and is exploring options for extending the workshop Curriculum to include online delivery options for the training modules. These advancements will address some of the issues raised in the survey, and may include regularly scheduled webinars, on-demand videos, and pre- and post-workshop videos. Keep a watchful eye on the DPOE website and The Signal for subsequent DPOE training materials as they become available.

District Dispatch: Grab E-rate Order CliffsNotes and join PLA webinar to get a jumpstart on “New Year, New E-rate”

planet code4lib - Tue, 2015-01-06 17:49

Photo by Pearl Avenue Branch Library

For those of us who did not take Marijke Visser’s advice for light holiday reading, ALA Office for Information Technology Policy Fellow Bob Bocher has a belated gift. Click here to get the library “CliffsNotes” of the E-rate Order adopted by the Federal Communications Commission (FCC) in December 2014.

This summary provides a high-level overview of the 76-page Order, focusing on four key changes:
1) Ensuring all libraries and schools have access to high-speed broadband connectivity.
2) Increasing the E-rate fund by $1.5 billion annually.
3) Taking actions to be reasonably certain all applications will be funded.
4) Correcting language in the July Order that defined many rural libraries and schools as “urban,” thus reducing their discounts.

This document book-ends nearly two years of ALA and library advocacy and joins a similar summary of the July 14 FCC E-rate Order. In addition to the summaries, we encourage you to go to the USAC website where there is a dedicated page for the most up to date information concerning the E-rate program.

Bob, Marijke and I also invite you to usher in a “New Year, New E-rate” with a free webinar from the Public Library Association, Thursday, January 8. We’ll review important changes in the program and discuss how libraries can take advantage of new opportunities in 2015 and 2016. The webinar is free, but registration is required, and space is limited! The archive will be made available online following the session.

Stay tuned. The #libraryerate conversation will continue at the ALA Midwinter Meeting in Chicago.

The post Grab E-rate Order CliffsNotes and join PLA webinar to get a jumpstart on “New Year, New E-rate” appeared first on District Dispatch.

Islandora: Call for 7.x-1.5 Release Team Volunteers

planet code4lib - Tue, 2015-01-06 17:31

The 7.x-1.5 Release Team will be working on the next release very soon, and you could be our very next release team member!

We are looking for members for all three release team roles.

Release Team Roles

Documentation: Documentation will need to be updated for the next release. Any new components will also need to be documented. This is your chance to help the community improve the Islandora documentation while updating it for the new release! Volunteers will be provided with a login to the Islandora Confluence wiki and will work alongside the Islandora Documentation Interest Group to update the wiki in time for the new release.

Testers: All components with Jira issues set to 'Ready for Test' will need to be tested and verifying. Testers will also test basic functionality of their components, and audit README and LICENSE files.

Component Manager: Are responsible for the code base of their components.

Time lines
  • Code Freeze: Friday, February 27, 2015
  • Release Candidate: Friday, March 6, 2015
  • Release: Thursday April 30, 2015

If you are interested in being a member of the release, please let me know what role you are interested in, and which components you'd like to volunteer for. A preliminary list of components can be found here. If you have a questions about being a member of the release team, please reply here.

District Dispatch: OITP releases report exploring policy implications of 3D printing

planet code4lib - Tue, 2015-01-06 17:13

Photo by Subhashish Panigrahi

3D printers can do incredible things – from creating food, to rendering human organs, to building spare parts for the International Space Station. A small but growing number of libraries make 3D printers available as a library service. Library 3D printers may not be able to make you a pizza (yes, that’s possible) or operate in zero gravity, but they are being used to do some pretty amazing things in their own right. Library users are building functioning prosthetic limbs, creating product prototypes and making educational models for use in classwork.

While 3D printing technology is advancing at a meteoric pace, policymakers are just beginning to develop frameworks for its use. This presents the library community with an exciting opportunity—as providers of 3D printing services to the public, we can begin to shape the policy that coalesces around this technology in the years to come.

To advance this work, ALA’s Office for Information Technology Policy (OITP) today released “Progress in the Making: 3D Printing Policy Considerations through the Library Lens,” a report that examines numerous policy implications of 3D printing, including those related to intellectual property, intellectual freedom and product liability. The report seeks to provide library professionals with the knowledge they need to craft 3D printer user policies that minimize liability risks while encouraging users to innovate, learn and have fun.

The report states:

“As this technology continues to take off, library staff should continue to encourage patrons to harness it to provide innovative health care solutions, launch business ventures and engage in creative learning. In order to do so, library staff must have a clear understanding of basic 3D printer mechanics; the current and potential future uses of 3D printers inside and outside of library walls; and the economic and public policy considerations regarding 3D printing.”

ALA’s Office for Intellectual Freedom contributed a piece to the report entitled, “Intellectual Freedom and Library Values,” which offers guidance to library professionals seeking to craft a 3D printer acceptable use policy that accords with the fundamental library value of free expression. Additionally, Tomas A. Lipinski, dean and professor at University of Wisconsin—Milwaukee’s School of Information, provides a sample warning notice that libraries may use with patrons to demonstrate awareness of the legal issues involved in the use of 3D printing technologies in libraries.

The report was released as part of the OITP Perspectives series of short publications that discuss and analyze specialized policy topics. It is the second publication in ALA’s “Progress in the Making” series, an effort to elucidate the policy implications of 3D printing in the library context. The first document was a tip sheet jointly released by OITP, the Public Library Association and United for Libraries.

The post OITP releases report exploring policy implications of 3D printing appeared first on District Dispatch.

David Rosenthal: Stretching the "peer reviewed" brand until it snaps

planet code4lib - Tue, 2015-01-06 16:00
The very first post to this blog, seven-and-a-half years and 265 posts ago, was based on an NSF/JISC workshop on scholarly communication. I expressed skepticism about the value added by peer review, following Don Waters by quoting work from Diane Harley et al:
They suggest that "the quality of peer review may be declining" with "a growing tendency to rely on secondary measures", "difficult[y] for reviewers in standard fields to judge submissions from compound disciplines", "difficulty in finding reviewers who are qualified, neutral and objective in a fairly closed academic community", "increasing reliance ... placed on the prestige of publication rather than ... actual content", and that "the proliferation of journals has resulted in the possibility of getting almost anything published somewhere" thus diluting "peer-reviewed" as a brand.My prediction was:
The big problem will be a more advanced version of the problems currently plaguing blogs, such as spam, abusive behavior, and deliberate subversion. Since then, I've returned to the theme at intervals, pointing out that reviewers for top-ranked journals fail to perform even basic checks, that the peer-reviewed research on peer review shows that the value even top-ranked journals add is barely detectable, even before allowing for the value subtracted by their higher rate of retraction, and that any ranking system for journals is fundamentally counter-productive. As recently as 2013 Nature published a special issue on scientific publishing that refused to face these issues by failing to cite the relevant research. Ensuring relevant citation is supposed to be part of the value top-ranked journals add.

Recently, a series of incidents has made it harder for journals to ignore these problems. Below the fold, I look at some of them.

In November, Ivan Oransky at Retraction Watch reported that BioMed Central (owned by Springer) recently found about 50 papers in their editorial process whose reviewers were sock-puppets, part of a trend:
Journals have retracted more than 100 papers in the past two years for fake peer reviews, many of which were written by the authors themselves. Many of the sock-puppets were suggested by the authors themselves, functionality in the submission process that clearly indicates the publisher's lack of value-add. Nature published an overview of this vulnerability of peer review by Cat Ferguson, Adam Marcus and Oransky entitled Publishing: The peer-review scam that included jaw-dropping security lapses in major publisher's systems:
[Elsevier's] Editorial Manager's main issue is the way it manages passwords. When users forget their password, the system sends it to them by e-mail, in plain text. For PLOS ONE, it actually sends out a password, without prompting, whenever it asks a user to sign in, for example to review a new manuscript.In December, Oransky pointed to a study published in PNAS by Kyle Silera, Kirby Leeb and Lisa Bero entitled Measuring the effectiveness of scientific gatekeeping. They tracked 1008 manuscripts submitted to three elite medical journals:
Of the 808 eventually published articles in our dataset, our three focal journals rejected many highly cited manuscripts, including the 14 most popular; roughly the top 2 percent. Of those 14 articles, 12 were desk-rejected. This finding raises concerns regarding whether peer review is ill-suited to recognize and gestate the most impactful ideas and research.Desk-rejected papers never even made it to review by peers. Its fair to say that Silera et al conclude:
Despite this finding, results show that in our case studies, on the whole, there was value added in peer review.These were elite journals, so a small net positive value add matches earlier research. But again, the fact that it was difficult to impossible for important, ground-breaking results to receive timely publication in elite journals is actually subtracting value. And, as Oransky says:
Perhaps next up, the authors will look at why so many “breakthrough” papers are still published in top journals — only to be retracted. As Retraction Watch readers may recall, high-impact journals tend to have more retractions.Also in December, via Yves Smith, I found Scholarly Mad Libs and Peer-less Reviews in which Marjorie Lazoff comments on the important article For Sale: “Your Name Here” in a Prestigious Science Journal from December's Scientific American (owned by Nature Publishing). In it Charles Seife investigates sites such as:
MedChina, which offers dozens of scientific "topics for sale" and scientific journal "article transfer" agreements. Among other services, these sites offer "authorship for pay" on articles already accepted by journals. He also found suspicious similarities in wording among papers, including:
"Begger's funnel plot" gets dozens of hits, all from China.“Beggers funnel plot” is particularly revealing. There is no such thing as a Beggers funnel plot. ... "It's difficult to imagine that 28 people independently would invent the name of a statistical test,"Some of the similarities may be due to authors with limited English using earlier papers as templates when reporting valid research, but some such as the Begger's funnel plot papers are likely the result of "mad libs" style fraud. And Lazoff points out they likely used sockpuppet reviewers:
Last month, Retraction Watch published an article describing a known and partially-related problem: fake peer reviews, in this case involving 50 BioMed Central papers. In the above-described article, Seife referred to this BioMed Central discovery; he was able to examine 6 of these titles and found that all were from Chinese authors, and shared style and subject matter to other “paper mill-written” meta-analyses. Lazoff concludes:
Research fraud is particularly destructive given traditional publishing’s ongoing struggle to survive the transformational Electronic Age; the pervasive if not perverse marketing of pharma, medical device companies, and self-promoting individuals and institutions using “unbiased” research; and today’s bizarrely anti-science culture.  but goes on to say:
Without ongoing attention and support from the entire medical and science communities, we risk the progressive erosion of our essential, venerable research database, until it finally becomes too contaminated for even our most talented editors to heal.I'm much less optimistic. These recent examples, while egregious, are merely a continuation of a trend publishers themselves started many years ago of stretching the "peer reviewed" brand by proliferating journals. If your role is to act as a gatekeeper for the literature database, you better be good at being a gatekeeper. Opening the gate so wide that anything can get published somewhere is not being a good gatekeeper.

The fact that even major publishers like Nature Publishing are finally facing up to problems with their method of publishing that the scholars who research such methods have been pointing out for more than seven years might be seen as hopeful. But even if their elite journals could improve their ability to gatekeep, the fundamental problem remains. An environment where anything will get published, the only question is where (and the answer is often in lower-ranked journals from the same publishers), renders even good gatekeeping futile. What is needed is better mechanisms for sorting the sheep from the goats after the animals are published. Two key parts of such mechanisms will be annotations, and reputation systems.

John Mark Ockerbloom: Public Domain Day 2015: Ending our own enclosures

planet code4lib - Tue, 2015-01-06 14:13

It’s the start of the new year, which, as many of my readers know, marks another Public Domain Day, when a year’s worth of creative work becomes free for anyone to use in many countries.

In countries where copyrights have been extended to life plus 70 years, works by people like Piet Mondrian, Edith Durham, Glenn Miller, and Ethel Lina White enter the public domain.  In countries that have resisted ongoing efforts to extend copyrights past life + 50 years, 2015 sees works by people like Flannery O’Connor, E. J. Pratt, Ian Fleming, Rachel Carson, and T. H. White enter the public domain. And in the US, once again no published works enter the public domain due to an ongoing freeze in copyright expirations (though some well-known works might have if we still had the copyright laws in effect when they were created.)

But we’re actually getting something new worth noting this year.  Today we’re seeing scholarship-quality transcriptions of tens of thousands of early English books — the EEBO Text Creation Partnership Phase I texts – become available free of charge to the general public for the first time.  (As I write this, the books aren’t accessible yet, but I expect they will be once the folks in the project come back to work from the holiday.)  (UpdateIt looks like files and links are now on Github; hopefully more user-friendly access points are in the works as well.)

This isn’t a new addition to the public domain; the books being transcribed have been in the public domain for some time.  But it’s the first time many of them are generally available in a form that’s easily searchable and isn’t riddled with OCR errors.  For the rarer works, it’s the first time they’re available freely across the world in any form.  It’s important to recognize this milestone as well, because taking advantage of the public domain requires not just copyrights expiring or being waived, but also people dedicated to making the public domain available to the public.

And that is where we who work in institutions dedicated to learning, knowledge, and memory have unique opportunities and responsibilities.   Libraries, galleries, archives, and museums have collected and preserved much of the cultural heritage that is now in the public domain, and that is often not findable– and generally not shareable– anywhere else.  That heritage becomes much more useful and valuable when we share it freely with the whole world online than when we only give access to people who can get to our physical collections, or who can pay the fees and tolerate the usage restrictions of restricted digitized collections.

So whether or not we’re getting new works in the public domain this year, we have a lot of work to do this year, and the years to follow, in making that work available to the world.  Wherever and whenever possible, those of us whose mission focuses more on knowledge than commerce should commit to having that work be as openly accessible as possible, as soon as possible.

That doesn’t mean we shouldn’t work with the commercial sector, or respect their interests as well.  After all, we wouldn’t have seen nearly so many books become readable online in the early years of this century if it weren’t for companies like Google, Microsoft, and ProQuest digitizing them at much larger scale than libraries had previously done on their own.  As commercial firms, they’re naturally looking to make some money by doing so.  But they need us as much as we need them to digitize the materials we hold, so we have the power and duty to ensure that when we work with them, our agreements fulfill our missions to spread knowledge widely as well as their missions to earn a profit.

We’ve done better at this in some cases than in others.   I’m happy that many of the libraries who partnered with Google in their book scanning program retained the rights to preserve those scans themselves and make them available to the world in HathiTrust.   (Though it’d be nice if the Google-imposed restrictions on full-book downloads from there eventually expired.)  I’m happy that libraries who made deals with ProQuest in the 1990s to digitize old English books that no one else was then digitizing had the foresight to secure the right to make transcriptions of those books freely available to the world today.  I’m less happy that there’s no definite release date yet for some of the other books in the collection (the ones in Phase II, where the 5-year timer for public release doesn’t count down until that phase’s as-yet-unclear completion date), and that there appears to be no plan to make the page images freely available.

Working together, we in knowledge institutions can get around the more onerous commercial restrictions put on the public domain.  I have no issue with firms that make a reasonable profit by adding value– if, for instance, Melville House can quickly sell lots of printed and digitally transcribed copies of the US Senate Torture report for under $20, more power to them.  People who want to pay for the convenience of those editions can do so, and free public domain copies from the Senate remains available for those who want to read and repurpose them.

But when I hear about firms like Taylor and Francis charging as much as $48 to nonsubscribers to download a 19th century public domain article from their website for the Philosophical Magazine, I’m going to be much more inclined to take the time to promote free alternatives scanned by others.  And we can make similar bypasses of not-for-profit gatekeepers when necessary.  I sympathize with Canadian institutions having to deal with drastic funding cuts, which seem to have prompted Early Canadiana Online to put many of their previously freely available digitized books behind paywalls– but I still switched my links as soon as I could to free copies of most of the same books posted at the Internet Archive.  (I expect that increasing numbers of free page scans of the titles represented in Early English Books Online will show up there and elsewhere over time as well, from independent scanning projects if not from ProQuest.)

Assuming we can hold off further extensions to copyright (which, as I noted last year, is a battle we need to show up for now), four years from now we’ll finally have more publication copyrights expiring into the public domain in the US.  But there’s a lot of work we in learning and memory institutions can do now in making our public domain works available to the world.  For that matter, there’s a lot we can do in making the many copyrighted works we create available to the world in free and open forms.  We saw a lot of progress in that respect in 2014: Scholars and funders are increasingly shifting from closed-access to open-access publication strategies.  A coalition of libraries has successfully crowdfunded open-access academic monographs for less cost to them than for similar closed-access print books.  And a growing number of academic authors and nonprofit publishers are making open access versions of their works, particularly older works, freely available to world while still sustaining themselves.  Today, for instance, I’ll be starting to list on The Online Books Page free copies of books that Ohio State University Press published in 2009, now that a 5-year-limited paywall has expired on those titles.  And, as usual, I’m also dedicating a year’s worth of 15-year-old copyrights I control (in this case, for work I made public in 2000) to the public domain today, since the 14-year initial copyright term that the founders of the United States first established is plenty long for most of what I do.

As we celebrate Public Domain Day today, let’s look to the works that we ourselves oversee, and resolve to bring down enclosures and provide access to as much of that work as we can.

LITA: Agile Development: Core Values

planet code4lib - Tue, 2015-01-06 13:00

Image courtesy of Planbox via Wikimedia Commons

In my last post, I talked about some of the advantages of and potential problems with using Agile as your development philosophy. Today I’d like to build on that topic by talking about the fundamental principles that guide Agile development. There are four, each seemingly described as a choice between two competing priorities:

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan

In reality, the core values should not be taken as “do this, NOT that” statements, but rather as reminders that help the team prioritize the activities and attitudes that create the most value.

1. Individuals and interactions over processes and tools

The first core value is my favorite one: start with the right people, then build your processes and select your tools to best fit them, rather than the other way around. A good development team will build good software; how they build it is a secondary concern, albeit still a valid one: just because your star engineer likes to code in original Fortran, it doesn’t mean you should fill a couple of rooms with IBM-704s. Choosing the right tool is important, and will improve your team’s ability to produce quality software, as well as team recruitment.

Still, it’s the people that matter, and in particular their interactions with each other and with other parts of the organization. The key to building great software is teamwork. Individual skill plays a role, but without open communication and commitment to the team’s goals, the end product may look great, but it will likely not fulfill the original customer need, or it will do so in an inefficient manner. Agile’s daily standup meetings and end-of-iteration evaluations are a way to encourage the team to communicate freely and check egos at the door.

2. Working software over comprehensive documentation

This is the one that often makes developers jump for joy! An Agile team’s focus should be on finding the most efficient way to build software that solves an identified need, and therefore should not spend a lot of time on paperwork. Agile documentation should answer two basic questions: what are we going to build (project requirements and user stories) and how did we build it (technical specifications). The former is crucial for keeping the team focused on the ultimate goal during the fast and furious development sprints, and the latter is needed later on for the purpose of revisiting a certain project, be it to make enhancements or corrections or to reuse a particular feature. Anything else is typically overkill.

3. Customer collaboration over contract negotiation

The best way I can think of to explain this core value is: the development team needs to think of the customer as another member of the team. The customer-team relationship should not be managed by a signed piece of paper, but rather by the ongoing needs of the project. Contract negotiations (you can  calm your legal department down at this point; yes, there will be a contract) should be focused on identifying the problem that needs to be solved and a clear set of success criteria that will tell us we’ve solved it, rather than the tool or process to be delivered. Provisions should be made for regular customer-team interactions (say, by involving customer representatives in sprint planning and review meetings) and a clearly defined change management process: software development is a journey, and the team should have the flexibility to change course midstream if doing so will make the end product a better fit for the customer’s need.

4. Responding to change over following a plan

I talked about requirements documentation earlier, so there is, in fact, an overall plan. What this core value means is that those requirements are a suggested path to solving a customer need, and they can be modified throughout the project if prior development work uncovers a different, better path to the solution, or even a better solution altogether. And in this case, better means more efficient. In fact, everything I’ve described can be summarized in one, overarching principle: identify the problem to be solved or that needs to be fulfilled, and find the least costly way to get to that end point; do this at the beginning of the project, and keep doing it over, and over, and over again until everyone agrees that a solution has been reached. Everything else (processes, tools, plans, documentation) either makes it easier for the team to find that solution, or is superfluous and should be eliminated.

State Library of Denmark: Finding the bottleneck

planet code4lib - Tue, 2015-01-06 11:57

While our Net Archive search performs satisfactory, we would like to determine how well-balanced the machine is. To recap, it has 16 cores (32 with Hyper Threading), 256GB RAM and 25 Solr shards @ 900GB. When running it uses about 150GB for Solr itself, leaving 100GB memory for disk cache.

Test searches are for 1-3 random words from a Danish dictionary, with faceting on 1 very large (billions of unique values, billions of references), 2 large fields (millions of unique values, billions of references) and 3 smaller fields (thousands of unique values, billions of references). Unless otherwise noted, searches were issued one request at a time.

Scaling cores

Under Linux it is quite easy to control which cores a process utilizes, by using the command taskset. We tried scaling that by doing the following with different cores:

  1. Shut down all Solr instances
  2. Clear the disk cache
  3. Start up all Solr instances, limited to specific cores
  4. Run the standard performance test

In the chart below, ht means that there is the stated number of cores + their Hyper Threaded counterpart. In other words, 8ht means 8 physical cores but 16 virtual ones.

Scaling number of cores and use of Hyper Threading


  • Hyper Threading does provide a substantial speed boost.
  • The differences between 8ht cores and 16 or 16ht cores are not very big.

Conclusion: For standard single searches, which is the design scenario, 16 cores seems to be overkill. More complex queries would likely raise the need for CPU though.

Scaling shards

Changing the number of shards on the SolrCloud setup was simulated by restricting queries to run on specific shards, using the argument shards. This was not the best test as it measured the combined effect of the shard-limitation and the percentage of the index held in disk cache; e.g. limiting the query to shard 1 & 2 meant that about 50GB of memory would be used for disk cache per shard, while limiting to shard 1, 2, 3, & 4 meant only 25GB of disk cache per shard.

Scaling number of active shards

Note: These tests were done on performance degraded drives, so the actual response times are too high. The relative difference should be representative enough.


  • Performance for 1-8 shards is remarkably similar.
  • Going from 8 to 16 shards is 100% more data at half performance.
  • Going from 16 to 24 shards is only 50% more data, but also halves performance.

Conclusion: Raising the number of shards further on an otherwise unchanged machine would likely degrade performance fast. A new machine seems like the best way to increase capacity, the less guaranteed alternative being more RAM.

Scaling disk cache

A Java program was used to reserve part of the free memory, by allocating a given amount of memory as long[] and randomly changing the content. This effectively controlled the amount of memory available for disk cache for the Solr instances. The Solr instances were restarted and the disk cache cleared between each test.

Scaling free memory for disk cache


  • A maximum running time of 10 minutes was far too little for this test, leaving very few measuring points for 54GB, 27GB and 7GB disk cache.
  • Performance degrades exponentially when the amount of disk cache drops below 100GB.

Conclusion: While 110GB (0.51% of the index size) memory for disk cache delivers performance well within our requirements, it seems that we cannot use much of the free memory for other purposes. It would be interesting to see how much performance would increase with even more free memory, for example by temporarily reducing the number of shards.

Scaling concurrent requests

Due to limited access, we only need acceptable performance for one search at a time. Due to the high cardinality (~6 billion unique values) URL-field, the memory requirements for a facet call is approximately 10GB, severely limiting the maximum number of concurrent requests. Nevertheless, it is interesting to see how much performance changes when the number of concurrent requests rises. To avoid reducing the disk cache, we only tested with 1 and 2 concurrent requests.

Scaling concurrent request with heavy faceting

Observations (over multiple runs; only one run in shown in the graph):

  • For searches with small to medium result sets (aka “normal” searches), performance for 2 concurrent requests was nearly twice as bad as for 1 request.
  • For searches with large result sets, performance for 2 concurrent requests were more than twice as bad as for 1 request. This is surprising as a slightly better than linear performance drop were expected.

Conclusion: Further tests seems to be in order due to the surprisingly bad scaling. One possible explanation is that memory speed is the bottleneck. Limiting that number to 1 or 2 and queuing further requests is the best option for maintaining a stable system due to memory overhead.

Scaling requirements

Admittedly, the whole facet-on-URL-thing might not be essential for the user experience. If we avoid faceting on that field and only facet on the more sane fields, such as host, domain and 3 smaller fields, we can turn up the number of concurrent requests without negative impact on disk cache.

Scaling concurrent requests with moderate faceting


  • Mean performance with 1 concurrent request is 10 times better, compared to the full faceting scenario.
  • From 1-4 threads, latency drops and throughput improves.
  • From 4-32 threads, latency drops but throughput does not improve.

Conclusion: As throughput does not improve for more than 4 concurrent threads, using a limit of 4 seems beneficial. However, as we are planning to add faceting on links_domains and links_hosts, as well as grouping on URL, the measured performance is not fully representative of future use of search in the Net Archive.

James Cook University, Library Tech: Random notes from "Data for ROI and Benchmarking Ebook Collections"

planet code4lib - Tue, 2015-01-06 06:15
I registered for Library Journals webcast "Data for ROI and Benchmarking Ebook Collections". This webcast can now be viewed On-demand." but as usual couldn't make it but they did record it so - I'm just writing down the points that stuck out for me. Ying Zhang (Acquistions Librarian from University of Central Florida) did some analysis of her institutions use of ebooks acquired using one of

William Denton: NO AD

planet code4lib - Tue, 2015-01-06 03:46

Re+Public has made NO AD, an augmented reality smartphone app that replaces ads in the New York City subway system with art.

They say:

New York is a city of commuters. 5.5 million riders move through its expansive subway system on an average weekday. Advertisers take advantage of this huge, captive audience by bombarding everyone with commercial messages.

Over the years, artists have attempted to take back this public space and our attention, but the system remains full of ads. This is why we created NO AD, a mobile app available now for FREE on iOS and Android. NO AD uses augmented reality technology to replace ads with artwork in realtime through your mobile device.

I wrote this in “Lunch with Zoia,” a short short story at the start of Libraries and Archives Augmenting the World:

Zoia was meeting George at a pub a ten-minute walk from her university that was also easy to get to from George’s public library, especially when the subways were working. She enjoyed the view as she left the university: she ran Adblock Lens, which she’d customized so it disabled every possible ad on campus as well as in the bus stops and on the billboards on the city streets. Sometimes she replaced them with live content, but today she just had the blanked spaces colored softly to blend in with what was around them. No ads, just a lot of beige and brown, slightly glitchy.

It’s not my original idea, but it’s nice to see it become real.

(In my browsers I use Adblock Plus, by the way, and I configure it to block the “unobtrusive” ads too. Wonderful. When I see people using the web without an ad blocker the experience revolts me.)

Mark E. Phillips: Metadata Quality, Completeness, and Minimally Viable Records

planet code4lib - Tue, 2015-01-06 03:20

The quality of metadata records for digital library objects is a subject that comes up pretty often at work.  We haven’t stumbled upon any solid answers to overall questions about measuring, improving, or evaluating metadata but we have given a few things a try.  Here are a few examples of one of these components that we have found useful.

UNTL Metadata

The UNT Libraries’ Digital Collections consists of The Portal to Texas History, the UNT Digital Library, and The Gateway to Oklahoma History.  At the time of writing this post we have 1,049,483 metadata records in our metadata store.  Our metadata model uses the primary fifteen Dublin Core Metadata Elements which we describe as “locally qualified” so for example a Title with a qualifier of “Main Title”, or “Added Title” and a Subject that is a namedPerson or LCSH, or MESH.  In addition to those fifteen elements we have added a few other qualified fields such as Citation, Degree, Partner, Collection, Primary Source, Note, and Meta.  These all make up a metadata format we call locally UNTL.  This is all well documented by our Digital Projects Unit on our Metadata Guidelines pages.  All of the controlled vocabularies we use as qualifiers to our metadata elements are available in our Controlled Vocabularies App .  We typically serialize our metadata format as an XML record on disk, each item in our system exposes the raw UNTL metadata record in addition to other formats.  Here is an example record for an item in The Portal.  To simplify the reading, writing and processing of metadata records in our system we have a Python module called pyuntl that we use for all things UNTL metadata related.


The group of folks in the Digital Libraries Division that were interested in metadata quality have been talking about ways to measure quality in our systems for some time.  As the conversation isn’t new,  we have quite a bit of literature on the subject to review.  We noticed that when librarians begin to talk about “qualify of metadata” we tend to get a bit bristly, saying “well what really is quality” and “but not in all situations” and so on.  We wanted to come up with a metric that we could use as a demonstration of the value of defining some of the concepts of quality and moving them into a system in an actionable way.  We decided that a notion of completeness could be a good way of moving forward because when defining what were the required fields for a record,  it would be easy to assess in a neutral fashion if a metadata has the required fields or not.

For our completeness metric we identified the following fields as needing to be present in a metadata record in order for us to consider it a “minimally viable record” in our system.

  • Title
  • Description
  • Subject/Keywords
  • Language
  • Collection
  • Partner
  • Resource Type
  • Format
  • Meta Information for Record

The idea was that even if one did not know much of anything about an object, that we would be able to describe it at the surface level, assign it a title, language value, subject/keyword and give it a resource type and format.  The meta information about the item and the Institution and Collection elements are important for keeping track of where items come from, who owns them and to make the system work properly.  We also assigned a weight to some of these fields saying that some of the elements carry more weight than others.  Here is that breakdown.

  • Title = 10
  • Description = 1
  • Subject/Keywords = 1
  • Language = 1
  • Collection = 1
  • Partner = 10
  • Resource Type = 5
  • Format = 1
  • Meta Information for Record = 20

The pyuntl library has functionality built into it that calculates a completeness score from 0.0 to 1.0 based on these weights with a record with a score of 1.0 beings “complete” or at least “minimally viable” and records with a score lower than 1.0 being deficient in some way.

In addition to this calculated metric we try and provide metadata creators with visual cues indicating that a “required” field is missing or partial.  The image below is an example of what is shown to a metadata editor as they are creating metadata records.

Metadata Editing Interface for the UNT Libraries

Detail of fields sidecar for uncompleted metadata record

Metadata editing interface for partially completed record in the UNT Libraries Digital Collections

Detail of fields sidecar for partially completed metadata record.

Our hope is that by providing this information to metadata creators,  they will know when they have created at least a minimally viable metadata record, or if they pull up a record to edit that is not “complete” that they can quickly assess and fix the problem.

When we are indexing our items into our search system we calculate and store this metric so that we can take a look at our metadata records as a whole and see how we are doing. While we currently only use this metric behind the scenes,  we are hoping to move it more front and center in the metadata editing interface.  As it stands today here is the breakdown of records in the system.

Completeness Score Number of Records 1.0 1,007,503 0.9830508 17,307 0.9661017 12,142 0.9491525 12,526 0.9322034 1 0.7966102 3 0.779661 1

As can be seen there is still quite a bit of cleanup to be done on the records in the system in order to make the whole dataset “minimally viable” but it is pretty easy to identify which records are missing what now,  and edit them accordingly.  It can allow us to focus directly on metadata editing tasks which can be directly measured as “improvements” to the system a as a whole.

How are other institutions addressing this problem?  Are we missing something?  Hit me up on twitter if this is an area of interest.

William Denton: 3d printing a Sierpinski tetrix

planet code4lib - Tue, 2015-01-06 01:03

Late last year I got a MakerBot Replicator Mini at work for some research I’m doing. I’ve been printing off some science and math things from Thingiverse, most recently four iterations built from this customizable Sierpinski tetrix.

A Sierpinski tetrix is the three-dimensional analogue of Sierpinski triangle, which is a fractal shape generated by taking a triangle, dividing it into four equal smaller triangles, removing the the middle one, then repeating the operation on each of the three remaining triangles, and so on, forever. You end up with a shape that has a finite (and always constant) perimeter, zero area (!), and a dimension of 1.585 (less than the two-dimensional figure it seems to be). That’s pretty wild.

The Wikipedia and Wolfram MathWorld articles explain more and have diagrams and math. Here are some pictures of what I printed.

Three Sierpinski tetrixes (levels 1, 2, 3)

With the tetrix, you start with a tetrahedron, divide it up and remove the middle bit so you have four equal smaller tetrahedra, as shown on the left. Repeat the process with each of the four tetrahedra and you get 16 (4^2) smaller tetrahedra, as shown in the middle. Repeat again to the third iteration and you have 64 (4^3) smaller tetrahedra, as shown on the right.

At the limit you end up with a shape with a constant surface area (unchanged through each iteration), zero volume (!), and 2 dimensions (!!). You can’t print to an infinite level of detail on this printer, which is understandable, but it turns out you can’t reliably print to the fourth iteration, either.

Four Sierpinski tetrixes (levels 1, 3, 2, 4)

There are supposed to be 256 (4^4) little tetrahedra in that one on the right, but it glitched up while printing and went wonky, and then it broke apart when I removed the supporting material from the sides. It took over five hours to print.

I’ll try again (and I’d like to try it on a better printer) but if that doesn’t work I can print three more level-3 iterations and put them together, because you don’t have to make a big thing smaller, you can also make small things bigger.

Stacked Sierpinski tetrixes

I like the malformed fourth iteration … it looks like the ruin of a futuristic building, decayed after the nanobots that maintained it lost their energy and weird climate change-adapted vines started to grow.

LITA: Do You Really Need a CMS?

planet code4lib - Mon, 2015-01-05 23:58

By Thecodeintellects (Own work) [CC BY-SA 3.0 (], via Wikimedia Commons

No, this post isn’t about another license, credential, or degree to put after your name. CMS stands for content management system, and in this case I’m referring to any of the applications that allow for publishing, editing, and organizing of content on web pages. Content management systems are powerful tools that make it easy to create, manage, and update websites and web content.

But do you really need a content management system for your website? Due to their wide range of capabilities, these systems can be very large and slow, which might not be a suitable trade-off if you’re trying to build a very simple website. Below, I outline some* considerations you should make before deciding whether to use a CMS.


How much content will be posted, and how often? If you only have a fixed amount of content to post — maybe you just need the basics, like library hours, location, contact information, etc. — then you can get away with coding the pages yourself. However, if you’re planning on a lot of publishing activity, a CMS can be a time-saver in several ways. For one, most content management systems will provide you with a way to view a list of all the content you created, and let you perform batch actions such as publishing/unpublishing and deleting content. Furthermore, a CMS will provide a simple, familiar interface to input your content — whether it’s text, images, PDFs, video, etc. — which means your users won’t need any HTML expertise in order to make contributions to the website.

There are other benefits of a CMS’s graphical user interface (GUI). It allows greater control over content by enabling you to make certain fields mandatory, like a title and author name. Additionally, the CMS will automagically tag those fields in the rendered HTML, so you can customize the look of each field through the CSS stylesheet.

Another content consideration to ask is, Will you embed dynamic content from other sources, such as social media? Most popular content management systems have extensions (a.k.a. modules or plugins) that will display the content from your social media accounts directly on your website. However, if you will you do little more than post the occasional Flickr photo and YouTube video, then a CMS will be overkill if you already know how to embed externally-hosted photos and videos in HTML pages.


How many users will post content? A CMS does more than content management — it also does user management. This is especially useful if you have several types of content and you need to assign user permissions based on content type. For example, you may want to give your reference librarians permission to publish blog posts, but you might not want them editing the page on computer use policy.


What are the web development and design skill levels of you/your staff? The benefit of using a CMS is that a relatively simple installation process lets you skip the development phase, and the theme marketplace lets you skip the design phase. You can have an attractive, functioning (if only basic) website up and running in less than a day’s work.

If you and your staff lack the technical skills, but have sufficient monetary resources to hire someone to develop and design a website, then you’ll also have to factor in the cost of maintaining the site once it’s live.

Another consideration is your users’ technical abilities. You may have some users who are very comfortable with embedding images, and you may have other users who have a difficult time even with a simple web form. If you care at all about accessibility — and you do, right?! — then you should also consider technical/web ability as an accessibility concern. Whether you decide to go with a CMS or not, cater to your users.


If you decide to use a content management system, there is no shortage of options, and the most popular today are Drupal, Joomla, and WordPress. But I wanted to close this post with some alternative options.

The intrepid web developer may want to roll her own CMS, perhaps with the help of a web application framework such as Yii or Zend. For organizations that lack the technical skills or time and money, there are website builder services such as Weebly and Squarespace that will help you get a slick-looking website up with minimal time and effort.

If you really don’t have much content to post, and your discovery vendor allows access, why not piggy-back on your online catalog and add your custom pages there?

*This isn’t a complete list of considerations. Let us know in the comments what I’ve missed!

District Dispatch: IRS officials to discuss library tax form program

planet code4lib - Mon, 2015-01-05 23:52

Photo by AgriLifeToday via Flickr

Want to comment on the Internal Revenue Service’s (IRS) tax form delivery service? Discuss your experiences obtaining tax forms for your library at “Tell the IRS: Tax Forms in the Library,” a session that takes place during the 2015 American Library Association (ALA) Midwinter Meeting in Chicago. The session will be held from 11:30 a.m. to 12:30 p.m. on Sunday, February 1, 2015, in the McCormick Place, the Chicago Convention Center, room W187.

Trish Evans, administrator of distribution for the IRS, will lead the discussion that will explore library participation in the agency’s Tax Forms Outlet Program (TFOP). The TFOP offers tax forms and products to the American public primarily through participating libraries and post offices.

View other ALA Washington Office Midwinter Meeting conference sessions

The post IRS officials to discuss library tax form program appeared first on District Dispatch.


Subscribe to code4lib aggregator