You are here

Feed aggregator

District Dispatch: 5 things you shouldn’t forget this holiday season

planet code4lib - Mon, 2015-11-16 18:39

5.  Thaw the turkey!
4.  Accidentally don’t remember to buy glitter.
3.  Stock up on elf repellent
2.  Open the fireplace flue all the way (whether you’re expecting Santa or not).
1.  Bake an “Advocake”!

“What the ^%$&#@?!@ is an Advocake?” we hear you saying. Glad you asked! We happen to have the recipe right here. There’s no better way to maximize holiday satisfaction than by wishing Members of Congress a happy holiday and inviting them for a quick visit to their local library while they are back in town. Here’s how to take advantage of the holiday recess and use it for library advocacy!

The post 5 things you shouldn’t forget this holiday season appeared first on District Dispatch.

Harvard Library Innovation Lab: Link roundup November 16, 2015

planet code4lib - Mon, 2015-11-16 18:03

This is the good stuff.


Make amazing shirts from web image searches. Love this. Fog is a winner. Grass too.

Rebellious Group Splices Fruit-Bearing Branches Onto Urban Trees | Mental Floss

Guerrilla Grafters splice fruit-bearing branches onto urban trees

Idea Sex: How New Yorker Cartoonists Generate 500 Ideas a Week – 99u

“One idea is never enough”

Google Cardboard’s New York Times Experiment Just Hooked a Generation on VR

The new (Cardboard) made of the old (cardboard) bundled with the old (printed newspaper).

Amazon is opening its first physical bookstore today | The Verge

Amazon opens a store

Islandora: Islandora Fundraising

planet code4lib - Mon, 2015-11-16 15:00

The Islandora Foundation is growing up. As a member-supported nonprofit, we have been very fortunate to have the support of more than a dozen wonderful universities, private companies, and like-minded projects - enough support that within our first year of operation, we were solvent. As of 2015, we now have a small buffer in our budget, which is a comfortable place to be.

But comfortable isn't enough. Not when our mission is to steward the Islandora project and ensure that it is the best software that it can be. With the launch of Fedora 4 last December, we started work on a version of Islandora that would work with this new major upgrade to the storage layer of our sites, recognizing that our community is going to want and need to move on to Fedora 4 someday and we had better be ready with a front-end for them when the time comes. Islandora 7.x-2.x was developed to the prototype stage with special funding from some of our supporters, and development continues by way of volunteer sprints. Meanwhile, Islandora 7.x-1.x (which works with Fedora 3) continues to be supported and improved - also by volunteers

It's a lot to coordinate, and we have determined through consultation with our interest groups, committees, and the community in general that in order to do this right, we need to have someone with the right skill set dedicated to coordinating these projects. We need a Tech Lead.

Right now, the Islandora Foundation has a single employee (*waves*). I am the Project & Community Manager, which means I work to support community groups and initiatives, organize Islandora events, handle communications (both public and private) and promotions, and just generally do everything I can to help our many wonderful volunteers to do the work that keeps this project thriving. We've been getting by with that because many of the duties that would belong to a Tech Lead have been fulfilled by members of the community on a volunteer basis, but we are swiftly outgrowing that model. The Fedora 4 project that inspired us to take on a new major version of Islandora has had great success with a two person team of employees (plus many groups for guidance): Product Manager David Wilcox (more or less my counterpart) and Tech Lead Andrew Woods. 

Now to the point: we need money. We have a confirmed membership revenue of $86,000 per year*, which is plenty for one employee plus some travel and general expenses, but not enough to hire this second position that we need to get the project to the next level. About a month ago I contacted many of the institutions in our community to see if they could consider becoming members of the Islandora Foundation, and we had a gratifying number of hopeful responses (thank you to those folks!), but we're still short of where we need to be. 

And so, the Funding Lobster (or Lobstometre). In the interest of transparency, and perhaps as motivation, this little guy is showing you exactly where things stand with our Tech Lead goal. If we get $160,000 in memberships we can do it (but we'll be operating without a net), $180,000 and we're solid, and if we hit $200,000 or above that's just unmitigated awesome (and would get turned into special projects, events, and other things to support the community). He's the Happy Lobster, and not the Sad Lobster, because we do believe we'll get there with your help, and soon.

How can you help? Become a member. While it would be great if we could frame this as a funding drive and take one-time donations, since the goal is to hire a real live human being who will want to know that they can pay their rent and eat beyond their first year of employment, we need to look for renewable commitments. Our membership levels are as follows:

Institutional Membership:

  • Member - $2000
  • Collaborator - $4000
  • Partner - $10,000

Individual Membership:

  • $10 - $250+ (at your discretion)

There are many benefits to membership, including things like representation on governing committees and discounts at events. Check out the member page or drop me an email if you want to know more.

Many thanks,

- Melissa

* some of our members were able to allocate more funding to support 7.x-2.x development than their typical membership dues. It is currently unknown how many will be able to maintain that funding level at renewal, but yearly membership revenue could be as high as $122,000. I went with the number we can be sure of.

Mark E. Phillips: Finding figures and images in Electronic Theses and Dissertations (ETD)

planet code4lib - Mon, 2015-11-16 14:17

One of the things that we are working on at UNT is a redesign of The Portal to Texas History’s interface.  In doing so I’ve been looking around quite a bit at other digital libraries to get ideas of features that we could incorporate into our new user experience.

One feature that I found that looked pretty nifty was the “peek” interface for the Carolina Digital Repository. They make the code for this interface available to others to use if they are interested via the UNC Libraries GitHub in the peek repository.  I think this is an interesting interface but I had the question still of “how did you decide which images to choose”.  I came across the peek-data repository that suggested that the choosing of images was a manual process, and I also found a powerpoint presentation titled “A Peek Inside the Carolina Digital Repository” by Michael Daines that confirmed this is the case.  These slides are a few years old so I don’t know if the process is still manual.

I really like this idea and would love to try and implement something similar for some of our collections but the thought of manually choosing images doesn’t sound like fun at all.  I looked around a bit to see if I could borrow from some prior work that others have done.  I know that the Internet Archive and the British Library have released some large image datasets that appear to be the “interesting” images from books in their collections.

Less and More interesting images

I ran across a blog post by Chris Adams who works on the World Digital Library at the Library of Congress called “Extracting images from scanned book pages” that seemed to be close to what I wanted to do,  but wasn’t exactly it either.

I remembered back to a Code4Lib Lightning Talk a few years back from Eric Larson called “Finding image in book page images” and the companion GitHub repository picturepages that contains the code that he used.   In reviewing the slides and looking at the code I think I found what I was looking for,  at least a starting point.


What Eric proposed for finding interesting images was that you would take an image, convert it to grayscale, increase the contrast dramatically, convert this new images into a single pixel wide image that is 1500 pixels tall and sharpen the image.  That resulting image would be inverted,  have a threshold applied to it to convert everything to black or white pixels and then it would be inverted again.  Finally the resulting values of either black or white pixels are analyzed to see if there are areas of the image that are 200 or more pixels long that are solid black.

convert #{file} -colorspace Gray -contrast -contrast -contrast -contrast -contrast -contrast -contrast -contrast -resize 1X1500! -sharpen 0x5 miff:- | convert - -negate -threshold 0 -negate TXT:#{filename}.txt`

The script above which uses ImageMagick to convert an input image to greyscale, calls contrast eight times, resizes the image and the sharpens the result. It pipes this file into convert again, flips the colors, applies and threshold and flips back the colors. The output is saved as a text file instead of an image, with one line per pixel. The output looks like this.

# ImageMagick pixel enumeration: 1,1500,255,srgb ... 0,228: (255,255,255) #FFFFFF white 0,229: (255,255,255) #FFFFFF white 0,230: (255,255,255) #FFFFFF white 0,231: (255,255,255) #FFFFFF white 0,232: (0,0,0) #000000 black 0,233: (0,0,0) #000000 black 0,234: (0,0,0) #000000 black 0,235: (255,255,255) #FFFFFF white 0,236: (255,255,255) #FFFFFF white 0,237: (0,0,0) #000000 black 0,238: (0,0,0) #000000 black 0,239: (0,0,0) #000000 black 0,240: (0,0,0) #000000 black 0,241: (0,0,0) #000000 black ...

The next step was to loop through each of the lines in the file to see if there was a sequence of 200 black pixels.

I pulled a set of images from an ETD that we have in the UNT Digital Library and tried a Python port of Eric’s code that I hacked together.  For me things worked pretty well, it was able to identify the images that I would have manually pulled as pages that were “interesting” on my own.

But there was a problem that I ran into,  the process was pretty slow.

I pulled a few more sets of page images from ETDs and found that for those images it would take the ImageMagick convert process up to 23 seconds per images to create the text files that I needed to work with.  This made me ask if I could actually implement this same sort of processing workflow with just Python.

I need a Pillow

I have worked with the Python Image Library (PIL) a few times over the years and had a feeling it could do what I was interested in doing.  I ended up using Pillow which is a “friendly fork” of the original PIL library.  My thought was to apply the same processing workflow as was carried out in Eric’s script and see if doing it all in python would be reasonable.

I ended up with an image processing workflow that looks like this:

# Open image file im = # Convert image to grayscale image g_im = ImageOps.grayscale(im) # Create enhanced version of image using aggressive Contrast e_im = ImageEnhance.Contrast(g_im).enhance(100) # resize image into a tiny 1x1500 pixel image # ANTIALIAS, BILINEAR, and BICUBIC work, NEAREST doesn't t_im = e_im.resize((1, 1500), resample=Image.BICUBIC) # Sharpen skinny image file st_im = t_im.filter(ImageFilter.SHARPEN) # Invert the colors it_im = ImageOps.invert(st_im) # If a pixel isn't black (0), make it white (255) fixed_it_im = it_im.point(lambda x: 0 if x < 1 else 255, 'L') # Invert the colors again final = ImageOps.invert(fixed_it_im)

I was then able to iterate through the pixels in the final image with the getdata() method and apply the same logic of identifying images that have sequences of black pixels that were over 200 pixels long.

Here are some examples of thumbnails from three ETDs,  first all images and then just the images identified by the above algorithm as “interesting”.

Example One

Thumbnails for ark:/67531/metadc699990/ including interesting and less visually interesting pages.

Thumbnails for ark:/67531/metadc699999/ with just visually interesting pages shown.



Example Two

Thumbnails for ark:/67531/metadc699999/ including interesting and less visually interesting pages.

Thumbnails for ark:/67531/metadc699999/ with just visually interesting pages shown.

Example Three

Thumbnails for ark:/67531/metadc699991/ including interesting and less visually interesting pages.

Thumbnails for ark:/67531/metadc699991/ with just visually interesting pages.

So in the end I was able to implement the code in Python with Pillow and a fancy little lambda function.  The speed was much improved as well.  For those same images that were taking up to 23 seconds to process with the ImageMagick version of the workflow,  I was able to process them in a tiny bit over a second with this Python version.

The full script I was using for these tests is below. You will need to download and install Pillow in order to get it to work.

I would love to hear other ideas or methods to do this kind of work, if you have thoughts, suggestions, or if I missed something in my thoughts, please let me know via Twitter.


D-Lib: Reminiscing About 15 Years of Interoperability Efforts

planet code4lib - Mon, 2015-11-16 14:14
Opinion by Herbert Van de Sompel, Los Alamos National Laboratory and Michael L. Nelson, Old Dominion University

D-Lib: MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide

planet code4lib - Mon, 2015-11-16 14:14
Article by Vetle I. Torvik, University of Illinois at Urbana-Champaign

D-Lib: Structured Affiliations Extraction from Scientific Literature

planet code4lib - Mon, 2015-11-16 14:14
Article by Dominika Tkaczyk, Bartosz Tarnawski and Lukasz Bolikowski, Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Poland

D-Lib: PubIndia: A Framework for Analyzing Indian Research Publications in Computer Science

planet code4lib - Mon, 2015-11-16 14:14
Article by Mayank Singh, Soumajit Pramanik and Tanmoy Chakraborty, Indian Institute of Technology, Kharagpur, India

D-Lib: Using Scenarios in Introductory Research Data Management Workshops for Library Staff

planet code4lib - Mon, 2015-11-16 14:14
Article by Sam Searle, Griffith University, Brisbane, Australia

D-Lib: Collaborative Construction of Digital Cultural Heritage: A Synthesis of Research on Online Sociability Determinants

planet code4lib - Mon, 2015-11-16 14:14
Article by Chern Li Liew, Victoria University of Wellington, New Zealand

D-Lib: Semantometrics in Coauthorship Networks: Fulltext-based Approach for Analysing Patterns of Research Collaboration

planet code4lib - Mon, 2015-11-16 14:14
Article by Drahomira Herrmannova, KMi, The Open University and Petr Knoth, Mendeley Ltd.

D-Lib: Efficient Table Annotation for Digital Articles

planet code4lib - Mon, 2015-11-16 14:14
Article by Matthias Frey, Graz University of Technology, Austria and Roman Kern, Know-Center GmbH, Austria

D-Lib: NLP4NLP: The Cobbler's Children Won't Go Unshod

planet code4lib - Mon, 2015-11-16 14:14
Article by Gil Francopoulo, IMMI-CNRS + TAGMATICA, France; Joseph Mariani, IMMI-CNRS + LIMSI-CNRS, France; Patrick Paroubek, LIMSI-CNRS, France

D-Lib: Holiday Reading

planet code4lib - Mon, 2015-11-16 14:14
Editorial by Laurence Lannom, CNRI

D-Lib: Developing Best Practices in Digital Library Assessment: Year One Update

planet code4lib - Mon, 2015-11-16 14:14
Article by Joyce Chapman, Duke University Libraries, Jody DeRidder, University of Alabama Libraries and Santi Thompson, University of Houston Libraries

D-Lib: The OpenAIRE Literature Broker Service for Institutional Repositories

planet code4lib - Mon, 2015-11-16 14:14
Article by Michele Artini, Claudio Atzori, Alessia Bardi, Sandro La Bruzzo, Paolo Manghi and Andrea Mannocci, Istituto di Scienza e Tecnologie dell'Informazione "A. Faedo" -- CNR, Pisa, Italy


Subscribe to code4lib aggregator