You are here

Feed aggregator

DPLA: DPLA at ALA Annual 2015 in San Francisco, June 25-30

planet code4lib - Mon, 2015-06-22 14:00

The American Library Association’s (ALA) annual conference is right around the corner, so we here at the DPLA have pulled together a nifty little schedule of talks, panels, and presentations that feature members of our staff, Board, Committees, and Hubs. Sessions involving DPLA staff are marked [S], while sessions involving Board, Committee, or Hub members are marked with [A], for ‘affiliate’.


[S] Knight Foundation Grantee Demo Booth, Day #1
2:00 PM – 3:00 PM / Moscone Convention Center, Exhibit Hall, Booth 3731

Connect with DPLA staffers at the Knight Foundation Grantee Demo Booth on Monday, June 29 from 10 AM – 11 AM.

Participants: Dan Cohen (DPLA Executive Director), Emily Gore (DPLA Director for Content), Amy Rudersdorf (DPLA Assistant Director for Content)


[S] Knight Foundation Grantee Demo Booth, Day #2
12:00 PM – 1:00 PM / Moscone Convention Center, Exhibit Hall, Booth 3731

Connect with DPLA staffers at the Knight Foundation Grantee Demo Booth on Sunday, June 28 from 12 PM – 1 PM.

Participants: Dan Cohen (DPLA Executive Director), Emily Gore (DPLA Director for Content), Amy Rudersdorf (DPLA Assistant Director for Content)

[S]  What’s Next for the Digital Public Library of America
4:30 PM – 5:30 PM / Moscone Convention Center, Room 131 (N)
After a busy first two years of bringing together the collections of America’s libraries, archives, and museums, the Digital Public Library of America is looking ahead to the next few years and some important strategic initiatives. DPLA will seek to complete its national network of hubs, work to simplify and harmonize right statements, make an effort to improve the landscape for ebooks, and create new technical infrastructure. DPLA staff will briefly detail some of these efforts, and will respond to questions from the audience. There will be ample time to mingle and interact with the staff and others from the DPLA community who are attending the ALA Annual meeting.

Speakers: Dan Cohen (DPLA Executive Director), Emily Gore (DPLA Director for Content), Amy Rudersdorf (DPLA Assistant Director for Content)



[A] CLA Preconference: Relationship-building and Community Engagement
8:30 AM – 4:00 PM / Moscone Convention Center, Room 2010 (W)
Survival of the public library is about relevance, and the key to relevance is engagement. That’s our future. Engagement, with customers, community, stakeholders, partners, and staff, is about people being in relationships. Public libraries need to approach relationships with the confidence that we have something of value to offer, and clarity about what we hope to gain from others that will move our strategic initiatives forward. In this session we’ll explore the various meanings of community engagement, talk about staff engagement, and discuss what it takes to build relationships in both our outward and inward worlds. We’ll hear about strategies for building productive relationships with staff, communities, partners, and stakeholders. We’ll talk about how to rightsize our relationships – recognizing there should be a correlation between the level of effort we put into nurturing relationships and the value we both offer and receive. We’ll discuss how to seek out strategic relationships that align to organizational priorities, and practice having conversations to build relationships in which you might have something to teach, want to learn, or hope to collaborate. Please show up ready to be engaged, interactive, and appreciative of all that is offered, you contribute, and acquire in this daylong session. Our guest speakers are Susan Hildreth, Gary Wasdin, Luis Herrera, and Jan Sanders. Cheryl Gould and Sam McBane Mulford will facilitate the workshop. If you are a member of CLA use special code AFL2015 to receive the price of $219.

Speakers: Cheryl Gould, Gary Wasdin, Jan Sanders, Luis Herrera (City Librarian, San Francisco Public Library, and member of the DPLA Board of Directors), Sam McBane Mulford, Susan Hildreth


[A]  Looking to the Future: Strategic Foresight and Scenario Planning
9:00 AM – 4:00 PM / Moscone Convention Center, Room 228-230 (S)
Go beyond trend spotting and learn how professional futurists leverage strategic foresight tools and approaches to look and see the big picture of where we are headed. Join our presenters, two consultants trained in Foresight by the University of Houston, as they lead hands-on activities where you will learn tools and techniques you can leverage for creating future scenarios at your own organization.

Speakers: Jamie Hollier (Member of the DPLA Board of Directors), Jen Chang

[A]  Building the New Nostalgia: Making the Case for Why Libraries Matter
2:00 PM – 3:o0 PM / Marriott Marquis San Francisco, Yerba Buena Salon 13-15
The new book “BiblioTech” argues that libraries are crucial institutions for serving Americans’ 21st century information needs, but that they are also at risk. Librarians and their allies explore how we can best position libraries to thrive in the digital age by leveraging existing—and new—assets amid dwindling government support.

Moderators: Carol Coletta (VP of Community and National Initiatives, John S. and James L. Knight Foundation), John Bracken (VP of Media Innovation, John S. and James L. Knight Foundation)

Speakers: Dale Dougherty (Founder and Executive Chairman, Maker Media), John Palfrey (Head of School, Phillips Academy, and former Chair of DPLA Board of Directors), Meaghan O’Connor (Assistant Director, Programs and Partnerships, District of Columbia Public Library)


[A]  Herding the fuzzy bits: What do you do after crowdsourcing?
8:30 AM – 10:00 AM / Moscone Convention Center, Room 133 (N)
Once you’ve invited the crowd to help, how do you use what they provide? Presenters will share ideas for incorporating crowdsource-enhanced data from many sources (flickr, transcription, twitter) back into collections, along with approaches–including “whoopsies” and remaining challenges–for quality control, data discovery, data disagreement, building communities, and scalability. The session will be interactive on Twitter locally and remotely, and will include fun activities to demonstrate some of the issues and methods at play. Additional information will be made available in an open online space for viewing and editing during the session at:

Speakers: Grace Costantino (Outreach and Communication Manager, Biodiversity Heritage Library, Smithsonian Institution Libraries), Jacqueline Chapman (Digital Collections Librarian, Biodiversity Heritage Library, Smithsonian Institution Libraries), Martin Kalfatovic (Associate Director, Digital Services and Program Director, Biodiversity Heritage Library, Smithsonian Institution Libraries), Suzanne Pilsk, Head, Metadata Unit, Smithsonian Institution Libraries

[S]  Data Clean-up: Let’s Not Sweep it Under the Rug
1:00 PM – 2:30 PM / Moscone Convention Center, Room 2022 (W)
Data migration is inevitable in a world in which technological infrastructures and data standards continue to evolve. Whether you work in a catalog database or a digital library/archives/institutional repository, working with library resource data means that you will eventually be required to usher data from one system or standard to another. Three speakers working in different library contexts will share their data normalization experiences.

Speakers: Amy Rudersdorf (DPLA Assistant Director for Content), Kyle Banerjee (Digital Collections and Metadata Librarian, Oregon Health and Science University), Terry Reese (Associate Professor, Head, Digital Initiatives, Ohio State University)


[A]  Getting Started with Library Linked Open Data: Lessons from UNLV and NCSU
8:30 AM – 9:30 AM / Moscone Convention Center, Room 2002 (W)

This program will focus on the practical steps involved in creating and publishing linked data including data modeling, data clean up, enhancing the data with links to other data sets, converting the data to various forms of RDF, and publishing the data set. At each step of the process, the speakers will share their experiences and the tools they used to give the audience multiple perspectives on how to approach linked data creation.

Speakers: Cory Lampert (Head, Digital Collections, University of Nevada, Las Vegas, and former DPLA Community Rep), Eric Hanson (Electronic Resources Librarian, North Carolina State University Libraries), Silvia Southwick, (Digital Collections Metadata Librarian, University of Nevada, Las Vegas)

[A]  How to Work with Government Officials on Community Wide Issues
8:30 AM – 9:30 AM / Moscone Convention Center, Room 2016 (W)
A panel discussion of library leaders and local officials. Discussions will center around the library as an important community partner/leader and how we can lead change in our local community.

Speakers: Hydra Mendoza (Mayor’s Education Policy Advisor and SF Unified School District Board member), Karen Danczak Lyons (Director, Evanston Public Library), Luis Herrera (City Librarian, San Francisco Public Library, and member of the DPLA Board of Directors), Siobhan Reardon (Director, Free Library of Philadelphia), Wally Bobkiewicz (Evanston (Ill.) City Manager)

[A]  Transforming Neighborhoods, One Library at a Time: The San Francisco Experience
10:30 AM – 11:30 AM / Moscone Convention Center, Room 121 (N)
The San Francisco Public Library’s Branch Library Improvement Program (BLIP) was the largest capital project in the library’s history and completed transformed the neighborhood branch system. During its 14 year span, the $200 million program under took the renovation of sixteen neighborhood libraries and eight new buildings. The completion of the ambitious program resulted in seismically safe, ADA accessible, 21st Century libraries. Each project required strong community engagement in the midst of one of the more politically charged cities in the nation.

Speakers: Charles Higueras, Jewell Gomez, Luis Herrera (City Librarian, San Francisco Public Library, and member of the DPLA Board of Directors), Mindy Linetzky


[S][A] Digital Archiving for Humans
10:30 AM – 11:30 AM / Moscone Convention Center, Room 120 (N)
Socializing the archive! Hear about the overlaps and differences between traditional archives and digital startups engaging the social web to make material more interoperable, searchable and usable.

Speakers: Alexis Rossi (Director of Web Services, Internet Archive), Anne Wootton (Co-founder & CEO, Pop Up Archive), Dan Cohen (DPLA Executive Director), P. Toby Graham, (University Librarian and Associate Provost, University of Georgia Libraries)

LITA: Tips for Improving Onsite Workshops

planet code4lib - Mon, 2015-06-22 14:00

The catchy all-encompassing title

Courtesy of Jirka Matousek (2012). Flickr

The title of the program is the catch. It serves as a brief description and hooks the interested party into reading the scope and objectives of the program. When a potential participant is browsing through a list of upcoming workshops from an e-mail, website or course catalog, certain terms/phrases will be the only reason for them to read the course description. “Building a Successful Website” is not as provocative as “Website Management with Google Analytics.” Usually the length of the ;8course name does not make a difference unless it requires two lines. Keep in mind your audience. Busy people are inundated with information. When you’re a member of multiple Listservs, you’ll receive an excessive amount of emails a day. I personally scan my list of new e-mails for subject lines that interest me, reading them and delete the rest. The title can function as a minor descriptor of what the course entails. It is also a summary of the main objectives of the course. If you’re only going to refer to Google Analytics for fifteen minutes during a two-hour workshop, then don’t put it in the title. Workshop participants will feel that you have wasted their time if you create a misleading description of your course.

Set objectives and goals upfront

The list of objectives can be a deciding factor. Providing a course outline ahead of time is an often overlooked concept. I personally like to pace myself by being aware of which topics will be included and for what length of time. There have been many times when, after receiving the course outline in class, I realize that the topic I was interested in is not being covered or is a small component of the lecture. I feel that workshop coordinator’s are either still revising their outline or guarding it like a trade secret. A lecture outline, with timetables, is a great resource for the attendee to have upfront and it also works as a time management tool for lecturers to prepare from. It’s a great organization tool for everyone involved.

Make time for questions

For short-term workshops answer questions after the workshop. Believe it or not, you can easily get off track and end up answering questions instead of meeting your objectives. If you are taking questions during a lecture, don’t hesitate to interrupt in order to get back on track. Also, asking participants to write down questions that come up, on a sheet of paper, for later is a great idea. As a presenter you should be prepared for a cold crowd. Sometimes participants don’t have immediate questions. Ahead of time, make a list of common questions that are asked about the topic. At the end of the workshop, if no one responds to your prompt for questions, be prepared to present those frequently asked questions. Provide your contact information so that they may contact you if they have follow-up questions after the workshop has ended.

The phrase “refreshments will be served” goes a long way

Food…and snacks. Everyone loves free food. Feeding a group of 30 can be pricey. If you charge a nominal fee for attendance it can be like crowd funding for group catering. Most workshops can cost upwards of $200 or more for attendance. If you charge everyone $10, it will be more inviting to attend and the payment easily covers the cost of catering for a sizable group. Refreshments go a long way and are highly appreciated during workshop breaks. One of the things about serving heavier food at workshops is that participants will be so busy trying to eat their meal that they won’t have time to mingle. Keep it light and simple. Additionally when serving food, consider food allergies, vegetarians, vegans and other special diets. In other words, don’t place peanut butter cookies next to the fresh fruit bowl.

There is always room for improvement

Conducting a user survey is one way to gauge the user experience of your attendees. You will want to know if any improvements are needed in terms of the presenter, handouts/materials, technology, seating arrangement, number of breaks, disability/accessibility accommodations, etc. You should also include an option to suggest other course topics they are interested in for the next class.
Rating systems are great, but don’t make them complicated. The goal is improvement, but you don’t want to make the process difficult or you will not receive thorough and complete responses. This would defeat the purpose and effort of conducting surveys. You may want to consider making them anonymous. Getting someone to participate in a survey that will be somehow associated with them may not be an easy task. Anonymity allows everyone to respond honestly without fear of the instructor/ coordinators knowing who they are. Consider if the format of the survey should be web or paper based. Web-based in easy and convenient. They are available for as long as your survey service will allow and can be fast and convenient for people to complete when they have time. Paper-based surveys are also effective and can be done in class. The workshop will be the best time to have their undivided attention. Scheduling time at the end of the workshop to conduct the survey is a beneficial option because participants will better recall their experience. Give the participants a reasonable amount of time to complete the survey. If deciding on paper, digitization for long term review is an option, but consider recycling. Years of using paper based survey’s can leave a hefty carbon footprint. The survey should focus on the class and not the instructor. A participant’s experience in the class will automatically be a reflection of their review of the class and the instructor. You can include a few questions about the instructor, but you want a survey that is evaluating the worth of the class.

If you build it, they will come. Do you have unique tips for creating a successful onsite workshop? Please share them in the comments section.

Open Library Data Additions: Amazon Crawl: part er

planet code4lib - Mon, 2015-06-22 09:56

Part er of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

DuraSpace News: Tecnalia Research and Innovation Launches its Institutional Repository

planet code4lib - Mon, 2015-06-22 00:00

From Emilio Lorenzo, Arvo Consulting 

Tecnalia Research & Innovation Foundation, a technological applied-research centre with over 1500 employees, teamed up with Arvo Consulting, to build its Institutional Repository. The main objective was to expose and provide wider visibility to its multidisciplinary research results.

Open Library Data Additions: Amazon Crawl: part cf

planet code4lib - Sun, 2015-06-21 14:15

Part cf of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Data, Metadata, Text

Patrick Hochstenbach: Homework assignment #6 Sketchbookskool #BootKamp

planet code4lib - Sat, 2015-06-20 12:48
Filed under: Sketchbook Tagged: art, drawing, portrait, sketch, sketchbook, sketchbookskool

Patrick Hochstenbach: Homework assignment #5 Sketchbookskool #BootKamp

planet code4lib - Sat, 2015-06-20 12:46
Filed under: Sketchbook Tagged: art, inspiration, picasso, sketch, sketchbook, sketchbookskool

State Library of Denmark: Dubious guesses, counted correctly

planet code4lib - Fri, 2015-06-19 21:55

We do have a bit of a performance challenge with heavy faceting on large result sets in our Solr based Net Archive Search. The usual query speed is < 2 seconds, but if the user requests aggregations based on large result sets, such as all resources from a whole year, processing time jumps to minutes. To get an idea of how bad it is, here’s a chart for response times when faceting on a field with 640M unique values.

Faceting performance for field links with 600M unique values on a 900GB / 250M document index

Yes, the 80M hits query does take 16 minutes! As outlined in Heuristically correct top-X facets, it seems possible to use sampling to determine the top-X terms of the facet result and then fine count only those terms. The first version of heuristically correct top-X facets has now been implemented (download the latest Sparse faceting WAR to try it out), so time for evaluation.

Three facet fields

For this small scale evaluation we use just a single 900GB shard with 250M documents, generated from harvested web resources. The three fields of interests are

  • domain, with 1 value/document and 1.1M unique values. Of these, 230K are only referenced by a single document. The most popular domains are referenced by 4M documents.
    Intuitively, domain seems fitting for sampling, with relatively few unique values, not too many single instance values and a high amount of popular domains.
  • url, with 1 value/document and 200M unique values. Of these, 185M are only referenced by a single document. The most popular urls are referenced by 65K documents.
    Contrary to domain, url seems more problematic to sample, with relatively many unique values, a great deal of single value instances and not very many popular urls.
  • links, with 10 values/document and 600M unique values. Of these, 420M are only referenced by a single document. The most popular links are referenced by 8M documents.
    In between domain and url is links, with relatively many unique values, but only 10% of the 6 billion references being to single instance values and a with high amount of popular links.

Caveat lector: This test should not be seen as authoritative, but rather an indicator of trade-offs. It was done on a heavy loaded machine, so real-world performance should be better. However, the relative differences in speed should not be to far off (tested ad hoc at a time where the machine was not under heavy load).

11 very popular terms were extracted from the general text field and used as query term, to simulate queries, heavy in terms of the number of hits.

Term Hits og 77M a 54M 10 50M to 45M ikke 40M søg 33M denne 25M også 22M under 18M telefon 10M indkøbskurv  7M

The top 25 terms were requested with facet.limit=25 and sampling was performed by using only part of the result set to update the facet counters. The sampling was controlled by 2 options:

  • fraction (facet.sparse.heuristic.fraction=0.xx): How much of the total number of documents to sample. If fraction is 0.01, this means 1% or 0.01*250M = 2.5M documents. Note that these are all the documents, not only the ones in the result set!
  • chunks (facet.sparse.heuristic.sample.chunks=xxx): How many chunks to split the sampling in. If chunks is 10 and fraction is 0.01, the 2.5M sample documents will be checked by visiting the first 250K, skipping ahead, visiting another 250K etc. 10 times.

To get a measure of validity, a full count was performed for each facet with each search term. The result from the samples runs were then compared to the full count, by counting the number of correct terms from the top to the first error. Example: If the fully counted result is

  • a (100)
  • b (80)
  • c (50)
  • d (20)
  • e (20)

and the sample result is

  • a (100)
  • b (80)
  • c (50)
  • e (20)
  • f (18)

then the score would be 3. Note that the counts themselves are guaranteed to be correct. Only the terms are unreliable.

Measurements Facet field domain (1.1M unique values, 1 value/document)

First we sample using half of all documents (sample fraction 0.5), for varying amounts of chunks: c10 means 10 chunks, c10K means 10000 chunks. As facet.limit=25, highest possible validity score is 25. Scores below 10 are marked with red, scores from 10-19 are marked purple.

Term Hits c10 c100 c1K c10K c100K og 77M 19 9 25 25 25 a 54M 20 4 25 25 25 10 50M 20 5 25 25 25 to 45M 18 14 25 25 25 ikke 40M 16 15 25 25 25 søg 33M 16 15 23 25 24 denne 25M 17 18 23 24 25 også 22M 17 12 25 25 25 under 18M 4 12 23 23 25 telefon 10M 16 8 23 23 25 indkøbskurv 7M 8 2 16 21 25

Heuristic faceting for field domain with 50% sampling

Looking at this, it seems that c1k (1000 chunks) is good, except for the last term indkøbskurv, and really good for 10000 chunks. Alas, sampling with half the data is nearly the full work.

Looking at a sample fraction of 0.01 (1% of total size) is more interesting:

Term Hits c10 c100 c1K c10K c100K og 77M 4 9 24 23 25 a 54M 4 4 23 24 25 10 50M 3 4 23 25 20 to 45M 0 0 24 24 24 ikke 40M 5 13 25 24 25 søg 33M 0 0 20 21 25 denne 25M 0 0 18 22 23 også 22M 6 12 23 25 25 under 18M 3 4 22 23 24 telefon 10M 5 7 12 12 25 indkøbskurv 7M 0 1 4 16 23

Heuristic faceting for field domain with 1% sampling

Here it seems that c10K is good and c100K is really good, using only 1% of the documents for sampling. If we were only interested in the top-10 terms, the over-provisioning call for top-25 would yield valid results for both c10k and c100k. If we want all top-25 terms to be correct, over-provisioning to top-50 or something like that should work.

The results are viable, even with a 1% sample size, provided that the number of chunks is high enough. So how fast is it to perform heuristic faceting, as opposed to full count?

Faceting performance for field domain with 1% sampling

The blue line represents the standard full counting faceting, no sampling. It grows linear with result size, with worst case being 14 seconds. Sample based counting (all the other lines) also grows linear, but with worst case at 2 seconds. Furthermore the speed difference between the number of chunks is so small that choosing 100K chunks, and thereby the best chance of getting the viable results, is not a problem.

In short: Heuristic faceting on the domain field for large result sets is 4-7 times faster than standard counting, with a high degree of viability.

Facet field url (200M unique values, 1 value/document)

Heuristic faceting for field url with 1% sampling

Faceting performance for field url with 1% sampling

The speed up is a modest 2-4 times for the url field, but worse the viability is low, even when using 100000 chunks. Raising the minimum result set size for heuristic faceting to 20M hits could conceivably work, but the url field still seems a poor fit. Considering that the url field does not have very many recurring values, this is not too surprising.

Facet field links (600M unique values, 10 values/document)

Heuristic faceting for field links with 1% sampling

Faceting performance for field links with 1% sampling

The heuristic viability of the links field is just as good as with the domain field: As long af the number of chunks is above 1000, sampling with 1% yields great results. The performance is 10-30 times that of standard counting. This means that the links field is an exceptionally well fit for heuristic faceting.

Removing the full count from the chart above reveals that worst-case in this setup is 22 seconds. Not bad for a result set of 77M documents, each with 10 references to any of 600M values:

Faceting performance for field links with 1% sampling, no baseline shown


Heuristically correct faceting for large result sets allows us to reduce the runtime of our heaviest queries by an order of magnitude. Viability and relative performance is heavily dictated by the term count distribution for the concrete fields (the url field was a poor fit) and by cardinality. Anyone considering heuristic faceting should test viability on their corpus before enabling it.

Word of caution

Heuristic faceting as part of Solr sparse faceting is very new and not tested in production. It is also somewhat rough on the edges; simple features such as automatic over-provisioning has not been implemented yet.

Nicole Engard: Bookmarks for June 19, 2015

planet code4lib - Fri, 2015-06-19 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Write down a command-line to see the help text that matches each argument

Digest powered by RSS Digest

The post Bookmarks for June 19, 2015 appeared first on What I Learned Today....

Related posts:

  1. Herding Cattle
  2. Live Ink
  3. Car 5.0

Terry Reese: MarcEdit User-Group Meeting @ALA, June 26, 2015

planet code4lib - Fri, 2015-06-19 20:24

Time: 6:00 – 7:30 pm, Friday, June 26, 2015
Place: Marriott Marquis (map)
Room: Pacific H, capacity: 30


The MarcEdit user community is large and diverse and honestly, I get to meet far too few community members.  This meeting has been put together to give members of the community a chance to come together and talk about the development road map, hear about the work to port MarcEdit to the Mac, and give me an opportunity to hear from the community.  I’ll talk about future work, areas of potential partnership, as well as hearing from you what you’d like to see in the program to make your metadata live’s a little easier.  If this sounds interesting to you — I really hope to see you there.


A *big* thank you to John Chapman and OCLC for allowing this to happen.  As folks might guess, finding space at ALA can be a challenging and expensive endeavor so when I originally broached the idea with OCLC, I had pretty low expectations.  But they truly went above and beyond any reasonable expectation, working with the hotel and ALA so this meeting could take place.  And why they didn’t ask for it — they have my personal thanks and gratitude.  If you can attend the event, or heck, wish you could have but your schedule made it impossible — make sure you let OCLC know that this was appreciated.

District Dispatch: IMLS announces next immigration webinar in series

planet code4lib - Fri, 2015-06-19 20:15


Photo Credit: U.S. Citizenship and Immigration Services

On July 2, 2015, the Institute of Museum and Library Services (IMLS) and U.S. Citizenship and Immigration Services (USCIS) will host a free webinar for public librarians on the topic of immigration and U.S. citizenship. Join in to learn more about what resources are available to assist libraries that provide immigrant and adult education services. The webinar, Overview of myE-Verify, will explore a new online service for the general public. Representatives will be on hand to discuss how the service can be used to:

  • Confirm their work eligibility with Self Check
  • Create a myE-Verify account
  • Protect their Social Security number in E-Verify with Self Lock
  • Access myResources, a multimedia resource center to learn about their rights and their employer’s responsibilities.

Webinar Details:
Date: July 2, 2015
Time: 2:00 – 3:00 p.m. EDT
Click here to register

This series was developed as part of a partnership between IMLS and USCIS to ensure that librarians have the necessary tools and knowledge to refer their patrons to accurate and reliable sources of information on immigration-related topics. To find out more about the partnership and the webinar series, visit the Serving New Americans page of the IMLS website or on the USCIS website.

The post IMLS announces next immigration webinar in series appeared first on District Dispatch.

Harvard Library Innovation Lab: Link roundup June 19, 2015

planet code4lib - Fri, 2015-06-19 20:04

More rounded up than ever

Spot the Ball: Women’s World Cup 2015 –

Fun and interactive method for displaying images

Toby Glanville’s brilliant images of workers in the late 90s

“I think perhaps that a real portrait is one that suggests to the viewer that the subject portrayed is alive”

The construction of the Statue of Liberty – Google Cultural Institute

Love the windowpane slider at the bottom.

The Humans Who Dream Of Companies That Won’t Need Us | Fast Company | Business + Innovation

An army of accountant-robots is coming for you

Giphoscopes from Officina K | The Public Domain Review

Giphoscopes are hand cranked animated gifs

FOSS4Lib Recent Releases: Fedora Repository - 3.8.1

planet code4lib - Fri, 2015-06-19 17:29

Last updated June 19, 2015. Created by Peter Murray on June 19, 2015.
Log in to edit this page.

Package: Fedora RepositoryRelease Date: Wednesday, June 17, 2015

David Rosenthal: EE380 talk on eBay storage

planet code4lib - Fri, 2015-06-19 15:00
Russ McElroy & Farid Yavari gave a talk to Stanford's EE380 course describing how eBay's approach to storage (YouTube) is driven by their Total Cost of Ownership (TCO) model. As shown in this screengrab, by taking into account all the cost elements, they can justify the higher capital cost of flash media in much the way, but with much more realistic data and across a broader span of applications, that Ian Adams, Ethan Miller and I did in our 2011 paper Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes.

We were inspired by a 2009 paper FAWN A Fast Array of Wimpy Nodes in which David Andersen and his co-authors from C-MU showed that a network of large numbers of small CPUs coupled with modest amounts of flash memory could process key-value queries at the same speed as the networks of beefy servers used by, for example, Google, but using 2 orders of magnitude less power.

As this McElroy slide shows, power cost is important and it varies over a 3x range (a problem for Kaminska's thesis about the importance of 21 Inc's bitcoin mining hardware). He specifically mentions the need to get the computation close to the data, with ARM processors in the storage fabric. In this way the amount of data to be moved can be significantly reduced, and thus the capital cost, since as he reports the cost of the network hardware is 25% of the cost of the rack, and it burns a lot of power.

At present, eBay relies on tiering, moving data to less expensive storage such as consumer hard drives when it hasn't been accessed in some time. As I wrote last year:
Fundamentally, tiering like most storage architectures suffers from the idea that in order to do anything with data you need to move it from the storage medium to some compute engine. Thus an obsession with I/O bandwidth rather than what the application really wants, which is query processing rate. By moving computation to the data on the storage medium, rather than moving data to the computation, architectures like DAWN and Seagate's and WD's Ethernet-connected hard disks show how to avoid the need to tier and thus the need to be right in your predictions about how users will access the data.That post was in part about Facebook's use of tiering, which works well because Facebook has highly predictable data access patterns. McElroy's talk suggests that eBay's data accesses are somewhat predictable, but much less so than Facebook's. This makes his implication that tiering isn't a good long-term approach plausible.

District Dispatch: 3D and IP at VT

planet code4lib - Fri, 2015-06-19 14:49

From Wikipedia

3D printers are finding their way into an ever-growing number of libraries, schools, museums and universities. Together, these institutions facilitate creative learning through 3D modeling, scanning and printing. Virginia Tech is one of the latest universities within this far-reaching “creative learning community” to build a makerspace replete with cutting-edge 3D technology.

The facility, located in the Resource Center at the Virginia Tech Northern Virginia Center in Fairfax, VA, is the brainchild of Kenneth Wong, Associate Dean of the Graduate School – and Director – of the Northern Virginia Center. Under the leadership of Associate Dean Wong and the day-to-day management of an expert team of library professionals led by Coordinator Debbie Cash, the facility is open to the public, giving all people the chance to bring their imaginations to life.

On Tuesday, as part of the Obama Administration’s “Week of Making,” the Northern Virginia Center held a 3D Printing Day. Dean Wong kindly invited me to kick off the day with a talk on 3D printing and intellectual property (IP). I spent the hour talking about the copyright, trademark and patent implications of 3D printing. Over the course of the ALA Office for Information Technology Policy’s work on 3D printing, we have exhorted library professionals to be undaunted by the specter of IP infringement. Our message has been a positive one: If we’re proactive in figuring out where our rights begin and end as users and providers of 3D printers, we can set the direction of the public policy that coalesces around 3D printing technology in the coming years.

From Flickr

Makerspaces like the one at Virginia Tech underscore just how imperative it is that we get out ahead of rights holders in setting the bounds of our 3D printing IP rights. Using the printers and scanners at Virginia Tech, Dean Wong built models of the human hand in an effort to design better prosthetics, and other users have created figurines, sculptures, and more. The sort of creativity this facility enables inspires hope for the future of connected learning. To ensure that such creativity can continue unfettered, we must be fearless in our approach to intellectual property.

IP law is not simply a series of stipulations that hamstring our ability to be creative; in the context of 3D printing, it represents something of a blank slate. Copyright, patent and trademark have yet to be interpreted in the context of 3D printing. As a result, we, as 3D printing leaders, can influence the efforts of lawmakers, regulators and the courts as they work to create frameworks for the use of this technology.

The ALA Office for Information Technology Policy’s 3D Printing Task Force works to do just that. The task force is dedicated to advancing policies that will allow Dean Wong and his Virginia Tech colleagues – and other “making enthusiasts” – to democratize creation and empower people of all ages to solve personal and community problems through 3D printing.

I would like to thank Kenneth Wong and Debbie Cash for the opportunity to speak at Virginia Tech, and to congratulate the Virginia Tech Northern Virginia Center for getting a state of the art makerspace up and running. ALA wishes Dean Wong and his team all the best in their efforts to unlock opportunities for all through 3D printing, scanning and design.

The post 3D and IP at VT appeared first on District Dispatch.

Open Knowledge Foundation: What should we include in the Global Open Data Index? From reference data to civil society audit.

planet code4lib - Thu, 2015-06-18 13:05

Three years ago we decided to begin to systematically track the state of open data around the world. We wanted to know which countries were the strongest and which national governments were lagging behind in releasing the key datasets as open data so that we could better understand the gaps and work with our global community to advocate for these to be addressed.

In order to do this, we created the Global Open Data Index, which was a global civil society collaboration to map the state of open data in countries around the world. The result was more than just a benchmark. Governments started to use the Index as a reference to inform their priorities on open data. Civil society actors began to use it as a tool to teach newcomers about open data and as advocacy mechanism to encourage governments to improve their performance in releasing key datasets.

Three years on we want the Global Open Data Index to become much more than a measurement tool. We would like it to become a civil society audit of the data revolution. As a tool driven by campaigners, researchers and advocacy organisations, it can help us, as a movement, determine the topics and issues we want to promote and to track progress on them together. This will mean going beyond a “baseline” of reference datasets which are widely held to be important. We would like the Index to include more datasets which are critical for democratic accountability but which may be more ambitious than what is made available by many governments today.

The 10 datasets we have now and their score in France

To do this, we are today opening a consultation on what themes and datasets civil society think should be included in the Global Open Data Index. We want you to help us decide on the priority datasets that we should be tracking and advocating to have opened up. We want to work with our global network to collaboratively determine the datasets that are most important to obtaining progress on different issues – from democratic accountability, to stronger action on climate change, to tackling tax avoidance and tax evasion.

Drawing inspiration from our chapter Open Knowledge Belgium’s activities to run their own local open data census, we decided to conduct a public consultation. This public consultation will be divided into two parts:

Crowdsourced Survey – Using the platform of WikiSurvey, a platform inspired by kittens war (and as we all know, anything inspired by viral kittens cannot be bad), we are interested in what you think about which datasets are most important. The platform is simple, just choose between two datasets the one that you see as being a higher priority to include in the Global Open Data Index. Can’t find a dataset that you think is important? Add your own idea to the pool. You do not have a vote limit, so vote as much as you want and shape the index. SUBMIT YOUR DATA NOW

Our Wiki Survey


Focused consultation with civil society organisations – This survey will be sent to a group of NGOs working on a variety of issues to find out what they think about what specific datasets are needed and how they can be used. We will add ideas from the survey to general pool as they come in. Want to answer the survey as well? You can find it here.

This public consultation will be open for the next 10 days and will be closed at June 28th. At the end of the process we will analyse the results and share them with you.

We hope that this new process that we are starting today will lead to an even better index. If you have thoughts about the process, please do share your thoughts with us on our new forum on this topic:

Peter Murray: Thursday Threads: Let’s Encrypt is coming, Businesses want you coming to the office, OR2015 Summary

planet code4lib - Thu, 2015-06-18 10:47
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

This week’s threads:

Funding for my current position at LYRASIS runs out at the end of June, so I am looking for new opportunities and challenges for my skills. Check out my resume/c.v. and please let me know of job opportunities in library technology, open source, and/or community engagement.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

Let’s Encrypt Launch Schedule

Let’s Encrypt has reached a point where we’re ready to announce our launch schedule.

  • First certificate: Week of July 27, 2015
  • General availability: Week of September 14, 2015
Let’s Encrypt Launch Schedule, by Josh Aas, 16-Jun-2015

As you might recall from a earlier edition of DLTJ Thursday Threads, the Let’s Encrypt initiative will allow anyone who has a domain name to get an encryption certificate at no cost. Not only that, but the effort is also building software to automatically create, update, install, and securely configure those certificates. This will make it very easy for small sites — like libraries, archives, and museums — to use HTTPS-encrypted connections. There has been a great deal of talk within the library patron privacy community about how to best make this happen, including a proposal by Eric Hellman for a “Digital Library Privacy Pledge” that will encourage libraries to adopt encrypted web connections across all of their services. Keep your eye out for more about “Let’s Encrypt.”

Five trends that are reshaping your office

But lots of companies wrestling with how to get people to show their face at work, in an era where telecommuting is increasingly popular, are trying to lure them back rather than mandate it. While organizations have long embraced the benefits of “hoteling,” where employees reserve desks for themselves rather than getting a dedicated space to work every day, many are taking that concept even further, adding concierge-like staff and other perks to give workers more reasons to come onsite.

Five trends that are reshaping your office, by Jena McGregor, Washington Post, 15-Jun-2015

I’m not sure this applies to many of our offices, but it is useful to know that these things are happening. As someone who has worked remotely for the past five years, I don’t know if these kinds of perks from my employer would get me to come into an office more. It is hard to beat face-to-face interaction for its power to convey information and build community. We are using tools like Slack to reproduce that kind of interaction as best we can, and the tools are getting better at making it easier for remote teams to form cohesion and effectively get work done.

Open Repositories 2015 Summary

Technology take away from #OR2015 is the Hydra-Fedora 4 stack is shaping up to be very impressive; plus ambitious plans from #DSpace camp

— Open Repository (@OpenRepository) June 11, 2015

Tweet from @OpenRepository, as quoted by Hardly Pottinger in his 2015 Recap

That tweet is a summary of what happened at Open Repositories 2015 last week, and Hardly’s summary matches what I heard about the conference activities from a far. They keynote from Google Scholar&aposs Anurag Acharya on pitfalls and best practices for indexing repository content was a bit hit. His slides are online as are a collection of tweets curated by Eileen Clancy, and I highly recommend software developers and repository users look over these do’s and don’ts for their own systems.

Link to this post!

DuraSpace News: Amy Brand Named New Director of the MIT Press

planet code4lib - Thu, 2015-06-18 00:00

From Peter Dizikes, MIT News Office

Cambridge, MA The MIT Press has named Amy Brand PhD ’89, an executive with a wide array of experience in academic publishing and communications, as its new director. She will begin in this position on July 20.

DuraSpace News: VIVO 2015 Update: Early Registration Deadline is Tomorrow; Conference Program Released!

planet code4lib - Thu, 2015-06-18 00:00

Early Bird Registration for the VIVO 2015 Conference Ends Friday... Don't Delay! Registration is open and the lowest registration rate is only available through tomorrow, June 19th. Register online today.

The $375 Early Bird registration rate is only available through June 19th.


Subscribe to code4lib aggregator