You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 2 hours 35 min ago

Open Library Data Additions: Amazon Crawl: part 2-bb

6 hours 44 min ago

Part 2-bb of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part 2-aq

6 hours 44 min ago

Part 2-aq of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part 2-ao

7 hours 50 sec ago

Part 2-ao of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part 2-ar

7 hours 51 sec ago

Part 2-ar of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part 24

7 hours 1 min ago

Part 24 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

DPLA: Apply to host DPLAfest 2017

7 hours 2 min ago

Following a successful DPLAfest 2016 in Washington, DC, we’re looking for next year’s location for another great interactive, productive, and exciting fest. DPLAfest is an annual event that brings together hundreds of people from the cultural, education, and technology communities to celebrate the Digital Public Library of America, our many partners across the country, and our large and growing community of practitioners and members of the public who contribute to, and benefit from, DPLA.

DPLAfest 2016 was co-hosted by the Library of Congress, the National Archives, and the Smithsonian Institution. Those great institutions were proud to host over 450 attendees from across the world for two-days of discussions, workshops, hands-on activities, and fun events.

DPLAfest host organizations are essential contributors to one of the most prominent gatherings in the country involving librarians, archivists, and museum professionals, developers and technologists, publishers and authors, teachers and students, and many others who work together to further the mission of providing maximal access to our shared cultural heritage. For colleges and universities, DPLAfest is the perfect opportunity to directly engage your students, educators, archivists, librarians and other information professionals in the work of a diverse national community of information and technology leaders. For public libraries, hosting DPLAfest brings the excitement and enthusiasm of our community right to your hometown, enriching your patrons’ understanding of library services through free and open workshops, conversations, and more. For museums, archives, and other cultural heritage institutions, it’s a great way to promote your collections and spotlight innovative work taking place at your organization. Hosting DPLAfest also affords the chance to promote your institution nationally and internationally, given the widespread media coverage of DPLAfest and the energy around the event.

If this opportunity sounds right for you and your organization, let us know! We are calling on universities and colleges, public libraries, archives, museums, historical societies, and others to submit expressions of interests to serve as hosts or co-hosts for DPLAfest 2017, which will take place in mid-April 2017.

To apply, review the information below and submit an expression of interest on behalf of your organization via the form at the bottom of this page. The deadline to apply is Tuesday, June 7, 2016. We will follow up with the most promising proposals shortly following the deadline.

Collaborative applications (such as between a university and a nearby public library) are encouraged. Preference will be given to applicants who can provide venue spaces which are located in the same building complex or campus. Please note that some host partners can contribute staffing or other day-of support in lieu of venue space.

Requirements of a DPLAfest 2017 Host Site

  • Willingness to make local arrangements and coordinate with DPLA staff and any/all staff at host institution.
  • An auditorium or similar space suitable for a keynote presentation (minimum 300 people).
  • 10 or more smaller rooms for “breakout” sessions (30 – 50 people).
    • Preference will be given to hosts that can provide breakout rooms equipped with projection/display capabilities.
  • Co-location of proposed event spaces (i.e., enough session spaces in the same building or same campus).
  • Availability of wireless network for all attendees, potentially in excess of 350 simultaneous clients, for free or via conference sponsorship.
  • An organizational commitment to donate use of all venue spaces. (As a small nonprofit with limited funds, as well as a strong desire to keep DPLAfest maximally open to the public, we’re unable to pursue host proposals that are unable to offer free or deeply-discounted use of venue spaces).
  • Ability to provide at least one staff person for every session room to help with day-of AV support, logistical support, etc.
  • Commitment to diversity, inclusion, and openness to all.

Additional Desirable Qualities

  • Proximity to a major airport and hotels.
  • Location outside of the Northeast corridor and the Midwest (we’re rotating the location of DPLAfest each year; we celebrated DPLAfest 2013 in Boston, DPLAfest 2015 in Indianapolis, and DPLAfest 2016 in Washington, DC).

Apply to host DPLAfest 2017

You can learn more about DPLAfest here. Questions? Email info@dp.la.

Open Library Data Additions: Amazon Crawl: part 2-as

7 hours 6 min ago

Part 2-as of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part o-5

7 hours 14 min ago

Part o-5 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part 2-al

7 hours 15 min ago

Part 2-al of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

FOSS4Lib Upcoming Events: Managing Assets as Linked Data with Fedora 4

7 hours 17 min ago
Date: Tuesday, May 24, 2016 - 09:00 to 12:00Supports: Fedora Repository

Last updated May 3, 2016. Created by Peter Murray on May 3, 2016.
Log in to edit this page.

Andrew Woods, Fedora technical lead, will offer a Fedora 4 workshop at the Texas Conference on Digital Libraries (TCDL) on Tuesday, May 24 from 9:00 AM-12:00 PM. Space is limited to please register in advance.

FOSS4Lib Upcoming Events: Publishing Assets as Linked Data with Fedora 4

7 hours 20 min ago
Date: Wednesday, May 18, 2016 - 13:00 to 15:30Supports: Fedora Repository

Last updated May 3, 2016. Created by Peter Murray on May 3, 2016.
Log in to edit this page.

David Wilcox, Fedora product manager will offer a workshop entitled, Publishing Assets as Linked Data with Fedora 4 at the Library Publishing Forum (LPForum 2016) to be held at the University of North Texas Libraries, Denton, Texas on May 18 from 1:00 PM-3:30 PM. All LPForum 2016 attendees are welcome—there is no need to pre-register for this introductory-level workshop.

LITA: Transmission #3

7 hours 20 min ago

In our third episode of Begin Transmission, we’re lucky enough to sit down with none other than Cinthya Ippoliti. Cinthya is a LITA Blogger and Associate Dean for Research and Learning Services at Oklahoma State University. Enjoy her library tech wisdom and perspectives in this short interview.

Begin Transmission will return May 15, 2016.

Open Library Data Additions: Amazon Crawl: part o-9

7 hours 20 min ago

Part o-9 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part 21

7 hours 27 min ago

Part 21 of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part 2-ad

7 hours 27 min ago

Part 2-ad of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Open Library Data Additions: Amazon Crawl: part hb

7 hours 29 min ago

Part hb of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Mark E. Phillips: DPLA Description Fields: Language used in descriptions.

7 hours 32 min ago

This is the last post in a series of posts related to the Description field found in the Digital Public Library of America.  I’ve been working with a collection of 11,654,800 metadata records for which I’ve created a dataset of 17,884,946 description fields.

This past Christmas I received a copy of Thing Explainer by Randall Munroe,  if you aren’t familiar with this book, Randall uses only the most used ten hundred words (thousand isn’t one of them) to describe very complicated concepts and technologies.

After seeing this book I started to wonder how much of the metadata we create for our digital objects use just the 1,000 most frequent words.  Often frequently used words, as well as less complex words (words with fewer syllables) are used in the calculation of the reading level of various texts so that also got me thinking about the reading level required to understand some of our metadata records.

Along that train of thought,  one of the things that we hear from aggregations of cultural heritage materials is that K-12 users are a target audience we have and that many of the resources we digitize are with them in mind.  With that being said, how often do we take them into account when we create our descriptive metadata?

When I was indexing the description fields I calculated three metrics related to this.

  1. What percentage of the tokens are in the 1,000 most frequently used English words
  2. What percentage of the tokens are in the 5,000 most frequently used English words
  3. What percentage of the tokens are words in a standard English dictionary.

From there I was curious about how the different providers compared to each other.

Average for 1,000, 5,000 and English Dictionary 1,000 most Frequent English Words

The first thing we will look at is the average of amount of a description composed of words from the list of the 1,000 most frequently used English words.

Average percentage of description consisting of 1000 most frequent English words.

For me the providers/hubs that I notice are of course bhl that has very little usage of the 1,000 word vocabulary.  This is followed by smithsonian, gpo, hathitrust and uiuc.  On the other end of the scale is virginia that has an average of 70%.

5,000 most Frequent English Words

Next up is the average percentage of the descriptions that consist of words from the 5,000 most frequently used English words.

Average percentage of description consisting of 5000 most frequent English words.

This graph ends up looking very much like the 1,000 words graph, just a bit higher percentage wise.  This is due to the fact of course that the 5,000 word list includes the 1,000 word list.  You do see a few changes in the ordering though,  for example gpo switches places with hathitrust in this graph over the 1,000 words graph above.

English Dictionary Words

Next is the average percentage of descriptions that consist of words from a standard English dictionary.  Again this includes the 1,000 and 5,000 words in that dictionary so it will be even higher.

Average percentage of description consisting of English dictionary words.

You see that the virginia hub has almost 100% or their descriptions consisting of English dictionary words.  The hubs that are the lowest in their use of English words for descriptions are bhl, smithsonian, and nypl.

The graph below has 1,000, 5,000, and English Dictionary words grouped together for each provider/hub so you can see at a glance how they stack up.

1,000, 5,000 most frequent English words and English dictionary words by Provider

Stacked Percent 1,000, 5,000, English Dictionary

Next we will look at the percentages per provider/hub if we group the percentage utilization into 25% buckets.  This gives a more granular view of the data than just the averages presented above.

Percentage of descriptions by provider that use 1,000 most frequent English words.

Percentage of descriptions by provider that use 5,000 most frequent English words.

Percentage of descriptions by provider that use English dictionary words.

Closing

I don’t think it is that much of a stretch to draw parallels between the language used in our descriptions and the intended audience of our metadata records. How often are we writing metadata records for ourselves instead of our users?  A great example that comes to mind is “verso” or “recto” that we use often for “front” and “back” of items. In the dataset I’ve been using there are 56,640 descriptions with the term “verso” and 5,938 with the term “recto”.

I think we should be taking into account our various audiences when we are creating metadata records.  I know this sounds like a very obvious suggestion but I don’t think we really do that when we are creating our descriptive metadata records.  Is there a target reading level for metadata records? Should there be?

Looking at the description fields in the DPLA dataset has been interesting.  The kind of analysis that I’ve done so far can be seen as kind of a distant reading of these fields. Big round numbers that are pretty squishy and only show the general shape of the field.  To dive in and do a close reading of the metadata records is probably needed to better understand what is going on in these records.

Based on experience of mapping descriptive metadata into the Dublin Core metadata fields, I have a feeling that the description field is generally a dumping ground for information that many of us might not consider “description”.  I sometimes wonder if it would do our users a greater service by adding a true “note” field to our metadata models so that we have a proper location to dump “notes and other stuff” instead of muddying up a field that should have an obvious purpose.

That’s about it for this work with descriptions,  or at least it is until I find some interest in really diving deeper into the data.

If you have questions or comments about this post,  please let me know via Twitter.

Open Library Data Additions: Amazon Crawl: part 2-am

7 hours 32 min ago

Part 2-am of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

FOSS4Lib Recent Releases: Fedora Repository - 4.5.1

7 hours 36 min ago

Last updated May 3, 2016. Created by Peter Murray on May 3, 2016.
Log in to edit this page.

Package: Fedora RepositoryRelease Date: Thursday, April 28, 2016

Open Library Data Additions: Amazon Crawl: part 2-ab

7 hours 40 min ago

Part 2-ab of Amazon crawl..

This item belongs to: data/ol_data.

This item has files of the following types: Data, Archive BitTorrent, Data, Metadata, Text

Pages