You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 43 min 42 sec ago

FOSS4Lib Upcoming Events: Fedora 4.0 in Action at The Art Institute of Chicago and UCSD

Fri, 2014-09-19 20:16
Date: Wednesday, October 15, 2014 - 13:00 to 14:00Supports: Fedora Repository

Last updated September 19, 2014. Created by Peter Murray on September 19, 2014.
Log in to edit this page.

Presented by: Stefano Cossu, Data and Application Architect, Art Institute of Chicago and Esmé Cowles, Software Engineer, University of California San Diego
Join Stefano and Esmé as they showcase new pilot projects built on Fedora 4.0 Beta at the Art Institute of Chicago and the University of California San Diego. These projects demonstrate the value of adopting Fedora 4.0 Beta and taking advantage of new features and opportunities for enhancing repository data.

HangingTogether: Talk Like a Pirate – library metadata speaks

Fri, 2014-09-19 19:32

Pirate Hunter, Richard Zacks

Friday, 19 September is of course well known as International Talk Like a Pirate Day. In order to mark the day, we created not one but FIVE lists (rolled out over this whole week). This is part of our What In the WorldCat? series (#wtworldcat lists are created by mining data from WorldCat in order to highlight interesting and different views of the world’s library collections).

If you have a suggestion something you’d like us to feature, let us know or leave a comment below.

About Merrilee Proffitt

Mail | Web | Twitter | Facebook | LinkedIn | More Posts (268)

FOSS4Lib Upcoming Events: VuFind Summit 2014

Fri, 2014-09-19 19:18
Date: Monday, October 13, 2014 - 08:00 to Tuesday, October 14, 2014 - 17:00Supports: VuFind

Last updated September 19, 2014. Created by Peter Murray on September 19, 2014.
Log in to edit this page.

This year's VuFind Summit will be held on October 13-14 at Villanova University (near Philadelphia).

Registration for the two-day event is $40 and includes both morning refreshments and a full lunch for both days.

It is not too late to submit a talk proposal and, if accepted, have your registration fee waived.

State Library of Denmark: Sparse facet caching

Fri, 2014-09-19 14:40

As explained in Ten times faster, distributed faceting in standard Solr is two-phase:

  1. Each shard performs standard faceting and returns the top limit*1.5+10 terms. The merger calculates the top limit terms. Standard faceting is a two-step process:
    1. For each term in each hit, update the counter for that term.
    2. Extract the top limit*1.5+10 terms by running through all the counters with a priority queue.
  2. Each shard returns the number of occurrences of each term in the top limit terms, calculated by the merger from phase 1. This is done by performing a mini-search for each term, which takes quite a long time. See Even sparse faceting is limited for details.
    1. Addendum: If the number for a term was returned by a given shard in phase 1, that shard is not asked for that term again.
    2. Addendum: If the shard returned a count of 0 for any term as part of phase 1, that means is has delivered all possible counts to the merger. That shard will not be asked again.
Sparse speedup

Sparse faceting speeds up phase 1 step 2 by only visiting the updated counters. It also speeds up phase 2 by repeating phase 1 step 1, then extracting the counts directly for the wanted terms. Although it sounds heavy to repeat phase 1 step 1, the total time for phase 2 for sparse faceting is a lot lower than standard Solr. But why repeat phase 1 step 1 at all?

Caching

Today, caching of the counters from phase 1 step 1 was added to Solr sparse faceting. Caching is tricky business to get just right, especially since the sparse cache must contain a mix of empty counters (to avoid re-allocation of large structures on the Java heap) as well as filled structures (from phase 1, intended for phase 2). But theoretically, it is simple: When phase 1 step 1 is finished, the counter structure is kept and re-used in phase 2. So time for testing:

15TB index / 5B docs / 2565GB RAM, faceting on 6 fields, facet limit 25, unwarmed queries

Note that there are no measurements of standard Solr faceting in the graph. See the Ten times faster article for that. What we have here are 4 different types of search:

  • no_facet: Plain searches without faceting, just to establish the baseline.
  • skip: Only phase 1 sparse faceting. This means inaccurate counts for the returned terms, but as can be seen, the overhead is very low for most searches.
  • cache: Sparse faceting with caching, as described above.
  • nocache: Sparse faceting without caching.
Observations

For 1-1000 hits, nocache is actually a bit faster than cache. The peculiar thing about this hit-range is that chances are high that all shards returns all possible counts (phase 2 addendum 2), so phase 2 is skipped for a lot of searches. When phase 2 is skipped, this means wasted caching of a filled counter structure, that needs to be either cleaned for re-use or discarded if the cache is getting too big. This means a bit of overhead.

For more than 1000 hits, cache wins over nocache. Filter through the graph noise by focusing on the medians. As the difference between cache and nocache is that the base faceting time is skipped with cache, the difference of their medians should be the about the same as the difference of the medians from no_facet and skip. Are they? Sorta-kinda. This should be repeated with a larger sample.

Conclusion

Caching with distributed faceting means a small performance hit in some cases and a larger performance gain in other. Nothing Earth-shattering and as it works best when there is more memory allocated for caching, it is not clear in general whether it is best to use it or not. Download a Solr sparse WAR from GitHub and try for yourself.


State Library of Denmark: Sparse facet caching

Fri, 2014-09-19 14:40

As explained in Ten times faster, distributed faceting in standard Solr is two-phase:

  1. Each shard performs standard faceting and returns the top limit*1.5+10 terms. The merger calculates the top limit terms. Standard faceting is a two-step process:
    1. For each term in each hit, update the counter for that term.
    2. Extract the top limit*1.5+10 terms by running through all the counters with a priority queue.
  2. Each shard returns the number of occurrences of each term in the top limit terms, calculated by the merger from phase 1. This is done by performing a mini-search for each term, which takes quite a long time. See Even sparse faceting is limited for details.
    1. Addendum: If the number for a term was returned by a given shard in phase 1, that shard is not asked for that term again.
    2. Addendum: If the shard returned a count of 0 for any term as part of phase 1, that means is has delivered all possible counts to the merger. That shard will not be asked again.
Sparse speedup

Sparse faceting speeds up phase 1 step 2 by only visiting the updated counters. It also speeds up phase 2 by repeating phase 1 step 1, then extracting the counts directly for the wanted terms. Although it sounds heavy to repeat phase 1 step 1, the total time for phase 2 for sparse faceting is a lot lower than standard Solr. But why repeat phase 1 step 1 at all?

Caching

Today, caching of the counters from phase 1 step 1 was added to Solr sparse faceting. Caching is tricky business to get just right, especially since the sparse cache must contain a mix of empty counters (to avoid re-allocation of large structures on the Java heap) as well as filled structures (from phase 1, intended for phase 2). But theoretically, it is simple: When phase 1 step 1 is finished, the counter structure is kept and re-used in phase 2. So time for testing:

15TB index / 5B docs / 2565GB RAM, faceting on 6 fields, facet limit 25, unwarmed queries

Note that there are no measurements of standard Solr faceting in the graph. See the Ten times faster article for that. What we have here are 4 different types of search:

  • no_facet: Plain searches without faceting, just to establish the baseline.
  • skip: Only phase 1 sparse faceting. This means inaccurate counts for the returned terms, but as can be seen, the overhead is very low for most searches.
  • cache: Sparse faceting with caching, as described above.
  • nocache: Sparse faceting without caching.
Observations

For 1-1000 hits, nocache is actually a bit faster than cache. The peculiar thing about this hit-range is that chances are high that all shards returns all possible counts (phase 2 addendum 2), so phase 2 is skipped for a lot of searches. When phase 2 is skipped, this means wasted caching of a filled counter structure, that needs to be either cleaned for re-use or discarded if the cache is getting too big. This means a bit of overhead.

For more than 1000 hits, cache wins over nocache. Filter through the graph noise by focusing on the medians. As the difference between cache and nocache is that the base faceting time is skipped with cache, the difference of their medians should be the about the same as the difference of the medians from no_facet and skip. Are they? Sorta-kinda. This should be repeated with a larger sample.

Conclusion

Caching with distributed faceting means a small performance hit in some cases and a larger performance gain in other. Nothing Earth-shattering and as it works best when there is more memory allocated for caching, it is not clear in general whether it is best to use it or not. Download a Solr sparse WAR from GitHub and try for yourself.


Library of Congress: The Signal: Emerging Collaborations for Accessing and Preserving Email

Fri, 2014-09-19 13:02

The following is a guest post by Chris Prom, Assistant University Archivist and Professor, University of Illinois at Urbana-Champaign.

I’ll never forget one lesson from my historical methods class at Marquette University.  Ronald Zupko–famous for his lecture about the bubonic plague and a natural showman–was expounding on what it means to interrogate primary sources–to cast a skeptical eye on every source, to see each one as a mere thread of evidence in a larger story, and to remember that every event can, and must, tell many different stories.

He asked us to name a few documentary genres, along with our opinions as to their relative value.  We shot back: “Photographs, diaries, reports, scrapbooks, newspaper articles,” along with the type of ill-informed comments graduate students are prone to make.  As our class rattled off responses, we gradually came to realize that each document reflected the particular viewpoint of its creator–and that the information a source conveyed was constrained by documentary conventions and other social factors inherent to the medium underlying the expression. Settling into the comfortable role of skeptics, we noted the biases each format reflected.  Finally, one student said: “What about correspondence?”  Dr Zupko erupted: “There is the real meat of history!  But, you need to be careful!”

Dangerous Inbox by Recrea HQ. Photo courtesy of Flickr through a CC BY-NC-SA 2.0 license.

Letters, memos, telegrams, postcards: such items have long been the stock-in-trade for archives.  Historians and researchers of all types, while mindful of the challenges in using correspondence, value it as a source for the insider perspective it provides on real-time events.   For this reason, the library and archives community must find effective ways to identify, preserve and provide access to email and other forms of electronic correspondence.

After I researched and wrote a guide to email preservation (pdf) for the Digital Preservation Coalition’s Technology Watch Report series, I concluded that the challenges are mostly cultural and administrative.

I have no doubt that with the right tools, archivists could do what we do best: build the relationships that underlie every successful archival acquisition.  Engaging records creators and donors in their digital spaces, we can help them preserve access to the records that are so sorely needed for those who will write histories.  But we need the tools, and a plan for how to use them.  Otherwise, our promises are mere words.

For this reason, I’m so pleased to report on the results of a recent online meeting organized by the National Digital Stewardship Alliance’s Standards and Practices Working Group.  On August 25, a group of fifty-plus experts from more than a dozen institutions informally shared the work they are doing to preserve email.

For me, the best part of the meeting was that it represented the diverse range of institutions (in terms of size and institutional focus) that are interested in this critical work. Email preservation is not something of interest only to large government archives,or to small collecting repositories, but also to every repository in between. That said, the representatives displayed a surprising similar vision for how email preservation can be made effective.

Robert Spangler, Lisa Haralampus, Ken  Hawkins and Kevin DeVorsey described challenges that the National Archives and Records Administration has faced in controlling and providing access to large bodies of email. Concluding that traditional records management practices are not sufficient to task, NARA has developed the Capstone approach, seeking to identify and preserve particular accounts that must be preserved as a record series, and is currently revising its transfer guidance.  Later in the meeting, Mark Conrad described the particular challenge of preserving email from the Executive Office of the President, highlighting the point that “scale matters”–a theme that resonated across the board.

The whole account approach that NARA advocates meshes well with activities described by other presenters.  For example, Kelly Eubank from North Carolina State Archives and the EMCAP project discussed the need for software tools to ingest and process email records while Linda Reib from the Arizona State Library noted that the PeDALS Project is seeking to continue their work, focusing on account-level preservation of key state government accounts.

Functional comparison of selected email archives tools/services. Courtesy Wendy Gogel.

Ricc Ferrante and Lynda Schmitz Fuhrig from the Smithsonian Institution Archives discussed the CERP project which produced, in conjunction with the EMCAP project, an XML schema for email objects among its deliverables. Kate Murray from the Library of Congress reviewed the new email and related calendaring formats on the Sustainability of Digital Formats website.

Harvard University was up next.  Andrea Goethels and Wendy Gogel shared information about Harvard’s Electronic Archiving Service.  EAS includes tools for normalizing email from an account into EML format (conforming to the Internet Engineering Task Force RFC 2822), then packaging it for deposit into Harvard’s digital repository.

One of the most exciting presentations was provided by Peter Chan and Glynn Edwards from Stanford University.  With generous funding from the National Historical Publications and Records Commission, as well as some internal support, the ePADD Project (“Email: Process, Appraise, Discover, Deliver”) is using natural language processing and entity extraction tools to build an application that will allow archivists and records creators to review email, then process it for search, display and retrieval.  Best of all, the web-based application will include a built-in discovery interface and users will be able to define a lexicon and to provide visual representations of the results.  Many participants in the meeting commented that the ePADD tools may provided a meaningful focus for additional collaborations.  A beta version is due out next spring.

In the discussion that followed the informal presentations, several presenters congratulated the Harvard team on a slide Wendy Gogel shared, comparing the functions provided by various tools and services (reproduced above).

As is apparent from even a cursory glance at the chart, repositories are doing wonderful work—and much yet remains.

Collaboration is the way forward. At the end of the discussion, participants agreed to take three specific steps to drive email preservation initiatives to the next level: (1) providing tool demo sessions; (2) developing use cases; and (3) working together.

The bottom line: I’m more hopeful about the ability of the digital preservation community to develop an effective approach toward email preservation than I have been in years.  Stay tuned for future developments!

LITA: Tech Yourself Before You Wreck Yourself – Vol. 1

Fri, 2014-09-19 12:30
Art from Cécile Graat

This post is for all the tech librarian caterpillars dreaming of one day becoming empowered tech butterflies. The internet is full to the brim with tools and resources for aiding in your transformation (and your job search). In each installment of Tech Yourself Before You Wreck Yourself – TYBYWY, pronounced tie-buy-why – I’ll curate a small selection of free courses, webinars, and other tools you can use to learn and master technologies.  I’ll also spotlight a presentation opportunity so that you can consider putting yourself out there- it’s a big, beautiful community and we all learn through collaboration.

MOOC of the Week -

Allow me to suggest you enroll in The Emerging Future: Technology Issues and Trends, a MOOC offered by the School of Information at San Jose State University through Canvas. Taking a Futurist approach to technology assessment, Sue Alman, PhD offers participants an opportunity to learn “the planning skills that are needed, the issues that are involved, and the current trends as we explore the potential impact of technological innovations.”

Sounds good to this would-be Futurist!

Worthwhile Webinars –

I live in the great state of Texas, so it is with some pride that I recommend the recurring series, Tech Tools with Tine, from the Texas State Library and Archives Commission.  If you’re like me, you like your tech talks in manageable bite-size pieces. This is just your style.

September 19th, 9-10 AM EST – Tech Tools with Tine: 1 Hour of Google Drive

September 26th, 9-10 AM EST – Tech Tools with Tine: 1 Hour of MailChimp

October 3rd, 9-10 AM EST – Tech Tools with Tine: 1 Hour of Curation with Pinterest and Tumblr

Show Off Your Stuff –

The deadline to submit a proposal to the 2015 Library Technology Conference at Macalester College in beautiful St. Paul is September 22nd. Maybe that tight timeline is just the motivation you’ve been looking for!

What’s up, Tiger Lily? -

Are you a tech caterpillar or a tech butterfly? Do you have any cool free webinars or opportunities you’d like to share? Write me all about it in the comments.

District Dispatch: OITP Director appointed to University of Maryland Advisory Board

Fri, 2014-09-19 08:46

This week, the College of Information Studies at the University of Maryland appointed Alan Inouye, director of the American Library Association’s (ALA) Office for Information Technology Policy (OITP), to the inaugural Advisory Board for the university’s Master of Library Science (MLS) degree program.

“This appointment supports OITP’s policy advocacy and its Policy Revolution! initiative,” said OITP Director Alan S. Inouye. “Future librarians will be working in a rapidly evolving information environment. I look forward to the opportunity to help articulate the professional education needed for success in the future.”

The Advisory Board comprises of 17 leaders and students in the information professions who will guide the future development of the university’s MLS program. The Board’s first task will be to engage in a strategic “re-envisioning the MLS” discussion.

Serving three-year terms, the members of the Board will:

  • Provide insights on how the MLS program can enhance the impact of its services on various stakeholder groups;
  • Provide advice and counsel on strategy, issues, and trends affecting the future of the MLS Program;
  • Strengthen relationships with libraries, archives, industry, and other key information community partners;
  • Provide input for assessing the progress of the MLS program;
  • Provide a vital link to the community of practice for faculty and students to facilitate research, inform teaching, and further develop public service skills;
  • Support the fundraising efforts to support the MLS program; and
  • Identify the necessary entry-level skills, attitudes and knowledge competencies as well as performance levels for target occupations.

Additional Advisory Board Members include:

  • Tahirah Akbar-Williams, Education and Information Studies Librarian, McKeldin Library, University of Maryland
  • Brenda Anderson, Elementary Integrated Curriculum Specialist, Montgomery County Public Schools
  • R. Joseph Anderson, Director, Niels Bohr Library and Archives, American Institute of Physics
  • Jay Bansbach, Program Specialist, School Libraries, Instructional Technology and School Libraries, Division of Curriculum, Assessment and Accountability, Maryland State Department of Education
  • Sue Baughman, Deputy Executive Director, Association of Research Libraries
  • Valerie Gross, President and CEO, Howard County Public Library
  • Lucy Holman, Director, Langsdale Library, University of Baltimore
  • Naomi House, Founder, I Need a Library Job (INALJ)
  • Erica Karmes Jesonis, Chief Librarian for Information Management, Cecil County Public Library
  • Irene Padilla, Assistant State Superintendent for Library Development and Services, Maryland State Department of Education
  • Katherine Simpson, Director of Strategy and Communication American University Library
  • Lissa Snyders, MLS Candidate, University of Maryland iSchool
  • Pat Steele, Dean of Libraries, University of Maryland
  • Maureen Sullivan, Immediate Past President, American Library Association
  • Joe Thompson, Senior Administrator, Public Services, Harford County Public Library
  • Paul Wester, Chief Records Officer for the Federal Government, National Archives and Records Administration

The post OITP Director appointed to University of Maryland Advisory Board appeared first on District Dispatch.

OCLC Dev Network: Release Scheduling Update

Thu, 2014-09-18 21:30

To accommodate additional performance testing and optimization, the September release of WMS, which includes changes to the WMS Vendor Information Center API, is being deferred.  We will communicate the new date for the release as soon as we have confirmation.

District Dispatch: The Goodlatte, the bad and the ugly…

Thu, 2014-09-18 20:55

My Washington Office colleague Carrie Russell, ALA’s copyright ace in the Office of Information Technology Policy, provides a great rundown here in DD on the substantive ins and outs of the House IP Subcommittee’s hearing yesterday. The Subcommittee met to take testimony on the part of the 1998 Digital Millennium Copyright Act (Section 1201, for those of you keeping score at home) that prohibits anyone from “circumventing” any kind of “digital locks” (aka, “technological protection measures,” or “TPMs”) used by their owners to protect copyrighted works. The hearing was also interesting, however, for the politics of the emerging 1201 debate on clear display.

First, the good news.  Rep. Bob Goodlatte (VA), Chairman of the full House Judiciary Committee, made time in a no doubt very crowded day to attend the hearing specifically for the purpose of making a statement in which he acknowledged that targeted reform of Section 1201 was needed and appropriate.  As one of the original authors of 1201 and the DMCA, and the guy with the big gavel, Mr. Goodlatte’s frank and informed talk was great to hear.

Likewise, Congressman Darrell Issa of California (who’s poised to assume the Chairmanship of the IP Subcommittee in the next Congress and eventually to succeed Mr. Goodlatte at the full Committee’s helm) agreed that Section 1201 might well need modification to prevent it from impeding technological innovation — a cause he’s championed over his years in Congress as a technology patent-holder himself.

Lastly, Rep. Blake Farenthold added his voice to the reform chorus.  While a relatively junior Member of Congress, Rep. Farenthold clearly “gets” the need to assure that 1201 doesn’t preclude fair use or valuable research that requires digital locks to be broken precisely to see if they create vulnerabilities in computer apps and networks that can be exploited by real “bad guys,” like malware- and virus-pushing lawbreakers.

Of course, any number of other members of the Subcommittee were singing loudly in the key of “M” for yet more copyright protection.  Led by the most senior Democrat on the full Judiciary Committee, Rep. John Conyers (MI), multiple members appeared (as Carrie described yesterday) to believe that “strengthening” Section 1201 in unspecified ways would somehow thwart … wait for it … piracy, as if another statute and another penalty would do anything to affect the behavior of industrial-scale copyright infringers in China who don’t think twice now about breaking existing US law.  Sigh….

No legislation is yet pending to change Section 1201 or other parts of the DMCA, but ALA and its many coalition partners in the public and private sectors will be in the vanguard of the fight to reform this outdated and ill-advised part of the law (including the triennial process by which exceptions to Section 1201 are granted, or not) next year.  See you there!

The post The Goodlatte, the bad and the ugly… appeared first on District Dispatch.

SearchHub: Say Hello to Lucidworks Fusion

Thu, 2014-09-18 20:43

The team at Lucidworks is proud to announce the release of our next-generation platform for building powerful, scalable search applications: Lucidworks Fusion.

Fusion extends any Solr deployment with the enterprise-grade capabilities you need to deliver a world-class search experience:

Full support for any Solr deployment including Lucidworks Search, SolrCloud, and stand-alone mode.

Deeper support for recommendations including Item-to-Query, Query-to-Item, and Item-to-Item with aggregated signals.

Advanced signal processing including any datapoint (click-through, purchases, ratings) – even social signals like Twitter.

Enhanced application development with REST APIs, index-side and query-time pipelines, with sophisticated connector frameworks.

Advanced web and filesystem crawlers with multi-threaded HTML/document connectors, de-duping, and incremental crawling.

Integrated security management for roles and users supporting HTTPs, form-based, Kerberos, LDAP, and native methods.

Search, log, and trend analytics for any log type with real-time and historical data with SiLK.

Ready to learn more? Join us for our upcoming webinar:

Webinar: Meet Lucidworks Fusion

Join Lucidworks CTO Grant Ingersoll for a ‘first look’ at our latest release, Lucidworks Fusion. You’ll be among the first to see the power of the Fusion platform and how it gives you everything you need to design, build, and deploy amazing search apps.

Webinar: Meet Lucidworks Fusion
Date: Thursday, October 2, 2014
Time: 11:00 am Pacific Daylight Time (San Francisco, GMT-07:00)

Click here to register for this webinar.

Or learn more at http://lucidworks.com/product/fusion/

John Miedema: Wilson iteration plans: Topics on text mining the novel.

Thu, 2014-09-18 20:27

The Wilson iteration of my cognitive system will involve a deep dive into topics on text mining the novel. My overly ambitious plans are the following, roughly in order:

  • Develop a working code illustration of genre detection.
  • Develop another custom entity recognition model for literature, using an annotated corpus.
  • Visualization of literary concepts using time trends.
  • Collection of open data, open access articles, and open source tools for text analysis of literature.
  • Think about a better teaching tool for building models. Distinguish teaching computers from programming.

We’ll see where it goes.

DPLA: Nearly 100,00 items from the Getty Research Institute now available in DPLA

Thu, 2014-09-18 20:03

More awesome news from DPLA! Hot on the heels of announcements earlier this week about newly added materials from the Medical Heritage Library and the Government Printing Office, we’re excited to share today that nearly 100,000 items from the Getty Research Institute are now available via DPLA.

To view the Getty in DPLA, click here.

From an announcement posted today on the Getty Research Institute Blog:

As a DPLA content hub, the Getty Research Institute has contributed metadata—information that enables search and retrieval of material—for nearly 100,000 digital images, documentary photograph collections, archives, and books dating from the 1400s to today. We’ve included some of the most frequently requested and significant material from our holdings of more than two million items, including some 5,600 images from the Julius Shulman photography archive, 2,100 images from the Jacobson collection of Orientalist photography, and dozens of art dealers’ stockbooks from the Duveen and Knoedler archives.

The Getty will make additional digital content available through DPLA as their collections continue to be cataloged and digitized.

All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

Alf Eaton, Alf: Archiving and displaying tweets with dat

Thu, 2014-09-18 19:47

First, make a new directory for the project:

mkdir tweet-stream && cd $_

Install node.js (nodejs in Debian/Ubuntu, node in Homebrew), update npm if needed (npm install -g npm) and install dat:

npm install -g maxogden/dat

dat is essentially git for data, so the data repository needs to be initialised before it can be used:

dat init

Next, start the dat server to listen for incoming connections:

dat listen

Data can be piped into dat as line-delimited JSON (i.e. one object per line - the same idea as CSV but with optional nested data). Happily, this is the format in which Twitter’s streaming API provides information, so it's ideal for piping into dat.

I used a PHP client to connect to Twitter’s streaming API as I was interested in seeing how it handled the connection (the client needs to watch the connection and reconnect if no data is received in a certain time frame). There may be a command-line client that is even easier than this, but I haven’t found one yet…

Install Phirehose using Composer:

composer init && composer require fennb/phirehose:dev-master && composer install

The streaming API uses OAuth 1.0 for authentication, so you have to register a Twitter application to get an OAuth consumer key and secret, then generate another access token and secret for your account. Add these to this small PHP script that initialises Phirehose, starts listening for filtered tweets and outputs each tweet to STDOUT as it arrives:

Run the script to connect to the streaming API and start importing data: php stream.php | dat import -json

The dat server that was started earlier with dat listen is listening on port 6461 for clients, and is able to emit each incoming tweet as a Server-Sent Event, which can then be consumed in JavaScript using the EventSource API.

I’m in the process of making a twitter-stream Polymer element, but in the meantime this is how to connect to dat’s SSE endpoint:

var server = new EventSource(‘http://your-dat-ip-address:6461/api/changes?data=true&style=sse&live=true&limit=1&tail=1’); server.addEventListener('data', function(event) { var item = JSON.parse(event.data).value; // do something with the tweet });

Patrick Hochstenbach: Hard Reset

Thu, 2014-09-18 18:52
Joining Hard Reset a playground for illustrators to draw cartoons about a post apocalyptic world. These doodles I can draw during my 20 minute commute from Brugge to Ghent.Filed under: Comics Tagged: art, cartoon, comic, comics, commute, copic, doodle,

Jonathan Rochkind: Umlaut 4.0 beta

Thu, 2014-09-18 18:39

Umlaut is an open source specific-item discovery layer, often used on top of SFX, and based on Rails.

Umlaut 4.0.0.beta2 is out! (Yeah, don’t ask about beta1 :) ).

This release is mostly back-end upgrades, including:

  • Support for Rails 4.x (Rails 3.2 included to make migration easier for existing installations, but recommend starting with Rails 4.1 in new apps)
  • Based on Bootstrap 3 (Rails 3 was Bootstrap 2)
  • internationalization/localization support
  • A more streamlined installation process with a custom installer

Anyone interested in beta testing? Probably most interesting if you have an SFX to point it at, but you can take it for a spin either way.

To install a new Umlaut app, see: https://github.com/team-umlaut/umlaut/wiki/Installation


Filed under: General

Andromeda Yelton: jQuery workshop teaching techniques, part 3: ruthless backward design

Thu, 2014-09-18 17:04

I’m writing up what I learned from teaching a jQuery workshop this past month. I’ve already posted on my theoretical basis, pacing, and supporting affective goals. Now for the technique I invested the most time in and got the most mileage out of…

Ruthless backward design

Yes, yes, we all know we are supposed to do backward design, and I always have a general sense of it in my head when I design courses. In practice it’s hard, because you can’t always carve out the time to write an entire course in advance of teaching it, but for a two-day bootcamp I’m doing that anyway

Yeah. Super ruthless. I wrote the last lesson, on functions, first. Along the way I took notes of every concept and every function that I relied on in constructing my examples. Then I wrote the second-to-last lesson, using what I could from that list (while keeping the pacing consistent), and taking notes on anything else I needed to have already introduced – again, right down to the granularity of individual jQuery functions. Et cetera. My goal was that, by the time they got to writing their own functions (with the significant leap in conceptual difficulty that entails), they would have already seen every line of code that they’d need to do the core exercises, so they could work on the syntax and concepts specific to functions in isolation from all the other syntax and concepts of the course. (Similarly, I wanted them to be able to write loops in isolation from the material in lessons 1 and 2, and if/then statements in isolation from the material in lesson 1.)

This made it a lot easier for me to see both where the big conceptual leaps were and what I didn’t need. I ended up axing .css() in favor of .addClass(), .removeClass(), and .hasClass() – more functions, but all conceptually simpler ones, and more in line with how I’ve written real-world code anyway. It meant that I axed booleans – which in writing out notes on course coverage I’d assumed I’d cover (such a basic data type, and so approachable for librarians!) – when I discovered I did not need their conceptual apparatus to make the subsequent code make sense. It made it clear that .indexOf() is a pain, and students would need to be familiar with its weirdness so it didn’t present any hurdles when they had to incorporate it into bigger programs.

Funny thing: being this ruthless and this granular meant I actually did get to the point where I could have done real-world-ish exercises with one more session. I ended up providing a few as further practice options for students who chose jQuery practice rather than the other unconference options for Tuesday afternoon. By eliminating absolutely everything unnecessary, right down to individual lines of code, I covered enough ground to get there. Huh!

So yeah. If I had a two-day workshop, I’d set out with that goal. A substantial fraction of the students would feel very shaky by then – it’s still a ton of material to assimilate, and about a third of my survey respondents’ brains were full by the time we got to functions – but including a real-world application would be a huge motivational payoff regardless. And group work plus an army of TAs would let most students get some mileage out of it. Add an option for people to review earlier material in the last half-day, and everyone’s making meaningful progress. Woot!

Also, big thanks to Sumana Harihareswara for giving me detailed feedback on a draft of the lesson, and helping me see the things I didn’t have the perspective to see about sequencing, clarity, etc. You should all be lucky enough to have a proofreader so enthusiastic and detail-oriented.

Later, where I want to go next.

Open Knowledge Foundation: Announcing a Leadership Update at Open Knowledge

Thu, 2014-09-18 15:05

Today I would like to share some important organisational news. After 3 years with Open Knowledge, Laura James, our CEO, has decided to move on to new challenges. As a result of this change we will be seeking to recruit a new senior executive to lead Open Knowledge as it continues to evolve and grow.

As many of you know, Laura James joined us to support the organisation as we scaled up, and stepped up to the CEO role in 2013. It has always been her intention to return to her roots in engineering at an appropriate juncture, and we have been fortunate to have had Laura with us for so long – she will be sorely missed.

Laura has made an immense contribution and we have been privileged to have her on board – I’d like to extend my deep personal thanks to her for all she has done. Laura has played a central role in our evolution as we’ve grown from a team of half-a-dozen to more than forty. Thanks to her commitment and skill we’ve navigated many of the tough challenges that accompany “growing-up” as an organisation.

There will be no change in my role (as President and founder) and I will be here both to continue to help lead the organisation and to work closely with the new appointment going forward. Laura will remain in post, continuing to manage and lead the organisation, assisting with the recruitment and bringing the new senior executive on board.

For a decade, Open Knowledge has been a leader in its field, working at the forefront of efforts to open up information around the world and and see it used to empower citizens and organisations to drive change. Both the community and original non-profit have grown – and continue to grow – very rapidly, and the space in which we work continues to develop at an incredible pace with many exciting new opportunities and activities.

We have a fantastic future ahead of us and I’m very excited as we prepare Open Knowledge to make its next decade even more successful than its first.

We will keep everyone informed in the coming weeks as our plans develop, and there will also be opportunities for the Open Knowledge community to discuss. In the meantime, please don’t hesitate to get in touch with me if you have any questions.

District Dispatch: Free webinar: Helping patrons set financial goals

Thu, 2014-09-18 14:51

On September 23rd, the Consumer Financial Protection Bureau and the Institute for Museum and Library Services will offer a free webinar on financial literacy. This session has limited space so please register quickly.

Sometimes, if you’re offering programs on money topics, library patrons may come to you with questions about setting money goals. To assist librarians, the Consumer Financial Protection Bureau and the Institute of Museum and Library Services are developing financial education tools and sharing best practices with the public library field.

The two agencies created the partnership to help libraries provide free, unbiased financial information and referrals in their communities, build local partnerships and promote libraries as community resources. As part of the partnership, both agencies gathered information about libraries and financial education. Their surveys focused on attitudes about financial education, and how librarians can facilitate more financial education programs.

Join both groups on Tuesday, September 23, 2014, from 2:30–3:30p.m. Eastern Time for the free webinar “Setting money goals,” which will explore the basics of money management. The webinar will teach participants how to show patrons to create effective money goals.

Webinar Details

September 23, 2014
2:30–3:30p.m. Eastern
Join the webinar (No need to RSVP)

  • Conference number: PW8729932
  • Audience passcode: LIBRARY

If you are participating only by phone, please dial the following number:

  • Phone: 1-888-947-8930
  • Participant passcode: LIBRARY

The post Free webinar: Helping patrons set financial goals appeared first on District Dispatch.

OCLC Dev Network: Reminder: Developer House Nominations Close on Monday

Thu, 2014-09-18 14:45

If you've been thinking about nominating someone – including yourself - for Developer House this December, there’s no time like the present to submit that nomination form.

Pages