You are here

Feed aggregator

State Library of Denmark: Sparse facet caching

planet code4lib - Fri, 2014-09-19 14:40

As explained in Ten times faster, distributed faceting in standard Solr is two-phase:

  1. Each shard performs standard faceting and returns the top limit*1.5+10 terms. The merger calculates the top limit terms. Standard faceting is a two-step process:
    1. For each term in each hit, update the counter for that term.
    2. Extract the top limit*1.5+10 terms by running through all the counters with a priority queue.
  2. Each shard returns the number of occurrences of each term in the top limit terms, calculated by the merger from phase 1. This is done by performing a mini-search for each term, which takes quite a long time. See Even sparse faceting is limited for details.
    1. Addendum: If the number for a term was returned by a given shard in phase 1, that shard is not asked for that term again.
    2. Addendum: If the shard returned a count of 0 for any term as part of phase 1, that means is has delivered all possible counts to the merger. That shard will not be asked again.
Sparse speedup

Sparse faceting speeds up phase 1 step 2 by only visiting the updated counters. It also speeds up phase 2 by repeating phase 1 step 1, then extracting the counts directly for the wanted terms. Although it sounds heavy to repeat phase 1 step 1, the total time for phase 2 for sparse faceting is a lot lower than standard Solr. But why repeat phase 1 step 1 at all?

Caching

Today, caching of the counters from phase 1 step 1 was added to Solr sparse faceting. Caching is tricky business to get just right, especially since the sparse cache must contain a mix of empty counters (to avoid re-allocation of large structures on the Java heap) as well as filled structures (from phase 1, intended for phase 2). But theoretically, it is simple: When phase 1 step 1 is finished, the counter structure is kept and re-used in phase 2. So time for testing:

15TB index / 5B docs / 2565GB RAM, faceting on 6 fields, facet limit 25, unwarmed queries

Note that there are no measurements of standard Solr faceting in the graph. See the Ten times faster article for that. What we have here are 4 different types of search:

  • no_facet: Plain searches without faceting, just to establish the baseline.
  • skip: Only phase 1 sparse faceting. This means inaccurate counts for the returned terms, but as can be seen, the overhead is very low for most searches.
  • cache: Sparse faceting with caching, as described above.
  • nocache: Sparse faceting without caching.
Observations

For 1-1000 hits, nocache is actually a bit faster than cache. The peculiar thing about this hit-range is that chances are high that all shards returns all possible counts (phase 2 addendum 2), so phase 2 is skipped for a lot of searches. When phase 2 is skipped, this means wasted caching of a filled counter structure, that needs to be either cleaned for re-use or discarded if the cache is getting too big. This means a bit of overhead.

For more than 1000 hits, cache wins over nocache. Filter through the graph noise by focusing on the medians. As the difference between cache and nocache is that the base faceting time is skipped with cache, the difference of their medians should be the about the same as the difference of the medians from no_facet and skip. Are they? Sorta-kinda. This should be repeated with a larger sample.

Conclusion

Caching with distributed faceting means a small performance hit in some cases and a larger performance gain in other. Nothing Earth-shattering and as it works best when there is more memory allocated for caching, it is not clear in general whether it is best to use it or not. Download a Solr sparse WAR from GitHub and try for yourself.


Library of Congress: The Signal: Emerging Collaborations for Accessing and Preserving Email

planet code4lib - Fri, 2014-09-19 13:02

The following is a guest post by Chris Prom, Assistant University Archivist and Professor, University of Illinois at Urbana-Champaign.

I’ll never forget one lesson from my historical methods class at Marquette University.  Ronald Zupko–famous for his lecture about the bubonic plague and a natural showman–was expounding on what it means to interrogate primary sources–to cast a skeptical eye on every source, to see each one as a mere thread of evidence in a larger story, and to remember that every event can, and must, tell many different stories.

He asked us to name a few documentary genres, along with our opinions as to their relative value.  We shot back: “Photographs, diaries, reports, scrapbooks, newspaper articles,” along with the type of ill-informed comments graduate students are prone to make.  As our class rattled off responses, we gradually came to realize that each document reflected the particular viewpoint of its creator–and that the information a source conveyed was constrained by documentary conventions and other social factors inherent to the medium underlying the expression. Settling into the comfortable role of skeptics, we noted the biases each format reflected.  Finally, one student said: “What about correspondence?”  Dr Zupko erupted: “There is the real meat of history!  But, you need to be careful!”

Dangerous Inbox by Recrea HQ. Photo courtesy of Flickr through a CC BY-NC-SA 2.0 license.

Letters, memos, telegrams, postcards: such items have long been the stock-in-trade for archives.  Historians and researchers of all types, while mindful of the challenges in using correspondence, value it as a source for the insider perspective it provides on real-time events.   For this reason, the library and archives community must find effective ways to identify, preserve and provide access to email and other forms of electronic correspondence.

After I researched and wrote a guide to email preservation (pdf) for the Digital Preservation Coalition’s Technology Watch Report series, I concluded that the challenges are mostly cultural and administrative.

I have no doubt that with the right tools, archivists could do what we do best: build the relationships that underlie every successful archival acquisition.  Engaging records creators and donors in their digital spaces, we can help them preserve access to the records that are so sorely needed for those who will write histories.  But we need the tools, and a plan for how to use them.  Otherwise, our promises are mere words.

For this reason, I’m so pleased to report on the results of a recent online meeting organized by the National Digital Stewardship Alliance’s Standards and Practices Working Group.  On August 25, a group of fifty-plus experts from more than a dozen institutions informally shared the work they are doing to preserve email.

For me, the best part of the meeting was that it represented the diverse range of institutions (in terms of size and institutional focus) that are interested in this critical work. Email preservation is not something of interest only to large government archives,or to small collecting repositories, but also to every repository in between. That said, the representatives displayed a surprising similar vision for how email preservation can be made effective.

Robert Spangler, Lisa Haralampus, Ken  Hawkins and Kevin DeVorsey described challenges that the National Archives and Records Administration has faced in controlling and providing access to large bodies of email. Concluding that traditional records management practices are not sufficient to task, NARA has developed the Capstone approach, seeking to identify and preserve particular accounts that must be preserved as a record series, and is currently revising its transfer guidance.  Later in the meeting, Mark Conrad described the particular challenge of preserving email from the Executive Office of the President, highlighting the point that “scale matters”–a theme that resonated across the board.

The whole account approach that NARA advocates meshes well with activities described by other presenters.  For example, Kelly Eubank from North Carolina State Archives and the EMCAP project discussed the need for software tools to ingest and process email records while Linda Reib from the Arizona State Library noted that the PeDALS Project is seeking to continue their work, focusing on account-level preservation of key state government accounts.

Functional comparison of selected email archives tools/services. Courtesy Wendy Gogel.

Ricc Ferrante and Lynda Schmitz Fuhrig from the Smithsonian Institution Archives discussed the CERP project which produced, in conjunction with the EMCAP project, an XML schema for email objects among its deliverables. Kate Murray from the Library of Congress reviewed the new email and related calendaring formats on the Sustainability of Digital Formats website.

Harvard University was up next.  Andrea Goethels and Wendy Gogel shared information about Harvard’s Electronic Archiving Service.  EAS includes tools for normalizing email from an account into EML format (conforming to the Internet Engineering Task Force RFC 2822), then packaging it for deposit into Harvard’s digital repository.

One of the most exciting presentations was provided by Peter Chan and Glynn Edwards from Stanford University.  With generous funding from the National Historical Publications and Records Commission, as well as some internal support, the ePADD Project (“Email: Process, Appraise, Discover, Deliver”) is using natural language processing and entity extraction tools to build an application that will allow archivists and records creators to review email, then process it for search, display and retrieval.  Best of all, the web-based application will include a built-in discovery interface and users will be able to define a lexicon and to provide visual representations of the results.  Many participants in the meeting commented that the ePADD tools may provided a meaningful focus for additional collaborations.  A beta version is due out next spring.

In the discussion that followed the informal presentations, several presenters congratulated the Harvard team on a slide Wendy Gogel shared, comparing the functions provided by various tools and services (reproduced above).

As is apparent from even a cursory glance at the chart, repositories are doing wonderful work—and much yet remains.

Collaboration is the way forward. At the end of the discussion, participants agreed to take three specific steps to drive email preservation initiatives to the next level: (1) providing tool demo sessions; (2) developing use cases; and (3) working together.

The bottom line: I’m more hopeful about the ability of the digital preservation community to develop an effective approach toward email preservation than I have been in years.  Stay tuned for future developments!

LITA: Tech Yourself Before You Wreck Yourself – Vol. 1

planet code4lib - Fri, 2014-09-19 12:30
Art from Cécile Graat

This post is for all the tech librarian caterpillars dreaming of one day becoming empowered tech butterflies. The internet is full to the brim with tools and resources for aiding in your transformation (and your job search). In each installment of Tech Yourself Before You Wreck Yourself – TYBYWY, pronounced tie-buy-why – I’ll curate a small selection of free courses, webinars, and other tools you can use to learn and master technologies.  I’ll also spotlight a presentation opportunity so that you can consider putting yourself out there- it’s a big, beautiful community and we all learn through collaboration.

MOOC of the Week -

Allow me to suggest you enroll in The Emerging Future: Technology Issues and Trends, a MOOC offered by the School of Information at San Jose State University through Canvas. Taking a Futurist approach to technology assessment, Sue Alman, PhD offers participants an opportunity to learn “the planning skills that are needed, the issues that are involved, and the current trends as we explore the potential impact of technological innovations.”

Sounds good to this would-be Futurist!

Worthwhile Webinars –

I live in the great state of Texas, so it is with some pride that I recommend the recurring series, Tech Tools with Tine, from the Texas State Library and Archives Commission.  If you’re like me, you like your tech talks in manageable bite-size pieces. This is just your style.

September 19th, 9-10 AM EST – Tech Tools with Tine: 1 Hour of Google Drive

September 26th, 9-10 AM EST – Tech Tools with Tine: 1 Hour of MailChimp

October 3rd, 9-10 AM EST – Tech Tools with Tine: 1 Hour of Curation with Pinterest and Tumblr

Show Off Your Stuff –

The deadline to submit a proposal to the 2015 Library Technology Conference at Macalester College in beautiful St. Paul is September 22nd. Maybe that tight timeline is just the motivation you’ve been looking for!

What’s up, Tiger Lily? -

Are you a tech caterpillar or a tech butterfly? Do you have any cool free webinars or opportunities you’d like to share? Write me all about it in the comments.

District Dispatch: OITP Director appointed to University of Maryland Advisory Board

planet code4lib - Fri, 2014-09-19 08:46

This week, the College of Information Studies at the University of Maryland appointed Alan Inouye, director of the American Library Association’s (ALA) Office for Information Technology Policy (OITP), to the inaugural Advisory Board for the university’s Master of Library Science (MLS) degree program.

“This appointment supports OITP’s policy advocacy and its Policy Revolution! initiative,” said OITP Director Alan S. Inouye. “Future librarians will be working in a rapidly evolving information environment. I look forward to the opportunity to help articulate the professional education needed for success in the future.”

The Advisory Board comprises of 17 leaders and students in the information professions who will guide the future development of the university’s MLS program. The Board’s first task will be to engage in a strategic “re-envisioning the MLS” discussion.

Serving three-year terms, the members of the Board will:

  • Provide insights on how the MLS program can enhance the impact of its services on various stakeholder groups;
  • Provide advice and counsel on strategy, issues, and trends affecting the future of the MLS Program;
  • Strengthen relationships with libraries, archives, industry, and other key information community partners;
  • Provide input for assessing the progress of the MLS program;
  • Provide a vital link to the community of practice for faculty and students to facilitate research, inform teaching, and further develop public service skills;
  • Support the fundraising efforts to support the MLS program; and
  • Identify the necessary entry-level skills, attitudes and knowledge competencies as well as performance levels for target occupations.

Additional Advisory Board Members include:

  • Tahirah Akbar-Williams, Education and Information Studies Librarian, McKeldin Library, University of Maryland
  • Brenda Anderson, Elementary Integrated Curriculum Specialist, Montgomery County Public Schools
  • R. Joseph Anderson, Director, Niels Bohr Library and Archives, American Institute of Physics
  • Jay Bansbach, Program Specialist, School Libraries, Instructional Technology and School Libraries, Division of Curriculum, Assessment and Accountability, Maryland State Department of Education
  • Sue Baughman, Deputy Executive Director, Association of Research Libraries
  • Valerie Gross, President and CEO, Howard County Public Library
  • Lucy Holman, Director, Langsdale Library, University of Baltimore
  • Naomi House, Founder, I Need a Library Job (INALJ)
  • Erica Karmes Jesonis, Chief Librarian for Information Management, Cecil County Public Library
  • Irene Padilla, Assistant State Superintendent for Library Development and Services, Maryland State Department of Education
  • Katherine Simpson, Director of Strategy and Communication American University Library
  • Lissa Snyders, MLS Candidate, University of Maryland iSchool
  • Pat Steele, Dean of Libraries, University of Maryland
  • Maureen Sullivan, Immediate Past President, American Library Association
  • Joe Thompson, Senior Administrator, Public Services, Harford County Public Library
  • Paul Wester, Chief Records Officer for the Federal Government, National Archives and Records Administration

The post OITP Director appointed to University of Maryland Advisory Board appeared first on District Dispatch.

OCLC Dev Network: Release Scheduling Update

planet code4lib - Thu, 2014-09-18 21:30

To accommodate additional performance testing and optimization, the September release of WMS, which includes changes to the WMS Vendor Information Center API, is being deferred.  We will communicate the new date for the release as soon as we have confirmation.

District Dispatch: The Goodlatte, the bad and the ugly…

planet code4lib - Thu, 2014-09-18 20:55

My Washington Office colleague Carrie Russell, ALA’s copyright ace in the Office of Information Technology Policy, provides a great rundown here in DD on the substantive ins and outs of the House IP Subcommittee’s hearing yesterday. The Subcommittee met to take testimony on the part of the 1998 Digital Millennium Copyright Act (Section 1201, for those of you keeping score at home) that prohibits anyone from “circumventing” any kind of “digital locks” (aka, “technological protection measures,” or “TPMs”) used by their owners to protect copyrighted works. The hearing was also interesting, however, for the politics of the emerging 1201 debate on clear display.

First, the good news.  Rep. Bob Goodlatte (VA), Chairman of the full House Judiciary Committee, made time in a no doubt very crowded day to attend the hearing specifically for the purpose of making a statement in which he acknowledged that targeted reform of Section 1201 was needed and appropriate.  As one of the original authors of 1201 and the DMCA, and the guy with the big gavel, Mr. Goodlatte’s frank and informed talk was great to hear.

Likewise, Congressman Darrell Issa of California (who’s poised to assume the Chairmanship of the IP Subcommittee in the next Congress and eventually to succeed Mr. Goodlatte at the full Committee’s helm) agreed that Section 1201 might well need modification to prevent it from impeding technological innovation — a cause he’s championed over his years in Congress as a technology patent-holder himself.

Lastly, Rep. Blake Farenthold added his voice to the reform chorus.  While a relatively junior Member of Congress, Rep. Farenthold clearly “gets” the need to assure that 1201 doesn’t preclude fair use or valuable research that requires digital locks to be broken precisely to see if they create vulnerabilities in computer apps and networks that can be exploited by real “bad guys,” like malware- and virus-pushing lawbreakers.

Of course, any number of other members of the Subcommittee were singing loudly in the key of “M” for yet more copyright protection.  Led by the most senior Democrat on the full Judiciary Committee, Rep. John Conyers (MI), multiple members appeared (as Carrie described yesterday) to believe that “strengthening” Section 1201 in unspecified ways would somehow thwart … wait for it … piracy, as if another statute and another penalty would do anything to affect the behavior of industrial-scale copyright infringers in China who don’t think twice now about breaking existing US law.  Sigh….

No legislation is yet pending to change Section 1201 or other parts of the DMCA, but ALA and its many coalition partners in the public and private sectors will be in the vanguard of the fight to reform this outdated and ill-advised part of the law (including the triennial process by which exceptions to Section 1201 are granted, or not) next year.  See you there!

The post The Goodlatte, the bad and the ugly… appeared first on District Dispatch.

SearchHub: Say Hello to Lucidworks Fusion

planet code4lib - Thu, 2014-09-18 20:43

The team at Lucidworks is proud to announce the release of our next-generation platform for building powerful, scalable search applications: Lucidworks Fusion.

Fusion extends any Solr deployment with the enterprise-grade capabilities you need to deliver a world-class search experience:

Full support for any Solr deployment including Lucidworks Search, SolrCloud, and stand-alone mode.

Deeper support for recommendations including Item-to-Query, Query-to-Item, and Item-to-Item with aggregated signals.

Advanced signal processing including any datapoint (click-through, purchases, ratings) – even social signals like Twitter.

Enhanced application development with REST APIs, index-side and query-time pipelines, with sophisticated connector frameworks.

Advanced web and filesystem crawlers with multi-threaded HTML/document connectors, de-duping, and incremental crawling.

Integrated security management for roles and users supporting HTTPs, form-based, Kerberos, LDAP, and native methods.

Search, log, and trend analytics for any log type with real-time and historical data with SiLK.

Ready to learn more? Join us for our upcoming webinar:

Webinar: Meet Lucidworks Fusion

Join Lucidworks CTO Grant Ingersoll for a ‘first look’ at our latest release, Lucidworks Fusion. You’ll be among the first to see the power of the Fusion platform and how it gives you everything you need to design, build, and deploy amazing search apps.

Webinar: Meet Lucidworks Fusion
Date: Thursday, October 2, 2014
Time: 11:00 am Pacific Daylight Time (San Francisco, GMT-07:00)

Click here to register for this webinar.

Or learn more at http://lucidworks.com/product/fusion/

John Miedema: Wilson iteration plans: Topics on text mining the novel.

planet code4lib - Thu, 2014-09-18 20:27

The Wilson iteration of my cognitive system will involve a deep dive into topics on text mining the novel. My overly ambitious plans are the following, roughly in order:

  • Develop a working code illustration of genre detection.
  • Develop another custom entity recognition model for literature, using an annotated corpus.
  • Visualization of literary concepts using time trends.
  • Collection of open data, open access articles, and open source tools for text analysis of literature.
  • Think about a better teaching tool for building models. Distinguish teaching computers from programming.

We’ll see where it goes.

DPLA: Nearly 100,00 items from the Getty Research Institute now available in DPLA

planet code4lib - Thu, 2014-09-18 20:03

More awesome news from DPLA! Hot on the heels of announcements earlier this week about newly added materials from the Medical Heritage Library and the Government Printing Office, we’re excited to share today that nearly 100,000 items from the Getty Research Institute are now available via DPLA.

To view the Getty in DPLA, click here.

From an announcement posted today on the Getty Research Institute Blog:

As a DPLA content hub, the Getty Research Institute has contributed metadata—information that enables search and retrieval of material—for nearly 100,000 digital images, documentary photograph collections, archives, and books dating from the 1400s to today. We’ve included some of the most frequently requested and significant material from our holdings of more than two million items, including some 5,600 images from the Julius Shulman photography archive, 2,100 images from the Jacobson collection of Orientalist photography, and dozens of art dealers’ stockbooks from the Duveen and Knoedler archives.

The Getty will make additional digital content available through DPLA as their collections continue to be cataloged and digitized.

All written content on this blog is made available under a Creative Commons Attribution 4.0 International License. All images found on this blog are available under the specific license(s) attributed to them, unless otherwise noted.

Alf Eaton, Alf: Archiving and displaying tweets with dat

planet code4lib - Thu, 2014-09-18 19:47

First, make a new directory for the project:

mkdir tweet-stream && cd $_

Install node.js (nodejs in Debian/Ubuntu, node in Homebrew), update npm if needed (npm install -g npm) and install dat:

npm install -g maxogden/dat

dat is essentially git for data, so the data repository needs to be initialised before it can be used:

dat init

Next, start the dat server to listen for incoming connections:

dat listen

Data can be piped into dat as line-delimited JSON (i.e. one object per line - the same idea as CSV but with optional nested data). Happily, this is the format in which Twitter’s streaming API provides information, so it's ideal for piping into dat.

I used a PHP client to connect to Twitter’s streaming API as I was interested in seeing how it handled the connection (the client needs to watch the connection and reconnect if no data is received in a certain time frame). There may be a command-line client that is even easier than this, but I haven’t found one yet…

Install Phirehose using Composer:

composer init && composer require fennb/phirehose:dev-master && composer install

The streaming API uses OAuth 1.0 for authentication, so you have to register a Twitter application to get an OAuth consumer key and secret, then generate another access token and secret for your account. Add these to this small PHP script that initialises Phirehose, starts listening for filtered tweets and outputs each tweet to STDOUT as it arrives:

Run the script to connect to the streaming API and start importing data: php stream.php | dat import -json

The dat server that was started earlier with dat listen is listening on port 6461 for clients, and is able to emit each incoming tweet as a Server-Sent Event, which can then be consumed in JavaScript using the EventSource API.

I’m in the process of making a twitter-stream Polymer element, but in the meantime this is how to connect to dat’s SSE endpoint:

var server = new EventSource(‘http://your-dat-ip-address:6461/api/changes?data=true&style=sse&live=true&limit=1&tail=1’); server.addEventListener('data', function(event) { var item = JSON.parse(event.data).value; // do something with the tweet });

Patrick Hochstenbach: Hard Reset

planet code4lib - Thu, 2014-09-18 18:52
Joining Hard Reset a playground for illustrators to draw cartoons about a post apocalyptic world. These doodles I can draw during my 20 minute commute from Brugge to Ghent.Filed under: Comics Tagged: art, cartoon, comic, comics, commute, copic, doodle,

Jonathan Rochkind: Umlaut 4.0 beta

planet code4lib - Thu, 2014-09-18 18:39

Umlaut is an open source specific-item discovery layer, often used on top of SFX, and based on Rails.

Umlaut 4.0.0.beta2 is out! (Yeah, don’t ask about beta1 :) ).

This release is mostly back-end upgrades, including:

  • Support for Rails 4.x (Rails 3.2 included to make migration easier for existing installations, but recommend starting with Rails 4.1 in new apps)
  • Based on Bootstrap 3 (Rails 3 was Bootstrap 2)
  • internationalization/localization support
  • A more streamlined installation process with a custom installer

Anyone interested in beta testing? Probably most interesting if you have an SFX to point it at, but you can take it for a spin either way.

To install a new Umlaut app, see: https://github.com/team-umlaut/umlaut/wiki/Installation


Filed under: General

Andromeda Yelton: jQuery workshop teaching techniques, part 3: ruthless backward design

planet code4lib - Thu, 2014-09-18 17:04

I’m writing up what I learned from teaching a jQuery workshop this past month. I’ve already posted on my theoretical basis, pacing, and supporting affective goals. Now for the technique I invested the most time in and got the most mileage out of…

Ruthless backward design

Yes, yes, we all know we are supposed to do backward design, and I always have a general sense of it in my head when I design courses. In practice it’s hard, because you can’t always carve out the time to write an entire course in advance of teaching it, but for a two-day bootcamp I’m doing that anyway

Yeah. Super ruthless. I wrote the last lesson, on functions, first. Along the way I took notes of every concept and every function that I relied on in constructing my examples. Then I wrote the second-to-last lesson, using what I could from that list (while keeping the pacing consistent), and taking notes on anything else I needed to have already introduced – again, right down to the granularity of individual jQuery functions. Et cetera. My goal was that, by the time they got to writing their own functions (with the significant leap in conceptual difficulty that entails), they would have already seen every line of code that they’d need to do the core exercises, so they could work on the syntax and concepts specific to functions in isolation from all the other syntax and concepts of the course. (Similarly, I wanted them to be able to write loops in isolation from the material in lessons 1 and 2, and if/then statements in isolation from the material in lesson 1.)

This made it a lot easier for me to see both where the big conceptual leaps were and what I didn’t need. I ended up axing .css() in favor of .addClass(), .removeClass(), and .hasClass() – more functions, but all conceptually simpler ones, and more in line with how I’ve written real-world code anyway. It meant that I axed booleans – which in writing out notes on course coverage I’d assumed I’d cover (such a basic data type, and so approachable for librarians!) – when I discovered I did not need their conceptual apparatus to make the subsequent code make sense. It made it clear that .indexOf() is a pain, and students would need to be familiar with its weirdness so it didn’t present any hurdles when they had to incorporate it into bigger programs.

Funny thing: being this ruthless and this granular meant I actually did get to the point where I could have done real-world-ish exercises with one more session. I ended up providing a few as further practice options for students who chose jQuery practice rather than the other unconference options for Tuesday afternoon. By eliminating absolutely everything unnecessary, right down to individual lines of code, I covered enough ground to get there. Huh!

So yeah. If I had a two-day workshop, I’d set out with that goal. A substantial fraction of the students would feel very shaky by then – it’s still a ton of material to assimilate, and about a third of my survey respondents’ brains were full by the time we got to functions – but including a real-world application would be a huge motivational payoff regardless. And group work plus an army of TAs would let most students get some mileage out of it. Add an option for people to review earlier material in the last half-day, and everyone’s making meaningful progress. Woot!

Also, big thanks to Sumana Harihareswara for giving me detailed feedback on a draft of the lesson, and helping me see the things I didn’t have the perspective to see about sequencing, clarity, etc. You should all be lucky enough to have a proofreader so enthusiastic and detail-oriented.

Later, where I want to go next.

Open Knowledge Foundation: Announcing a Leadership Update at Open Knowledge

planet code4lib - Thu, 2014-09-18 15:05

Today I would like to share some important organisational news. After 3 years with Open Knowledge, Laura James, our CEO, has decided to move on to new challenges. As a result of this change we will be seeking to recruit a new senior executive to lead Open Knowledge as it continues to evolve and grow.

As many of you know, Laura James joined us to support the organisation as we scaled up, and stepped up to the CEO role in 2013. It has always been her intention to return to her roots in engineering at an appropriate juncture, and we have been fortunate to have had Laura with us for so long – she will be sorely missed.

Laura has made an immense contribution and we have been privileged to have her on board – I’d like to extend my deep personal thanks to her for all she has done. Laura has played a central role in our evolution as we’ve grown from a team of half-a-dozen to more than forty. Thanks to her commitment and skill we’ve navigated many of the tough challenges that accompany “growing-up” as an organisation.

There will be no change in my role (as President and founder) and I will be here both to continue to help lead the organisation and to work closely with the new appointment going forward. Laura will remain in post, continuing to manage and lead the organisation, assisting with the recruitment and bringing the new senior executive on board.

For a decade, Open Knowledge has been a leader in its field, working at the forefront of efforts to open up information around the world and and see it used to empower citizens and organisations to drive change. Both the community and original non-profit have grown – and continue to grow – very rapidly, and the space in which we work continues to develop at an incredible pace with many exciting new opportunities and activities.

We have a fantastic future ahead of us and I’m very excited as we prepare Open Knowledge to make its next decade even more successful than its first.

We will keep everyone informed in the coming weeks as our plans develop, and there will also be opportunities for the Open Knowledge community to discuss. In the meantime, please don’t hesitate to get in touch with me if you have any questions.

District Dispatch: Free webinar: Helping patrons set financial goals

planet code4lib - Thu, 2014-09-18 14:51

On September 23rd, the Consumer Financial Protection Bureau and the Institute for Museum and Library Services will offer a free webinar on financial literacy. This session has limited space so please register quickly.

Sometimes, if you’re offering programs on money topics, library patrons may come to you with questions about setting money goals. To assist librarians, the Consumer Financial Protection Bureau and the Institute of Museum and Library Services are developing financial education tools and sharing best practices with the public library field.

The two agencies created the partnership to help libraries provide free, unbiased financial information and referrals in their communities, build local partnerships and promote libraries as community resources. As part of the partnership, both agencies gathered information about libraries and financial education. Their surveys focused on attitudes about financial education, and how librarians can facilitate more financial education programs.

Join both groups on Tuesday, September 23, 2014, from 2:30–3:30p.m. Eastern Time for the free webinar “Setting money goals,” which will explore the basics of money management. The webinar will teach participants how to show patrons to create effective money goals.

Webinar Details

September 23, 2014
2:30–3:30p.m. Eastern
Join the webinar (No need to RSVP)

  • Conference number: PW8729932
  • Audience passcode: LIBRARY

If you are participating only by phone, please dial the following number:

  • Phone: 1-888-947-8930
  • Participant passcode: LIBRARY

The post Free webinar: Helping patrons set financial goals appeared first on District Dispatch.

OCLC Dev Network: Reminder: Developer House Nominations Close on Monday

planet code4lib - Thu, 2014-09-18 14:45

If you've been thinking about nominating someone – including yourself - for Developer House this December, there’s no time like the present to submit that nomination form.

Open Knowledge Foundation: Launching a new collaboration in Macedonia with Metamorphosis and the UK Foreign & Commonwealth Office

planet code4lib - Thu, 2014-09-18 14:07

As part of the The Open Data Civil Society Network Project, School of Data Fellow, Dona Djambaska, who works with the local independent nonprofit, Metamorphosis, explains the value of the programme and what we hope to achieve over the next 24 months.

“The concept of Open Data is still very fresh among Macedonians. Citizens, CSOs and activists are just beginning to realise the meaning and power hidden in data. They are beginning to sense that there is some potential for them to use open data to support their causes, but in many cases they still don’t understand the value of open data, how to advocate for it, how to find it and most importantly – how to use it!

Metamorphosis was really pleased to get this incredible opportunity to work with the UK Foreign Office and our colleagues at Open Knowledge, to help support the open data movement in Macedonia. We know that an active open data ecosystem in Macedonia, and throughout the Balkan region, will support Metamorphosis’s core objectives of improving democracy and increasing quality of life for our citizens.

It’s great to help all these wonderful minds join together and co-build a community where everyone gets to teach and share. This collaboration with Open Knowledge and the UK Foreign Office is a really amazing stepping-stone for us.

We are starting the programme with meet-ups and then moving to more intense (online and offline) communications and awareness raising events. We hope our tailored workshops will increase the skills of local CSOs, journalists, students, activists or curious citizens to use open data in their work – whether they are trying to expose corruption or find new efficiencies in the delivery of government services.

We can already see the community being built, and the network spreading among Macedonian CSOs and hope that this first project will be part of a more regional strategy to support democratic processes across the Balkan region.”

Read our full report on the project: Improving governance and higher quality delivery of government services in Macedonia through open data

Dona Djambaska, Macedonia.

Dona graduated in the field of Environmental Engineering and has been working with the Metamorphosis foundation in Skopje for the past six years assisting on projects in the field of information society.

There she has focused on organising trainings for computer skills, social media, online promotion, photo and video activism. Dona is also an active contributor and member of the Global Voices Online community. She dedicates her spare time to artistic and activism photography.

Ed Summers: Satellite of Art

planet code4lib - Thu, 2014-09-18 13:26

… still there

FOSS4Lib Recent Releases: BitCurator - 0.9.20

planet code4lib - Thu, 2014-09-18 12:36

Last updated September 18, 2014. Created by Peter Murray on September 18, 2014.
Log in to edit this page.

Package: BitCuratorRelease Date: Friday, September 5, 2014

Peter Murray: Thursday Threads: Patron Privacy on Library Sites, Communicating with Developers, Kuali Continued

planet code4lib - Thu, 2014-09-18 10:58
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

In the DLTJ Thursday Threads this week: an analysis of how external services included on library web pages can impact patron privacy, pointers to a series of helpful posts from OCLC on communication between software users and software developers, and lastly an update on the continuing discussion of the Kuali Foundation Board’s announcement forming a commercial entity.

Before we get started on this week’s threads, I want to point out a free online symposium that LYRASIS is performing next week on sustainable cultural heritage open source software. Details are on the FOSS4Lib site, you can register on the LYRASIS events site, and then join the open discussion on the discuss.foss4lib.org site before, during and after the symposium.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to my Pinboard bookmarks are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

Analysis of Privacy Leakage on a Library Catalog Webpage

My post last month about privacy on library websites, and the surrounding discussion on the Code4Lib list prompted me to do a focused investigation, which I presented at last weeks Code4Lib-NYC meeting.
I looked at a single web page from the NYPL online catalog. I used Chrome developer tools to trace all the requests my browser made in the process of building that page. The catalog page in question is for The Communist Manifesto. It’s here: http://nypl.bibliocommons.com/item/show/18235020052907_communist_manifesto. …

So here are the results.

- Analysis of Privacy Leakage on a Library Catalog Webpage, by Eric Hellman, Go To Hellman, 16-Sep-2014

Eric goes on to note that he isn’t criticizing the New York Public Library, but rather looking at a prominent system with people who are careful of privacy concerns — and also because NYPL was the host of the Code4Lib-NYC meeting. His analysis of what goes on behind the scenes of a web page is illuminating, though, and how all the careful work to protect patron’s privacy while browsing the library’s catalog can be brought down by the inclusion of one simple JavaScript widget.

Series of Posts on Software Development Practices from OCLC

This is the first post in a series on software development practices. We’re launching the series with a couple of posts aimed at helping those who might not have a technical background communicate their feature requests to developers.

- Software Development Practices: What&aposs the Problem?, by Shelly Hostetler, OCLC Developer Network, 22-Aug-2014

OCLC has started an excellent set of posts on how to improve communication between software users and software developers. The first three have been posted so far with another one expected today:

  1. Software Development Practices: What&aposs the Problem?
  2. Software Development Practices: Telling Your User&aposs Story
  3. Software Development Practices: Getting Specific with Acceptance Criteria

I’ve bookmarked them and will be referring to them when talking with our own members about software development needs.

Kuali 2.0 Discussion Continues

…I thought of my beehives and how the overall bee community supports that community/ hive. The community needs to be protected, prioritized, supported and nourished any way possible. Each entity, the queen, the workers and the drones all know their jobs, which revolve around protecting supporting and nourishing the community.

Even if something disrupts the community, everyone knows their role and they get back to work in spite of the disruption. The real problem within the Kuali Community, with the establishment of the Kuali Commercial Entity now is that various articles, social media outlets, and even the communication from the senior Kuali leadership to the community members, have created a situation in which many do not have a good feel for their role in protecting, prioritizing, supporting and nourishing the community.

- The Evolving Kuali Narrative, by Kent Brooks, “I was just thinking”, 14-Sep-2014

The Kuali Foundation Board has set a direction for our second decade and at this time there are many unknowns as we work through priorities and options with each of the Kuali Project Boards. Kuali is a large and complex community of many institutions, firms, and individuals. We are working with projects now and hope to have some initial roadmaps very soon.

- Updates – Moving at the Speed of Light, by Jennifer Foutty, Kuali 2.0 Blog, 17-Sep-2014

As the library community that built a true next-generation library management system, the future of OLE’s development and long-term success is in our hands. We intend to continue to provide free and open access to our community designed and built software. The OLE board is strongly committed to providing a community driven option for library management workflow.

- Open Library Environment (OLE) & Kuali Foundation Announcement, by Bruce M. Taggart (Board Chair, Open Library Environment (OLE)), 9-Sep-2014

Building on previous updates here, the story of the commercialization of the Kuali collaborative continues. I missed the post from Bruce Taggart in last week’s update, and for the main DLTJ Thursday Threads audience this status update from the Open Library Environment project should be most interesting. Given the lack of information, it is hard not to parse each word of formal statements for underlying meanings. In the case of Dr. Taggart’s post about OLE, I’m leaning heavily on wondering what “community designed and built software” means. The Kuali 2.0 FAQ still says “the current plan is for the Kuali codebase to be forked and relicensed under the Affero General Public License (AGPL).” As Charles Severance points out, the Affero license can be a path to vendor lock-in. So is there to be a “community” version that has a life of its own in under the Educational Community License while the KualiCo develops features only available under the Affero license? It is entirely possible that too much can be read into too few words, so I (for one) continue to ponder these questions and watch for the plan to evolve.

Link to this post!

Pages

Subscribe to code4lib aggregator