New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week:
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
“We’ve got a pipeline problem, so let’s build a better pipeline.” –Bess Sadler, Code4Lib 2014 Conference (the link goes to the video)
I should add, right here: I’m no longer trying to get a librarian-coder position*. This post isn’t about me, although it is, of course, from my perspective and informed by my experiences. This post is about a field I love, which is currently shooting itself in the foot, which frustrates me.
Bess is right: libraries need 1) more developers and 2) more diversity among them. Libraries are hamstrung by expensive, insufficient vendor “solutions.” (I’m not hating on the vendors, here; libraries’ problems are complex, and fragmentation and a number of other issues make it difficult for vendors to provide really good solutions.) Libraries and librarians could be so much more effective if we had good software, with interoperable APIs, designed specifically to fill modern libraries’ needs.
Please, don’t get me wrong: I know some libraries are working on this. But they’re too few, and their developers’ demographics do not represent the demographics of libraries at large, let alone our patron bases. I argue that the dearth and the demographic skew will continue and probably worsen, unless we make a radical change to our hiring practices and training options for technical talent.Building technical skills among librarians
The biggest issue I see is that we offer a fair number of very basic learn-to-code workshops, but we don’t offer a realistic path from there to writing code as a job. To put a finer point on it, we do not offer “junior developer” positions in libraries; we write job ads asking for unicorns, with expert- or near-expert-level skills in at least two areas (I’ve seen ones that wanted strong skills in development, user experience, and devops, for instance).
This is unfortunate, because developing real fluency with any skill, including coding, requires practicing it regularly. In the case of software development, there are things you can really only learn on the job, working with other developers (ask me about Git, sometime); only, nobody seems willing to hire for that. And, yes, I understand that there are lots of single-person teams in libraries—far more than there should be—but many open source software projects can fill in a lot of that group learning and mentoring experience, if a lone developer is allowed to participate in them on work time. (OSS is how I am planning to fill in those skills, myself.)
From what I can tell, if you’re a librarian who wants to learn to code, you generally have two really bad options: 1) learn in your spare time, somehow; or 2) quit libraries and work somewhere else until your skills are built up. I’ve been down both of those roads, and as a result I no longer have “be a [paid] librarian-developer” on my goals list.Option one: Learn in your spare time
This option is clown shoes. It isn’t sustainable for anybody, really, but it’s especially not sustainable for people in caretaker roles (e.g. single parents), people with certain disabilities (who have less energy and free time to start with), people who need to work more than one job, etc.—that is, people from marginalized groups. Frankly, it’s oppressive, and it’s absolutely a contributing factor to libtech’s largely male, white, middle to upper-middle class, able-bodied demographics—in contrast to the demographics of the field at large (which is also most of those things, but certainly not predominantly male).
“I’ve never bought this ‘do it in your spare time’ stuff. And it turns out that doing it in your spare time is terribly discriminatory, because … a prominent aspect of oppression is that you have more to do in less spare time.” – Valerie Aurora, during her keynote interview for Code4Lib 2014 (the link goes to the video)
“It’s become the norm in many technology shops to expect that people will take care of skills upgrading on their own time. But that’s just not a sustainable model. Even people who adore late night, just-for-fun hacking sessions during the legendary ‘larval phase’ of discovering software development can come to feel differently in a later part of their lives.” – Bess Sadler, same talk as above
I tried to make it work, in my last library job, by taking one day off every other week** to work on my development skills. I did make some headway—a lot, arguably—but one day every two weeks is not enough to build real fluency, just as fiddling around alone did not help me build the skills that a project with a team would have. Not only do most people not have the privilege of dropping to 90% of their work time, but even if you do, that’s not an effective route to learning enough!
And, here, you might think of the coding bootcamps (at more than $10k per) or the (free, but you have to live in NYC) Recurse Center (which sits on my bucket list, unvisited), but, again: most people can’t afford to take three months away from work, like that. And the Recurse Center isn’t so much a school (hence the name change away from “Hacker School”) as it is a place to get away from the pressures of daily life and just code; realistically, you have to be at a certain level to get in. My point, though, is that the people for whom these are realistic options tend to be among the least marginalized in other ways. So, I argue that they are not solutions and not something we should expect people to do.Option two: go work in tech
If you can’t get the training you need within libraries or in your spare time, it kind of makes sense to go find a job with some tech company, work there for a few years, build up your skills, and then come back. I thought so, anyway. It turns out, this plan was clown shoes, too.
Every woman I’ve talked to who has taken this approach has had a terrible experience. (I also know of a few women who’ve tried this approach and haven’t reported back, at least to me. So my data is incomplete, here. Still, tech’s horror stories are numerous, so go with me here.) I have a theory that library vendors are a safer bet and may be open to hiring newer developers than libraries currently are, but I don’t have enough data (or anecdata) to back it up, so I’m going to talk about tech-tech.
Frankly, if we expect members of any marginalized group to go work in tech in order to build up the skills necessary for a librarian-developer job, we are throwing them to the wolves. In tech, even able-bodied straight cisgender middle class white women are a badly marginalized group, and heaven help you if you’re on any other axis of oppression.
And, sure, yeah. Not all tech. I’ll agree that there are non-terrible jobs for people from marginalized groups in tech, but you have to be skilled enough to get to be that choosy, which people in the scenario we’re discussing are not. I think my story is a pretty good illustration of how even a promising-looking tech job can still turn out horrible. (TLDR: I found a company that could talk about basic inclusivity and diversity in a knowledgeable way and seemed to want to build a healthy culture. It did not have a healthy culture.)
We just can’t outsource that skill-building period to non-library tech. It isn’t right. We stand to lose good people that way.
We need to develop our own techies—I’m talking code, here, because it’s what I know, but most of my argument expands to all of libtech and possibly even to library leadership—or continue offering our patrons sub-par software built within vendor silos and patched together by a small, privileged subset of our field. I don’t have to tell you what that looks like; we live with it, already.What to do?
I’m going to focus on what you, as an individual organization, or leader within an organization, can do to help; I acknowledge that there are some systemic issues at play, beyond what my relatively small suggestions can reach, and I hope this post gets people talking and thinking about them (and not just to wave their hands and sigh and complain that “there isn’t enough money,” because doomsaying is boring and not helpful).
First of all, when you’re looking at adding to the tech talent in your organization, look within your organization. Is there a cataloger who knows some scripting and might want to learn more? (Ask around! Find out!) What about your web content manager, UX person, etc.? (Offer!) You’ll probably be tempted to look at men, first, because society has programmed us all in evil ways (seriously), so acknowledge that impulse and look harder. The same goes for race and disability and having the MLIS, which is too often a stand-in for socioeconomic class; actively resist those biases (and we all have those biases).
If you need tech talent and can’t grow it from within your organization, sit down and figure out what you really need, on day one, versus what might be nice to have, but could realistically wait. Don’t put a single nice-to-have on your requirements list, and don’t you dare lose sight of what is and isn’t necessary when evaluating candidates.
Recruit in diverse and non-traditional spaces for tech folks — dashing off an email to Code4Lib is not good enough (although, sure, do that too; they’re nice folks). LibTechWomen is an obvious choice, as are the Spectrum Scholars, but you might also look at the cataloging listservs or the UX listservs, just to name two options. Maybe see who tweets about #libtechgender and #critlib (and possibly #lismicroaggressions?), and invite those folks to apply and to share your linted job opening with their networks.
Don’t use whiteboard interviews! They are useless and unnecessarily intimidating! They screen for “confidence,” not technical ability. Pair-programming exercises, with actual taking turns and pairing, are a good alternative. Talking through scenarios is also a good alternative.
Don’t give candidates technology vocabulary tests. Not only is it nearly useless as an evaluation tool (and a little insulting); it actively discriminates against people without formal CS education (or, cough, people with CS minors from more than a decade ago). You want to know that they can approach a problem in an organized manner, not that they can define a term that’s easily Googled.
(I have a whole slew of comments about hiring, and I’ll make those—and probably repeat the list above—in another post.)
Once you have someone in a position, or (better) you’re growing someone into a position, be sure to set reasonable expectations and deadlines. There will be some training time for any tech person; you want this, because something built with enough forethought and research will be better than something hurriedly duct-taped (figuratively, you hope) together.
Give people access to mentorship, in whatever form you can. If you can’t give them access to a team within your organization, give them dedicated time to contribute to relevant OSS projects. Send them to—just to name two really inclusive and helpful conferences/communities—Code4Lib (which has regional meetings, too) and/or Open Source Bridge.
So… that’s what I’ve got. What have I missed? What else should we be doing to help fix this gap?
* In truth, as excited as I am about starting my own business, I wouldn’t turn down an interview for a librarian-coder position local to Pittsburgh, but 1) it doesn’t feel like the wind is blowing that way, here, and 2) I’m in the midst of a whole slew of posts that may make me unemployable, anyway ;) (back to the text)
** To be fair, I did get to do some development on the clock, there. Unfortunately, because I wore so many hats, and other hats grew more quickly, it was not a large part of my work. Still, I got most of my PHP experience there, and I’m glad I had the opportunity. (back to the text)
The post How Twitter Uses Apache Lucene for Real-Time Search appeared first on Lucidworks.
Chad Haefele sat with Amanda and I — Michael — to talk about his new book about WordPress for Libraries. If you’ve been paying attention then you know WordPress is our jam, so we were chomping at the bit.
You only have until tomorrow at the time of this posting, but if you jump on it you can enter to win a free copy.
Also, Chad has a lot to say about usability testing, especially using optimal workshop tools, as well as about organizations allocating a user experience design budget —
as well as the inglorious end of Google Wave.
We’re pleased to announce a new report, “Open Budget Data: Mapping the Landscape” undertaken as a collaboration between Open Knowledge, the Global Initiative for Financial Transparency and the Digital Methods Initiative at the University of Amsterdam.
The report offers an unprecedented empirical mapping and analysis of the emerging issue of open budget data, which has appeared as ideals from the open data movement have begun to gain traction amongst advocates and practitioners of financial transparency.
In the report we chart the definitions, best practices, actors, issues and initiatives associated with the emerging issue of open budget data in different forms of digital media.
In doing so, our objective is to enable practitioners – in particular civil society organisations, intergovernmental organisations, governments, multilaterals and funders – to navigate this developing field and to identify trends, gaps and opportunities for supporting it.
How public money is collected and distributed is one of the most pressing political questions of our time, influencing the health, well-being and prospects of billions of people. Decisions about fiscal policy affect everyone-determining everything from the resourcing of essential public services, to the capacity of public institutions to take action on global challenges such as poverty, inequality or climate change.
Digital technologies have the potential to transform the way that information about public money is organised, circulated and utilised in society, which in turn could shape the character of public debate, democratic engagement, governmental accountability and public participation in decision-making about public funds. Data could play a vital role in tackling the democratic deficit in fiscal policy and in supporting better outcomes for citizens.
The report includes the following recommendations:
CSOs, IGOs, multilaterals and governments should undertake further work to identify, engage with and map the interests of a broader range of civil society actors whose work might benefit from open fiscal data, in order to inform data release priorities and data standards work. Stronger feedback loops should be established between the contexts of data production and its various contexts of usage in civil society – particularly in journalism and in advocacy.
Governments, IGOs and funders should support pilot projects undertaken by CSOs and/or media organisations in order to further explore the role of data in the democratisation of fiscal policy – especially in relation to areas which appear to have been comparatively under-explored in this field, such as tax distribution and tax base erosion, or tracking money through from revenues to results.
Governments should work to make data “citizen readable” as well as “machine readable”, and should take steps to ensure that information about flows of public money and the institutional processes around them are accessible to non-specialist audiences – including through documentation, media, events and guidance materials. This is a critical step towards the greater democratisation and accountability of fiscal policy.
Further research should be undertaken to explore the potential implications and impacts of opening up information about public finance which is currently not routinely disclosed, such as more detailed data about tax revenues – as well as measures needed to protect the personal privacy of individuals.
CSOs, IGOs, multilaterals and governments should work together to promote and adopt consistent definitions of open budget data, open spending data and open fiscal data in order to establish the legal and technical openness of public information about public money as a global norm in financial transparency.
This year I had the pleasure of meeting Dr. Peggy Spitzer Christoff, lecturer in Asian and Asian American Studies at Stony Brook University. She shared with me how she’s using the Library of Congress’ Viewshare tool to engage her students in an introduction to Asia Studies course. Peg talked about using digital platforms as a way to improve writing, visual and information literacy skills in her students. In this interview, she talks about why and how Viewshare is useful in connecting the students’ time “surfing the web” to creating presentations that require reflection and analysis.
Abbey: How did you first hear about Viewshare and what inspired you to use it in your classes?
Peg: I heard about it through the monthly Library of Congress Women’s History Discussion Group, about three years ago. At the time, Trevor Owens [former Library of Congress staff member] was doing presentations throughout the Library and he presented Viewshare to that group. It sounded like a neat way to organize information. Around the same time, I was developing the Department of Asian and Asian American Studies’ introductory (gateway) course for first and second year students at Stony Brook University. Faculty in our department were concerned that students couldn’t find Asian countries on a map and had very little understanding of basic information about Asia. I thought that developing a student project using Viewshare would enable each student to identify, describe and visually represent aspects of Asia of their choosing — as a launching pad for further exploration. Plus, I liked the idea of students writing paragraphs to describe each of the items they selected because it could help them become better writers. Finally, I wanted students to learn how to use an Excel spreadsheet in the context of a digital platform.
Abbey: So it sounds like the digital platforms project is allowing your students to explore a specific topic they may not be familiar with (i.e., Asian Studies) with a resource they are probably more familiar with (i.e., the web) while at the same time exposing them to basic data curation principles. Would you agree?
Peg: Yes. Combining these into one project has been so popular because we’ve broadened student interest in how collections are developed and organized.
Abbey: Why do you think Viewshare works well in the classroom?
Peg: Because students have the freedom to develop their own collections of Asian artifacts and, at the end of the semester, share their collections with each other. Students approach the assignment differently and it’s surprising to them (and me) to see how their interests in “Asia” change throughout the semester, as they develop their collections.
Abbey: Please walk us through how you approach teaching your students to use Viewshare in their assignments.
Peg: I introduce the Viewshare platform to engage students in critical thinking. The project requires students to select, classify, and describe the significance of Asian artifacts relating to subjects of common concern — education, health, religion and values, consumer issues, family and home, mobility, children, careers and work, entertainment and leisure, etc. Also, I want students to think about cultured spaces in India, Southeast Asia, China, Korea, Japan and Asian communities in the United States. I encourage students to consider the emotional appeal of the items, which could include anything from a photograph of the Demilitarized Zone (DMZ) in Korea, to ornamental jade pieces from China, to ancient religious texts from India, to anime from Japan. Food has a particularly emotional appeal, especially for college students.
Undergrad TAs have developed power point slides as “tutorials” on how to use Viewshare, which I post on Blackboard. We explore the website in class and everyone signs up for an account at the very beginning of the semester. The TA helps with troubleshooting. Four times throughout the semester, the students add several artifacts, I grade their written descriptions and the TA reviews their excel spreadsheet to correct format problems. Then, around the last few weeks of the semester, the students upload their excel spreadsheet into the Viewshare platform and generate maps, timelines, pie charts, etc. Here’s an example of a typical final project.
Abbey: How have your students reacted to using Viewshare?
Peg: Sometimes they are frustrated when they can’t get the platform to load correctly. Almost always they enjoy seeing the final result and would like to work more on it — if we only had more time during the semester.
Abbey: Do you see any possibilities for making more use of Viewshare?
Peg: I’d like to keep track of the Asian artifacts the students select and how they describe them over long periods of time — to interpret changes in student interests. (We have a large Asian population on campus and over 50% of my students are either Asian or Asian American.)
Also, my department would like to use the Viewshare platform to illustrate a collection of Asian connections to Long Island.
Abbey: Anything else to add?
Peg: I think Viewshare is really ideal for student projects. And I have used Viewshare in academic writing to organize data and illustrate patterns. I just cited a Viewshare view in a footnote.
Academic libraries support certain software by virtue of what they have available on their public computers, what their librarians are trained to use, and what instruction sessions they offer. Sometimes libraries don’t have a choice in the software they are tasked with supporting, but often they do. If the goal of the software support is to simply help students achieve success in the short term, then any software that the library already has a license for is fair game. If the goal is to teach them a tool they can rely on anywhere, then libraries must consider the impact of choosing open tools over commercial ones.
Suppose we have a student, we’ll call them “Student A”, who wants to learn about citation management. They see a workshop on EndNote, a popular piece of citation management software, and they decide to attend. Student A becomes enamored with EndNote and continues to grow their skills with it throughout their undergraduate career. Upon graduating, Student A gets hired and is expected to keep up with the latest research in their field, but suddenly they no longer have access to EndNote through their university’s subscription. They can either pay for an individual license, or choose a new piece of citation management software (losing all of their hard earned EndNote-specific skills in the process).
Now let’s imagine Student B who also wants to learn about citation management software but ends up going to a workshop promoting Zotero, an open source alternative to EndNote. Similar to Student A, Student B continues to use Zotero throughout their undergraduate career, slowly mastering it. Since Zotero requires no license to use, Student B continues to use Zotero after graduating, allowing the skills that served them as a student to continue to do so as a professional.
Which one of these scenarios do you think is more helpful to the student in the long run? By teaching our students to use tools that they will lose access to once outside of the university system, we are essentially handing them a ticking time bomb that will explode as they transition from student to professional, which happens to be one of the most vulnerable and stressful periods in one’s life. Any academic library that cares about the continuing success of their students once they graduate should definitely take a look at their list of current supported software and ask themselves, “Am I teaching a tool or a time bomb?”
“Telling VIVO Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing details about VIVO implementations for the community and beyond. The following interview includes personal observations that may not represent the opinions and views of Duke University or the VIVO Project. Carol Minton Morris from DuraSpace interviewed Julia Trimmer from Duke University to learn about Scholars@Duke.
Signals in Lucidworks Fusion leverage information about external activity, e.g., information collected from logfiles and transaction databases, to improve the quality of search results. This post follows on my previous post, Basics of Storing Signals in Solr with Fusion for Data Engineers, which showed how to index and aggregate signal data. In this post, I show how to write and debug query pipelines using this aggregated signal information.
User clicks provide a link between what people ask for and what they choose to view, given a set of search results, usually with product images. In the aggregate, if users have winnowed the set of search results for a given kind of thing, down to a set of products that are exactly that kind of thing, e.g., if the logfile entries link queries for “Netgear”, or “router”, or “netgear router” to clicks for products that really are routers, then this information can be used to improve new searches over the product catalog.The Story So Far
To show how signals can be used to improve search in an e-commerce application, I created a set of Fusion collections:
- A collection called “bb_catalog”, which contains Best Buy product data, a dataset comprised of over 1.2M items, mainly consumer electronics such as household appliances, TVs, computers, and entertainment media such as games, music, and movies. This is the primary collection.
- An auxiliary collection called “bb_catalog_signals”, created from a synthetic dataset over Best Buy query logs from 2011. This is the raw signals data, meaning that each logfile entry is stored as an individual document.
- An auxiliary collection called “bb_catalog_signals_aggr” derived from the data in “bb_catalog_signals” by aggregating all raw signal records based on the combination of search query, field “query_s”, item clicked on, field “doc_id_s”, and search categories, field “filters_ss”.
All documents in collection “bb_catalog” have a unique product ID stored in field “id”. All items belong to one of more categories which are stored in the field “categories_ss”.
The following screenshot shows the Fusion UI search panel over collection “bb_catalog”, after using the Search UI Configuration tool to limit the document fields displayed. The gear icon next to the search box toggles this control open and closed. The “Documents” settings are set so that the primary field displayed is “name_t”, the secondary field is “id”, and additional fields are “name_t”, “id”, and “category_ss”. The document in the yellow rectangle is a Netgear router with product id “1208844”.
For collection “bb_catalog_signals”, the search query string is stored in field “query_s”, the timestamp is stored in field “tz_timestamp_txt”, the id of the document clicked on is stored in field “doc_id_s”, and the set of category filters are stored in fields “filters_ss” as well as “filters_orig_ss”.
The following screenshot shows the results of a search for raw signals where the id of the product clicked on was “1208844”.
The collection “bb_catalog_signals_aggr” contains aggregated signals. In addition to the fields “doc_id_s”, “query_s”, and “filter_ss”, aggregated click signals contain fields:
- “count_i” – the number of raw signals found for this query, doc, filter combo.
- “weight_d” – a real-number used as a multiplier to boost the score of these documents.
- “tz_timestamp_txt” – all timestamps of raw signals, stored as a list of strings.
The following screenshot shows aggregated signals for searches for “netgear”. There were 3 raw signals where the search query “netgear” and some set of category choices resulted in a click on the item with id “1208844”:Using Click Signals in a Fusion Query Pipeline
Fusion&aposs Query Pipelines take as input a set of search terms and process them into Solr query request. The Fusion UI Search panel has a control which allows you to choose the processing pipeline. In the following screenshot of the collection “bb_catalog”, the query pipeline control is just below the search input box. Here the pipeline chosen is “bb_catalog-default” (circled in yellow):
The pre-configured default query pipelines consist of 3 stages:
- A Search Fields query stage, used to define common Solr query parameters. The initial configuration specifies that the 10 best-scoring documents should be returned.
- A Facet query stage which defines the facets to be returned as part of the Solr search results. No facet field names are specified in the initial defaults.
- A Solr query stage which transforms a query request object into a Solr query and submits the request to Solr. The default configuration specifies the HTTP method as a POST request.
In order to get text-based search over the collection “bb_catalog” to work as expected, the Search Field query stage must be configured to specify the set of fields that which contain relevant text. For the majority of the 1.2M products in the product catalog, the item name, found in field “name_t” is only field amenable to free text search. The following screenshot shows how to add this field to the Search Fields stage by editing the query pipeline via the Fusion 2 UI:
The search panel on the right displays the results of a search for “ipad”. There were 1,359 hits for this query, which far exceeds the number of items that are an Apple iPad. The best scoring items contain “iPad” in the title, sometimes twice, but these are all iPad accessories, not the device itself.Recommendation Boosting query stage
A Recommendation Boosting stage uses aggregated signals to selectively boost items in the set of search results. The following screenshot show the results of the same search after adding a Recommendations Boosting stage to the query pipeline:
The edit pipeline panel on the left shows the updated query pipeline “bb_catalog-default” after adding a “Recommendations Boosting” stage. All parameter settings for this stage have been left at their default values. In particular, the recommendation boosts are applied to field “id”. The search panel on the right shows the updated results for the search query “ipad”. Now the three most relevant items are for Apple iPads. They are iPad 2 models because the click dataset used here is based on logfile data from 2011, and at that time, the iPad 2 was the most recent iPad on the market. There were more clicks on the 16GB iPads over the more expensive 32GB model, and for the color black over the color white.Peeking Under the Hood
Of course, under the hood, Fusion is leveraging the awesome power of Solr. To see how this works, I show both the Fusion query and the JSON of the Solr response. To display the Fusion query, I go into the Search UI Configuration and change the “General” settings and check the set “Show Query URL” option. To see the Solr response in JSON format, I change the display control from “Results” to “JSON”.
The following screenshot shows the Fusion UI search display for “ipad”:
The query “ipad” entered via the Fusion UI search box is transformed into the following request sent to the Fusion REST-API:/api/apollo/query-pipelines/bb_catalog-default/collections/bb_catalog/select?fl=*,score&echoParams=all&wt=json&json.nl=arrarr&sort&start=0&q=ipad&debug=true&rows=10
This request to the Query Pipelines API sends a query through the query pipeline “bb_catalog-default” for the collection “bb_catalog” using the Solr “select” request handler, where the search query parameter “q” has value “ipad”. Because the parameter “debug” has value “true”, the Solr response contains debug information, outlined by the yellow rectangle. The “bb_catalog-default” query pipeline transforms the query “ipad” into the following Solr query:"parsedquery": "(+DisjunctionMaxQuery((name_t:ipad)) id:1945531^4.0904393 id:2339322^1.5108471 id:1945595^1.0636971 id:1945674^0.4065684 id:2842056^0.3342921 id:2408224^0.4388061 id:2339386^0.39254773 id:2319133^0.32736558 id:9924603^0.1956079 id:1432551^0.18906432)/no_coord"
The outer part of this expression, “( … )/no_coord” is a reporting detail, indicating Solr&aposs “coord scoring” feature wasn&apost used.
The enclosed expression consists of:
- The search: “+DisjunctionMaxQuery(name_t:ipad)”.
- A set of selective boosts to be applied to the search results
The field name “name_t” is supplied by the set of search fields specified by the Search Fields query stage. (Note: if no search fields are specified, the default search field name “text” is used. Since the documents in collection “bb_catalog” don&apost contain a field named “text”, this stage must be configured with the appropriate set of search fields.)
The Recommendations Boosting stage was configured with the default parameters:
- Number of Recommendations: 10
- Number of Signals: 100
There are 10 documents boosted, with ids ( 1945531, 2339322, 1945595, 1945674, 2842056, 2408224, 2339386, 2319133, 9924603, 1432551 ). This set of 10 documents represents documents which had at least 100 clicks where “ipad” occurred in the user search query. The boost factor is a number derived from the aggregated signals by the Recommendation Boosting stage. If those documents contain the term “name_t:ipad”, then they will be boosted. If those documents don&apost contain the term, then they won&apost be returned by the Solr query.
To summarize: adding in the Recommendations Boosting stage results in a Solr query where selective boosts will be applied to 10 documents, based on clickstream information from an undifferentiated set of previous searches. The improvement in the quality of the search results is dramatic.Even Better Search
Adding more processing to the query pipeline allows for user-specific and search-specific refinements. Like the Recommendations Boosting stage, these more complex query pipelines leverage Solr&aposs expressive query language, flexible scoring, and lightning fast search and indexing. Fusion query pipelines plus aggregated signals give you the tools you need to rapidly improve the user search experience.
We are pleased to announce the publication of 10 new exhibitions created by DPLA Hubs and public librarian participants in our Public Library Partnerships Project (PLPP), funded by the Bill and Melinda Gates Foundation. Over the course of the last six months, curators from Digital Commonwealth, Digital Library of Georgia, Minnesota Digital Library, the Montana Memory Project, and Mountain West Digital Library researched and built these exhibitions to showcase content digitized through PLPP. Through this final phase of the project, public librarians had the opportunity to share their new content, learn exhibition curation skills, explore Omeka for future projects, and contribute to an open peer review process for exhibition drafts.
Congratulations to all of our curators and, in particular, our exhibition organizers: Greta Bahnemann, Jennifer Birnel, Hillary Brady, Anna Fahey-Flynn, Greer Martin, Mandy Mastrovita, Anna Neatrour, Carla Urban, Della Yeager, and Franky Abbott.
Thanks to the following reviewers who participated in our open peer review process: Dale Alger, Cody Allen, Greta Bahnemann, Alexandra Beswick, Jennifer Birnel, Hillary Brady, Wanda Brown, Anne Dalton, Carly Delsigne, Liz Dube, Ted Hathaway, Sarah Hawkins, Jenny Herring, Tammi Jalowiec, Stef Johnson, Greer Martin, Sheila McAlister, Lisa Mecklenberg-Jackson, Tina Monaco, Mary Moore, Anna Neatrour, Michele Poor, Amy Rudersdorf, Beth Safford, Angela Stanley, Kathy Turton, and Carla Urban.
As you may have read here, school libraries are well represented in S. 1177, the Every Child Achieves Act. In fact, we were more successful with this bill than we have been in recent history and this is largely due to your efforts in contacting Congress.
Currently, the House Committee on Education and Workforce (H.R. 5, the Student Success Act) and the Senate Committee on Health, Education, Labor and Pensions are preparing to go to “conference” in an attempt to work out differences between the two versions of the legislation and reach agreement on reauthorization of ESEA. ALA is encouraged that provisions included under S. 1177, would support effective school library programs. In particular, ALA is pleased that effective school library program provisions were adopted unanimously during HELP Committee consideration of an amendment offered by Senator Whitehouse (D-RI)) and on the Senate floor with an amendment offered by Senators Reed (D-RI) and Cochran (R-MS).
ALA is asking (with your help!) that any conference agreement to reauthorize ESEA maintain the following provisions that were overwhelmingly adopted by the HELP Committee and the full Senate under S. 1177, the Every Child Achieves Act:
- Title V, Part H – Literacy and Arts Education – Authorizes activities to promote literacy programs that support the development of literacy skills in low-income communities (similar to the Innovative Approaches to Literacy program that has been funded through appropriations) as well as activities to promote arts education for disadvantaged students.
- Title I – Improving Basic Programs Operated by State and Local Educational Agencies – Under Title I of ESEA, State Educational Agencies (SEAs) and local educational agencies (LEAs) must develop plans on how they will implement activities funded under the Act.
- Title V, Part G – Innovative Technology Expands Children’s Horizons (I-TECH) – Authorizes activities to ensure all students have access to personalized, rigorous learning experiences that are supported through technology and to ensure that educators have the knowledge and skills to use technology to personalize learning.
Now is the time to keep the momentum going! Contact your Senators and Representative to let them know that you support the effective school library provisions found in the Senate bill and they should too!
Do you want to be part of the magic of AccessYYZ? Well, aren’t you lucky? Turns out we’re looking for some convenors!
Convening isn’t much work (not that we think you’re a slacker or anything)–all you have to do is introduce the name of the session, read the bio of the speaker(s), and thank any sponsors. Oh, and facilitate any question and answer segments. Which doesn’t actually mean you’re on the hook to come up with questions (that’d be rather unpleasant of us) so much as you’ll repeat questions from the crowd into the microphone. Yup, that’s it. We’ll give you a script and everything!
In return, you’ll get eternal gratitude from the AccessYYZ Organizing Committee. And also a high five! If you’re into that sort of thing. Even if you’re not, you’ll get to enjoy the bright lights and the glory that comes with standing up in front of some of libraryland’s most talented humans for 60 seconds. Sound good? We thought so.
You can dibs a session by filling out the Doodle poll.
The world needs a good, sane in-browser editing component, one that edits document structure (headings, lists, quotes etc) rather than format (font, size etc). I’ve been thinking for a while that an editing component based around Markdown (or Commonmark) would be just the thing. Markdown/Commonmark is effectively a spec for the minimal sensible markup set for documents, it’s more than adequate for articles, theses, reports etc. And it can be extended with document semantics.
Anyway, there’s a crowdfunding campaign going on for an editor called ProseMirror that does just that, and promises collaborative editing as well. It’s beta quality but looks promising, I chipped in 50 Euros to try to get it over the line to be released as open source.
The author says:Who I am
There’s a lot to like with this editor - it has a nice floating toolbar that pops up at the right of the paragraph, with a couple of non-quite-standard behaviours that just might catch on. Mostly works, but has some really obvious bugs usability issues , like when I try to make a nested list it makes commonmark like this:* List item * List item * * List item
And it even renders the two bullets side by side in the HTML view. Even thought that is apparently supported by commonmark, for a prose editor it’s just wrong. Nobody means two bullets unless they’re up to no good, typographically speaking.
The editor should do the thing you almost certainly mean. Something like:* List item * List item * List item
But, if that stuff gets cleaned up then this will be perfect for producing Scholarly Markdown, and Scholarly HTML. The $84 AUD means I’ll get priority on a reporting a bug, assuming it reaches its funding goal.
- Entity-oriented search – Searching not by keyword, but by entities (concepts in a certain domain)
- Knowledge graphs – Leveraging relationships amongst entities: Linked Data datasets (Freebase, DbPedia, Custom …)
- Search assistance – Autocomplete and Spellchecking are now common features, but using semantic data makes it possible to offer smarter features, driving the users to build queries in a natural way.
The post Apache Solr for Multi-language Content Discovery Through Entity Driven Search appeared first on Lucidworks.
Figure 1. “Les Miserables” Co-occurrence Matrix by Mike Bostock.
In Fusion we generate a similar type of matrix, where each of the items is one of the types specified when configuring the system. The value in each cell will then be the frequency of co-occurrence for any two given items e.g. a (query, document) pair, a (query, query) pair, a (user, query) pair etc.
For example, if the query “Les Mis” and a click on the web page for the musical appear together in the same user session then they will be treated as having co-occurred. The frequency of co-occurrence is then the number of times this has happened in the raw event logs being processed.3. Generating a Graph from the Matrix The co-occurrence matrix from the previous step can also be treated as an “adjacency matrix”, which encodes whether two vertices (nodes) in a graph are “adjacent” to each other i.e. have a link or “co-occur”. This matrix can then be used to generate a graph, as shown in Figure 2:
Figure 2. Generating a Graph from a Matrix.
Here the values in the matrix are the frequency of co-occurrence for those two vertices. We can see that in the graph representation these are stored as “weights” on the edge (link) between the nodes e.g. nodes V2 and V3 co-occurred 5 times together.
We encode the graph structure in a collection in Solr using a simple JSON record for each node. Each record contains fields that list the IDs of other nodes that point “in” at this record, or which this node points “out” to.Fusion provides an abstraction layer which hides the details of constructing queries to Solr to navigate the graph. Because we know the IDs of the records we are interested in we can generate a single boolean query where the individual IDs we are looking for are separated by OR operators e.g. (id:3677 OR id:9762 OR id:1459). This means we only make a single request to Solr to get the details we need.
In addition, the fact that we are only interested in the neighborhood graph around a start point means the system does not have to store the entire graph (which is potentially very large) in memory.4. Powering Recommendations from the Graph
At query/recommendation time we can use the graph to make suggestions on which other items in that graph are most related to the input item, using the following approach:
- Navigate the co-occurrence graph out from the seed item to harvest additional entities (documents, users, queries).
- Merge the list of entities harvested from different nodes in the graph so that the more lists an entity appears in the more weight it receives and the higher it rises in the final output list.
- Weights are based on the reciprocal rank of the overall rank of the entity. The overall rank is calculated as the sum of the rank of the result the entity came from and the rank of the entity within its own list.
The following image shows the graph surrounding the document “Midnight Club: Los Angeles” from a sample data set:
Figure 3. An Example Neighborhood Graph.
Here the relative size of the nodes shows how frequently they occurred in the raw event data, and the size of the arrows is a visual indicator of the weight or frequency of co-occurrence between two elements.
For example, we can see that the query “midnight club” (blue node on bottom RHS) most frequently resulted in a click on the “Midnight Club: Los Angeles Complete Edition Platinum Hits” product (as opposed to the original version above it). This is the type of information that would be useful to a business analyst trying to understand user behavior on a site.Diversity in Recommendations For a given item, we may only have a small number of items that co-occur with it (based on the co-occurrence matrix). By adding in the data from navigating the graph (which comes from the matrix), we increase the diversity of suggestions. Items that appear in multiple source lists then rise to the top. We believe this helps improve the quality of the recommendations & reduce bias. For example, in Figure 4 we show some sample recommendations for the query “Call of Duty”, where the recommendations are coming from a “popularity-based” recommender i.e. it gives a large weight to items with the most clicks. We can see that the suggestions are all from the “Call of Duty” video game franchise:
Figure 4. Recommendations from a “popularity-based” recommender system.
In contrast, in Figure 5 we show the recommendations from EventMiner for the same query:
Figure 5. Recommendations from navigating the graph.
Here we can see that the suggestions are now more diverse, with the first two being games from the same genre (“First Person Shooter” games) as the original query.
In the case of an e-commerce site, diversity in recommendations can be an important factor in suggesting items to a user that are related to their original query, but which they may not be aware of. This in turn can help increase the overall CTR (Click-Through Rate) and conversion rate on the site, which would have a direct positive impact on revenue and customer retention.Evaluating Recommendation Quality To evaluate the quality of the recommendations produced by this approach we used CrowdFlower to get user judgements on the relevance of the suggestions produced by EventMiner. Figure 6 shows an example of how a sample recommendation was presented to a human judge:
Figure 6. Example relevance judgment screen (CrowdFlower).
Here the original user query (“resident evil”) is shown, along with an example recommendation (another video game called “Dead Island”). We can see that the judge is asked to select one of four options, which is used to give the item a numeric relevance score:
- Off Topic
Here we can see that the average relevance score across all judgements was 3.27 i.e. “good” to “excellent”.Conclusion If you want an “out-of-the-box” recommender system that generates high-quality recommendations from your data please consider downloading and trying out Lucidworks Fusion.
We are delighted to announce that the University of Michigan has become the latest formal Hydra Partner. Maurice York, their Associate University Librarian for Library Information Technology, writes:
“The strength, vibrancy and richness of the Hydra community is compelling to us. We are motivated by partnership and collaboration with this community, more than simply use of the technology and tools. The interest in and commitment to the community is organization-wide; last fall we sent over twenty participants to Hydra Connect from across five technology and service divisions; our showing this year will be equally strong, our enthusiasm tempered only by the registration limits.”
Welcome Michigan! We look forward to a long collaboration with you.
Now, any library , organization or company that signs the pledge will have 6 months to implement HTTPS from the effective date of their signature. This should give everyone plenty of margin to do a good job on the implementation.
We pushed back our launch date to the first week of November. That's when we'll announce the list of "charter signatories". If you want your library, company or organization to be included in the charter signatory list, please send an e-mail to email@example.com.
The Let's Encrypt project will be launching soon. They are just one certificate authority that can help with HTTPS implementation.
I think this is an very important step for the library information community to take, together. Let's make it happen.
Here's the finalized pledge:
The Library Freedom Project is inviting the library community - libraries, vendors that serve libraries, and membership organizations - to sign the "Library Digital Privacy Pledge of 2015". For this first pledge, we're focusing on the use of HTTPS to deliver library services and the information resources offered by libraries. It’s just a first step: HTTPS is a privacy prerequisite, not a privacy solution. Building a culture of library digital privacy will not end with this 2015 pledge, but committing to this first modest step together will begin a process that won't turn back. We aim to gather momentum and raise awareness with this pledge; and will develop similar pledges in the future as appropriate to advance digital privacy practices for library patrons.
We focus on HTTPS as a first step because of its timeliness. The Let's Encrypt initiative of the Electronic Frontier Foundation will soon launch a new certificate infrastructure that will remove much of the cost and technical difficulty involved in the implementation of HTTPS, with general availability scheduled for September. Due to a heightened concern about digital surveillance, many prominent internet companies, such as Google, Twitter, and Facebook, have moved their services exclusively to HTTPS rather than relying on unencrypted HTTP connections. The White House has issued a directive that all government websites must move their services to HTTPS by the end of 2016. We believe that libraries must also make this change, lest they be viewed as technology and privacy laggards, and dishonor their proud history of protecting reader privacy.
The 3rd article of the American Library Association Code of Ethics sets a broad objective:
We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.It's not always clear how to interpret this broad mandate, especially when everything is done on the internet. However, one principle of implementation should be clear and uncontroversial: Library services and resources should be delivered, whenever practical, over channels that are immune to eavesdropping.
The current best practice dictated by this principle is as following: Libraries and vendors that serve libraries and library patrons, should require HTTPS for all services and resources delivered via the web.
The Pledge for Libraries:
1. We will make every effort to ensure that web services and information resources under direct control of our library will use HTTPS within six months. [ dated______ ]
2. Starting in 2016, our library will assure that any new or renewed contracts for web services or information resources will require support for HTTPS by the end of 2016.
The Pledge for Service Providers (Publishers and Vendors):
1. We will make every effort to ensure that all web services that we (the signatories) offer to libraries will enable HTTPS within six months. [ dated______ ]
2. All web services that we (the signatories) offer to libraries will default to HTTPS by the end of 2016.
The Pledge for Membership Organizations:
1. We will make every effort to ensure that all web services that our organization directly control will use HTTPS within six months. [ dated______ ]
2. We encourage our members to support and sign the appropriate version of the pledge.
There's a FAQ available, too. All this will soon be posted on the Library Freedom Project website.
This is the good stuff.
When employees negotiate, they negotiate for improved compensation, since nothing else is on the table.
“Rather than placing tech leaders on a pedestal, we should put their successes”
Synced lamps as part of a band’s performance
Jail time for a brown lawn? A wonderfully weird dive into the moral implications of lawncare
Swim through the air