You are here

Feed aggregator

DuraSpace News: VIVO Updates for June 26–VIVO 1.9 beta On the Way, Persistent Identifiers

planet code4lib - Tue, 2016-06-28 00:00

Conference early bird deadline extended to Tuesday June 28.  And the conference program is available on line!  And we have a new keynote, Dario Taraborelli of Wikidata! Register today at http://vivoconference.org

Shelley Gullikson: UXLibs II: Conference Notes

planet code4lib - Mon, 2016-06-27 22:54

As always with my conference notes, this isn’t a faithful summing up, but rather a few of the points that stuck out most for me. I’ll follow this up with a more reflective piece.

I haven’t added in anything about my own presentation, but have uploaded the pdf version of it: “From user-testing to user research: Collaborating to improve library websites.” I’ve also uploaded the pdf version of my poster: “Cram it all in! Exploring delight in the research process. And Summon. Oh, and subject guides too” in case you’re interested.

Andy Priestner: Opening address

Andy told us a couple of stories about his recent experiences on trains in Hong Kong and Melbourne. Despite the language barrier, he found the Hong Kong trains to be much easier to use, and in fact, made the experience so enjoyable that he and his family sought out opportunities to take the train: “Hey, if we go to that restaurant across town instead of the one down the street we could take the train!”(this isn’t a direct quote)

My notes on this read:

How can we help students not feel like they’re in a foreign place in the library?

How can we help the library feel desirable?

But now that I think about it, that first point is totally unecessary. Feeling like you’re in a foreign place isn’t the problem; it can actually be quite wonderful and exciting. Being made to feel unwelcome is the problem, regardless of whether the place is foreign or familiar. So I quite like the idea of trying to make the library feel desirable. I think my own library does this reasonably well with our physical space (we’re often full to bursting with students) but it’s a nice challenge for our virtual spaces.

Andy also talked about Ellen Isaacs idea of “the hidden obvious” when describing library staff reaction to his team’s user research findings. He also mentioned Dan North on uncertainty: “We would rather be wrong than be uncertain.” These two ideas returned at other times during the next two days.

Donna Lanclos: Keynote

Donna also told us stories. She told us stories about gardens and her mother’s advice that if you plant something new and it dies, you plant something else. With “Failed” as one of the conference streams, this key next step of “plant something else” is important to keep in mind. Failing and then learning from failure is great. But we must go on to try again. We must plant something else. Not just say “well, that didn’t work, let’s figure out what we learned and not do that again.” Plant something else.

Donna’s mother also said, though, that “sometimes the plant dies because of you.” So that maybe, sometimes, it’s not that you need to plant something else. You just need to plant the same thing and be more careful with it. Or maybe someone else should plant it or look after it.

Another point from this garden story was that there are always people in the library who take particular pains to keep lists of all the dead plants. People who say “we tried that before and it didn’t work.” Or who make it clear they think you shouldn’t try to plant anything at all. Or who cling too strongly to some of those dead plants; who never intend to plant again because of it. Don’t keep a list of the dead plants. Or maybe keep a list but not at the forefront of your mind.

Donna told us another story about her fieldwork in Northern Ireland. How she found it difficult to be gathering folklore when there were bigger issues; problems that needed fixing. Advice she got then and passed on to us was that just because you can’t fix problems with your ethnographic work doesn’t mean that you can’t do anything, that you aren’t doing anything. Gathering understanding – a new and different understanding – is valid and valuable work and it’s different work than solving problems.

She argued that ethnographic work is not about finding and solving problems but about meaning. Finding out what something means, or if you don’t know what it means, figuring out what you think it means. The work can help with small wins but is really about much more. This is a theme Donne and Andrew discussed further in the wrap-up panel on Friday.

Finally, I have this note that I can’t at all remember the context for, but boy do I like it anyway:

Not risk, but possibility

Jenny Morgan: UX – Small project/ high value?

Jenny’s was the first of the Nailed, Failed, Derailed sessions I attended and she was a wonderfully calm presenter – something I always admire since I often feel like a flailing goon. She spoke about a project she led, focusing on international students at her library at Leeds Beckett University. A couple of my take-aways:

  • They asked students how they felt about the library. I like this affective aspect and think it ties in with what Andy was talking about with making the library desirable.
  • Students don’t think of the whole building; despite the library making printers available in the same place on every floor, students didn’t realize there were printers on any floor other than the one they were on. As a consequence, students would stand in line to use printers on one floor instead of going to another floor where printers were available. Of course this makes sense, but library staff often think of the whole building and forget that our users only use, see, and know about a tiny portion.
  • The international students they spoke to found the library too noisy and were hesitant to ask the “home” students to be quiet. They didn’t like the silent study areas or the study carrels; they wanted quiet, but not silent.
  • International students are often on campus at times when “home” students are not (e.g. holidays, break times). They like going to the library for the community that they can’t find elsewhere, often because everywhere else is closed. This hit home for me because our campus really shuts down at the Christmas break, and even the library is closed. It made me wonder where our international students go for that feeling of community.
Carl Barrow: Getting back on the rails by spreading the load

One of the first things that struck me about Carl’s presentation was his job title – Student Engagement Manager – and that Web is included under his purview. I think I would love that job.

Carl was really open and honest in his presentation. He talked about being excited about what he learned at UXLibs and wanting to start doing user research with those methods, but feeling hesitant. And then he looked deeper into why he was feeling hesitant, and realized part of it was his own fear of failure. Hearing him be so honest about how his initial enthusiasm was almost sidetracked by fear was really refreshing. Conference presenters usually (and understandably) want to come off as polished and professional, and talking about feelings tends not to enter into it. But it makes so much sense at a UX conference – where we spend a fair bit of time talking about our users’ feelings – to talk about our own feelings as well. I really appreciated this about Carl’s talk. A few other points I noted down:

  • He trained staff on the ethnographic methods he wanted to use and then (this is the really good bit) he had them practice those methods on students who work in the library. This seemed to me to be a great way for staff to ease in: unfamiliar methods made less scary by using them with familiar people.
  • Something that made me think of Andy’s point about “the hidden obvious”: they realized through their user research that the silent reading room had services located in the space (e.g. printers, laptop loans) that made it rather useless for silent study. I personally love how user research can make us see these things, turning “the hidden obvious” to “the blindingly obvious.”
  • I just like this note of mine: “Found that signage was bad. (Signage is always bad.)”
  • They found that because people were not sure what they could do from the library’s information points (computer kiosk-type things), they simply stayed away from them. At my own library, trying to make our kiosks suck less is one of my next projects, so this was absolutely relevant to me.
Deirdre Costello: Sponsor presentation from EBSCO

Last year, Deirdre rocked her sponsor presentation and this year was no different. I was still a bit loopy from having done my own presentation and then gone right to my poster, so honestly, this was the only sponsor presentation I took notes on. My brain went on strike for a bit after this.

Deirdre talked about how to handle hard questions when you’re either presenting user research results, or trying to convince someone to let you do user research in the first place. One of those was “Are you sure about your sample?” and she said the hidden questions behind this was “Are you credible?” It reminded me about a presentation I did where I (in part) read out a particularly insightful love letter from a user, and someone’s notes on that part of the presentation read “n=1”: surely meant to be a withering slam.

Other points I took away from Deirdre:

  • Sometimes you need to find ways for stakeholders to hear the message from someone who is not you (her analogy was that you can become a teenager’s mom; once you’ve said something once, they can’t stand to hear the same thing from you again).
  • One great way of doing the above is through videos with student voices. She said students like being on video and cracking jokes, and this can create a valuable and entertaining artifact to show your stakeholders.
  • Again related to all this, Deidre talked about the importance of finding champions who can do things you can’t. She said that advocacy requires a mix of swagger and diplomacy, and if you’re too much on the swagger side then you need a champion who can do the diplomacy part for you.
Andrea Gasparini: A successful introduction of User Experience as a strategic tool for service and user centric organizations

Apologies to Andrea: I know I liked his session but the notes I took make almost no sense at all. I got a bit distracted when he was talking about his co-author being a product designer at his library at the University of Oslo. The day before I came to UXLibs II, I met with Jenn Phillips-Bacher who was one of my team-mates at the first UXLibs. Jenn does fabulously cool things at the Wellcome Library and is getting a new job title that includes either “product designer” or “product manager” and we had talked a bit about what that means and how it changes things for her and for the library. That discussion came back to me during Andrea’s session and took me away from the presentation at hand for a while.

The only semi-coherent note I do have is:

  • Openness to design methods implies testing and learning
Ingela Wahlgren: What happens when you let a non-user loose in the library?

Ingela described how a whole range of methods were used at Lund University library to get a bigger picture of their user experience. She then went into depth about a project that she and her colleague Åsa Forsberg undertook, trying to get the non-user’s perspective.

One UX method that was taught at last year’s UXLibs was “touchstone tours,” where a user takes the researcher on a tour of a space (physical or virtual). This lets the researcher experience the space from the user’s point of view and see the bits that are most useful or meaningful to them. Ingela and  Åsa wanted to have a non-user of the library take them on a touchstone tour. They might see useful and meaningful parts of the library, but more importantly would see what was confusing and awful for a new user. I thought this was a brilliant idea!

Most of the presentation, then, was Ingela taking the audience along for the touchstone tour she had with a non-user. With lots of pictures of what they had seen and experienced, Ingela clearly demonstrated how utterly frustrating the experience had been. And yet, after this long and frustrating experience, the student proclaimed that it had all gone well and she was very satisfied. ACK! What a stunningly clear reminder that what users say is not at all as important as what they do, and also how satisfaction surveys do not tell us the true story of our users’ experience.

Ingela won the “best paper” prize for this presentation at the gala dinner on Thursday night. Well-deserved!

Team Challenge

The team challenge this year focused on advocacy. There were three categories:

  • Marketing Up (advocating to senior management)
  • Collaboration (advocating to colleagues in other areas)
  • Recruitment (advocating to student groups)

Attendees were in groups of about 8 and there were 5 groups per category. We had less than 2 hours on Thursday and an additional 45 minutes on Friday to prepare our 7-minute pitches to our respective audiences. I was in team M1, so Marketing Up to senior management. I’m going to reflect on this in my Conference Thoughts post, but there are a few notes below from the other teams’ presentations.

Andy Priestner: Welcome to Day 2

Friday was a sombre day, with the results of the Brexit vote. Andy has written a lovely post about writing and delivering his Welcome to Day 2 speech. I will have my own reflections in my upcoming Conference Thoughts post. But suffice it to say, Andy’s speech was spot-on, clearly appreciated by the audience, and left me rather teary.

Lawrie Phipps: Keynote

I got a bit lost at some of the UK-specific vocabulary and content of Lawrie’s keynote, but he made some really rather wonderful points:

  • Don’t compromise the vision you have before you share it. He talked about how we often anticipate responses to our ideas before we have a chance to share them, and that this can lead to internally deciding on compromises. His point was that if you make those compromises before you’ve articulated your vision to others, you’re more likely to compromise rather than sticking to your guns. Don’t compromise before it’s actually necessary.
  • Incremental changes, when you make enough of them, can be transformative. You don’t have to make a huge change in order to make a difference. This was nice to hear because it’s absolutely how I approach things, particularly on the library website.
  • Use your external network of people to tell your internal stakeholders things because often external experts are more likely to be listened to or believed. (Deirdre Costello had said pretty much the same thing in her presentation. It can be hard on the ego, but is very often true.)
  • “Leadership is often stealthy.” Yes, I would say that if/when I show leadership, it is pretty much always stealthy.
  • Finally, Lawrie talked about the importance of documenting your failures. It’s not enough to fail and learn from your failures, you have to document them so that other people learn from them too, otherwise the failure is likely to be repeated again and again.
Team Challenge Presentations

I didn’t take as many notes as I should have during the team presentations. The other teams in my group certainly raised a lot of good points, but the only one I made special note of was from Team M5:

  • There are benefits to students seeing our UX work, even when they aren’t directly involved. It demonstrates that we care. Students are often impressed that the library is talking to students or observing student behaviour – that we are seeking to understand them. This can go a long way to generating goodwill and have students believe that we are genuinely trying to help them.

My team (M1) ended up winning the “Marketing Upwards” challenge, which was rather nice although I don’t think any of us were keen to repeat our pitch to the whole conference! We thought the fire alarm might get us out of it, but no luck. (Donna Lanclos – one of our judges – later said that including the student voice and being very specific about what we wanted were definitely contributing factors in our win. This feels very “real world” to me and was nice feedback to hear.)

There were a couple of points from the winning Collaboration team (C4) that I took note of:

  • Your networks are made up of people who are your friends, and people who may owe you favours. Don’t be afraid to make use of that.
  • Even if a collaborative project fails, the collaboration itself can still be a success. Don’t give up on a collaborative relationship just because the outcome wasn’t what you’d hoped.

Again, my brain checked out a bit during team R2’s winning Recruitment pitch. (I was ravenous and lunch was about to begin.) There was definitely uproarious laughter for Bethany Sherwood’s embodiment of the student voice.

Andrew Asher: Process Interviews

I chose the interviews workshop with Andrew Asher because when I was transcribing interviews I did this year, I was cringing from time to time and knew I needed to beef up my interview skills. I was also keen to get some help with coding because huge chunks of those interviews are still sitting there, waiting to be analyzed. Some good bits:

  • You generally will spend 3-4 hours analyzing for each 1 hour interview
  • Different kinds of interviews: descriptive (“tell me about”), demonstration (“show me”), and elicitation (using prompts such as cognitive maps, photos)
  • Nice to start with a throwaway question to act as an icebreaker. (I know this and still usually forget to include it. Maybe now it will stick.)

We practiced doing interviews and reflected on that experience. I was an interviewee and felt bad that I’d chosen a situation that didn’t match the questions very well. It was interesting to feel like a participant who wanted to please the interviewer, and to reflect on what the interviewer could have said to lessen the feeling that I wasn’t being a good interviewee. (I really don’t know the answer to that one.)

We looked at an example of a coded interview and practiced coding ourselves. There wasn’t a lot of time for this part of the workshop, but it’s nice to have the example in-hand, and also to know that there is really no big trick to it. Like so much, it really just takes doing it and refining your own approach.

Andy Priestner: Cultural Probes

I had never heard of cultural probes before this, and Andy started with a description and history of their use. Essentially, cultural probes are kits of things like maps, postcards, cameras, and diaries that are given to groups of people to use to document their thoughts, feelings, behaviour, etc.

Andy used cultural probes earlier this year in Cambridge to explore the lives of postdocs. His team’s kit included things like a diary pre-loaded with handwritten questions for the participants to answer, task envelopes that they would open and complete at specific times, pieces of foam to write key words on, and other bits and pieces. They found that the participants were really engaged with the project and gave very full answers. (Perhaps too full; they’re a bit overwhelmed with the amount of data the project has given them.)

After this, we were asked to create a cultural probe within our table groups. Again, there wasn’t a lot of time for the exercise but all the groups managed to come up with something really interesting.

I loved this. In part it was just fun to create (postcards, stickers, foam!) but it was also interesting to try to think about what would make it fun for participants to participate.  When I was doing cognitive maps and love letters/break-up letters with students last summer, one of them was really excited by how much fun it had been – so much better than filling out a survey. It’s easier to convince someone to participate in user research if they’re having a good time while doing it.

Panel Discussion (Ange, Andrew, Lawrie, Donna, Matthew)

The next-to-last thing on the agenda was a panel discussion. We’d been asked to write down any questions we had for the panelists ahead of time and Ned Potter chose a few from the pile. A few notes:

  • In response to a question about how to stop collecting data (which is fun) and start analyzing it (which is hard), Matthew Reidsma recommended the book Just Enough Research by Erika Hall. Other suggestions were: finding an external deadline by committing to a conference presentation or writing an article or report, working with a colleague who will keep you to a deadline, or having a project that relies on analyzing data before the project can move forward
  • Responding to a question about any fears about the direction UX in libraries is taking, Donna spoke about the need to keep thinking long-term; not to simply use UX research for quick wins and problem-solving, but to really try to create some solid and in-depth understanding. I think it was Donna again who said that we can’t just keep striking out on our own with small projects; we must bring our champions along with us so that we can develop larger visions. Andrew and Donna are working on an article on this very theme for an upcoming issue of Weave.
  • I don’t remember what question prompted this, but Ange Fitzpatrick talked about how she and colleague were able to get more expansive responses from students when they didn’t identify themselves as librarians. However, as team M5 had already mentioned and I believe it was Donna who reiterated at this point: students like to know that the library wants to know about them and cares about knowing them.
  • Finally, to a question about how to choose the most useful method for a given project, there were two really good responses. Andrew said to figure out what information you need and what you need to do with that information, and then pick a method that will help you with those two things. He recommended the ERIAL toolkit (well, Donna recommended it really, but Andrew wrote the toolkit, so I’ll credit him). And Matthew responded that you don’t have to choose the most useful method, you just have to choose a useful method.
Andy Priestner: Conference Review

Andy ended the day with a nice wrap-up and call-out to the positive collaborations that had happened and would continue to happen in the UXLibs community. He also got much applause ending his review with “I am a European.”

Like last year, I left exhausted and exhilarated, anxious to put some of these new ideas into practice, and hoping to attend another UXLibs conference. Next year?

 


William Denton: Collaboration is not causation

planet code4lib - Mon, 2016-06-27 22:03

Good to remember when you embark on a project with someone, both of you full of good intentions that it will be completed soon, but a bit vague on who will do what work how: “Collaboration is not causation.”

Jeremy Frumkin: Libraries and the state of the Internet

planet code4lib - Mon, 2016-06-27 19:04

Mary Meeker presented her 2016 Internet Trends report earlier this month. If you want a better understanding of how tech and the tech industry is evolving, you should watch her talk and read her slides.

This year’s talk was fairly time constrained, and she did not go into as much detail as she has in years past. That being said, there is still an enormous amount of value in the data she presents and the trends she identifies via that data.

Some interesting takeaways:

  • The growth in total number of internet users worldwide is slowing (the year-to-year growth rate is flat; overall growth is around 7% new years per year)
  • However, growth in India is still accelerating, and India is now the #2 global user market (behind China; USA is 3rd)
  • Similarly, there is a slowdown in the growth of the number of smartphone users and number of smartphones being shipped worldwide (still growing, but at a slower rate)
  • Android continues to demonstrate growth in marketshare; Android devices are continuing to be less costly by a significant margin than Apple devices.
  • Overall, there are opportunities for businesses that innovate / increase efficiency / lower prices / create jobs
  • Advertising continues to demonstrate strong growth; advertising efficacy still has a ways to go (internet advertising is effective and can be even more so)
  • Internet as distribution channel continues to grow in use and importance
  •  Brand recognition is increasingly important
  • Visual communication channel usage is increasing – Generation Z relies more on communicating with images than with text
  • Messaging is becoming a core communication channel for business interactions in addition to social interactions
  • Voice on mobile rapidly rising as important user interface – lots of activity around this
  • Data as platform – important!

So, what kind of take-aways might be most useful to consider in the library context? Some top-of-head thoughts:

  • In the larger context of the Internet, Libraries need to be more aggressive in marketing their brand and brand value. We are, by nature, fairly passive, especially compared to our commercial competition, and a failure to better leverage the opportunity for brand exposure leaves the door open to commercial competitors.
  • Integration of library services and content through messaging channels will become more important, especially with younger users. (Integration may actually be too weak a term; understanding how to use messaging inherently within the digital lifestyles of our users is critical)
  • Voice – are any libraries doing anything with voice? Integration with Amazon’s Alexa voice search? How do we fit into the voice as platform paradigm?

One parting thought, that I’ll try to tease out in a follow-up post: Libraries need to look very seriously at the importance of personalized, customized curation of collections for users, something that might actually be antithetical to the way we currently approach collection development. Think Apple Music, but for books, articles, and other content provided by libraries. It feels like we are doing this in slices and pieces, but that we have not yet established a unifying platform that integrates with the larger Internet ecosystem.

Max Planck Digital Library: HTTPS enabled for MPG/SFX

planet code4lib - Mon, 2016-06-27 16:54

The MPG/SFX link resolver is now alternatively accessible via the https protocol. The secure base URL of the productive MPG/SFX instance is: https://sfx.mpg.de/sfx_local.

HTTPS support enables secure third-party sites to load or to embed content from MPG/SFX without causing mixed content errors. Please feel free to update your applications or your links to the MPG/SFX server.

SearchHub: Search Hub 2.0 Public Beta

planet code4lib - Mon, 2016-06-27 13:52
Introduction

For quite some time now, Lucidworks has been hosting a community site named Search Hub (aka LucidFind) that consists of a searchable archive of a number of Apache Software Foundation mailing lists, source code repositories and wiki pages, as well as related content that we’ve deemed beneficial. Previously, we’ve had three goals in building and maintaining such a site:

  1. Provide the community a focused resource for finding answers to questions on our favorite projects like Apache Solr and Lucene
  2. Dogfood our product
  3. Associate the Lucidworks brand with the projects we support

As we’ve grown and evolved, the site has done a good job on #1 and #3. However, we have fallen a bit behind on goal number two, since the site, 22 months after the launch of Lucidworks Fusion, was still running on our legacy product, Lucidworks Search. While it’s easy to fall back on the “if it ain’t broke, don’t fix it” mentality (the site has had almost no down time all these years, even while running on very basic hardware and with a very basic setup and serving decent, albeit not huge, query volume), it has always bothered me that we haven’t put more effort into porting Search Hub to run on Fusion. This post intends to remedy that situation, while also significantly expanding our set of goals and the number of projects we cover. Those goals are, including the original ones from above:

  1. Show others how it’s done by open sourcing the code base under an Apache license. (Note: you will need Lucidworks Fusion to run it.)
  2. Fully instrument the User Interface with the Snowplow Javascript Tracker to capture user interaction data.
  3. Leverage Fusion’s built in Apache Spark capabilities for offline, background enhancement of the index to improve relevance and our analytics.
  4. Deploy machine learning experiments.
  5. Build on Lucidworks View.

While we aren’t done yet, we are far enough along that I am happy to announce we are making Search Hub 2.0 available as a public beta. If you want to cut to the chase and try it out, follow the links I just provided, if you want all the gory details on how it all works, keep reading.

Rebooting Search Hub

When Jake Mannix joined Lucidworks back in January, we knew we wanted to significantly expand the machine learning and recommendation story here at Lucidworks, but we kept coming back to the fundamental problem that plagues all such approaches: where to get real data and real user feedback. Sure, we work with customers all the time on these types of problems, but that only goes so far in enabling our team to control it’s own destiny. After all, we can’t run experiments on the customer’s website (at least not in any reasonable time frame for our goals), nor can we always get the data that we want due to compliance and security reasons. As we looked around, we kept coming back to, and finally settled on, rebooting Search Hub to run on Fusion, but this time with the goals outlined above to strive for.

We also have been working with the academic IR research community on ways to share our user data, while hoping to avoid another AOL query log fiasco. It is too early too announce anything on that front just yet, but I am quite excited about what we have in store and hope we can do our part at Lucidworks to help close the “data gap” in academic research by providing the community with a significantly large corpus with real user interaction data. If you are an academic researcher interested in helping out and are up on differential privacy and other data sharing techniques, please contact me via the Lucidworks Contact Us form and mention this blog post and my name. Otherwise, stay tuned.

In the remainder of this post, I’ll cover what’s in Search Hub, highlight how it leverages key Fusion features and finish up with where we are headed next.

Basics

The Search Hub beta currently consists of:

  • 26 ASF projects (e.g. Lucene, Solr, Hadoop, Mahout) and all public Lucidworks content, including our website, knowledge base and documentation, with more content added automatically via scheduled crawls.
  • 90+ datasources (soon to be 120+) spanning email, Github, Websites and Wikis, each with a corresponding schedule defining its update rate.
  • Nine index pipelines and two query pipelines for processing incoming content and requests.
  • Five different signal capture mechanisms in the UI, including: Page View, Page Ping (heartbeat), Searches, Document clicks, Typeahead search clicks. See below for the gory details on signals.

The application consists of:

If you wish to run Search Hub, see the README, as I am not going to cover that in this blog post.

Next Generation Relevance

While other search engines are touting their recent adoption of search ranking functions (BM25) that have been around for 20+ years, Fusion is focused on bringing next generation relevance to the forefront. Don’t get me wrong, BM25 is a good core ranking algorithm and it should be the default in Lucene, but if that’s your answer to better relevance in the age of Google, Amazon and Facebook, then good luck to you. (As an aside, I once sat next to Illya Segalovich from Yandex at a SIGIR workshop where he claimed that at Yandex, BM25 only got relevance about ~52% of the way to the answer. Others in the room disputed this saying their experience was more like ~60-70%. In either case, its got a ways to go.)

If BM25 (and other core similarity approaches) only get you 70% (at best) of the way, where does the rest come from? We like to define Next Generation Relevance as being founded on three key ideas (which Internet search vendors have been deploying for many years now), which I like to call the “3 C’s”:

  1. Content — This is where BM25 comes in, as well as things like how you index your content, what fields you search, editorial rules, language analysis and update frequency. In other words, the stuff Lucene and Solr have been doing for a long time now. If you were building a house, this would be the basement and first floor.
  2. Collaboration — What signals can you capture about how users and other systems interact with your content? Clicks are the beginning, not the end of this story. Extending the house analogy, this is the second floor.
  3. Context — Who are you? Where are you? What are you doing right now? What have you done in the past? What roles do you have in the system? A user in Minnesota searching for “shovel” in December is almost always looking for something different than a user in California in July with the same query. Again, with the house analogy: this is the attic and roof.

In Search Hub, we’ve done a lot of work on the content already, but it’s the latter two we are most keen to showcase in the coming months, as they highlight how we can create a virtuous cycle between our users and our data by leveraging user feedback and machine learning to learn relevance. To achieve that goal, we’ve hooked in a number of signal capture mechanisms into our UI, all of which you can see in the code. (See snowplow.js, SnowplowService.js and their usages in places like here.)
These captured signals include:

  1. Page visits.
  2. Time on page (approximated by the page ping heartbeat in Snowplow).
  3. Queries executed, including the capture of all documents and facets displayed.
  4. What documents were clicked on, including unique query id, doc id, position in the SERP, facets chosen, and score.
  5. Typeahead click information, including what characters were typed, the suggestions offered and which suggestion
    was chosen.

With each of these signals, Snowplow sends a myriad of information, including things like User IDs, Session IDs, browser details and timing data. All of these signals are captured in Fusion. Over the coming weeks and months, as we gather enough signal data, we will be rolling out a number of new features highlighting how to use this data for better relevance, as well as other capabilities like recommendations.

Getting Started with Spark on Search Hub

The core of Fusion consists of two key open source technologies: Apache Solr and Apache Spark. If you know Lucidworks, then you already likely know Solr. Spark, however, is something that we’ve added to our stack in Fusion 2.0 and it opens up a host of possibilities that were previously something our customers had to do outside of Fusion, in what was almost always a significantly more complex application. At it’s core, Spark is a scalable, distributed compute engine. It ships with machine learning and graph analytics libraries out of the box. We’ve been using Spark for a number of releases now to do background, large scale processing of things like logs and system metrics. As of Fusion 2.3, we have been exposing Spark (and Spark-shell) to our users. This means that Fusion users can now write and submit their own Spark jobs as well as explore our Spark Solr integration on the command line simply by typing $FUSION_HOME/bin/spark-shell. This includes the ability to take advantage of all Lucene analyzers in Spark, which Steve Rowe covered in this blog post.

To highlight these fresh new capabilities, we’ve put together examples of doing tokenization, clustering using the venerable k-Means algorithm, word2vec and Random Forest-based classification (or as we like to call it, 20 newsgroups on steroids.)

All of these demos are showcased in the SparkShellHelpers.scala file. As the name implies, this file contains commands that can be cut and pasted into the Fusion spark shell (bin/spark-shell). I’m going to save the details of running this to a future post, as there are some very interesting data engineering discussions that fall out of working with this data set in this manner.

Contributing

Our long term intent as we move out of beta is to support all Apache projects. Currently, the project specifications are located in the project_config folder. If you would like your project supported, please issue a Pull Request and we will take a look and try to schedule it. If you would like to see some other feature supported, we are open to suggestions. Please open an issue or a pull request and we will consider it.

If you’re project is already supported and you would like to add support for it similar to what is on Lucene’s home page, add a search box that submits to http://searchhub.lucidworks.com/?p:PROJECT_NAME, passing in
your project name (not label) for PROJECT_NAME, as specified in the project_config. For example, for Hadoop, it would be http://searchhub.lucidworks.com/?p:hadoop.

Next Steps

In the coming months, we will be rolling out:

  1. Word2Vec for query and index time synonym expansion. See the Github Issue for the details.
  2. Classification of content to indicate what mailing list we think the message belongs to, as opposed to what mailing list it was actually sent to. Think of it as a “Did you mean to send this to this list?” classifier.
  3. User registration and personalized recommendations, with alerting. For a preview, check out our webinar on June 30th.
  4. Content and collaborative filtering recommendation.
  5. Community analytics, powered by Spark. Find out who in the community you should be listening to for answers!
  6. User Interface improvements.

If you would like to participate in any of these, we welcome Pull Requests on the Github project, otherwise please reach out to us.

Resources

The post Search Hub 2.0 Public Beta appeared first on Lucidworks.com.

Islandora: Islandora CLAW is moving to Drupal 8

planet code4lib - Mon, 2016-06-27 13:35

The initial phases of Islandora CLAW development worked with Drupal 7 as a front-end, but Islandora CLAW has been architected with a pivot to Drupal 8 in mind from its very inception. Drupal 8 has been officially released and development has begun on Drupal 9. Drupal policy will see Drupal 7 become unsupported when Drupal 9 is released, putting it in the same end-of-life territory as Fedora 3. As of this month, Islandora CLAW development has pivoted fully to Drupal 8, ensuring that when the Islandora Community is ready to make the move, there will be a version of Islandora that functions with the latest and best-supported versions of both our front-end and repository layers by pairing Drupal 8 with Fedora 4. This pivot was approved by the Islandora Roadmap Committee, based on a Drupal 8 Prospectus put forth by the CLAW development team.

 

DuraSpace News: Comparing DuraSpace Repository and Cloud Services Options

planet code4lib - Mon, 2016-06-27 00:00

Austin, TX  Are you in the process of researching different types of repository platforms? Or have you lately been trying to understand hosted cloud service options? DuraSpace just made it easier to compare apples to apples and oranges to oranges when it comes to sorting out which DuraSpace-supported repository or cloud service is right for you with two comparison tables. These tables are designed to help you match your use case to a repository or hosted cloud service that meets your needs.

Open Library: Towards better EPUBs at Open Library and the Internet Archive

planet code4lib - Thu, 2016-06-23 21:28

You may have read about our recent downtime. We thought it might be a good opportunity to let you know about some of the other behind the scenes things going on here. We continue to answer email, keep the FAQ updated and improve our metadata. Many of you have written about the quality of some of our EPUBs. As you may know, all of our OCR (optical character recognition) is done automatically without manual corrections and while it’s pretty good, it could be better. Specifically we had a pernicious bug where some books’ formatting led to the first page of chapters not being part of some books’ OCRed EPUB. I personally had this happen to me with a series of books I was reading on Open Library and I know it’s beyond frustrating.

To address this and other scanning quality issues, we’re changing the way EPUBs work. We’ve improved our OCR algorithm and we’re shifting from stored EPUB files to on-the-fly generation. This means that further developments and improvements in our OCR capabilities will be available immediately. This is good news and has the side benefit of radically decreasing our EPUB storage needs. It also means that we have to

  • remove all of our old EPUBs (approximately eight million items for EPUBs generated by the Archive)
  • put the new on-the-fly EPUB generation in place (now active)
  • do some testing to make sure it’s working as expected (in process)

We hope that this addresses some of the EPUB errors people have been finding. Please continue to give us feedback on how this is working for you. Coming soon: improvements to Open Library’s search features!

Jonathan Rochkind: How to see if current version of a gem is greater than X

planet code4lib - Thu, 2016-06-23 19:56

I sometimes need to this, and always forget how. I want to see the currently loaded version of a current gem, and see if it’s greater than a certain version X.

Mainly because I’ve monkey-patched that gem, and want to either automatically stop monkey patching it if a future version is installed, or more likely output a warning message “Hey, you probably don’t need to monkey patch this anymore.”

I usually forget the right rubygems API, so I’m leaving this partially as a note to myself.

Here’s how you do it.

# If some_gem_name is at 2.0 or higher, warn that this patch may # not be needed. Here's a URL to the PR we're back-porting: <URL> if Gem.loaded_specs["some_gem_name"].version >= Gem::Version.new('2.0') msg = " Please check and make sure this patch is still needed\ at #{__FILE__}:#{__LINE__}\n\n" $stderr.puts msg Rails.logger.warn msg end

Whenever I do this, I always include the URL to the github PR that implements the fix we’re monkey-patch back-porting, in a comment right by here.

The `$stderr.puts` is there to make sure the warning shows up in the console when running tests.

Unfortunately:

Gem::Version.new("1.4.0.rc1") >= Gem::Version.new("1.4") # => false

I really want the warning to trigger if I’m using a pre-release too. Hmm.

Aha! Perusing the docs, this seems like it’ll work:

if Gem.loaded_specs["some_gem_name"].version.release >= Gem::Version.new('2.0')

`Gem::Version#release` trims off the prerelease tags.


Filed under: General

DPLA: Historypin wins Knight News Challenge award for “Our Story” project in partnership with DPLA

planet code4lib - Thu, 2016-06-23 13:00
Historypin wins Knight News Challenge award to gather, preserve, and measure the impact of public library-led history, storytelling, and local cultural heritage in rural US communities in partnership with Digital Public Library of America

BOSTON & SAN FRANCISCO —Historypin announced today that they have been awarded $222,000 from the John S. and James L. Knight Foundation as part of its Knight News Challenge on Libraries, an open call for ideas to help libraries serve 21st century information needs. Selected from more than 615 submissions, Historypin’s “Our Story” project, a partnership with the Digital Public Library of America (DPLA), will collaborate with more than a dozen rural libraries in New Mexico, North Carolina and Louisiana to host lively events to gather and preserve community memory, and to measure the impact of these events on local communities.

“Local historical collections are some of the most viewed content in DPLA, and express the deep interest in our shared community history,” according to Emily Gore, Director for Content at DPLA. “Making cultural heritage collections from rural communities accessible to the world is extremely important to us, and this project will help us further share this rich history and the diverse stories to be found.”

“This award gives us the ability to work with small libraries to provide a toolkit–a physical box with posters, materials and guidance–to make it easy for librarians and volunteers to engage their community in memory sharing events,” said Jon Voss, Strategic Partnerships Director for Historypin. “We know through research that getting people across generations and cultures to sit together and share experiences strengthens communities, and this project will help local libraries to better measure their social impact.”

Led by national partners Historypin and DPLA, together with state and local library networks, Our Story aims to expand the national network and projects of thousands of cultural heritage collaborations that both DPLA and Historypin have established and increase the capabilities of small, rural libraries. Participating libraries in Our Story will be supplied with kits and training to guide them through a number of steps, including recruiting staff and volunteers for the project, planning for digitization and preservation, running community events and collecting stories, and measuring engagement and impact, among other important steps. The library kits and training will be based on four key areas — training, programming, preservation, and evaluation — and will pull in methodology and curriculum developed by both DPLA and Historypin in their work with cultural heritage partners throughout the US and around the world.

“The project will help promote civic engagement, while providing libraries with meaningful data, so they can better understand their impact on communities and meet new information needs,” said John Bracken Knight Foundation vice president for media innovation.

The Knight News Challenge, an open call for ideas launched in late February 2014, asked applicants to answer the question, “How might libraries serve 21st century information needs?” Our Story aims to advance the library field in three key areas: measuring the social impact of public libraries, strengthening a national network of digital preservation and content discovery, and demonstrating the potential of open library data. The outputs of the project will be published and openly licensed for reuse in other rural libraries worldwide.

To learn more about DPLA and Historypin’s Our Story project, visit our News Challenge application page: https://www.newschallenge.org/challenge/how-might-libraries-serve-21st-century-information-needs/refinement/our-story-content-collections-and-impact-in-rural-america


About DPLA

The Digital Public Library of America (https://dp.la) strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science. Since launching in April 2013, it has aggregated over 13 million items from 2,000 institutions. The DPLA is a registered 501(c)(3) non-profit.

About Historypin

Historypin.org is a global non-profit project that builds community through local history. Over 3,000 cultural heritage organizations and 75,000 individuals have used the site to discover, share and enrich community memory since 2010.


About Knight Foundation

Knight Foundation supports transformational ideas that promote quality journalism, advance media innovation, engage communities and foster the arts. The foundation believes that democracy thrives when people and communities are informed and engaged. For more, visit KnightFoundation.org.

DuraSpace News: Are You Interested in Using Fedora, Hydra or Islandora?

planet code4lib - Thu, 2016-06-23 00:00

Oxford, England  Neil Jeffries, Oxford University, and Tom Cramer, Stanford University, from the Fedora team will hold an informal gathering joined by other Fedora community members, at The King's Arms Pub (just next to Wadham College and the Weston Library in Oxford, England) on July 5 from 5-7 PM, prior to the Jisc and CNI Conference welcome reception.

DuraSpace News: AVAILABLE: New Edition of the Digital Preservation Handbook

planet code4lib - Thu, 2016-06-23 00:00

From Neil Beagrie, Charles Beagrie Limited, Digital Preservation Coalition

Glasgow, Scotland  A new edition of the Digital Preservation Handbook was officially launched at the Guildhall in York yesterday, comprehensively updating the original version first published in 2001: http://handbook.dpconline.org/

OCLC Dev Network: Upcoming Changes to WMS Acquisitions API

planet code4lib - Wed, 2016-06-22 21:00

The WMS Acquisitions API is undergoing backwards incompatible changes in the upcoming July install tentatively scheduled for 7/24/2016.

Brown University Library Digital Technologies Projects: ORCID: Unique IDs for Brown Researchers

planet code4lib - Wed, 2016-06-22 17:42

The Library is coordinating an effort to introduce ORCID identifiers to the campus. ORCID (orcid.org) is an open, non-profit initiative founded by academic institutions, professional bodies, funding agencies, and publishers to resolve authorship confusion in scholarly work. The ORCID repository of unique scholar identification numbers aims to reliably identify and link scholars in all disciplines with their work, analogous to the way ISBN and DOI identify books and articles.

Brown is an institutional member of ORCID, which allows the University to create ORCID records on behalf of faculty and to integrate ORCID identifiers into the Brown Identity Management System, Researchers@Brown profiles, grant application processes, and other systems that facilitate identification of faculty and their works.

Please go to https://library.brown.edu/orcid to obtain an ORCID identifier OR, if you already have an ORCID, to link it to your Brown identity.

Please contact researchers@brown.edu if you have questions or feedback.

Brown University Library Digital Technologies Projects: ORCID: Unique IDs for Brown Researchers

planet code4lib - Wed, 2016-06-22 17:42

The Library is coordinating an effort to introduce ORCID identifiers to the campus. ORCID (orcid.org) is an open, non-profit initiative founded by academic institutions, professional bodies, funding agencies, and publishers to resolve authorship confusion in scholarly work. The ORCID repository of unique scholar identification numbers aims to reliably identify and link scholars in all disciplines with their work, analogous to the way ISBN and DOI identify books and articles.

Brown is an institutional member of ORCID, which allows the University to create ORCID records on behalf of faculty and to integrate ORCID identifiers into the Brown Identity Management System, Researchers@Brown profiles, grant application processes, and other systems that facilitate identification of faculty and their works.

Please go to https://library.brown.edu/orcid to obtain an ORCID identifier OR, if you already have an ORCID, to link it to your Brown identity.

Please contact researchers@brown.edu if you have questions or feedback.

Equinox Software: See You In Orlando!

planet code4lib - Wed, 2016-06-22 16:30

We’re packing up and preparing to head to Orlando for ALA Annual this week!  Equinox will be in Booth 1175. Throughout the conference, you’ll find Mike, Grace, Mary, Galen, Shae, and Dale in the booth ready to answer your questions.  We’d love for you to come visit and do a little crafting with us.  Crafting?  Yes–CRAFTING.  We’ll have some supplies ready for you to make a little DIY swag.  Quantities are limited, so make sure to see us early.

As usual, the Equinox team will be available in the booth to discuss Evergreen, Koha, and FulfILLment.  We’ll also be attending a few programs and, of course, the Evergreen Meet-Up.  Directly following the Evergreen Meet-Up, Equinox is hosting a Happy Hour for the Evergreen aficionados in attendance.  Come chat with us at the Equinox booth to get more information!

The Equinox team is so proud of the proactive approach ALA has taken toward the senseless tragedy in Orlando recently.  We will be participating in some of the relief events.  We will be attending the Pulse Victim’s Memorial on Saturday to pay our respects and you’ll also find some of the team donating blood throughout the weekend.

We’re looking forward to the conference but most of all, we’re looking forward to seeing YOU.  Stop by and say hello at Booth 1175!

Equinox Software: See You In Orlando!

planet code4lib - Wed, 2016-06-22 16:30

We’re packing up and preparing to head to Orlando for ALA Annual this week!  Equinox will be in Booth 1175. Throughout the conference, you’ll find Mike, Grace, Mary, Galen, Shae, and Dale in the booth ready to answer your questions.  We’d love for you to come visit and do a little crafting with us.  Crafting?  Yes–CRAFTING.  We’ll have some supplies ready for you to make a little DIY swag.  Quantities are limited, so make sure to see us early.

As usual, the Equinox team will be available in the booth to discuss Evergreen, Koha, and FulfILLment.  We’ll also be attending a few programs and, of course, the Evergreen Meet-Up.  Directly following the Evergreen Meet-Up, Equinox is hosting a Happy Hour for the Evergreen aficionados in attendance.  Come chat with us at the Equinox booth to get more information!

The Equinox team is so proud of the proactive approach ALA has taken toward the senseless tragedy in Orlando recently.  We will be participating in some of the relief events.  We will be attending the Pulse Victim’s Memorial on Saturday to pay our respects and you’ll also find some of the team donating blood throughout the weekend.

We’re looking forward to the conference but most of all, we’re looking forward to seeing YOU.  Stop by and say hello at Booth 1175!

Equinox Software: See You In Orlando!

planet code4lib - Wed, 2016-06-22 16:30

We’re packing up and preparing to head to Orlando for ALA Annual this week!  Equinox will be in Booth 1175. Throughout the conference, you’ll find Mike, Grace, Mary, Galen, Shae, and Dale in the booth ready to answer your questions.  We’d love for you to come visit and do a little crafting with us.  Crafting?  Yes–CRAFTING.  We’ll have some supplies ready for you to make a little DIY swag.  Quantities are limited, so make sure to see us early.

As usual, the Equinox team will be available in the booth to discuss Evergreen, Koha, and FulfILLment.  We’ll also be attending a few programs and, of course, the Evergreen Meet-Up.  Directly following the Evergreen Meet-Up, Equinox is hosting a Happy Hour for the Evergreen aficionados in attendance.  Come chat with us at the Equinox booth to get more information!

The Equinox team is so proud of the proactive approach ALA has taken toward the senseless tragedy in Orlando recently.  We will be participating in some of the relief events.  We will be attending the Pulse Victim’s Memorial on Saturday to pay our respects and you’ll also find some of the team donating blood throughout the weekend.

We’re looking forward to the conference but most of all, we’re looking forward to seeing YOU.  Stop by and say hello at Booth 1175!

Library of Congress: The Signal: Library of Congress Advisory Team Kicks off New Digitization Effort at Eckerd College

planet code4lib - Wed, 2016-06-22 15:56

Participants in the Eckerd Digitization Advisory meeting include (l-r) Nancy Schuler, Lisa Johnston, Alexis Ramsey-Tobienne, Alyssa Koclanes, Mary Molinaro (Digital Preservation Network) George Coulbourne (Library of Congress), David Gliem, Arthur Skinner, Justine Sanford, Emily Ayers-Rideout, Nicole Finzer (Northwestern University), Kristen Regina (Philadelphia Museum of Art), Anna Ruth, and Brittney Sherley.

This is a guest post by Eckerd College faculty David Gliem, associate professor of Art History, and Nancy Schuler, librarian and asistant professor of Electronic Resources, Collection Development and Instructional Services.

On June 3rd, a meeting at Eckerd College in St. Petersburg, Florida, brought key experts and College departments together to begin plans for the digitization of the College’s art collection. George Coulbourne of the Library of Congress assembled a team of advisers that included DPOE trainers and NDSR program providers from the Library of Congress, Northwestern University, the Digital Preservation Network, the Philadelphia Museum of Art and Yale University.

Advisers provided guidance on project elements including institutional repositories, collection design, metadata and cataloging standards, funding and partnership opportunities and digitization strategies. Suggestions will be used to design a digitization and preservation strategy that could be used as a model for small academic institutions.

Eckerd College is an innovative undergraduate liberal arts institution known for its small classes and values-oriented curriculum that stresses personal and social responsibility, cross-cultural understanding and respect for diversity in a global society. A charter member of Loren Pope’s 40 Colleges That Change Lives, Eckerd has a unique approach to mentoring that reflects its commitment to students. As a tuition-dependent institution of 1,770 students, Eckerd is seeking ways to design the project to be cost-effective, while also ensuring long-term sustainability.

The initial goal of the project is to digitize the College’s large collection of more than 3000 prints, paintings, drawings and sculptures made by the founding faculty in the visual arts: Robert O. Hodgell (1922-2000), Jim Crane (1927-2015) and Margaret (Pegg) Rigg (1928-2011). Along with Crane (cartoonist, painter and collage artist) and Rigg (art editor of motive (always spelled with a lowercase “m”) magazine, as well as graphic designer, assemblage artist and calligrapher), Hodgell (printmaker, painter, sculptor, and illustrator) contributed regularly to motive, a progressive monthly magazine published by the Methodist Student Movement.

In print from 1941 to 1972, motive was hailed for its vanguard editorial and artistic vision and for its aggressive stance on civil rights, Vietnam, and gender issues. In 1965 the publication was runner-up to Life for Magazine of the Year and in 1966, Time magazine quipped that among church publications it stood out “like a miniskirt at a church social.” An entire generation of activists was shaped by its vision with Hodgell, Crane and Rigg playing an important role in forming and communicating that vision.

Eckerd’s unique position as a liberal arts college influenced by the tenants of the Presbyterian Church made it possible for these artists to converge and produce art that reflected society and promoted the emergence of activism that shaped the identity of the Methodist church at the time. Preserving these materials and making them available for broader scholarship will provide significant insight into the factors surrounding the development of the Methodist Church as it is today. Implementing the infrastructure to preserve, digitize and house the collection provides additional opportunities to add other College collections to the repository in the future.

The gathering also brought together relevant departments within Eckerd College, including representatives from the Library, Visual Arts and Rhetoric faculty, Information Technology Services, Marketing & Communications, Advancement and the Dean of Faculty. Having these key players in the room provided an opportunity to involve the broader campus community so efforts can begin to ensure the long-term sustainability of the project, while also highlighting key challenges unique to the College as seen by the external board of advisors.

Eckerd will now move forward with grant applications for the project, with hopes to integrate DPOE’s Train-the-trainer and an NDSR program to jump start and sustain the project through implementation. Potential partnerships and training opportunities with area institutions and local groups will be explored, as well as teaching opportunities to educate students about the importance of digital stewardship.

Pages

Subscribe to code4lib aggregator