Code4Lib Journal: Open Journal Systems and Dataverse Integration– Helping Journals to Upgrade Data Publication for Reusable Research
Code4Lib Journal: Collecting and Describing University-Generated Patents in an Institutional Repository: A Case Study from Rice University
Winchester, MA On October 12, 2015 Fedora 4.4.0 was released by the Fedora team. Full release notes are included in this message and are also available on the wiki: https://wiki.duraspace.org/display/FF/Fedora+4.4.0+Release+Notes. This new version furthers several major objectives including:
Last week was another great Stump the Chump session at Lucene/Solr Revolution in Austin. After a nice weekend of playing tourist and eating great BBQ, today I’m back at my computer and happy to announce last weeks winners:
- Barani Bikshandi ($100 Amazon gift certificate)
- Carlos Eduardo Sponchiado (Sponch) ($50 Amazon gift certificate)
- Aditya Varun Chadha ($25 Amazon gift certificate)
I want to thank everyone who participated — either by sending in your questions, or by being there in person to heckle me. But I would especially like to thank the judges and our moderator Cassandra Targett, who had to do all the hard work preparing the questions.
Keep an eye on the Lucidworks YouTube page to see the video once it’s available. And if you can make it to Cambridge, MA next week, make sure to sign up for the October 28th Boston Lucene/Solr MeetUp and hear all about the winning questions, and how I think they stacked up over the past 5 years.
Today I found the following resources and bookmarked them on Delicious.
- Discourse Discourse is the 100% open source discussion platform built for the next decade of the Internet. It works as a mailing list, a discussion forum, and a long-form chat room
Digest powered by RSS Digest
I just got back from Lucene/Solr Revolution 2015 in Austin on a big high. There were a lot of exciting talks at the conference this year, but one thing that was particularly exciting to me was the focus that I saw on search quality (accuracy and relevance), on the problem of inferring user intent from the queries, and of tracking user behavior and using that to improve relevancy and so on. There were also plenty of great talks on technology issues this week that attack the other ‘Q’ problem – we keep pushing the envelope of what is possible with SolrCloud at scale and under load, are indexing data faster and faster with streaming technologies such as Spark and are deploying Solr to more and more interesting domains. Big data integrations with SolrCloud continue to be a hot topic – as they should since search is probably the most (only?) effective answer to dealing with the explosion of digital information. But without quality results, all the technology improvements in speed, scalability, reliability and the like will be of little real value. Quantity and quality are two sides of the same coin. Quantity is more of a technology or engineering problem (authors like myself that tend to “eschew brevity” being a possible exception) and quality is a language and user experience problem. Both are critical to success where “success” is defined by happy users. What was really cool to me was the different ways people are using to solve the same basic problem – what does the user want to find? And, how do we measure how well we are doing?
Our Lucidworks CTO Grant Ingersoll started the ball rolling in his opening keynote address by reminding us of the way that we typically test search applications by using a small set of what he called “pet peeve queries” that attack the quality problem in piecemeal fashion but don’t come near to solving it. We pat ourselves on the back when we go to production and are feeling pretty smug about it until real users start to interact with our system and the tweets and/or tech support calls start pouring in – and not with the sentiments we were expecting. We need better ways of developing and measuring search quality. Yes, the business unit is footing the bill and has certain standards (which tend to be their pet peeve queries as Grant pointed out) so we give them knobs and dials that they can twist to calm their nerves and to get them off our backs, but when the business rules become so pervasive that they start to take over from what the search engine is designed to do, we have another problem. To be clear, there are some situations where we know that the search engine is not going to get it right so we have to do a manual override. We can either go straight a destination (using a technique that we call “Landing Pages” ) or force what we know to be the best answer to the top – so called “Best Bets” which is implemented in Solr using the QueryElevationComponent. However, this is clearly a case where moderation is needed! We should use these tools to tweak our results – i.e. fix the intractable edge cases, not to fix the core problems.
This ad-hoc or subjective way of measuring search quality that Grant was talking about is pervasive. The reason is that quality – unlike quantity – is hard to measure. What do you mean by “best”? And we know from our own experience and from our armchair data science-esque cogitations on this, that what is best for one user may not be best for another and this can in fact change over time for a given user. So quality, relevance is “fuzzy”. But what can we do? We’re engineers not psychics dammit! Paul Nelson, the Chief Scientist at Search Technologies, then proceeded to show us what we can do to measure search quality (precision and recall) in an objective (i.e. scientific!) way. Paul gave a fascinating talk showing the types of graphs that you typically see in a nuts-and-bolts talk that tracked the gradual improvement in accuracy over time during the course of search application development. The magic behind all of this are query logs and predictive analytics. So given that you have this data (even if from your previous search engine app) and want to know if you are making “improvements” or not, Paul and his team at Search Technologies have developed a way to use this information to essentially regression test for search quality – pretty cool huh? Check out Paul’s talk if you didn’t get a chance to see it.
But look, lets face it, getting computers to understand language is a hard problem. But rather than throwing up our hands, in my humble opinion, we are really starting to dig into solving this one! The rubber is hitting the road folks. One of the more gnarly problems in this domain is name recognition. Chris Mack of Basis Technologies gave a very good presentation of how Basis is using their suite of language technologies to help solve this. Name matching is hard because there are many ambiguities and alternate ways of representing names and there are many people that share the same name, etc. etc. etc. Chris’s family name is an example of this problem – is it a truck, a cheeseburger (spelled Mac) or a last name? For those of you out there that are migrating from Fast ESP to Solr (a shoutout here to that company in Redmond Washington for sunsetting enterprise support for Fast ESP – especially on Linux – thanks for all of the sales leads guys! Much appreciated!) – you should know that Basis Technologies (and Search Technologies as well I believe) have a solution for Lemmatization that you can plug into Solr (a more comprehensive way to do stemming). I was actually over at the Basis Tech booth to see about getting a dev copy of their lemmatizer for myself so that we could demonstrate this to potential Fast ESP customers when I met Chris. Besides name recognition, Basis Tech has a lot of other cool things. Their flagship product is Rosette – a world class ontology / rules-based classification engine among other things. Check it out.
Next up on my list was Trey Grainger of CareerBuilder. Trey is leading a team there that is doing some truly outstanding work on user intent recognition and using that to craft more precise queries. When I first saw the title of Trey’s talk “Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine”, I thought that he and his team had scooped me since my own title is very similar – great minds think alike I guess, (certainly true in Trey’s case, a little self-aggrandizement on my part here but hey, its my blog post so cut me some slack!). What they are basically doing is using classification approaches such as machine learning to build a Knowledge Graph in Solr and then using that at query time to determine what the user is asking for and then to craft a query that brings back those things and other closely related things. The “related to” thing is very important especially in the buzz-word salad that characterizes most of our resumes these days. The query rewrite that you can do if you get this right can slice through noise hits like a hot knife through butter.
Trey is also the co-author of Solr in Action with our own Tim Potter – I am already on record about this wonderful book – but it was cool what Trey did – he offered a free signed copy to the person who had the best tweet about his talk. Nifty idea – wish I had thought of it but, oh yeah, I’d have to write a book first – whoever won, don’t just put this book on your shelf when you get home – read it!
Not to be outdone, Simon Hughes of Dice.com, Trey’s competitor in the job search sector gave a very interesting talk about how they are using machine learning techniques such as Latent Semantic Analysis (LSA) and Google’s Word2Vec software to do similar things. They are using Lucene payloads in very interesting ways and building Lucene Similarity implementations to re-rank queries – heavy duty stuff that the nuts-and-bolts guys would appreciate too (the code that Simon talked about is open sourced). The title of the talk was “Implementing Conceptual Search in Solr using LSA and Word2Vec”. The keyword here is “implementing” – as I said earlier in this post, we are implementing this stuff now, not just talking about it as we have been doing for too long in my opinion. Simon also stressed the importance of phrase recognition and I was excited to realize that the techniques that Dice is using can feed into some of my own work, specifically to build autophrasing dictionaries that can then be ingested by the AutoPhraseTokenFilter. In the audience with me were Chris Morley of Wayfair.com and Koorosh Vakhshoori of Synopsys.com who have made some improvements to my autophrasing code that we hope to submit to Solr and github soon.
Nitin Sharma and Li Ding of BloomReach introduced us to a tool that they are working on called NLP4L – a natural language processing tool for Lucene. In the talk, they emphasized important things like precision and recall and how to use NLP techniques in the context of a Lucene search. It was a very good talk but I was standing too near the door– because getting a seat was hard – and some noisy people in the hallway were making it difficult to hear well. That’s a good problem to have as this talk like the others were very well attended. I’ll follow up with Nitin and Li because what they are doing is very important and I want to understand it better. Domo Arrigato!
Another fascinating talk was by Rama Yannam and Viju Kothuvatiparambil (“Viju”) of Bank of America. I had met Viju earlier in the week as he attended our Solr and Big Data course ably taught by my friend and colleague Scott Shearer. I had been tapped to be a Teaching Assistant for Scott. Cool, a TA, hadn’t done that since Grad School, made me feel younger … Anyway, Rama and Viju gave a really great talk on how they are using open-source natural language processing tools such as UIMA, Open NLP, Jena/SPARQL and others to solve the Q&A problem for users coming to the BofA web site. They are also building/using an Ontology (that’s where Jena and SPARQL come in) which as you may know is a subject near and dear to my heart, as well as NLP techniques like Parts Of Speech (POS) detection.
They have done some interesting customizations on Solr but unfortunately this is proprietary. They were also not allowed to publish this talk by having their slides shared online or the talk recorded. People were talking pictures of the slides with their cell phones (not me, I promise) but were asked not to upload them to Facebook, LinkedIn, Instagram or such. There was also a disclaimer bullet on one of their slides like you see on DVDs – the opinions expressed are the authors own and not necessarily shared by BofA – ta da ta dum – lawyereze drivel for we are not liable for ANYTHING these guys say but they’ll be sorry if they don’t stick to the approved script! So you will have to take my word for it, it was a great talk, but I have to be careful here – I may be on thin ice already with BofA legal and at the end of the day, Bank Of America already has all of my money! That said, I was grateful for this work because it will benefit me personally as a BofA customer even if I can’t see the source code. Their smart search knows the difference between when I need to “check my balance” vs when I need to “order checks”. As they would say in Boston – “Wicked Awesome”! One interesting side note here, Ramman and Viju mentioned that the POS tagger that they are using works really well for full sentences (on which the models were trained) but less well on sentence fragments (noun phrases) – still not too bad though – about 80%. More on this in a bit. But hey Banks – gotta love it – don’t get me started on ATM fees.
Last but not least (hopefully?) – as my boss Grant Ingersoll is fond of saying – was my own talk where I tried to stay competitive with all of this cool stuff. I had to be careful not to call it a Ted talk because this is a patented trademark and I didn’t want to get caught by the “Ted Police”. Notice that I didn’t use all caps to spell my own name here – they registered that so it probably would have been flagged by the Ted autobots. But enough about me. First I introduced my own pet peeve – why we should think of precision and recall before we worry about relevance tuning because technically speaking that is exactly what the Lucene engine does. If we don’t get precision and recall right we have created a garbage in – garbage out problem for the ranking engine. I then talked about autophrasing a bit, bringing out my New York – Big Apple demo yet again. I admitted that this is a toy problem but it does show that you can absolutely nail the phrase recognition and synonym problem which brings precision and recall to 100%. Although this is not a real world problem, I have gotten feedback that autophrasing is currently solving production problems, which is why Chris and Koorosh (mentioned above) needed to improve the code over my initial hack, for their respective dot-coms.
The focus of my talk then shifted to the work I have been doing on Query Autofiltering where you get the noun phrases from the Lucene index itself courtesy of the Field Cache (and yes Hoss, uh Chump, it works great, is less filling than some other NLP techniques – and there is a JIRA: SOLR-7539, take a look). This is more useful in a structured data situation where you have string fields with noun phrases in them. Autophrasing is appropriate for Solr text fields (i.e. tokenized / analyzed fields) so the techniques are entirely complementary. I’m not going to bore you with the details here since I have already written three blog posts on this but I will tell you that the improvements I have made recently will impell me to write a fourth installment – (hey, maybe I can get a movie deal like the guy who wrote The Martian which started out as a blog … naaaah, his was techy but mine is way too techy and it doesn’t have any NASA tie ins … )
Anyway, what I am doing now is adding verb/adjective resolution to the mix. The Query Autofiltering stuff is starting to resemble real NLP now so I am calling it NLP-Lite. “Pseudo NLP”, “Quasi-NLP” and “query time NLP” are also contenders. I tried to do a demo on this (which was partially successful) using a Music Ontology I am developing where I could get the questions “Who’s in The Who” and “Beatles songs covered by Joe Cocker” right, but Murphy was heavily on my case so I had to move on because the “time’s up” enforcers were looming and I had a plane to catch. I should say that the techniques that I was talking about do not replace classical NLP – rather we (collectively speaking) are using classic NLP to build knowledge bases that we can use on the query side with techniques such as query autofiltering. That’s very important and I have said this repeatedly – the more tools we have, the better chance we have of finding the right one for a given situation. POS tagging works well on full sentences and less well on sentence fragments, where the Query Autofilter excels. So its “front-end NLP” – you use classic NLP techniques to mine the data at index time and to build your knowledge base, and you use this type of technique to harvest the gold at query time. Again, the “knowledge base” as Trey’s talk and my own stressed can be the Solr/Lucene index itself!
Finally, I talked about some soon-to-be-published work I am doing on auto suggest. I was looking for a way to generate more precise typeahead queries that span multiple fields which the Query Autofilter could then process. I discovered a way to use Solr facets, especially pivot facets to generate multi-field phrases and regular facets to pull context so that I could build a dedicated suggester collection derived from a content collection. (whew!!) The pivot facets allow me to turn a pattern like “genre,musician_type” into “Jazz Drummers”, “Hard Rock Guitarists”, “Classical Pianists”, “Country Singers” and so on. The facets enable me to then grab related information to the subject so if I do a pivot pattern like “name,composition_type” to generate suggestions like “Bob Dylan Songs”, I can pull back other related things to Bob Dylan such as “The Band” and “Folk Rock” that I can then use to create user context for the suggester. Now, if you are searching for Bob Dylan songs, the suggester can start to boost them so that song titles that would normally be down the list will come to the top.
This matches a spooky thing that Google was doing while I was building the music ontology – after awhile, it would start to suggest long song titles with just two words entered if my “agenda” for that moment was consistent. So if I am searching for Beatles songs for example, after a few searches, typing “ba” brings back (in the typeahead) “Baby’s In Black and “Baby I’m a Rich Man” above the myriad of songs that start with Baby as well as everything else in their typeahead dictionary starting with “ba”. WOW – that’s cool – and we should be able to do that too! (i.e., be more “Google-esque” as one of my clients put it in their Business Requirements Document) I call it “On-The-Fly Predictive Analytics” – as we say in the search quality biz – its ALL about context!
I say “last but not least” above, because for me, that was the last session that I attended due to my impending flight reservation. There were a few talks that I missed for various other reasons (there was a scheduling conflict, my company made me do some pre-sales work, I was wool gathering or schmoozing/networking, etc) where the authors seem to be on the same quest for search quality. Talks like “Nice Docs Finish First” by Fiona Condon at Etsy, “Where Search Meets Machine Learning” by folks at Verizon, “When You Have To Be Relevant” by Tom Burgmans of Wolters-Kluwer and “Learning to Rank” by those awesome Solr guys at Bloomberg – who have got both ‘Qs’ working big time!
Since I wasn’t able to attend these talks and don’t want to write about them from a position of ignorance, I invite the authors (or someone who feels inspired to talk about it) to add comments to this post so we can get a post-meeting discussion going here. Also, any author that I did mention who feels that I botched my reporting of their work should feel free to correct me. And finally, anybody who submitted on the “Tweet about Trey’s Talk and Win an Autographed Book” contest is encouraged to re-tweet – uh post, your gems here.
So, thanks for all the great work on this very important search topic. Maybe next year we can get Watson to give a talk so we can see what the computers think about all of this. After all, Watson has read all of Bob Dylan’s song lyrics so he (she?) must be a pretty cool dude/gal by now. I wonder what it thinks about “Stuck Inside of Mobile with the Memphis Blues Again”? To paraphrase the song, yes Mama, this is really the end. So, until we meet again at next year’s Revolution, Happy searching!
The post Focusing on Search Quality at Lucene/Solr Revolution 2015 appeared first on Lucidworks.com.
Last week I was on an NDFNZ wikipedia panel with Courtney Johnston, Sara Barham and Mike Dickison. Having reflected a little and watched the youtube at https://www.youtube.com/watch?v=3b8X2SQO1UA I've got some comments to make (or to repeat, as the case may be).
Many people, including apparently including Courtney, seemed to get the most enjoyment out of writing the ‘body text’ of articles. This is fine, because the body text (the core textual content of the article) is the core of what the encyclopaedia is about. If you can’t be bothered with wikiprojects, categories, infoboxes, common names and wikidata, you’re not alone and there’s no reason you need to delve into them to any extent. If you start an article with body text and references that’s fine; other people will to a greater or less extent do that work for you over time. If you’re starting a non-trivial number of similar articles, get yourself a prototype which does most of the stuff for you (I still use https://en.wikipedia.org/wiki/User:Stuartyeates/sandbox/academicbio which I wrote for doing New Zealand women academics). If you need a prototype like this, feel free to ask me.
If you have a list of things (people, public art works, exhibitions) in some machine readable format (Excel, CSV, etc) it’s pretty straightforward to turn them into a table like https://en.wikipedia.org/wiki/Wikipedia:WikiProject_New_Zealand/Requested_articles/Craft#Proposed_artists or https://en.wikipedia.org/wiki/Enjoy_Public_Art_Gallery Send me your data and what kind of direction you want to take it.
If you have a random thing that you think needs a Wikipedia article, add to https://en.wikipedia.org/wiki/Wikipedia:WikiProject_New_Zealand/Requested_articles if you have a hundred things that you think need articles, start a subpage, a la https://en.wikipedia.org/wiki/Wikipedia:WikiProject_New_Zealand/Requested_articles/Craft and https://en.wikipedia.org/wiki/Wikipedia:WikiProject_New_Zealand/Requested_articles/New_Zealand_academic_biographies both completed projects of mine.
Sara mentioned that they were thinking of getting subject matter experts to contribute to relevant wikipedia articles. In theory this is a great idea and some famous subject matter experts contributed to Britannica, so this is well-established ground. However, there have been some recent wikipedia failures particularly in the sciences. People used to ground-breaking writing may have difficulty switching to a genre where no original ideas are permitted and everything needs to be balanced and referenced.
Preparing for the event, I created a list of things the awesome Dowse team could do as follow-ups to they craft artists work, but we never got to that in the session, so I've listed them here:
- [[List of public art in Lower Hutt]] Since public art is out of copyright, someone could spend a couple of weeks taking photos of all the public art and creating a table with clickable thumbnail, name, artist, date, notes and GPS coordinates. Could probably steal some logic from somewhere to make the table convertible to a set of points inside a GPS for a tour.
- Publish from their archives a complete list of every exhibition ever held at the Dowse since founding. Each exhibition is a shout-out to the artists involved and the list can be used to check for potentially missing wikipedia articles.
- Digitise and release photos taken at exhibition openings, capturing the people, fashion and feeling of those era. The hard part of this, of course, is labelling the people.
- Reach out to their broader community to use the Dowse blog to publish community-written obituaries and similar content (i.e. encourage the generation of quality secondary sources).
- Engage with your local artists and politicians by taking pictures at Dowse events, uploading them to commons and adding them to the subjects’ wikipedia articles—have attending a Dowse exhibition opening being the easiest way for locals to get a new wikipedia image.
This past week I was clearing out a bunch of software feature request tickets to prepare for a feature push for our digital library system. We are getting ready to do a redesign of The Portal to Texas History and the UNT Digital Library interfaces.
Buried deep in our ticketing system were some tickets made during the past five years that included notes about future implementations that we could create for the system. One of these notes caught my eye because it had the phrase “since date data is so poor in the system”. At first I had dismissed this phrase and ticket altogether because our ideas related to the feature request had changed, but later that phrase stuck with me a bit.
I began to wonder, “what is the quality of our date data in our digital library” and more specifically “what does the date resolution look like across the UNT Libraries’ Digital Collections”.Getting the Data
The first thing to do was to grab all of the date data for each record in the system. At the time of writing there were 1,310,415 items in the UNT Libraries Digital Collections. I decided the easiest way to grab the date information for these records was to pull it from our Solr index.
I constructed a solr query that would return the value of our dc_date field, the ark identifier we use to uniquely identify each item in the repository, and finally which of the systems (Portal, Digital Library, or Gateway) a record belongs to.
I pulled these as JSON files with 10,000 records per request, did 132 requests and I was in business.
I wrote a short Python little script that takes those Solr responses and converts them into a tab separated format that looks like this:ark:/67531/metapth2355 1844-01-01 PTH ark:/67531/metapth2356 1845-01-01 PTH ark:/67531/metapth2357 1845-01-01 PTH ark:/67531/metapth2358 1844-01-01 PTH ark:/67531/metapth2359 1844-01-01 PTH ark:/67531/metapth2360 1844 PTH ark:/67531/metapth2361 1845-01-01 PTH ark:/67531/metapth2362 1883-01-01 PTH ark:/67531/metapth2363 1844 PTH ark:/67531/metapth2365 1845 PTH
Next I wrote another Python script that classifies a date into the following categories:
Day, Month, and Year are the three units that I’m really curious about, I identified these with simple regular expressions for yyyy-mm-dd, yyyy-mm, and yyyy respectively. For records that had date strings that weren’t day, month, or year, I checked if the string was an Extended Date Time Format string. If it was valid EDTF I marked it as Other-EDTF, if it wasn’t a valid EDTF and wasn’t a day, month, year I marked it as Unknown. Finally if there wasn’t a date present for a metadata record at all, it is marked as “None”.
One thing to note about the way I’m doing the categories, I am probably missing quite a few values that have day, month or years somewhere in the string by not parsing the EDTF and Unknown strings a little more liberally for days, months and years. This is true but for what I’m trying to accomplish here, I think we will let that slide.What does the data look like?
The first thing for me to do was to see how many of the records had date strings compared to the number of records that do not have date strings present.
Looking at the numbers shows 1,222,750 (93%) of records having date strings and 87,665 (7%) are missing date strings. Just with those numbers I think that we negate the statement that “date data is poor in the system”. But maybe just the presence of dates isn’t what the ticket author meant. So we investigate further.
The next thing I did was to see how many of the dates overall were able to be classified as a day, month, or year. The reasoning for looking at these values is that you can imagine building user interfaces that make use of date values to let users refine their searching activities or browse a collection by date.
This chart shows that the overwhelming majority of objects in our digital library 1,202,625 (92%) had date values that were either day, month, or year and only 107,790 (8%) were classified as “Other”. Now this I think does blow the statement about poor date data quality away.
The last thing I think there is to look at is how each of the categories stack up against each other. Once again, a pie chart.
Here is a table view of the same data.Date Classification Instances Percentage Day 967,257 73.8% Month 43,952 3.4% Year 191,416 14.6% Other-EDTF 15,866 1.2% Unknown 4,259 0.3% None 87,665 6.7%
So looking at this data it is clear that the majority of our digital objects have the resolution at the “day” level with 967,257 records or 73.8% of all records being in the format yyyy-mm-dd. Next year resolution is the second highest occurrence with 191,416 or 14.6%. Finally Month resolution came in with 43,952 records or 3.4%. There were 15,866 records that had valid EDTF values, 4,259 with other date values and finally the 87,665 records that did not contain a data at all.Conclusion
I think that I can safely say that we do in fact have a large amount of date data in our digital libraries. This date data can be parsed easily into day, month and year buckets for use in discovery interfaces, and by doing very basic work with the date strings we are able to account for 92% of all records in the system.
I’d be interested to see how other digital libraries stand on date data to see if we are similar or different as far as this goes. I might hit up my colleagues at the University of Florida because their University of Florida Digital Collections is of similar scale with similar content. If you would like to work to compare your digital libraries’ date data let me know.
Hope you enjoyed my musings here, if you have thoughts, suggestions, or if I missed something in my thoughts, please let me know via Twitter.
Welcome to issue 74 of Ariadne! This is the first issue of the magazine that we have hosted here at Loughborough University, with an editorial team spread over a number of institutions, after we took over the reins (and the software and database) from Bath University back in April. You might have noticed a few changes since the move that I’ll hopefully explain in this editorial. Read more about Editorial: Ariadne: the neverending story.Article type: Issue number: Authors: Organisations: Date published: Mon, 10/19/201574http://www.ariadne.ac.uk/issue74/editorial
Wireless — this term evokes an array of feelings in technologists today. Even though the definition of the term is relatively simple, there are numerous protocols, standards, and methods that have been developed to perform wireless interactions. For example, by now, many of you have heard of the mobile applications, such as Apple Pay or Google Wallet, similarly, you might have a transit pass or badge for your gym or work. With a wave of your device or pass a scanner processes a “contactless transactions”. The tap-and-go experience of these technologies often utilize Near Field Communication, or NFC.
NFC is a set of standards that allows devices to establish radio communication with each other by touching them together or bringing them into close proximity, an effective distance of 4 cm. A direct transmissions of specific information, separate from the open ended Wi-Fi access and seemingly limitless information resources it provides.
NFC tags are used to send a resource, or a specific set of data, directly to a patron’s mobile device to improve their information seeking experience. By utilizing this technology, Libraries have the ability to perform data exchanges with patron mobile devices without scanning a QR-code, or pairing devices (as required by Bluetooth) providing a less complex experience.
There are many useful tasks you can program these tags to perform. One example would be to set a tag to update a patron’s mobile calendar with an event your library is having. These tags have the ability to be programmed with date, time, location, and an alarm information to remind the patron of the event, which is substantially more effective than a QR codes ability to connect a patron with a destination. Another useful method of using this technology would be to program a set of NFC keychains for the library staff to have on hand programmed to allow Wi-Fi access, no more password requests or questions about access, just a simple tap of the NFC keychain. The ability to execute preset instructions, beyond just a URL for the mobile device, differentiates NFC tags from QR codes. Many NFC tag users also find them more appealing visually, because they can be placed into posters or other advertisement materials without visually altering the design.
The use of this technology has been anticipated in libraries for several years now. However, there is a one minor issue with implementing NFC tags, Apple only supports the use of this technology for Apple Pay. Apple devices do not currently support the use of NFC for any other transaction, even though the technology is available on their devices. Hopefully, in the future Apple will make NFC unrestrained on their devices, and this technology and it will become more widely utilized.
This week we focused on information visualization with Niklas Elmqvist from the UMD iSchool. Niklas studies information visualization and human computer interaction. He joined UMD in the last year, after arriving from Purdue University.
For this week we also read Elzen & Wijk (2014), Heer, Bostock, & Ogievetsky (2010) and Heer & Shneiderman (2012). I enjoyed the two Heer articles because of their accessibility (they were written for the more general readership of ACM Queue), but also for their breadth. The 2012 paper in particular does a really nice job of summarizing a large number of visualization techniques by breaking them down into a taxonomy of data/view specification, view manipulation, and analysis process / provenance.
The surprise for me (since I’ve just been a dabbler in dataviz) is that the iterative feedback loop of the analysis/provenance piece is deemed an important part of the visualization itself. Niklas stressed this as he described how Visual Analytics which studies not only how to visualize data, but how the interaction between data processing, data visualization, computer interfaces and the human can enable new forms of reasoning that have previously been impossible, or at least very difficult.
The 2010 article was also very interesting to me because I recognized the name of Mike Bostock, who is a legend in the developer community for having played a part in the creation of Data Driven Documents (D3). D3 is a Web standards compliant data visualization toolkit. I have also used Bostock’s Protovis library, but learned from Niklas that Heer (his PhD advisor) also played a role in the creation of both Protovis and D3, as well as the Flare and Prefuse visualization libraries. It seems like there is a lesson here about persistence, or at least not staying still. Bostock was at the NYTimes until recently, helping bootstrap their data visualization capabilities.
We did spend a little bit of time talking about how essential it is to be able to share visualizations. We talked briefly about Bostock’s D3 publishing framework at bl.ocks.org, which allows GitHub repositories containing data and D3 visualizations to be easily published on the Web. I’ve heard from friends at the NYTimes that Bostock created a very similar in-house system for reporters and editors there.
I left this meeting more excited than I thought I was going to be about the propspects of learning more about data visualization. I hadn’t considered before how much of a HCI and data visualization problem there is lying in the web archiving domain. My immediate interest centers on the appraisal process itself: how do curators and archivists sift through social media to identify salient Web documents to preserve. But also the very act of exploring Web archives is quite under-developed. The Wayback experience of diving into the archive with a known URL and then wandering around in links is the de facto standard for Web archives. But how would search be presented: what are the new new and useful ways to search through time as well as text? It feels like there is a big piece of work that could be done in this area. At any rate, I definitely would like to take Niklas’ class when it is available next.References
Elzen, S. van den, & Wijk, J. J. van. (2014). Multivariate network exploration and presentation: From detail to overview via selections and aggregations. Visualization and Computer Graphics, IEEE Transactions on, 20(12), 2310–2319. Retrieved from http://www.win.tue.nl/~selzen/paper/InfoVis2014.pdf
Heer, J., & Shneiderman, B. (2012). Interactive dynamics for visual analysis. Queue, 10(2), 30. Retrieved from https://queue.acm.org/detail.cfm?id=2146416
Heer, J., Bostock, M., & Ogievetsky, V. (2010). A tour through the visualization zoo. Commun. Acm, 53(6), 59–67. Retrieved from https://queue.acm.org/detail.cfm?id=1805128
Hi there, Michael and Amanda here. We help push libraries forward by teaching teams to write and maintain front-ends that intend to grow, designs that work toward goals, and systems that prepare content to go anywhere — because, well, it’s going to go everywhere.
LibUX provides design and development consultancy for user experience departments and library web teams. From rationalizing the design process, working together as a team, sustainable workflow or content strategies, to writing maintainable and performant code. We are super friendly, super competent, and available for work.Get in touch [contact-form] About this site
A couple years ago, we started a podcast called LibUX — as in library user experience — and discovered we had quite a bit to say. We want to push the #libweb forward and this starts by pushing the conversation forward.
The writeups, the podcasts, and the newsletters are made with love on free time under a generous copyleft license. We — Amanda and Michael — work really hard to meet the following deadlines:
- A new podcast every Monday
- A new Web for Libraries every Wednedsay
- A solid article every other week
We have taught four courses (so far), spoke a ton, wrote a half-dozen articles (real ones), fired-up a whole community around lower-case #libux, made some really good — and one award winning — websites.