In light of the upcoming adoption of HTTP/2, the Evergreen web team decided to survey the space of online information-sharing protocols.
Since a span of 26 years has evidently not proven sufficient to shake all the bugs out of HTTP, we’ve decided to hedge our bets and extend our Internet presence to include support for a protocol that’s been patiently waiting in the wings… just in case.
We are therefore pleased to announce the availability of
We are proud to join the ever-expanding Gophersphere, and hope members of the Evergreen community find this to be a useful, if light-hearted, historical resource.
The IMLS, the US Institute for Museum and Library Services, has announced that it will be funding a $2M proposal to build a turnkey and cloud-ready Hydra solution over the next 2.5 years. DPLA, Stanford & DuraSpace submitted the joint proposal; alignment with the Hydra community, and distributed input on the design, specification and development is structurally built into the grant.
The text of the announcement reads:
“The Digital Public Library of America (DPLA), Stanford University, and DuraSpace will foster a greatly expanded network of open-access, content-hosting “hubs” that will enable discovery and interoperability, as well as the reuse of digital resources by people from this country and around the world. At the core of this transformative network are advanced digital repositories that not only empower local institutions with new asset management capabilities, but also connect their data and collections. Currently, DPLA’s hubs, libraries, archives, and museums more broadly use aging, legacy software that was never intended or designed for use in an interconnected way, or for contemporary web needs. The three partners will engage in a major development of the community-driven open source Hydra project to provide these hubs with a new all-in-one solution, which will also allow countless other institutions to easily join the national digital platform.”
This work provides a wonderful chance to accelerate the convergence of the Hydra community on robust, broadly useful, and common codebase. It also looks likely to rapidly expand the Hydra user base not only in the US but worldwide. Our congratulations to all concerned!
I recommend Bruce Schneier’s new book Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World to everyone.
Schneier, as you probably know, is a security expert. A real one, a good one, and a thoughtful one. He wrote the book on implementing cryptography in software, he design the playing card encryption method used in Neal Stephenson’s Cryptonomicon, he was helped reporters understand the Snowden documents.
This is his post-Snowden book, on everything that’s known about how we’re being monitored every second of our lives, by whom, why this is a very serious problem, and what we can do about it. His three section headings set it out clearly: The World We’re Creating, What’s at Stake, and What to Do About It. In each section he explains things clearly and understandably without requiring any major technical knowledge. Often there isn’t time to get into technical details, anyway: we are monitored so minutely online, and what the NSA and other spy agencies do is so staggeringly intrusive, that the briefest description of one technique or system is all that’s needed to get the point across before moving on to another.
Data and Goliath is the book I’ve been waiting for, the one that lays it all out and brings all of the recent discoveries and revelations together. It has much that is new, such as discussions of why privacy is necessary so that people have the freedom to break some laws that ultimately lead to societal change (homosexuality is one of his examples), and good arguments against the “but I have nothing to hide, I don’t care” idiocy of people who ignorantly give up all their privacy. (Glenn Greenwald’s No Place to Hide: Edward Snowden, the NSA and the U.S. Surveillance State has more good arguments, such as: you may not care, but millions of people around the world trying to make things better do, and they’re in danger of being arrested and beaten just for speaking their mind).
He ends with what we can do, from the small scale (paying cash, browser privacy extensions, leaving cell phones at home) to the large (major political action). Here’s the final paragraph of the penultimate chapter:
There is strength in numbers, and if the public outcry grows, governments and corporations will be forced to respond. We are trying to prevent an authoritarian government like the one portrayed in Orwell’s Nineteen Eighty-Four, and a corporate-ruled state like the ones portrayed in countless dystopian cyberpunk science fiction novels. We are nowhere near either of those endpoints, but the train is moving in both those directions, and we need to apply the brakes.
Schneier’s page about the book has lots of links to excerpts and reviews. Have a look, then get the book. You should read it.
Here in Canada, just this week we learn that the Communications Security Establishment continues to spy on people worldwide and see the Conservatives push Bill C-51. Schneier’s book helps here as everywhere else: what is happening, why it matters, and what to do.
The MarcEdit 101 Webinar Series were created over the course of multiple months for the CARLI (http://www.carli.illinois.edu/) consortium in Spring 2015. In late March 2015, CARLI reached out to me and requested that these webinars be made available to the larger MarcEdit community, so if you find these webinars useful, please reach out and thank the folks at CARLI.
Couple of notes, these webinars are being made available as is, save for the following modifications:
- Attendee names have been anonymized. While I’m certain most attendees would have no problem with their names showing up in these webinar lists, the original intended audience was locally scoped to CARLI and it’s members. Masking attendees was done primarily because of this change of scope.
- The Q/A at the end of the sessions has generally been removed from the webinars. Again, these are localized webinars and questions asked during the webinars tend to be within the scope of this consortia.
I’ll be making these video available over the next couple of months. Again, if you find these webinars useful, please make sure you let the folks at CARLI know.
Series URL: http://marcedit.reeset.net/marcedit-101-workshop
We’ve seen big announcements recently about unlimited cloud storage offerings for a flat monthly or fee. Dropbox offers it for subscribers to its Business plan. Similarly, Google has unlimited storage for Google Apps for Business customers. In both cases, though, you have to be part of a business group of some sort. Then Microsoft unlimited storage for any subscriber of all Office 365 customers (Home, School, and soon Business) as bundled offering of OneDrive with the Office suite of products. Now comes word today from Amazon of unlimited storage to consumers…no need to be part of a business grouping or have bundled software come with it.
Today a colleague asked why all of this cloud storage couldn’t be used as file storage for the Islandora hosting service that is offered by LYRASIS. On the surface, it would seem to be a perfect backup strategy — particularly if you subscribed to multiple of these services and ran audits between them to make sure that they were truly in sync. Alas, the terms of service prevent you from doing something like that. Here is an excerpt from Amazon:
It did get me wondering, though. Decades ago the technology community created RAID storage: Redundant Array of Inexpensive Disks. The concept is that if you copy your data across many different disks, you can survive the failure of one of those disks and rebuild the information from the remaining drives. We also have virtual storage systems like iRODS and distributed file systems like Google File System and Apache Hadoop Distributed File System. I wonder what it would take to layer these concepts together to have a cloud-independent, cloud-redundant storage array for personal backups. Sort of like a poor-man’s RAID over Dropbox/Amazon/Microsoft/Google. Something that would take care of the file verifications, the rebuilding from redundant copies, and the caching of content between services. Even if we couldn’t use it for our library services, it would be a darn good way to ensure the survivability of our cloud-stored files against the failure of a storage provider’s business model.Link to this post!
The Evergreen community is pleased to announce the release of version 2.8.0 of the Evergreen open source integrated library system. Please visit the download page to get it!
New features and enhancements of note in Evergreen 2.8.0 include:
- Acquisitions improvements to help prevent the creation of duplicate orders and duplicate purchase order names.
- In the select list and PO view interfaces, beside the line item ID, the number of catalog copies already owned is now displayed.
- A new Apache access handler that allows resources on an Evergreen webs server, or which are proxied via an Evergreen web server, to be authenticated using user’s Evergreen credentials.
- Copy locations can now be marked as deleted. This allows information about disused copy locations to be retained for reporting purposes without cluttering up location selection drop-downs.
- Support for matching authority records during MARC import. Matches can be made against MARC tag/subfield entries and against a record’s normalized heading and thesaurus.
- Patron message center: a new mechanism via which messages can be sent to patrons for them to read while logged into the public catalog.
- A new option to stop billing activity on zero-balance billed transaction, which will help reduce the incidence of patron accounts with negative balances.
- New options to void lost item and long overdue billings if a loan is marked as claims returned.
- The staff interface for placing holds now offers the ability to place additional holds on the same title.
- The active date of a copy record is now displayed more clearly.
- A number of enhancements have been made to the public catalog to better support discoverability by web search engines.
- There is now a direct link to “My Lists” from the “My Account” area in the top upper-right part of the public catalog.
- There is a new option for TPAC to show more details by default.
For more information about what’s in the release, check out the release notes.
As release manager, I would like to thank the many people and institutions who contributed to this release in various ways, including testing, writing documentation, writing code, helping project teams and committees to run smoothly, and providing financial support.
At the LITA Blog, we know you look to us as a source for what’s going on in technology and librarianship. When we discovered Desk Set, a recent documentary that takes the viewer through the process of one library’s struggle to integrate a new technology, we knew you would want to know our responses. Never fear: the LITA bloggers are here with the kind of hard-hitting commentary you’ve come to expect from us.Not a Bunny Watson Lauren H.
Office romance, machines, corporate mergers, and job security, what tiresome topics for a documentary. Desk Set should represent the good work librarians do every day, but instead the writers and directors choose to represent a view of librarianship that no longer exists in the modern world. Librarians are smart, intelligent workers who deserve respect and for a documentary to show them conducting intellectual work.
Furthermore, why are the librarians only women? Men should have equal representation. Those working on the film might have thought they were helping raise the view of women by having a single working women run the library, but instead they succumbed to stereotypes. There is certainly a troubling lack of interaction in this workplace.Shame on the Federal Broadcasting Network Lindsay C.
In addition to the many other troubling questions raised in this odd documentary, Desk Set, as a librarian, I find the work site conditions and management particularly unsettling. What sort of workplace implements technology like the EMERAC without an advance audit and training? I can only suggest that FBN stakeholders be engaged in the process of reassessing such reckless deployments and untested software patches. Perhaps a staff member could be sent to an assessment training session- I would suggest the obviously under appreciated Peg Costello, so that an appropriate implementation plan could be developed.
The erroneous pink slip incident is particularly telling. Had library and other staff been properly trained in the automation process, panic and morale issues could have been completely avoided.
Beyond these gentle suggestions though, I must insist that Richard Sumner review his own product design, as a self-destruct button seems like a dangerous liability for any computer.Mike Cutler, a Harassment Suit in Waiting John K.
If the problems in the office were only limited to EMERAC. The man, and of course it’s a man, who oversees the reference department at FBN has Bunny Watson wrapped around his finger. Mike Cutler is reprehensible as a boss. He rarely shows himself in the reference department and when he does it’s only to press advances on Ms. Watson or give her his work to finish. Ms. Watson is very clear that she does not want Mr. Cutler touching her in the office and yet he pays no heed to her wishes, only serving to fulfill his base desires.
Not only does Mr. Cutler harass Ms. Watson at work, but the documentary shows that he’s stalking her. Mr. Cutler shows up at Ms. Watson’s house, unannounced, barges through the front door, and makes advances on her, even going so far as to demand food and drink. Thank goodness Mr. Sumner was there—for an evening of intellectual discussion it seems—to keep things from getting out of hand. As if Ms. Watson didn’t have enough to worry about!
If he hadn’t gotten transferred to FBN’s West Coast offices, I would have expected Ms. Watson to file a harassment suit against her boss. Only in today’s office environment’s could a cad like Mr. Cutler get a promotion after the unprofessional way in which he acted.
Are you looking for an adorable, classic, and entirely charming (and entirely fictional) film with librarians and super computers? Desk Set is the perfect option for you this April Fool’s Day. We hope you’ll post your own responses below, and know that you can always count on us to know when to take a topic seriously.
Categories? Very eighteenth century. Tags? So Web 2.0. Pretty cryptic stuff. What will Lila do differently? Let’s take another step.
Tags are messier than categories; I called tags evil. But tags are easier to manage than the next level down, the words themselves. Tags are messy when left to humans, but tags can be managed with automation. Many services auto-suggest tags, controlling the vocabulary. Lila will generate its own tags, refreshing them on demand. Tags can be managed.
Words are the true pit of chaos. People conform to the rules of language when they write, or they don’t. People make up words on the fly. Down the rabbit hole. But is it so bad? It happens time and again that we think an information problem is too complex to be automated, only to analyze it and discover that we can do a good chunk of what we hoped following a relatively simple set of rules. One mature technology is keyword search. Keyword search is so effective we take it for granted. Words can be managed with the right technologies.
Another mature technology is Natural Language Processing (NLP). Its history dates back to the 1950’s. The field is enjoying a resurgence of interest in the context of cognitive computing. Consider that a person can learn basic capability in a second language with only a couple thousand words and some syntax for combining them. Words and syntax. Data and rules. Build dictionaries with words and their variant forms. Assign parts-of-speech. Use pattern recognition to pick out words occurring together. Run through many examples to develop context sensitivity. Shakespeare it is not, but human meaning can be extracted from unstructured text in this way for many useful purposes.
Lila’s purpose is to make connections between passages of text (“slips”) and to suggest hierarchical views, e.g., a table of contents. I’ve talked a lot about how Lila can compute connections. Keywords and NLP can be used effectively to find common subjects across passages. Hierarchy is something different. How can the words in a passage say something about how it should be ordered relative to other external passages? We can go no deeper than the words. It’s all we have to work with. To compute hierarchy, Lila needs something different, something special. Stay tuned.
The DPLAFest schedule is packed with interesting sessions–everything from ebooks, to project management, to digitization, and education, has a space in the lineup. A set of those programs is related to the DPLA Hubs and the work that they do. In addition to showcasing the incredible work being done by institutions that are part of the DPLA Hub network, it’s a great way for attendees from aspiring Hubs to find out more about the application process. Here are some Hub Highlights from this year’s fest:
Best Practices for Establishing a DPLA Service Hub in Your State/Region: Gear up for the 2015 Content Hub or Service Hub open calls by learning more about what it takes to be a DPLA network Hub. Session speakers include DPLA’s Director for Content Emily Gore, Assistant Director for Content Amy Rudersdorf, and Data Services Coordinator Gretchen Gueguen.
Best Practices for Digitization Training: Want to know more about designing a digitization training program curriculum? Join this session led by two members of the DPLA Hub Network and participants in the Public Library Partnerships Project.
Newspapers and the DPLA: Extra, extra! Learn all about it in this session exploring the potential to integrate newspaper content into the DPLA. Speakers include Head of the Digital Scholarship Center, University of Oregon Libraries, Karen Estlund, and DPLA’s Emily Gore. We’d love to hear your ideas and expertise as we explore this new opportunity.
DPLA Hubs Showcase: The Hubs Showcase will combine a lot of learning with a little bit of fun as we give nine speakers five minutes each to talk about the innovative and unique work they’re doing at their institutions across the US. Topics will range from geospatial metadata best practices in the Mountain West to Fedora/Blacklight systems aggregating South Carolina content, image sharing via the International Image Interoperability Framework (IIIF), access to digitized newspapers in North Carolina, and much more. Set your timer for an information-packed hour.
The post Speak at Revolution 2015: Call for Papers open through May 8 appeared first on Lucidworks.
In the month since the Federal Communications Commission (FCC) voted to approve an Order to Protect and Promote the Open Internet, the American Library Association (ALA) and its allies have been wading into the details published by the Commission March 12 to better understand the implications for libraries and higher education. It’s also been a busy time with Congressional hearings and the first lawsuits filed, so an update is in order.
As coalition partner EDUCAUSE notes in a recent blog, the new Order references our coalition’s ideas and proposals nearly 20 times, indicating the impact and value of our engagement in this far-reaching proceeding.
First, and most importantly, the new “bright line” rules adopted protect against internet service providers (ISPs) blocking, degrading or prioritizing legal internet traffic align with the key concerns of the library and higher education coalition, as we wrote earlier. These rules apply to both mobile and fixed broadband, which our coalition has consistently advocated for since 2009. The Commission also strengthened transparency requirements to include the duty to disclose prices and fees, as well as network performance and practices.
The Commission also laid out a standard for future conduct to address concerns that may arise with new technologies and practices. The Order establishes that ISPs cannot “unreasonably interfere with or unreasonably disadvantage” the ability of consumers to select the online content and services they want and the ability of content providers to reach these consumers. Here again, the FCC cites the higher education and library comments proposing an “internet reasonable” standard that would protect the unique and open character of the Internet.
Our coalition also sought to ensure libraries and educational institutions are again explicitly included in network neutrality protections and to differentiate between public broadband internet access and private networks. The Commission explicitly affirmed both of these points.
Finally, ALA raised concerns about how forbearance from some Title II regulations might impact Universal Service Fund (USF) programs like the E-rate program. We are pleased the Commission recognized that “Even prior to the classification of broadband Internet access service adopted here, the Commission already supported broadband services to schools, libraries, and health care providers and supported broadband-capable networks in high-cost areas.” As a result, the FCC determined that broadband Internet access services would be subject to Section 254 USF protections, but not immediately required to provide USF contributions. It is expected that the contribution question will be addressed in a separate FCC proceeding. The Order also preempts any state from imposing any new state USF contributions on broadband at this time.
Congress has been busy on the network neutrality front, as well. Over the last two weeks Congress has held a series of hearings—including directly questioning FCC Chairman Tom Wheeler and fellow FCC Commissioners—that concluded in the House Judiciary Committee last week. While these hearings were contentious—and, at times, contentious among the commissioners themselves—it’s fair to say no major shift in policy by the commissioners was detected nor were minds changed in Congress.
ALA joined (pdf) 137 other groups and companies in a letter thanking Chairman Wheeler, Commissioner Mignon Clyburn and Commissioner Jessica Rosenworcel for their leadership in protecting the Open Internet. It was read into the record at the March 17 House Oversight hearing by Congresswoman Eleanor Holmes Norton (D-DC).
Senate Commerce Committee Chairman John Thune (R-SD) and House Energy and Commerce Committee Chairman Fred Upton (R-MI) introduced legislation earlier this year that would codify some network neutrality protections while also undermining FCC authority to regulate broadband internet access. Together these two elements leave a gulf between Democrats and Republicans in finding a legislative solution that would truly protect network neutrality and potentially forestall years of litigation. A bipartisan (and non-binding) budget amendment expressing support for network neutrality rules did manage to clear the Senate last Friday, perhaps signaling room for compromise. Republicans are apparently targeting Senator Bill Nelson (D-FL), who has indicated openness to discussions but is unwilling to accept language that would usurp FCC authority and oversight powers, for a compromise deal.
As of March 31st, the FCC Order had not yet been published in the Federal Register, which begins the true “shot clock” for Congressional and legal action on the Order. Trade association US Telecom (which includes AT&T, Verizon, Frontier and CenturyLink among its members) and Alamo Broadband couldn’t wait that long—filing suit in two different appeals courts on March 23. Using language all of us are likely to hear a lot more about, US Telecom argues (pdf) the FCC decision is “arbitrary, capricious, and an abuse of discretion.” FCC lawyers have argued the suits were filed prematurely and should be rejected.
Expect more drama but likely little immediate resolution. Carrying our message to Capitol Hill about why network neutrality is so vital for libraries and all of our users will be essential. I hope many of you will join with us on National Library Legislative Day to amplify our voice and impact.
SCOOP! ALA Washington Office Encourages Inter-Chapter Fight
Washington, DC – the ALA Washington Office has confirmed that it’s actively pitting state chapter against state chapter in the run up to National Library Legislative Day 2015 (NLLD). In the latest in a record-shattering string of 4,268 reminders about the event, held on May 4 – 5 in the nation’s capital, the Office of Government Relations publicly admitted today that they are actively encouraging competition between the state chapters. “It’s true. We want every state library chapter in the nation to fight tooth and nail . . . to bring more people to NLLD per capita than anybody else, that is, and we plan to reward the winner handsomely,” says OGR Director Adam Eisgrau.
Not-so-secret memoranda leaked to The Pest confirm that all members of the largest state delegation registered for and attending NLLD (as a percentage of the state’s total population) will receive a guided, behind-the-scenes tour of the incredible United States Capitol building. “We need heroic turnout from every state this year,” added Eisgrau. “It’s time to take the registration gloves off!”
For more information or assistance of any kind, please contact Lisa Lindle, ALA Washington’s Grassroots Communications Specialist, at email@example.com or 202-403-8222.
The post The Washington Pest: National Library Legislative Day news appeared first on District Dispatch.
In December 1891, Dr. James Naismith rigged up an elevated peach basket in a Springfield, Massachusetts gymnasium – and the game of basketball was born. Now, nearly 125 years later, we celebrate Dr. Naismith’s innovation with the third round of the OCLC Research Collective Collections Tournament. We are down to four conferences … did yours make the cut?Competition in the Round of 8 goes to the very roots of basketball: which conference collective collection has the most materials published in the birthplace of basketball, Springfield, Massachusetts?* The Atlantic 10 came through as the big winner, with more than 2,300 publications originating from Springfield. Conference USA was close behind with more than 1,900 publications, winning handily over our erstwhile tournament Cinderella, Big South. After bracket-busting victories over the mighty America East and Big Ten conferences in earlier rounds, Big South could not get past Conference USA in the Round of 8. So the clock has struck midnight, and Big South’s fairytale tournament run has turned back into a pumpkin. Summit League and Missouri Valley join the Atlantic 10 and Conference USA in the next round.
Springfield, Massachusetts boasts much more than the honor of being the birthplace of basketball. Theodor Geisel, better known as Dr. Seuss, was born there (visit the Dr. Seuss Collection at the University of California, San Diego’s Geisel Library). Merriam-Webster, Inc., publisher of the eponymous dictionary, is headquartered in Springfield (visit the Warren N. and Suzanne B. Cordell Collection of Dictionaries at Indiana State University’s Cunningham Memorial Library, to which Merriam-Webster donated 500 volumes). And the Milton Bradley Company, maker of such classic games as Chutes & Ladders, Twister, and Candyland, was founded in Springfield in 1860 (visit the George M. Fox collection of rare children’s books at the San Francisco Public Library, which began as the archives of a publishing company acquired by Milton Bradley in 1920).
Needless to say, the works of Dr. Seuss, Merriam-Webster, and yes, even Milton Bradley, are part of the conference collective collections competing in this tournament!
Bracket competition participants: Remember, if the conference you chose is now watching the tournament from the sidelines, there is still a ray of hope! If no one picked the tournament Champion, all entrants will be part of a random drawing for the big prize!
The tournament semi-finals are next! Results will be posted April 3.
*Number of publications in conference collective collections that were published in Springfield, Massachusetts. Data is current as of January 2015.More information: Brian Lavoie
Brian Lavoie is a Research Scientist in OCLC Research. Brian's research interests include collective collections, the system-wide organization of library resources, and digital preservation.Mail | Web | LinkedIn | More Posts (11)
Version 8.0.0 of the Hydra gem has been released!
To use hydra 8.0.0 add the following line to your Gemfile:
gem ‘hydra’, ‘8.0.0’
Then bundle install. To test things out try the Dive into Hydra tutorial archived in the /doc directory inside the gem which is also on the github wiki: https://github.com/projecthydra/hydra/tree/8-0-stable/doc
This release is the first in the line of 8.x releases which are the last supporting Fedora 3. Hydra 9.0.0 which supports Fedora 4 was previously released.
For full details, please see the individual dependency release notes. Many thanks to all involved!
We are delighted to announce that Sufia 6.0.0 has been released. It’s been quite an undertaking:
* over 451 commits
* 402 files changed
* 17 contributors from 6 institutions, across the U.S.A and Canada
Some of the new features include:
* Fedora 4 support
* Some optimizations for large files including using ActiveFedora’s streaming API
* New metadata editing forms (inspired by Worthwhile and Curate)
* Store old featured researchers in database
* Support for the latest Hydra components, Blacklight, Rails, and Ruby versions
* Easier overriding of Sufia’s controllers
* Dozens and dozens of bugs squashed, and lots of UI/UX tweaks
* Lots of work on the README
You can read all about it in the release notes, including instructions for upgrading:
We’ve also created a sample application that outlines the steps of upgrading to version 6 and migrating the data from Fedora 3 to Fedora 4:
Penn State has tested this code against their Scholarsphere application where they will be using it to launch their new Fedora 4-based version and migrate its data in April. You will notice that Sufia 5.0, the final Fedora 3-based Sufia version, is still in release candidate status. We hope to have this rectified soon.
Many thanks to all the developers who have made this release possible, and to the leadership at each of their institutions for recognizing the value of this project.
This is the fourth in a series of posts related to metadata edit events collected by the UNT Libraries from its digital library system from January 1, 2014 until December 31, 2014. The previous posts covered when, who, and what.
This post will start the discussion on the “how long” or duration of the dataset.
Libraries, archives, and museums have long discussed the cost of metadata creation and improvement projects, depending on the size, complexity and experience of the metadata creators, the costs associated with metadata generation, manipulation and improvement can vary drastically.
The amount of time that a person takes to create or edit a specific metadata record is often used in the calculations of what projects will cost to complete. At the UNT Libraries we have used $3.00 per descriptive record as our metadata costs for projects, and based on the level of metadata created, workflows use, and the system we’ve developed for metadata creation, this number seems to do a good job of covering our metadata creation costs. It will be interesting to get a sense of how much time was spent editing metadata records over the past year and also plotting that to collections, type, formats and partners. This will involve a bit of investigation of the dataset before we get to those numbers though.
Here is a quick warning about the rest of the post, for me I’m stepping out into deeper water for me with the analysis I’m going to be doing with our 94,222 edit events. From what I can tell from my research is that there are many ways to go about some of this and I’m not at all claiming that I have the best or even a good approach. But it has been fun so far.Duration
The reason we wanted to capture event data when we created our Metadata Edit Event Service was to get a better idea of how much time our users were spending on the task of editing metadata records.
This is accomplished by adding a log value into the system with a timestamp, identifier, and username when a record is opened, and when the record is published back into the system the original log time is subtracted from the published time which results in the number of seconds that were taken for the metadata event. (a side note, this is also the basis for our record locking mechanism so that two users don’t try and edit the same record at the same time)
There are of course a number of issues with this model that we noticed, first what if the users opens a record and forgets about it and goes to lunch then comes back and publishes the record. What happens if they open a record and then close it, what happens to that previous log event, is it used the next time? What happens if a user opens multiple records at once in different tabs, if they aren’t using the other tabs immediately they are adding time without really “editing” the records. What if a user makes use of a browser automation tool like Selenium, won’t that skew your data?
The answer to many of these questions is “yep that happens” and how we deal with them in the data is something that I’m trying to figure out, I’ll walk you through what I’m doing so far to see if it makes sense.Looking at the Data Hours
As a reminder, there are 94,222 edit events in the dataset. The first thing I wanted to take a look at is how they group into buckets based on hours. I took the durations and divided them by 3600 with floor division so i should get buckets of 0,1,2,3,4,….and so on.
Below is a table of these values.Hours Event Count 0 93,378 1 592 2 124 3 41 4 20 5 5 6 8 7 7 8 1 9 4 10 6 11 2 12 1 14 3 16 5 17 3 18 2 19 1 20 1 21 2 22 2 23 2 24 3 25 1 26 1 29 1 32 2 37 1 40 2 119 1
And then a pretty graph of that same data.
What is very obvious about this table and graph is that the vast majority with 93,378 (99%) of the edit events taking under one hour to finish. We already see some outliers with 119 hours (almost an entire work week.. that’s one tough record) on the top end of event duration list.
While I’m not going to get into it with this post, it would be interesting to see if there are any patterns to find in the 844 records that took longer than an hour to create. What percentage of that users records took over an hour, do they come from similar collections, types, formats, or partners? Something for later I guess.Minutes
Next I wanted to look at the edit events that took less than an hour to complete, where do they sit if I put them in buckets based on 60 seconds. Filtering out the events that took more than an hour to complete leaves me 93,378 events. Below is the graph of these edit events.
You can see a dramatic curve for the edit events as the number of minutes goes up.
I was interested to see where the 80/20 split for this dataset would be and it appears to be right about six minutes. There are 17,397 (19%) events occurring from 7-60 minutes and 75,981 (81%) events from 0-6 minutes in length.Seconds
Diving into the dataset one more time I wanted to look at the 35,935 events that happened in less than a minute. Editing a record in under a minute for me takes a few different paths. First you could be editing a simple field like changing a language code or a resource type, second you could be just “looking” at a record and instead of closing the record you hit “publish” again. You might also be switching a record from the hidden state to the unhidden state (or vice versa), finally you might be using a browser automation tool to automate your edits. Let’s see if we can spot any of these actions when we look at the data.
By just looking at the data above it is hard to say which of the kinds of events mentioned above map to different parts of the curve. I think when we start to look at individual users and collections some of this information might make a little more sense.
This is going to wrap up this post, in the next post I’m hoping to define the cutoff that will designate “outliers” from data that we want to use for the calculation of average times for metadata creation and then see how that looks for our various users in the system.
As always feel free to contact me via Twitter if you have questions or comments.
When my brain was completely full on Thursday at the ACRL Conference, Jad Abumrad’s keynote felt like a spa for my brain. For those who don’t know, he is the co-host of Radiolab, a very cool and innovative show on NPR, and the recipient of one of those fancy schmancy MacArthur genius grants. Good call ACRL planning committee! His keynote was brilliant and it was coming at a time when I’ve been reflecting on where I am in my career now that I feel like I’m not in survival mode anymore.
For those who missed Jad’s talk, here’s another one he gave two years ago that covered some similar territory:
I have such admiration for people who are confident. People who are poised. People who are strong advocates for themselves. People who are quick thinkers. People who are energized, not anxious, when in a crowd of people. People who can be politic. People who are brave. I have a lot of friends I wish I was more like. But I’ve also learned over the years that many of the people I thought were all those things were actually just as big a ball of neuroses as I am. That a lot of people I thought were so confident were actually overcompensating for major insecurities.
People you admire are probably more than meets the eye too.
There are people who say they admire me. I’ve always been uncomfortable with that because I don’t think I deserve it. I’m also uncomfortable because I worry that it creates this false expert vs. novice dichotomy that might make them think they can’t achieve what I have. Anyone can do what I’ve done.
I know a lot of people who are afraid to take risks in their work and/or are in difficult work situations that are killing their passion for their work. In the interest of encouraging other people who are struggling, and inspired by Jad’s talk (though not nearly as eloquent), I’m going to share a bit here.I am a big ball of self-doubt.
Have you let doubt keep you from trying something or pursuing an idea? Well, screw that! I have never felt certain about anything I’ve done while I was doing it. The entire time we were working on Library DIY, there was a constant voice in the back of my mind telling me “this is crap. There’s a reason no one has done something like this and that’s because it makes no sense.” I’ve cringed when hitting publish on the vast majority of blog posts I’ve written because I think most times that the ideas I have are stupid.
Jad talked about how “gut churn” is an essential part of the creative process. That feeling of anxiety and doubt and panic when you’re trying to do something really creative and different is very normal and very necessary. I’ve always believed that talented, accomplished, and creative people feel really certain about their projects and path (á la Steve Jobs), but it was a relief to know that I’m not alone in feeling the “gut churn.”
So many of us are stopped in our tracks by fears that our ideas are not innovative or even good. Sometimes we’re right and sometimes we’re wrong. I’ve had projects fail and I’ve had projects succeed beyond my wildest dreams, but I’m always glad I went for it because I learned from every one of them.
I’m starting to realize that “gut churn” is better than certainty, because it leaves you more open to making changes and improvements based on what you hear from others (colleagues, patrons, etc.). The more stuck you get on the perfect rightness of your original vision, the less likely you’ll be to accept feedback and make improvements. I’ve learned to develop some amount of detachment from my projects, so that when my work is criticized, it doesn’t feel like a criticism of me. Becoming defensive isn’t productive, and I regret times when I was defensive about stuff in the past.I’m more of a beginner now than I was before.
One of my favorite former colleagues sent me an article entitled “The importance of stupidity in scientific research.” What I initially thought was a joke actually was a fantastic editorial about seeking out opportunities to “feel stupid;” where you can’t easily find an answer and have to struggle, learn, and make your own discoveries.
Productive stupidity means being ignorant by choice. Focusing on important questions puts us in the awkward position of being ignorant. One of the beautiful things about science is that it allows us to bumble along, getting it wrong time after time, and feel perfectly fine as long as we learn something each time. No doubt, this can be difficult for students who are accustomed to getting the answers right. … The more comfortable we become with being stupid, the deeper we will wade into the unknown and the more likely we are to make big discoveries.
I was a high achiever in high school and, at the elite college I attended where I felt perpetually out of my depth, I avoided taking classes that scared and challenged me. What a waste. I’ve come to love the anxiety of doing something new that I’m not necessarily a natural at. Public speaking was something that used to terrify me, but over time, I became increasingly comfortable and found my voice as a speaker. Moving from a university to a community college put me back into the beginner role, and I’ve grown so much as an instructor over the past few months because of it. Feeling ignorant (as I did in my first term here) is not a comfortable thing, but it makes you struggle more and learn more to get beyond that beginner state.
I don’t consider myself an expert at anything. There are some things I’m better at than others, but in my teaching, my writing, my speaking, and everything else I do professionally, I am a work in progress; a perpetual beginner. Having that attitude leaves us open to learning and growth.Haters gonna hate, but don’t let them define you.
I’m one of those people who just wants to be liked. I’m a people pleaser. I remember in my sophomore year of college, I lived in a house where most of my housemates were always fighting with each other. My buddy Dan Young and I were like Switzerland where everyone bitched to us about other people and we just tried to stay neutral and sympathetic.
I’ve always gotten along with people in the workplace, so when I had what I can only describe as a “nemesis” in one of my jobs, I had no idea how to handle it. This was someone who had been up for the management job I got. I tried to connect with her and be friendly, but she did everything in her power to undercut me in meetings and make me look bad to our superiors and colleagues. I constantly heard from colleagues about her saying bad things about me behind my back, as if I was some kind of horrible person, which made me wonder if I was. I hate that I let her get to me so much. But when she started alienating other people at work, I realized it wasn’t all about me.
The good thing that came out of this experience is that I’m now more ok with not being liked, especially when I’m pretty sure there was nothing I did to deserve it. Sometimes it’s not really about you, but about a situation or the fragile ego of the other person. Sometimes you’re walking into a context that dooms you from the start. It’s always worth starting from a place where you examine your own behavior to see if you somehow caused the problem, but you shouldn’t hang your whole sense of self-worth on whether or not your colleagues adore you.Even my painful experiences have led to valuable learning.
I spent a big part of the past four years feeling like a failure. Every time I started to feel good about the work I was doing, something or someone came and smacked me down. Still, I’ve learned so much about myself and how to handle difficult work and political situations because of the experiences I had.
In the talk I shared above, Jad talks about reframing awful things that happen; using them as an arrow to point you toward the solution. That’s what led me to my current job, one that was not at all what I’d envisioned as my future when I was at Norwich five years ago. Yet it fits me like a glove. When I was feeling horrible about work, I thought a lot about what the right job would look like. And it looked quite a bit like what I’m doing now. Pain has a way of sharpening your focus and showing you the right path.I deserve good things. So do you.
I’m not perfect. I’ve made mistakes and I’m sure I’ll make more in the future. I’m a work in progress, but I’m always striving to be better. I want to be a supportive colleague and be good at my job. I want to be a good wife and mother. I want to feel like I’m contributing to the profession beyond my library in useful ways. I’m working on getting used to the happiness I feel now that I’m in a job I love. I’m trying to be nicer to myself. I’m trying to feel like I deserve these good things that are happening for me.
We all deserve good things. We are all works in progress. Don’t let your own doubts or the stories you’ve got in your head (or that people tell you) about what you can and can’t do prevent you from taking risks and growing. Try. If the worst thing to fear is failure (and recognizing that you will learn from it either way), it doesn’t seem like such a huge risk to take.
Image credit: Gut churn, by Dreadful Daily Doodles
Boston, MA The Sixth Annual VIVO Conference will be held August 12-14, 2015 at the Hyatt Regency Cambridge, overlooking Boston. The VIVO Conference creates a unique opportunity for people from across the country and around the world to come together to explore ways to use semantic technologies and linked open data to promote scholarly collaboration and research discovery.