John Miedema: Lila “tears down” old categories and suggests new ways of looking at content. Word concreteness is a good candidate.
Many of the good things we love about language are essentially hierarchical. Narrative is linear: a beginning, middle, and end. Order shapes the story. Hierarchy gives a bird’s eye view, a table of contents, a summary that allows a reader to consider a work as a whole.
Lila will compute hierarchy by comparing passages on word qualities that suggest order. Concreteness is considered a good candidate. Passages with more abstract words express ideas and concepts, whereas passages with more concrete words express examples. Of the views that Lila can suggest, it is useful to have a view that presents abstract concepts first and concrete examples second. I have listed four candidate qualities here, but I will focus in the posts that follow on concreteness.Quality Description Examples 1 Abstract Intangible qualities, ideas and concepts. Different than frequency of word usage. Both academic terms and colorful prose can have low word frequency. freedom (227*), justice (307), love (311) Concrete Tangible examples, illustrations and sensory experience grasshopper (660*), tomato (662), milk (670) 2 General Categories and groupings. Similar to 1, but 1 is more dichotomous and this one is more of a range. furniture Specific Particular instances La-Z-Boy rocker-recliner 3 Logical Analytical thinking, understatement and fact. Note the conflict with 1 and 2 — facts are both logical and concrete. The fastest land dwelling creature is the Cheetah. Emotional/Sentimental Feeling, emphasis, opinion. Can take advantage of the vast amount of sentiment measures available. The ugliest sea creature is the manatee. 4 Static Constancy and passivity It was earlier demonstrated that heart attacks can be caused by high stress. Dynamic Change and activity. Energy. Researchers earlier showed that high stress can cause heart attacks.
* Concreteness index. MRC Psycholinguistic database. Grasshopper is a more concrete word than freedom. Indexes like the MRC can be used to compute concreteness for passages.
Lila can compute hierarchy for passages, and for groups of passages. Together, it builds a hierarchy, a view of how the content can be organized. Think of what this offers a writer. A writer stuck in his or her manually produced categories and view can ask Lila for alternate views. Lila “tears down” the old categories and suggests a new way of looking at the content. It is unlikely that the writer will stick exactly to Lila’s view, but it could provide a fresh start or give new insight. And Lila can compute new views dynamically, on demand, as the content changes.
In my last post, I discussed effort estimation and scheduling, which leads into the beginning of actual development. But first, you need to decide how you’re going to track progress. Here are some commonly used methods:
The Big Board
In keeping with Agile philosophy, you should choose the simplest tool that gives you the functionality you need. If your team does all of its development work in the same physical space, you could get by with post-it notes on a big white board. There’s a lot to be said for a tangible object: it communicates the independent nature of each task or story in a way that software may not. It provides the team with a ready-made meeting point: if you want to see how the project is going, you have to go stand in front of the big board. A board can also help to keep projects lean and simple, because there’s only so much available space on it. There are no multiple screens or pages to hide complexity.
Sticky notes, however, are ephemeral in nature. You can lose your entire project plan to an overzealous janitor; more importantly, unless you periodically take pictures of your board, there’s no way to trace user story evolution. Personally, I like to use this method in the initial stages of planning; the board is a very useful anchor for user story definition and prioritization. Once we move into the development process, I find that moving into the virtual realm adds crucial flexibility and tracking functionality.
If the scope of the project is limited, it may be possible to track it using a basic office productivity suite like MS Office. MS Excel and similar spreadsheet tools are fairly easy to use, and they’re ubiquitous, which means your team will likely face a lower learning curve. Remember that in Agile the business side of the organization is an integral part of the development effort, and it may not make sense to spend time and effort to train sales and management staff on a complex tracking tool.
If you choose to go the spreadsheet route, however, you are giving up some functionality: it’s easy enough to create and maintain spreadsheets that give you project snapshots and track current progress, but this type of software is not designed to accurately measure long term progress and productivity, which helps you upgrade your processes and increase your team’s efficiency. There are ways to track Agile metrics using Excel, but if you find that you need to do that you may just want to switch to dedicated software anyway.
There are several tracking tools out there that can help manage Agile projects, although my personal experience so far has been limited to to JIRA and its companion GreenHopper. JIRA is a fairly simple issue-tracking tool: you can create issues (manually or directly from a reporting form), add a description, estimate effort, prioritize, and assign to a team member for completion. You can also track it through the various stages of development, adding comments at each step of the way and preserving meaningful conversations about its progress and evolution. As you can see in this article comparing similar tools, JIRA’s main advantage is the lack of unnecessary UI complexity, which makes it easier to master. Its main shortcoming is the lack of sprint management functionality, which is what GreenHopper provides. With the add-on, users can create sprints, assign tickets to them, and track sprint progress.
Can all of this functionality be replicated using spreadsheets? Yes, although maintenance and authentication can becomes problematic as the complexity of the project increases. At some point a tool like JIRA starts to pay for itself in terms of increased efficiency, and most if not all of these products are web-based and offer some sort of free trial or small enterprise pricing. My advice is to do analyze your operations to determine if you need to go the tracking tool route, and then some basic research to identify popular options and their pros and cons. Once you’ve identified one or two options that seem to fit your needs, give then a try to see if they’re what you’re looking for.
Again, which method you go with will depend on how much effort you will need to spend up front (in training and adapting new software) versus later on (added maintenance and decreased efficiency).
How do you track user story progress? What are the big advantages/disadvantages of your chosen method? JIRA in particular seems to elicit strong feelings in users, positive or negative; what are your thoughts on it?
DuraSpace News: OR2015 Conference Stands Behind Commitment to Ensure All Participants are Treated With Respect
Indianapolis, IN The Open Repositories 2015 conference will take place June 8-11 in Indianapolis and is wholly committed to creating an open and inclusive conference environment. As expressed in its Code of Conduct, OR is dedicated to providing a welcoming and positive experience for everyone and to having an environment in which all colleagues are treated with dignity and respect.
Friday, June 26, 2015, 8:30am – 4:00pm
In this hackathon attendees will learn to use the Bootstrap front-end framework and the Git version control system to create, modify and share code for a new library website. Expect a friendly atmosphere and a creative hands-on experience that will introduce you to web literacy for the 21st century librarian. The morning will consist of in-depth introductions to the tools, while the afternoon will see participants split into working groups to build a single collaborative library website.
Bootstrap is an open-source, responsive designed, and front-end web framework that can be used to create complete website redesigns to rapid prototyping. It is useful for many library web applications, such as customizing LibGuides (version 2) or creating responsive sites. This workshop will give attendees a crash-course into the basics of what Bootstrap can do and how to code it. Attendees can work individually or in teams.
Git is an open-source software tool that allows you to manage drafts and collaboratively work on projects – whether you’re building a library app, writing a paper, or organizing a talk. We will also talk about GitHub, a massively popular website that hosts git projects and has built-in features like issue tracking and simple web page hosting.
Bootstrap, LibGuides, & Potential Web Domination – Discussion of the use of Bootstrap at the Van Library, University of St. Francis
Libraries using Bootstrap example:
Bradford County Public Library
Library Code Year Interest Group
Kate Bronstad, Web Developer, Tisch Library, Tufts University
Kate is a librarian-turned-web developer for Tufts University’s Tisch Library. She works with git on a daily basis and teaches classes on git for the Boston chapter of Girl Develop It. Kate is originally from Austin, TX and has a MSIS from UT-Austin.
Junior Tidal, New York City College of Technology
Junior is the Multimedia and Web Services Librarian and Assistant Professor for the Ursula C. Schwerin Library at the New York City College of Technology, City University of New York. His research interests include mobile web development, usability, web metrics, and information architecture. He has published in the Journal of Web Librarianship, OCLC Systems & Services, Computers in Libraries, and code4Lib Journal. He has written a LITA guide entitled Usability and the Mobile Web published by ALA TechSource. Originally from Whitesburg, Kentucky, he has earned a MLS and a Master’s in Information Science from Indiana University.
- LITA Member $235 (coupon code: LITA2015)
- ALA Member $350
- Non-Member $380
To register for any of these events, you can include them with your initial conference registration or add them later using the unique link in your email confirmation. If you don’t have your registration confirmation handy, you can request a copy by emailing firstname.lastname@example.org. You also have the option of registering for a preconference only. To receive the LITA member pricing during the registration process on the Personal Information page enter the discount promotional code: LITA2015
Register online for the ALA Annual Conference and add a LITA Preconference
Call ALA Registration at 1-800-974-3084
Onsite registration will also be accepted in San Francisco.
Questions or Comments?
For all other questions or comments related to the course, contact LITA at (312) 280-4269 or Mark Beatty, email@example.com
Journal of Web Librarianship: BUILDING COMMUNITIES: SOCIAL NETWORKING FOR ACADEMIC LIBRARIES. Garofalo, Denise A. Oxford, UK: Chandos Publishing, 2013, 242 pp., $80.00, ISBN-13: 978-1-84334-735-4.
Journal of Web Librarianship: DIGITAL HUMANITIES IN PRACTICE. Warwick, Claire, Melissa Terras, and Julianne Nyhan, Eds. London: Facet Publishing, 2012, 233 pp., $97.42, ISBN: 978-1-85604-766-1.
Bradford Lee Eden
Journal of Web Librarianship: GUIDE TO REFERENCE IN MEDICINE AND HEALTH. Modschiedler, Christa, and Denise Beaubien Bennett. Chicago: ALA Editions, 2014, 480 pp., $75.00, ISBN-13: 978-0-83891-221-8.
Kristen L. Young
Journal of Web Librarianship: THE METADATA MANUAL: A PRACTICAL WORKBOOK. Lubas, Rebecca, Amy Jackson, and Ingrid Schneider. Oxford, UK: Chandos Publishing, 2013, 240 pp., $80.00, ISBN: 978-1-84334-729-3.
Journal of Web Librarianship: ANNUAL REVIEW OF CULTURAL HERITAGE INFORMATICS: 2012–2013. Hastings, Samantha K., ed. New York: Rowman & Littlefield, 2014, 290 pp., $84.99, ISBN-13: 978-0-75912-333-5.
Dena L. Luce
Journal of Web Librarianship: PRIVATIZING LIBRARIES. Jerrard, Jane, Nancy Bolt, and Karen Strege. Chicago, IL: ALA Editions, 2012, 72 pp., $46.00, ISBN-13: 978-0-83891-154-9.
Journal of Web Librarianship: DIGITAL LIBRARIES AND INFORMATION ACCESS: RESEARCH PERSPECTIVES. Chowdhury, G. G., and Schubert Foo, Eds. Chicago: Neal Schuman, 2012, 256 pp., $99.95, ISBN-13: 978-1-55570-914-3.
The Andrew W. Mellon Foundation is aggressively funding efforts to support new forms of academic publishing, which researchers say could further legitimize digital scholarship.
The foundation in May sent university press directors a request for proposals to a new grant-making initiative for long-form digital publishing for the humanities. In the e-mail, the foundation noted the growing popularity of digital scholarship, which presented an “urgent and compelling” need for university presses to publish and make digital work available to readers.Note in particular:
The foundation’s proposed solution is for groups of university presses to ... tackle any of the moving parts that task is comprised of, including “...(g) distribution; and (h) maintenance and preservation of digital content.”Below the fold, some thoughts on this based on experience from the LOCKSS Program.
Since a Mellon-funded meeting more than a decade ago at the NYPL with humanities librarians, the LOCKSS team has been involved in discussions of, and attempts to, preserve the "long tail" of smaller journal publishers, especially in the humanities. Our observations:
- The cost of negotiating individually with publishers for permission to preserve their content, and the fact that they need to take action to express that permission, is a major problem. Creative Commons licenses and their standard electronic representation greatly reduce the cost of preservation. If for-pay access is essential for sustainability, some standard electronic representation of permission and standard way of allowing archives access is necessary.
- Push preservation models, in which the publisher sends content for preservation, are not viable in the long tail. Pull preservation, in which the archive(s) harvest content from the publisher, is essential.
- Further, the more the "new digital work flows and publication models" diverge from the e-book/PDF model, the less push models will work. They require the archive replicating the original publishing platform, easy enough if it is delivering static files, but not so easy once the content gets dynamic.
- The cost of pull preservation is dominated by the cost of the first publisher on a given platform. Subsequent publishers have much lower cost. Thus driving publishing to a few, widely-used platforms is very important.
- Once a platform has critical mass, archives can work with the platform to reduce the cost of preservation. We have worked with the Open Journal System (OJS) to (a) make it easy for publishers to give LOCKSS permission by checking a box, and (b) provide LOCKSS with a way of getting the content without all the highly variable (and thus impossibly expensive) customization. See, for example, work by the Public Knowledge Project.
- The problem with OJS has been selection - much of the content is too low quality to justify the effort of preserving it. Finding the good stuff is difficult for archives because the signal-to-noise ratio is low.
There are significant differences between the University Press market for long-form digital humanities and the long tail of humanities journals. The journals are mostly open-access and many are low-quality. The content that Mellon is addressing is mostly paid access and uniformly high-quality; the selection process has been done by the Presses. But these observations are still relevant, especially the cost implications of a lack of standards.
It is possible that no viable cost-sharing model can be found for archiving the long tail in general. In the University Press case, a less satisfactory alternative is a "preserve in place" strategy in which a condition of funding would be that the University commit to permanent access to the output of its press, with an identified succession plan. At least this would make the cost of preservation visible, and eliminate the assumption that it was someone else's problem.
John Miedema: Hierarchy has a bad rap but language is infused with it. We must find ways to tear down hierarchy almost as quickly as we build it up.
Hierarchy has a bad rap. Hierarchy is a one-sided relation, one thing set higher than another. In society, hierarchy is the stage for abuse of power. The rich on the poor, white on black, men on women, straight on gay. In language too, hierarchy is problematic. Static labels are laden with power and stereotypes, favoring some over others. Aggressive language, too, can overshadow small worthy ideas.
I read Lila the year it was published, 1991. I have a special fondness for this book because my girlfriend bought it for me; she is now my wife. Lila is not a romantic book, and I don’t mean in the classic-romantic sense of Pirsig’s first famous book. I re-read Lila this year. Philosophy aside, I cringe at Pirsig’s portrayal of his central female character, Lila. She is a stereotype, a dumb blonde, operating only on the level of biology and sexuality, the subject of men’s debates about quality. Pirsig is more philosopher than storyteller.
We cannot escape that many of the good things we love about language are essentially hierarchical. Narrative is linear: a beginning, middle, and end. Order shapes the story. Hierarchy gives a bird’s eye view, a table of contents, a summary that allows a reader to consider a work as a whole. For the reader’s evaluation of a book, or for choosing to only enter a work at a particular door, the table provides a map. Hierarchy is a tree, a trunk on which the reader can climb, and branches on which the reader can swing.
Granted, a hierarchy is just one view, an author’s take on how the work should be understood. There is merit in deconstructing the author’s take and analyzing the work in other ways. It is static hierarchy that is the problem.
Many writers are inspired to start a project with a vision of the whole, a view of how all the pieces hang together, as if only keystrokes were needed to fill in the details. The writer gets busy, happily tossing content into categories. Inevitably new material is acquired and new thinking takes place. Sooner or later a crisis occurs — the new ideas do not fit the original view. Either the writer does the necessary work to uproot the original categories and build a new better view, or the work will flounder. Again, it is static hierarchy that is the problem.
We must find ways to tear down hierarchy almost as quickly as we build it up. Pirsig’s metaphysics is all about the tension between static and dynamic quality. My writing technology, Lila, named after Pirsig’s book, uses word qualities to compute hierarchy. What word qualities measure hierarchy? I have several ideas. I propose that passages with abstract words are higher order than those with more concrete words. Closer to Pirsig’s view, passages that are dynamic — measured by agency, activity, and heat — are higher order than those that are static. Or does cool clear static logic trump heated emotion? There are several ways to measure it, and plenty of issues to work out. It will take more posts.
Join us for a two-day hackathon during DPLAfest 2015 (Indianapolis, April 17-18) to collaborate with members of the DPLA community and build something awesome with our API. A hackathon is a concentrated period of time for creative people to come together and make something new. In their excellent hackathon planning guide, DPLA community reps Chad Nelson and Nabil Kashyap described a hackathon as “an alternative space–outside of day-to-day assignments, project management procedures, and decision-making processes–to think differently about a problem, a tool, a dataset, or even an institution.”
The hackathon at DPLAfest 2015 will provide a space for people to build off the DPLA API, which provides access to almost 9 million (and counting!) CC0 licensed metadata records from America’s libraries, archives, and museums in a common metadata format. We support this open API so that the world can access our common cultural heritage, and use it to build something transformative. Our ever-growing app library has examples of innovative projects that have been built using the API. Many people have also contributed ideas for apps and tools – perhaps someone at the hackathon will take one on!
Coders of all levels – from beginning to advanced – are welcome at the hackathon. During the first hour on Friday, we will cover API basics, the capabilities of the DPLA API, available toolsets, and tips for using records from the API effectively. After that, there will be ample opportunity to teach and learn from one another as we build our apps. As always, you can find helpful documentation on our website, such as the API codex and the glossary of terms.
Non-programmers are also welcome. Whatever your expertise – design, metadata, business development – you can help generate ideas and create prototypes. The only requirements for participation are curiosity and a desire to collaborate.
The hackathon is Friday, April 17, 1:30pm-4:00pm, and Saturday, April 18, 10:30am-3:00pm (with a break for lunch). It culminates with a Developer Showcase on Saturday at 3:15pm. Visit the full schedule to find out more about what’s happening at DPLAfest 2015. Registration is still open!
Last updated: April 1, 2015
- What Information Do We Collect?
- How Is This Information Used?
- What Security Measures Are Used?
- User-Generated Content Forums
- Other Information
WHAT INFORMATION DO WE COLLECT?Information You Provide to UsWe will request information from you if you establish a personal profile to gain access to certain content or services, if you ask to be notified by e-mail about online content, or if you participate in surveys we conduct. This requires the input of personal information and preferences that may include, but is not limited to, details such as your name, address (postal and e-mail), telephone number, or demographic information. You can't use secure communications to give us this information, so you should consider anything you tell us to be public information. If you request paid content from NEJM.org, including subscriptions, we will also ask for payment information such as credit card type and number. Our payment providers won't actually let us see your credit card number, because there are federal regulations and such.
We may also use clear gifs which are tiny graphics with unique identifiers that function similarly to cookies to help us to track site activity. We do not use these to collect personally identifying information, because that's impossible. We also do not use clear gifs to shovel snow, even though we've had a whole mess of it. Oh and by the way, some of our partners have used "flash cookies", which you can't delete. And maybe even "canvas fingerprints". But they pay us money or give us services, so we don't want to interfere.
HOW IS THIS INFORMATION USED?
Information that you provide to us will be used to process, fulfill, and deliver your requests for content and services. We may send you information about our products and services, unless you have indicated you do not wish to receive further information.
We may report aggregate information about usage to third parties, including our service vendors and advertisers. These advertisers may include your competitors, so be careful. For additional information, please also see our Internet Advertising Policy. We may also disclose personal and demographic information about your use of NEJM.org and our digital applications to the countless companies and individuals we engage to perform functions on our behalf. Examples may include hosting our Web servers, analyzing data, and providing marketing assistance. These companies and individuals are obligated to maintain your personal information as confidential and may have access to your personal information only as necessary to perform their requested function on our behalf, which is usually to earn us more money, except as detailed in their respective privacy policies. So of course, these companies may sell the data collected in the course of your interaction with us.
WHAT SECURITY MEASURES ARE USED?
When you submit personal information via NEJM.org or our digital applications, your information is protected both online and offline with what we believe to be appropriate physical, electronic, and managerial procedures to safeguard and secure the information we collect. For information submitted via NEJM.org, we use the latest Secure Socket Layer (SSL) technology to encrypt your credit card and personal information. But other information is totally up for grabs.
USER-GENERATED CONTENT FORUMS
Any data or personal information that you submit to us as user-generated content becomes public and may be used by MMS in connection with NEJM.org, our digital applications, and other MMS publications in any and all media. For more information, see our User-Generated Content Guidelines. We'll have the right to publish your name and location worldwide forever if you do so, and we can sue you if you try to use a pseudonym.
Do Not Track SignalsLike most web services, at this time we do not alter our behavior or change our services in response to do not track signals. In other words, our website tracks you, even if you use technical means to tell us you do not want us to track you.
Compliance with Legal ProcessWe may disclose personally identifying information if we are required to do so by law or we in good faith believe that such action is necessary to (1) comply with the law or legal process; (2) protect our rights and property; (3) protect against misuse or the unauthorized use of our Web site; or (4) protect the personal safety or property of our users or the public. So, for example, if you are involved in a divorce proceeding, we can help your spouse verify that you weren't staying late at your office reading up on the latest research like you said you were.
Children NEJM.org is not intended for children under 13 years of age. We do not knowingly collect or store any personal information from children under 13. If we did not have this disclaimer, our lawyer would not let us do things we want to do. If you are under 13, we're really impressed, you should spend more time outside getting fresh air.
This is the fifth post in a series of posts related to metadata edit events for the UNT Libraries’ Digital Collections from January 1, 2014 to December 31, 2014. If you are interested in the previous posts in this series, they talked about the when, what, who, and first steps of duration.
In this post we are going to try and come up with the “average” amount of time spent on metadata edits in the dataset.
The first thing I wanted to do was to figure out which of the values mentioned in the previous post about duration buckets I could ignore as noise in the dataset.
As a reminder the duration data for metadata edit events is started when a user opens a metadata record in the edit system, and finished when they submit the record back to the system as a publish event. The duration is the difference in seconds of those two time timestamps.
There are a number of factors that can cause the duration data to vary wildly, a user can have a number of tabs open at the same time while only working on one of them. They may open a record and then walk off without editing that record. They could also be using a browser automation tool like Selenium that automates the metadata edits and therefore pushes the edit time down considerably.
In doing some tests of my own editing skills it isn’t unreasonable to have edits that are four or five seconds in duration if you are going in to change a known value from a simple dropdown. For example adding a language code to a photograph that you know should be “no-language” doesn’t take much time at all.
My gut feeling based on the data in the previous post was to say that edits that have a duration of over one hour should be considered outliers. This would remove 844 events from the total 94,222 edit events leaving me 93,378 (99%) of the events. This seemed like a logical first step but I was curious if there were other ways of approaching this.
I had a chat with the UNT Libraries’ Director of Research & Assessment Jesse Hamner and he suggested a few methods for me to look at.
IQR for calculating outliers
I took a stab at using the Interquartile Range of the dataset as the basis for identifying the outliers. With a little bit of R I was able to find the following information about the duration dataset.Min. : 2.0 1st Qu.: 29.0 Median : 97.0 Mean : 363.8 3rd Qu.: 300.0 Max. :431644.0
With that I have Q1 of 29 and a Q3 of 300, this gives me an IQR of 271.
So the range for outliers is Q1–1.5 × IQR for the low end and Q3+1.5 × IQR on the high end.
With the numbers that says that values under -377.5 or over 706.5 should be considered outliers.
Note: I’m pretty sure there are some different ways of dealing IQR and datasets that end at Zero so that’s something to investigate.
For me the key here is that I’ve come up with 706.5 seconds being the ceiling for a valid event duration based on this method. Thats 11 minutes and 47 seconds. If I limit the dataset to edit events that are under 707 seconds I am left with 83,239 records. That is now just 88% of the dataset with 12% being considered an outlier. I thought this seemed to be too many records to ignore so after talking with my resident expert in the library I had a new method.Two Standard Deviations
I took a look at what the timings would look look like if i based my outliers on the standard deviations. Edit events that are under 1,300 seconds (21 min 40 sec) in duration amount to 89,547 which is 95% of the values in the dataset. I also wanted to see what 2.5% of the dataset would look like. Edit durations under 2,100 seconds (35 minutes) result in 91,916 usable edit events for calculations which is right at 97.6%.Comparing the methods
The following table takes the four duration ceilings that I tried. (IQR, 95 and 97.5, and gut feeling one hour) and makes them a bit more readable. The total number of duration events in the dataset before limiting is 94,222.Duration Ceiling Events Remaining Events Removed % remaining 707 83,239 10,983 88% 1,300 89,547 4,675 95% 2,100 91,916 2,306 97.6% 3,600 93,378 844 99%
Just for kicks I calculated the average time spent on editing records across the datasets that remained for the various cutoffs to get an idea how the ceilings changed things.Duration Ceiling Events Included Events Ignored Mean Stddev Sum Average Edit Duration Total Edit Hours 707 83,239 10,983 140.03 160.31 11,656,340 2:20 3,238 1,300 89,547 4,675 196.47 260.44 17,593,387 3:16 4,887 2,100 91,916 2,306 233.54 345.48 21,466,240 3:54 5,963 3,600 93,378 844 272.44 464.25 25,440,348 4:32 7,067 431,644 94,222 0 363.76 2311.13 34,274,434 6.04 9,521
In the table above you can see how the different duration ceilings do to the data analyzed. I calculated the mean of the various datasets, and their standard deviations (really Solr statsComponent did that). I converted those Means into minutes and seconds in the “Average Edit Duration” column and the final column is the number of person hours that were spent editing metadata in 2014 based on the various datasets.
In going forward I will be using 2,100 seconds as my duration ceiling and ignoring the edit events that took longer than that period of time. I’m going to do a little work in figuring out the costs associated with metadata creation in our collections for the last year. So check back for the next post in this series.
As always feel free to contact me via Twitter if you have questions or comments.
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
CrossRef International Workshop, April 29, Shanghai, China - Ed Pentz and Pippa Smart presenting.
CSE 2015 Annual Meeting, May 15-18, Philadelphia, PA - Rachael Lammey and Chuck Koscher presenting.
MLA '15 "Librarians Without Limits", May 15-20, Austin, TX. Exhibiting at booth number 234.
2015 SSP 37th Annual Meeting, May 27-29, Arlington, VA. Exhibiting at table 6.
CrossRef International Workshop, June 11, Vilnius, Lithuania - Ed Pentz and Pippa Smart presenting.
PKP Scholarly Publishing Conference 2015, August 11-14, Vancouver, BC - Karl Ward attending.
ISMTE 8th Annual North American Conference, August 20-21, Baltimore, MD - Rachael Lammey presenting.
ALPSP Conference, 9-11 September, London, UK - CrossRef staff attending.