New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Metadata is basis of the work of DPLA. We rely on a growing network of Content Hubs, large repositories of digital content, and Service Hubs that aggregate metadata from partners. We, in turn, aggregate the Hubs’ metadata into the DPLA datastore.
With new Hubs, we often work together to identify organizational and governance structures that make the most sense for their local situation. Once an administrative model is established, the practical matter of how to aggregate their partners’ metadata and how to deal with deal with quality control over the resulting aggregated set plays a larger role.
DPLA’s Hub network does not rely on a single metadata aggregation workflow or tool, and own aggregation practices are quite a bit different from our partners’. While diversity in approaches is good in that each Hub can create a process that works best for them, it also means that our community hasn’t decided on a set of standard practices or tools.
We’ve recently implemented an application process for new Hubs, so it seems timely to start a conversation about metadata aggregation practices among our current and potential Hubs, their partners, and really, anyone else interested in sharing and enhancing metadata. It seems that there’s always something to learn about metadata aggregation, and we’re hopeful that DPLA can be a conduit for a discussion about some of the fundamental concepts and requirements for local practice and aggregation at scale.
To that end, on January 22, at 2 pm eastern, we will be hosting a webinar about metadata aggregation. We’ll be taking an inside look at aggregation best practices at two of our DPLA Service Hubs in North Carolina and South Carolina. In addition, DPLA has been working on improving our existing tools as well as creating some new ones for metadata aggregation and quality control.We’d like to share what’s in place and preview some of our plans and we hope to get feedback on future directions.
- Lisa Gregory and Stephanie Williams of the North Carolina Digital Heritage Center
- Heather Gilbert and Tyler Mobley of the South Carolina Digital Library
- Gretchen Gueguen of DPLA
This webinar will be offered to the public. Since we’ll be limited to 100 seats, please limit registration to no more than two seats per organization. Please get in touch with Gretchen with any questions.
Hydra Partners Indiana University and WGBH Boston have jointly been awarded nearly $400,000 by the National Endowment for the Humanities to develop HydraDAM2, a Hydra-based software tool that will assist in the long-term preservation of audio and video collections.
HydraDAM2 will primarily address challenges posed by long-term preservation of digital audio and video files. Because these “time-based media” files are significantly larger than many other digital files managed by libraries and archives, they potentially require special solutions and workflows.
An important feature of HydraDAM2 is that it will be open source and can be used and shared freely among cultural institutions, including libraries, archives, universities and public broadcasters.
HydraDAM2 is also scalable to both small and large organizations, having the ability to interact with massive digital storage systems as well as with smaller digital tape storage systems.
The full press release can be found at http://news.indiana.edu/releases/iu/2014/12/neh-grants-digital-preservation.shtml
Congratulations to Jon Dunn at IU, Karen Cariani at WGBH and to their teams.
In her inaugural column for American Libraries, titled “Advocate. Today.,” American Library Association (ALA) President Courtney Young challenged librarians of all types, and friends of libraries, to commit to spending just an hour a week advocating for libraries. To take the mystery out of just what “advocacy” means, how to do it and how to have fun along the way, ALA’s Offices of Intellectual Freedom (OIF), Library Advocacy (OLA), Public Information and the Washington Office will partner with all ALA divisions to present “An Hour a Week: Library Advocacy is Easy!!!” during the 2015 ALA Midwinter Meeting in Chicago. The program is being cosponsored and will be co-promoted by all of ALA’s twelve divisions.
The session, which will be held on Saturday, January 31, 2015, from 10:30–11:30 a.m., will be led by the ever-popular “Advocacy Guru,” Stephanie Vance, who will walk “newbies” and “old pros” alike through just what advocacy means today–from engaging with the local PTAs and library boards to lobbying the White House. With the help of panelists from OIF and OLA, Vance will share easy advocacy strategies and lead a lightening tour of the many terrific ALA advocacy resources available to give “ALAdvocates” everything they need to answer Courtney’s call.
For fully half the program, Vance also will “dive” into the audience to “get the story” of what works in the real world from real librarians and library supporters from all parts of the profession. These everyday advocates will share their own hands-on experiences at bringing libraries’ messages to allies, policy-makers and elected officials at every level and explain why every librarian’s (and library lover’s) involvement in advocacy is so critical to the future of libraries of every kind everywhere.
Additional speakers include Marci Merola, director of the ALA Office for Library Advocacy and Barbara Jones, director of the ALA Office for Intellectual Freedom.
The post Key ALA Offices to team up with “Advocacy Guru” at 2015 Midwinter Meeting appeared first on District Dispatch.
Conferences, meetings and meet-ups are important networking and collaboration events that allow librarians and archivists to share digital stewardship experiences. While national conferences and meetings offer strong professional development opportunities, regional and local meetings offer opportunities for practitioners to connect and network with a local community of practice. In a previous blog post, Kim Schroeder, a lecturer at the Wayne State University School of Library and Information Science, shared her experiences planning and holding Regional Digital Preservation Practitioners (RDPP) in Detroit. In this post, part of our Insights Interview series, I’m excited to talk to Ed Busch, Electronic Records Archivist at Michigan State University, about his experiences spearheading the Mid-Michigan Digital Practitioners Group.
Erin: Please tell us a little bit about yourself and what you do at Michigan State University.
Ed: I come from what I suspect is a unique background for an archivist. I have an undergraduate B.S. in Fisheries from Humboldt State University in California, took coursework in programming (BASIC, FORTAN, APL), worked as a computer operator (loading punch cards and hanging tapes), performed software testing as well as requirements writing, and was a stay-at home dad for a period of time.
It was during this period, that I looked into librarianship; I thought I could bring an IT background along with my love of history and genealogy to the field. After I completed my MLIS and Archives Administration certificate at Wayne State in 2007, I began a processing archivist position at the MSU Archives that lead to my current position as the Electronic Records Archivist.
As an archivist here, I work on a lot of different projects. This includes “digital projects” such as web crawling (via Archive-It), adding content to our website, managing our Archivists’ Toolkit installation, managing a couple of listservs (Michigan Archival Association and Big10 Archivists), working on our Trusted Digital Repository workflows and identifying useful tools to aid processing digital records. I also continue to do some paper processing, manage our Civil War Letters and Diaries Digitization project and the development of an AV Digitization Lab at the archives. I’m also the first person staff consults for PC or network issues at the archives.
Erin: How are you involved in digital preservation at your organization?
Ed: I supported my fellow electronic records archivist Lisa Schmidt on a NHPRC grant to create the Spartan Archive, a repository for Michigan State University’s born-digital institutional administrative records. For the grant, we focused on MSU’s Office of the Registrar digital records.
As a follow-on to the grant we are working on creating a Trusted Digital Repository for MSU. We are currently ingesting digital records using Archivematica into a preservation environment. Lisa and an intern do most of the actual ingesting while I provide technical advice, create workflows for unique items and identify useful tools. We are also evaluating applications that can help manage our digital assets and to provide access to them.
One area that has been on the “To Do list” is processing the digital assets from our university photographers and videographers. The challenges include selecting what to keep and what not, how to provide access and how to fund the storage for this large amount of data. I’ve also explored some facial recognition applications but haven’t found a good way to integrate into our TDR yet.
I’m also the person doing all the web archiving for the University and testing out migrating ArchivesSpace so that we can schedule a transition for it. Besides the Mid-Michigan Digital Practitioners (MMDP) meeting planning, I also attend meetings of Web Developers here at MSU (WebDev CAFE) and am a volunteer on the ArchivesSpace Technical Advisory Council.
Erin: Could you talk about Mid-Michigan Digital Practitioners Group. You have had some very successful regional meetings over the past couple of years. Can you tell us more about these?
Ed: In February of 2013, I heard about a new group for Digital Preservation Practitioners in the Detroit/Ann Arbor/Toledo/Windsor Area. I recall thinking that this sounds neat and wanted to explore if there was interest in holding a session for Mid-Michigan Digital Preservation Practitioners with the purpose to get together and talk about what the various institutions are doing: projects, technologies, partners, etc.
After contacting some of my colleagues about this, the answer was a resounding yes! Portia Vescio (Assistant Director of the Archives) and myself contacted Digital Curation Librarian Aaron Collie and we created Mid-Michigan Digital Practitioners. Systems Librarian Ranti Junus joined the three of us to form the Mid-Michigan Digital Practitioners planning group. We’ve had great support from the MSU Archives and MSU Libraries leadership for this effort.
We held our first meeting at MSU in August of 2013. From the beginning, we’ve been big on using email and surveys to get ideas and help from the Mid-Michigan professionals working with digital materials. For this first meeting, we came up with a rough agenda and started soliciting presenters to talk about what they were working on. We also communicated with Kim and Lance [Stuchell]’s group to keep them in the loop. There was some concern that there were two groups but we really wanted to serve the needs of the Mid-Michigan area. Many smaller shops don’t have the resources to go far. At that first meeting, we had over 50 attendees from around 15 different institutions. What most people kept saying they liked best was the chance to talk to other people trying to solve the same problems.
We held the second meeting in March 2014 at Grand Valley State University with over 50 attendees from 24 different institutions. We repeated the process and held the third meeting at Central Michigan University this past September with 50 attendees from over 20 institutions.
We’re now just starting the planning for the 4th meeting for March 27, 2015 at the University of Michigan in Ann Arbor. We have high hopes for a great meeting and hopefully some student involvement from the U of Michigan School of Information and Wayne State University School of Library Science. We’ve also setup a listserv (email@example.com) to aid communication.
Erin: What did you feel was most successful about your meetings?
Ed: I think what’s been most successful is creating a chance for archivists, librarians and museum curators from all types and sizes of institutions to share experiences, what’s worked, what hasn’t, nifty tools, cool projects, etc. about their digital materials. Feedback from the meetings has this as the thing most people liked best. We also really do try to use the feedback we get to improve each meeting, try out new things and talk about what people are interested in learning more about.
Erin: What kind of impact do you think these meetings have had on your community and the organizations in your region?
Ed: I think our greatest contribution to the region has been creating a place for professionals from large and small institutions to see what’s happening in the area of digital materials and to share experiences. Digital materials have the same issues/problems/situations for all of us; the main difference being what resources we can use to deal with them. By providing a forum for people to meet, hopefully everyone can get ideas to take back with them and to have information they can share with their leadership on the importance of this work.
Erin: What one piece of advice would you offer others who may be interested in starting up a regional practitioners group?
Ed: One thing that I believe has made our group able to keep going is that the core planning group is all located at MSU. We can meet every few weeks to work on the next meeting, assign tasks and share information with the host institution. Saying that, for the next MMDP meeting, we are expanding our planning group to include a few other people to call in to the planning meetings. We’ll see how that works and regroup if needed or possibly add some more. Flexibility is important.
I do sincerely believe though that what really makes a difference is the interest and commitment of the planning team and its leadership at the Archives and Libraries to keep this going even though we each have a lot on our plates. We feel this is vital to the community of archivists, librarians and curators in the area.
Today VCU Libraries launched a couple of new web tools that should make it easier for people to find or discover our library’s databases and research guides.
This project’s goal was to help connect “hunters” to known databases and help “gatherers” explore new topic areas in databases and research guides1. Our web redesign task force identified these issues in 2012 user research.1. New look for the databases list
Since the dawn of library-web time, visitors to our databases landing page were presented with an A to Z list of hundreds of databases with a list of subject categories tucked away in the sidebar.
For the hunters:
- Search by title with autocomplete (new functionality)
- A to Z links
For the gatherers:
- Popular databases (new functionality)
- Databases by subject
Building on the search feature in the new database list, we created an AJAX Google Adwords-esque add-on to our search engine (Ex Libris’ Primo) that recommends databases or research guides results based on the search query. For longer, more complex queries, no suggestions are shown.
Try these queries:
Included in the suggested results:
- Database titles and descriptions, which are being indexed in the VCU Libraries search engine
- Subject guide and How do I… guide titles using the LibGuides 1.0 API
To highlight the changes to the databases page, we also made some changes to how we are linking to it. Previously, our homepage search box linked to popular databases, the alphabet characters A through Z, our subject list, and “all”.
The intent of the new design is to surface the new databases list landing page and wean users off the A-Z interaction pattern in lieu of search.
The top three databases are still on the list both for easy access and to provide “information scent” to clue beginner researchers in on what a database might be.
Dropping the A-Z links will require advanced researchers to make a change in their interaction patterns, but it could also mean that they’re able to get to their favorite databases more easily (and possibly unearth new databases they didn’t know about).Remaining questions/issues
- Research guides search is just okay. The results are helpful a majority of the time and wildly nonsensical the rest of the time. And, this search is slowing down the overall load time for suggested results. The jury is still out on whether we’ll keep this search around.
- Our database subject categories need work, and we need to figure out how research guides and database categories should relate to each other. They don’t connect right now.
- We don’t know if people will actually use the suggested search results and are not sure how to define success. We are tracking the number of clicks on these links using Google Analytics event tracking – but what’s good? How do we know to keep this system around?
- The change away from the A-Z link list will be disruptive for many and was not universally popular among our librarians. Ultimately it should be faster for “hunters”, but we will likely hear groans.
- The database title search doesn’t yet account for common and understandable misspellings2 of database names, which we hope to rectify in the future with alternate titles in the metadata.
Shariq Torres, our web engineer, provided the programming brawn behind this project, completely rearchitecting the database list in Slim/Ember and writing an AJAX frontend for the suggested results. Shariq worked with systems librarians Emily Owens and Tom McNulty to get a Dublin Core XML file of the databases indexed and searchable in Primo. Web designer Alison Tinker consulted on look and feel and responsified the design for smaller-screen devices. A slew of VCU librarians provided valuable feedback and QA testing.
- I believe this hunter-gatherer analogy for information-seeking behaviors came from Sandstrom’s An Optimal Foraging Approach to Information Seeking and Use (1994) and have heard it in multiple forms from smart librarians over the years.
- Great info from Ken Varnum’s Database Names are Hard to Learn (2014)
The Digital Public Library of America launched on April 18, 2013, less than two years ago. And what a couple of years it has been. From a staff of three people, a starting slate of two million items, and 500 contributing institutions, we are now an organization of 12, with over eight million items from 1,300 contributing institutions. We have materials from all 50 states—and from around the world—in a remarkable 400 languages. Within our collection are millions of books and photographs, maps of all shapes and sizes, material culture and works of art, the products of science and medicine, and rare documents, postcards, and media.
But focusing on these numbers and their growth, while gratifying and a signal DPLA is thriving, is perhaps less important than what the numbers represent. DPLA has always been a community effort, and that community, which became active in the planning phase to support the idea of a noncommercial effort to bring together American libraries, archives, and museums, and to make their content freely available to the world, has strengthened even more since 2013. A truly national network and digital platform is emerging, although we still have much to do. A strong commitment to providing open access to our shared cultural heritage, and a deeply collaborative spirit, is what drives us every day.
Looking back, 2013 was characterized by a start-up mode: hiring staff, getting the site and infrastructure live, and bringing on a first wave of states and collections. 2014 was a year in which we juggled so much: many new hubs, partners, and content, lining up additional future contributors, and beginning to restructure our technology behind the scenes to prepare for an even more expansive collection and network.
Beginning this year, and with the release of our strategic plan for the next three years, the Digital Public Library of America will hit its stride. We encourage you to read the plan to see what’s in store, but also to know that it will require your help and support; so much in the plan is community-driven, and will be done with that same emphasis on widespread and productive collaboration.
We will be systematically putting in place what will be needed to ensure that there’s an on-ramp to the DPLA for every collection in the United States, in every state. We call this “completing the map,” or making sure that we have a DPLA service hub available to every library, archive, museum, and cultural heritage site that wishes to get their materials online and described in such a way as to be broadly discoverable. We also plan to make special efforts around certain content types—areas where there are gaps in our collection, or where we feel DPLA can make a difference as an agent of discovery and serendipity.
We have already begun to make some major technical improvements that will make ingesting content and managing metadata even better. This initiative will accelerate and be shared with our network. Moreover, we will make a major effort in the coming years to make sure that our valuable unified collection reaches every classroom, civic institution, and audience, to educate, inform, and delight.
There’s a lot to do. We just put a big pot of coffee on. Please join us for this next phase of rapid growth and communal effort!
A question came up on ALA Think Tank:
What do you prefer: to click a link and it open in a new tab or for it to open in the same page? Is there a best practice?
There is. The best practice is to leave the default link behavior alone. Usually, this means that the link on a website will open in that same window or tab. Ideas about what links should do are taken for granted, and “best practices” that favor links opening new windows – well, aren’t exactly.
It’s worth taking a look at the original thread because I really hesitate to misrepresent it. I’m not bashing. Well-meaning Think Tankers were in favor of links opening new tabs. Below, I cherry-picked a few comments to communicate the gist:
- “Most marketing folks will tell you that If it is a link outside your website open in a new tab, that way they don’t lose your site. Within your own site then stay with the default.”
- “New tab because it’s likely that I want to keep the original page open. And, as [name redacted] mentions, you want to keep them on your site.”
- “External links open in new tabs.”
- “I choose to open in a new tab, so the user can easily return to the website in the original tab.”
- “I was taught in Web design to always go with a new tab. You don’t want to navigate people away from your site.”
- “I prefer a new tab.”
- “I prefer a new tab” – not a duplicate.
- “Marketers usually tell you new tab so people don’t move away from your page as fast.”
- “I like new tabs because then I don’t lose the original page.”
- “I prefer new tabs.”
- “I think best practice is to open links on a new tab.”
There were three themes that kept recurring:
I linked these up to a little tongue-in-cheek section at the bottom, but before we get squirrelly I want to make the case for linking within the same window.Links should open in the same window
Best-in-show user experience researchers Nielsen Norman Group write that “links that don’t behave as expected undermine users’ understanding of their own system,” where unexpected external linking is particularly hostile. See, one of the benefits of the browser itself is that it frees users “from the whims of particular web page or content designers.” For as varied and unique as sites can be, browsers bake-in consistency. Consistency is crucial.
Jakob’s Law of the Web User Experience
Users spend most of their time on other websites.
Design conventions are useful. The menu bar isn’t at the top of the website because that’s the most natural place for it; it’s at the top because that is where every other website put it. The conventions set by the sites that users spend the most time on–Facebook, Google, Amazon, Yahoo, and so on–are conventions users expect to be adopted everywhere.
[A] user-friendly and effective user interface places users in control of the application they are using. Users need to be able to rely on the consistency of the user interface and know that they won’t be distracted or disrupted during the interaction.
Users … may be search-navigators or link-clickers, but they all have additional mental systems in place that keep them aware of where they are on the site map. That is, if you put the proper markers in place. Without proper beacons to home in on, users will quickly become disoriented.
This is all to stress the point that violating conventions, such as the default behaviors of web browsers, is a big no-no. The default behavior of hyperlinks is that they open within the same page.
While not addressing this question directly, Kara Pernice–the managing director at Nielsen Norman Group–wrote last month about the importance of confirming the person’s expectation of what a link is and where the link goes. Breaking that promise actually endangers the trust and credibility of the brand – in this case, the library.Accessibility Concerns
Newer screen readers alert the user when a link opens a new window, though only after the user clicks on the link. Older screen readers do not alert the user at all. Sighted users can see the new window open, but users with cognitive disabilities may have difficulty interpreting what just happened.
Compatibility with WCAG 2.0 involves an “Understanding Guideline” which suggests that the website should “provide a warning before automatically opening a new window or tab.” Here is the technique.Exceptions
On Twitter, I said:
— Michael Schofield (@schoeyfield) January 6, 2015
Hugh Rundle, who you might know, pointed out a totally legit use case:
@schoeyfield I don’t disbelieve you, but I do find it difficult to comprehend. If I’m reading something I want to look at the refs later.
— Hugh Rundle (@HughRundle) January 6, 2015
Say you’re reading In the Library with the Lead Pipe where the articles can get pretty long, and you are interested in a bunch of links peppered throughout the content. You don’t want to be just midway through the text then jump to another site before you’re ready. Sometimes, having a link open in a new tab or window makes sense.
But hijacking default behavior isn’t a light decision. Chris Coyier shows how to use target attributes in hyperlinks to force link behavior, but gives you no less than six reasons why you shouldn’t. Consider this: deciding that such-and-such link should open in a new window actually eliminates navigation options.
If a link is just marked up without any frills, like <a href=http://link.com>, users’ assumed behavior of that link is that it will open in the same tab/window, but by either right-clicking, using a keyboard command, or lingering touch on a mobile device, the user can optionally open in it in a new window. When you add target=_blank to the mix, alternate options are mostly unavailable.
I think it’s a compelling use-case of opening reference links in new windows midway through long content, but it’s worth considering whether the inconvenience of default link behavior is greater than the interaction cost and otherwise downward drag on overall user experience.Uh, you said “exceptions” …
In my mind, I do think it is a good idea to use target=_blank when opening the link will interrupt an ongoing process:
- the user is filling out a form and needs to click on a link to review, say, terms of service
- the user is watching video or listening to audio
So, yeah, there are exceptions.So, is there a best practice?
The best practice is to leave the default link behavior alone. It is only appropriate to open a link in a new tab or window in the rarest use cases.Frequent Comments, or, Librarians Aren’t Their Users
Marketing folks say this sort of thing. They are the same people who demand carousels, require long forms, and make ads that look like regular content. Using this reasoning, opening a link in a new window isn’t just an antipattern, it is a dark pattern – a user interface designed to trick people.
Plus, poor user experiences negatively impact conversion rate and the bottom line. Tricks like the above are self-defeating.
No they don’t.
You are not your user.
to: The boss
cc: Another senior person, HR
As I discussed with you last week, I have accepted a position with UTS, starting Feb 9th 2015, and I resign my position with UWS. My last day will be Feb 6th 2015.
Dr PETER SEFTON Manager, eResearch, Office of the Deputy Vice-Chancellor (Research & Development) University of Western Sydney
What? eResearch Support Manager – more or less the same gig as I’ve had at UWS, in a tech-focussed uni with a bigger team, dedicated visualisation service and HPC staff, an actual budget and mostly within a few city blocks.
Why UTS? A few reasons.
There was a job going, I thought I’d see if they liked me. They did. I already knew some of the eResearch team there. I’m confident we will be good together.
It’s a continuing position, rather than the five-year, more-than-half-over contract I was on, not that I’m planning to settle down for the rest of my working life as an eResearch manager or anything.
But what about the travel!!!!? It will be 90 minutes laptop time each way on a comfy, reasonably cheap and fairly reliable train service with almost total mobile internet coverage, with a few hundred metres walking on either end. That’s a change from 35-90 minutes each way depending on what campus(es) I was heading for that day and the mode of transport, which unfortunately was mostly motor vehicle. I do not like adding yet another car to Sydney’s M4, M7 or M5, despite what I said in my UWS staff snapshot. I think I’ll be fine with the train. If not, oops. Anyway, there are inner-Sydney family members and mates I’ll see more of if only for lunch.
When the internets stop working the view is at its best. OK, apart from the tunnel and the cuttings.
What’s the dirt on UWS? It’s not about UWS, I’ve had a great time there, learned how to be an eResearch manager, worked on some exciting projects, made some friends, and I’ll be leaving behind an established, experienced eResearch team to continue the work. I’m sorry to be going. I’d work there again.
Why did you use this mode of announcement? I was inspired by Titus Brown, a few weeks a go.
Whoa! A batch of links in one day.
The sofa provides a space for a range of social interactions.
Librarian career spotlight. “Customer service is always my number one goal.”
Hilarious provisional additions to the Dewey Decimal System
A 34 foot tower of books about Abraham Lincoln lives at the Ford’s Theatre Center for Education and Leadership
Use your 3D printer to make an all-in-one paper airplane folder and launcher
Library of Congress: The Signal: Report Available for the 2014 DPOE Training Needs Assessment Survey
The following is a guest post by Barrie Howard, IT Project Manager at the Library of Congress.
In September, the Digital Preservation Outreach and Education (DPOE) program wrapped up the “2014 DPOE Training Needs Assessment Survey” in an effort to get a sense of current digital preservation practice, a better understanding about what capacity exists for organizations and professionals to effectively preserve digital content and some insight into their training needs. An executive summary (PDF) and full report (PDF) about the survey results are now available.
The respondents expressed an overwhelming concern for making their content accessible for at least a ten-year horizon, and showed strong support for educational opportunities, like the DPOE Train-the-Trainer Workshop, which provides training to working professionals, increasing organizational capacity to provide long-term access to digital content.
As mentioned in a previous blog post announcing the survey results, this survey was a follow-up to an earlier survey conducted in the summer and fall of 2010. The questions addressed issues such as the primary function of an organization (library, archive, museum, etc.), staff size and responsibilities, collection items, preferred training content and delivery options and financial support for professional development and training. There was good geographic coverage in the responses from organizations in 48 states, Washington D.C. and Puerto Rico, and none of the survey questions were skipped by any of the respondents. Overall, the distribution of responses was about the same from libraries, archives, museums and historical societies between 2010 and 2014, although there was a notable increase in participation from state governments.
The most significant takeaways from the 2014 survey are:
1) an overwhelming expression of concern that respondents ensure their digital content is accessible for ten or more years (84%);
2) evidence of a strong commitment to support employee training opportunities (83%, which is an increase from 66% reported in 2010), and;
3) similar results between 2010 and 2014. This trend will be of particular interest when the survey is conducted again in 2016.
Other important discoveries reveal changes in staff size and configuration over the last four years. There was a marked 6% decrease in staff size at smaller organizations (those with 1-50 employees), and a slight 2% drop in staff size at large organizations with over 500 employees. In comparison, medium-size organizations reported a 4% uptick in the staff range of 51-200, and 3% for the 201-500 tier. There was a substantial 13% increase across all organizations in paid full-time or part-time professional staff with practitioner experience, and a 5% drop in organizations reporting no staff at all. These findings suggest positive trends across the digital preservation community, which bodes well for the long-term preservation of our collective cultural heritage. Born-digital content wasn’t extant as a choice for the 2010 survey regarding content held by respondents, yet is a close second to reformatted materials. This will be another closely-monitored data point in 2016.
Regarding training needs, online delivery is trending upward across many sectors to meet the constraints of reduced travel and professional development budgets. However, results of the 2014 survey reveal respondents still value intimate, in-person workshops as one of their most preferred delivery options with webinars and self-paced, online courses as the next two choices. Respondents demonstrated a preference for training focused on applicable skills, rather than introductory material on basic concepts, and show a preference to travel off-site within a 100-mile radius for half- to full-day workshops over other options.
DPOE currently offers an in-person, train-the-trainer workshop, and is exploring options for extending the workshop Curriculum to include online delivery options for the training modules. These advancements will address some of the issues raised in the survey, and may include regularly scheduled webinars, on-demand videos, and pre- and post-workshop videos. Keep a watchful eye on the DPOE website and The Signal for subsequent DPOE training materials as they become available.
District Dispatch: Grab E-rate Order CliffsNotes and join PLA webinar to get a jumpstart on “New Year, New E-rate”
For those of us who did not take Marijke Visser’s advice for light holiday reading, ALA Office for Information Technology Policy Fellow Bob Bocher has a belated gift. Click here to get the library “CliffsNotes” of the E-rate Order adopted by the Federal Communications Commission (FCC) in December 2014.
This summary provides a high-level overview of the 76-page Order, focusing on four key changes:
1) Ensuring all libraries and schools have access to high-speed broadband connectivity.
2) Increasing the E-rate fund by $1.5 billion annually.
3) Taking actions to be reasonably certain all applications will be funded.
4) Correcting language in the July Order that defined many rural libraries and schools as “urban,” thus reducing their discounts.
This document book-ends nearly two years of ALA and library advocacy and joins a similar summary of the July 14 FCC E-rate Order. In addition to the summaries, we encourage you to go to the USAC website where there is a dedicated page for the most up to date information concerning the E-rate program.
Bob, Marijke and I also invite you to usher in a “New Year, New E-rate” with a free webinar from the Public Library Association, Thursday, January 8. We’ll review important changes in the program and discuss how libraries can take advantage of new opportunities in 2015 and 2016. The webinar is free, but registration is required, and space is limited! The archive will be made available online following the session.
Stay tuned. The #libraryerate conversation will continue at the ALA Midwinter Meeting in Chicago.
The post Grab E-rate Order CliffsNotes and join PLA webinar to get a jumpstart on “New Year, New E-rate” appeared first on District Dispatch.
The 7.x-1.5 Release Team will be working on the next release very soon, and you could be our very next release team member!
We are looking for members for all three release team roles.Release Team Roles
Documentation: Documentation will need to be updated for the next release. Any new components will also need to be documented. This is your chance to help the community improve the Islandora documentation while updating it for the new release! Volunteers will be provided with a login to the Islandora Confluence wiki and will work alongside the Islandora Documentation Interest Group to update the wiki in time for the new release.
Testers: All components with Jira issues set to 'Ready for Test' will need to be tested and verifying. Testers will also test basic functionality of their components, and audit README and LICENSE files.
Component Manager: Are responsible for the code base of their components.Time lines
- Code Freeze: Friday, February 27, 2015
- Release Candidate: Friday, March 6, 2015
- Release: Thursday April 30, 2015
If you are interested in being a member of the release, please let me know what role you are interested in, and which components you'd like to volunteer for. A preliminary list of components can be found here. If you have a questions about being a member of the release team, please reply here.
3D printers can do incredible things – from creating food, to rendering human organs, to building spare parts for the International Space Station. A small but growing number of libraries make 3D printers available as a library service. Library 3D printers may not be able to make you a pizza (yes, that’s possible) or operate in zero gravity, but they are being used to do some pretty amazing things in their own right. Library users are building functioning prosthetic limbs, creating product prototypes and making educational models for use in classwork.
While 3D printing technology is advancing at a meteoric pace, policymakers are just beginning to develop frameworks for its use. This presents the library community with an exciting opportunity—as providers of 3D printing services to the public, we can begin to shape the policy that coalesces around this technology in the years to come.
To advance this work, ALA’s Office for Information Technology Policy (OITP) today released “Progress in the Making: 3D Printing Policy Considerations through the Library Lens,” a report that examines numerous policy implications of 3D printing, including those related to intellectual property, intellectual freedom and product liability. The report seeks to provide library professionals with the knowledge they need to craft 3D printer user policies that minimize liability risks while encouraging users to innovate, learn and have fun.
The report states:
“As this technology continues to take off, library staff should continue to encourage patrons to harness it to provide innovative health care solutions, launch business ventures and engage in creative learning. In order to do so, library staff must have a clear understanding of basic 3D printer mechanics; the current and potential future uses of 3D printers inside and outside of library walls; and the economic and public policy considerations regarding 3D printing.”
ALA’s Office for Intellectual Freedom contributed a piece to the report entitled, “Intellectual Freedom and Library Values,” which offers guidance to library professionals seeking to craft a 3D printer acceptable use policy that accords with the fundamental library value of free expression. Additionally, Tomas A. Lipinski, dean and professor at University of Wisconsin—Milwaukee’s School of Information, provides a sample warning notice that libraries may use with patrons to demonstrate awareness of the legal issues involved in the use of 3D printing technologies in libraries.
The report was released as part of the OITP Perspectives series of short publications that discuss and analyze specialized policy topics. It is the second publication in ALA’s “Progress in the Making” series, an effort to elucidate the policy implications of 3D printing in the library context. The first document was a tip sheet jointly released by OITP, the Public Library Association and United for Libraries.
The post OITP releases report exploring policy implications of 3D printing appeared first on District Dispatch.
They suggest that "the quality of peer review may be declining" with "a growing tendency to rely on secondary measures", "difficult[y] for reviewers in standard fields to judge submissions from compound disciplines", "difficulty in finding reviewers who are qualified, neutral and objective in a fairly closed academic community", "increasing reliance ... placed on the prestige of publication rather than ... actual content", and that "the proliferation of journals has resulted in the possibility of getting almost anything published somewhere" thus diluting "peer-reviewed" as a brand.My prediction was:
The big problem will be a more advanced version of the problems currently plaguing blogs, such as spam, abusive behavior, and deliberate subversion. Since then, I've returned to the theme at intervals, pointing out that reviewers for top-ranked journals fail to perform even basic checks, that the peer-reviewed research on peer review shows that the value even top-ranked journals add is barely detectable, even before allowing for the value subtracted by their higher rate of retraction, and that any ranking system for journals is fundamentally counter-productive. As recently as 2013 Nature published a special issue on scientific publishing that refused to face these issues by failing to cite the relevant research. Ensuring relevant citation is supposed to be part of the value top-ranked journals add.
Recently, a series of incidents has made it harder for journals to ignore these problems. Below the fold, I look at some of them.
In November, Ivan Oransky at Retraction Watch reported that BioMed Central (owned by Springer) recently found about 50 papers in their editorial process whose reviewers were sock-puppets, part of a trend:
Journals have retracted more than 100 papers in the past two years for fake peer reviews, many of which were written by the authors themselves. Many of the sock-puppets were suggested by the authors themselves, functionality in the submission process that clearly indicates the publisher's lack of value-add. Nature published an overview of this vulnerability of peer review by Cat Ferguson, Adam Marcus and Oransky entitled Publishing: The peer-review scam that included jaw-dropping security lapses in major publisher's systems:
[Elsevier's] Editorial Manager's main issue is the way it manages passwords. When users forget their password, the system sends it to them by e-mail, in plain text. For PLOS ONE, it actually sends out a password, without prompting, whenever it asks a user to sign in, for example to review a new manuscript.In December, Oransky pointed to a study published in PNAS by Kyle Silera, Kirby Leeb and Lisa Bero entitled Measuring the effectiveness of scientific gatekeeping. They tracked 1008 manuscripts submitted to three elite medical journals:
Of the 808 eventually published articles in our dataset, our three focal journals rejected many highly cited manuscripts, including the 14 most popular; roughly the top 2 percent. Of those 14 articles, 12 were desk-rejected. This finding raises concerns regarding whether peer review is ill-suited to recognize and gestate the most impactful ideas and research.Desk-rejected papers never even made it to review by peers. Its fair to say that Silera et al conclude:
Despite this finding, results show that in our case studies, on the whole, there was value added in peer review.These were elite journals, so a small net positive value add matches earlier research. But again, the fact that it was difficult to impossible for important, ground-breaking results to receive timely publication in elite journals is actually subtracting value. And, as Oransky says:
Perhaps next up, the authors will look at why so many “breakthrough” papers are still published in top journals — only to be retracted. As Retraction Watch readers may recall, high-impact journals tend to have more retractions.Also in December, via Yves Smith, I found Scholarly Mad Libs and Peer-less Reviews in which Marjorie Lazoff comments on the important article For Sale: “Your Name Here” in a Prestigious Science Journal from December's Scientific American (owned by Nature Publishing). In it Charles Seife investigates sites such as:
MedChina, which offers dozens of scientific "topics for sale" and scientific journal "article transfer" agreements. Among other services, these sites offer "authorship for pay" on articles already accepted by journals. He also found suspicious similarities in wording among papers, including:
"Begger's funnel plot" gets dozens of hits, all from China.“Beggers funnel plot” is particularly revealing. There is no such thing as a Beggers funnel plot. ... "It's difficult to imagine that 28 people independently would invent the name of a statistical test,"Some of the similarities may be due to authors with limited English using earlier papers as templates when reporting valid research, but some such as the Begger's funnel plot papers are likely the result of "mad libs" style fraud. And Lazoff points out they likely used sockpuppet reviewers:
Last month, Retraction Watch published an article describing a known and partially-related problem: fake peer reviews, in this case involving 50 BioMed Central papers. In the above-described article, Seife referred to this BioMed Central discovery; he was able to examine 6 of these titles and found that all were from Chinese authors, and shared style and subject matter to other “paper mill-written” meta-analyses. Lazoff concludes:
Research fraud is particularly destructive given traditional publishing’s ongoing struggle to survive the transformational Electronic Age; the pervasive if not perverse marketing of pharma, medical device companies, and self-promoting individuals and institutions using “unbiased” research; and today’s bizarrely anti-science culture. but goes on to say:
Without ongoing attention and support from the entire medical and science communities, we risk the progressive erosion of our essential, venerable research database, until it finally becomes too contaminated for even our most talented editors to heal.I'm much less optimistic. These recent examples, while egregious, are merely a continuation of a trend publishers themselves started many years ago of stretching the "peer reviewed" brand by proliferating journals. If your role is to act as a gatekeeper for the literature database, you better be good at being a gatekeeper. Opening the gate so wide that anything can get published somewhere is not being a good gatekeeper.
The fact that even major publishers like Nature Publishing are finally facing up to problems with their method of publishing that the scholars who research such methods have been pointing out for more than seven years might be seen as hopeful. But even if their elite journals could improve their ability to gatekeep, the fundamental problem remains. An environment where anything will get published, the only question is where (and the answer is often in lower-ranked journals from the same publishers), renders even good gatekeeping futile. What is needed is better mechanisms for sorting the sheep from the goats after the animals are published. Two key parts of such mechanisms will be annotations, and reputation systems.
It’s the start of the new year, which, as many of my readers know, marks another Public Domain Day, when a year’s worth of creative work becomes free for anyone to use in many countries.
In countries where copyrights have been extended to life plus 70 years, works by people like Piet Mondrian, Edith Durham, Glenn Miller, and Ethel Lina White enter the public domain. In countries that have resisted ongoing efforts to extend copyrights past life + 50 years, 2015 sees works by people like Flannery O’Connor, E. J. Pratt, Ian Fleming, Rachel Carson, and T. H. White enter the public domain. And in the US, once again no published works enter the public domain due to an ongoing freeze in copyright expirations (though some well-known works might have if we still had the copyright laws in effect when they were created.)
But we’re actually getting something new worth noting this year. Today we’re seeing scholarship-quality transcriptions of tens of thousands of early English books — the EEBO Text Creation Partnership Phase I texts – become available free of charge to the general public for the first time. (As I write this, the books aren’t accessible yet, but I expect they will be once the folks in the project come back to work from the holiday.) (Update: It looks like files and links are now on Github; hopefully more user-friendly access points are in the works as well.)
This isn’t a new addition to the public domain; the books being transcribed have been in the public domain for some time. But it’s the first time many of them are generally available in a form that’s easily searchable and isn’t riddled with OCR errors. For the rarer works, it’s the first time they’re available freely across the world in any form. It’s important to recognize this milestone as well, because taking advantage of the public domain requires not just copyrights expiring or being waived, but also people dedicated to making the public domain available to the public.
And that is where we who work in institutions dedicated to learning, knowledge, and memory have unique opportunities and responsibilities. Libraries, galleries, archives, and museums have collected and preserved much of the cultural heritage that is now in the public domain, and that is often not findable– and generally not shareable– anywhere else. That heritage becomes much more useful and valuable when we share it freely with the whole world online than when we only give access to people who can get to our physical collections, or who can pay the fees and tolerate the usage restrictions of restricted digitized collections.
So whether or not we’re getting new works in the public domain this year, we have a lot of work to do this year, and the years to follow, in making that work available to the world. Wherever and whenever possible, those of us whose mission focuses more on knowledge than commerce should commit to having that work be as openly accessible as possible, as soon as possible.
That doesn’t mean we shouldn’t work with the commercial sector, or respect their interests as well. After all, we wouldn’t have seen nearly so many books become readable online in the early years of this century if it weren’t for companies like Google, Microsoft, and ProQuest digitizing them at much larger scale than libraries had previously done on their own. As commercial firms, they’re naturally looking to make some money by doing so. But they need us as much as we need them to digitize the materials we hold, so we have the power and duty to ensure that when we work with them, our agreements fulfill our missions to spread knowledge widely as well as their missions to earn a profit.
We’ve done better at this in some cases than in others. I’m happy that many of the libraries who partnered with Google in their book scanning program retained the rights to preserve those scans themselves and make them available to the world in HathiTrust. (Though it’d be nice if the Google-imposed restrictions on full-book downloads from there eventually expired.) I’m happy that libraries who made deals with ProQuest in the 1990s to digitize old English books that no one else was then digitizing had the foresight to secure the right to make transcriptions of those books freely available to the world today. I’m less happy that there’s no definite release date yet for some of the other books in the collection (the ones in Phase II, where the 5-year timer for public release doesn’t count down until that phase’s as-yet-unclear completion date), and that there appears to be no plan to make the page images freely available.
Working together, we in knowledge institutions can get around the more onerous commercial restrictions put on the public domain. I have no issue with firms that make a reasonable profit by adding value– if, for instance, Melville House can quickly sell lots of printed and digitally transcribed copies of the US Senate Torture report for under $20, more power to them. People who want to pay for the convenience of those editions can do so, and free public domain copies from the Senate remains available for those who want to read and repurpose them.
But when I hear about firms like Taylor and Francis charging as much as $48 to nonsubscribers to download a 19th century public domain article from their website for the Philosophical Magazine, I’m going to be much more inclined to take the time to promote free alternatives scanned by others. And we can make similar bypasses of not-for-profit gatekeepers when necessary. I sympathize with Canadian institutions having to deal with drastic funding cuts, which seem to have prompted Early Canadiana Online to put many of their previously freely available digitized books behind paywalls– but I still switched my links as soon as I could to free copies of most of the same books posted at the Internet Archive. (I expect that increasing numbers of free page scans of the titles represented in Early English Books Online will show up there and elsewhere over time as well, from independent scanning projects if not from ProQuest.)
Assuming we can hold off further extensions to copyright (which, as I noted last year, is a battle we need to show up for now), four years from now we’ll finally have more publication copyrights expiring into the public domain in the US. But there’s a lot of work we in learning and memory institutions can do now in making our public domain works available to the world. For that matter, there’s a lot we can do in making the many copyrighted works we create available to the world in free and open forms. We saw a lot of progress in that respect in 2014: Scholars and funders are increasingly shifting from closed-access to open-access publication strategies. A coalition of libraries has successfully crowdfunded open-access academic monographs for less cost to them than for similar closed-access print books. And a growing number of academic authors and nonprofit publishers are making open access versions of their works, particularly older works, freely available to world while still sustaining themselves. Today, for instance, I’ll be starting to list on The Online Books Page free copies of books that Ohio State University Press published in 2009, now that a 5-year-limited paywall has expired on those titles. And, as usual, I’m also dedicating a year’s worth of 15-year-old copyrights I control (in this case, for work I made public in 2000) to the public domain today, since the 14-year initial copyright term that the founders of the United States first established is plenty long for most of what I do.
As we celebrate Public Domain Day today, let’s look to the works that we ourselves oversee, and resolve to bring down enclosures and provide access to as much of that work as we can.
In my last post, I talked about some of the advantages of and potential problems with using Agile as your development philosophy. Today I’d like to build on that topic by talking about the fundamental principles that guide Agile development. There are four, each seemingly described as a choice between two competing priorities:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
In reality, the core values should not be taken as “do this, NOT that” statements, but rather as reminders that help the team prioritize the activities and attitudes that create the most value.
1. Individuals and interactions over processes and tools
The first core value is my favorite one: start with the right people, then build your processes and select your tools to best fit them, rather than the other way around. A good development team will build good software; how they build it is a secondary concern, albeit still a valid one: just because your star engineer likes to code in original Fortran, it doesn’t mean you should fill a couple of rooms with IBM-704s. Choosing the right tool is important, and will improve your team’s ability to produce quality software, as well as team recruitment.
Still, it’s the people that matter, and in particular their interactions with each other and with other parts of the organization. The key to building great software is teamwork. Individual skill plays a role, but without open communication and commitment to the team’s goals, the end product may look great, but it will likely not fulfill the original customer need, or it will do so in an inefficient manner. Agile’s daily standup meetings and end-of-iteration evaluations are a way to encourage the team to communicate freely and check egos at the door.
2. Working software over comprehensive documentation
This is the one that often makes developers jump for joy! An Agile team’s focus should be on finding the most efficient way to build software that solves an identified need, and therefore should not spend a lot of time on paperwork. Agile documentation should answer two basic questions: what are we going to build (project requirements and user stories) and how did we build it (technical specifications). The former is crucial for keeping the team focused on the ultimate goal during the fast and furious development sprints, and the latter is needed later on for the purpose of revisiting a certain project, be it to make enhancements or corrections or to reuse a particular feature. Anything else is typically overkill.
3. Customer collaboration over contract negotiation
The best way I can think of to explain this core value is: the development team needs to think of the customer as another member of the team. The customer-team relationship should not be managed by a signed piece of paper, but rather by the ongoing needs of the project. Contract negotiations (you can calm your legal department down at this point; yes, there will be a contract) should be focused on identifying the problem that needs to be solved and a clear set of success criteria that will tell us we’ve solved it, rather than the tool or process to be delivered. Provisions should be made for regular customer-team interactions (say, by involving customer representatives in sprint planning and review meetings) and a clearly defined change management process: software development is a journey, and the team should have the flexibility to change course midstream if doing so will make the end product a better fit for the customer’s need.
4. Responding to change over following a plan
I talked about requirements documentation earlier, so there is, in fact, an overall plan. What this core value means is that those requirements are a suggested path to solving a customer need, and they can be modified throughout the project if prior development work uncovers a different, better path to the solution, or even a better solution altogether. And in this case, better means more efficient. In fact, everything I’ve described can be summarized in one, overarching principle: identify the problem to be solved or that needs to be fulfilled, and find the least costly way to get to that end point; do this at the beginning of the project, and keep doing it over, and over, and over again until everyone agrees that a solution has been reached. Everything else (processes, tools, plans, documentation) either makes it easier for the team to find that solution, or is superfluous and should be eliminated.
While our Net Archive search performs satisfactory, we would like to determine how well-balanced the machine is. To recap, it has 16 cores (32 with Hyper Threading), 256GB RAM and 25 Solr shards @ 900GB. When running it uses about 150GB for Solr itself, leaving 100GB memory for disk cache.
Test searches are for 1-3 random words from a Danish dictionary, with faceting on 1 very large (billions of unique values, billions of references), 2 large fields (millions of unique values, billions of references) and 3 smaller fields (thousands of unique values, billions of references). Unless otherwise noted, searches were issued one request at a time.Scaling cores
Under Linux it is quite easy to control which cores a process utilizes, by using the command taskset. We tried scaling that by doing the following with different cores:
- Shut down all Solr instances
- Clear the disk cache
- Start up all Solr instances, limited to specific cores
- Run the standard performance test
In the chart below, ht means that there is the stated number of cores + their Hyper Threaded counterpart. In other words, 8ht means 8 physical cores but 16 virtual ones.
- Hyper Threading does provide a substantial speed boost.
- The differences between 8ht cores and 16 or 16ht cores are not very big.
Conclusion: For standard single searches, which is the design scenario, 16 cores seems to be overkill. More complex queries would likely raise the need for CPU though.Scaling shards
Changing the number of shards on the SolrCloud setup was simulated by restricting queries to run on specific shards, using the argument shards. This was not the best test as it measured the combined effect of the shard-limitation and the percentage of the index held in disk cache; e.g. limiting the query to shard 1 & 2 meant that about 50GB of memory would be used for disk cache per shard, while limiting to shard 1, 2, 3, & 4 meant only 25GB of disk cache per shard.
Note: These tests were done on performance degraded drives, so the actual response times are too high. The relative difference should be representative enough.
- Performance for 1-8 shards is remarkably similar.
- Going from 8 to 16 shards is 100% more data at half performance.
- Going from 16 to 24 shards is only 50% more data, but also halves performance.
Conclusion: Raising the number of shards further on an otherwise unchanged machine would likely degrade performance fast. A new machine seems like the best way to increase capacity, the less guaranteed alternative being more RAM.Scaling disk cache
A Java program was used to reserve part of the free memory, by allocating a given amount of memory as long and randomly changing the content. This effectively controlled the amount of memory available for disk cache for the Solr instances. The Solr instances were restarted and the disk cache cleared between each test.
- A maximum running time of 10 minutes was far too little for this test, leaving very few measuring points for 54GB, 27GB and 7GB disk cache.
- Performance degrades exponentially when the amount of disk cache drops below 100GB.
Conclusion: While 110GB (0.51% of the index size) memory for disk cache delivers performance well within our requirements, it seems that we cannot use much of the free memory for other purposes. It would be interesting to see how much performance would increase with even more free memory, for example by temporarily reducing the number of shards.Scaling concurrent requests
Due to limited access, we only need acceptable performance for one search at a time. Due to the high cardinality (~6 billion unique values) URL-field, the memory requirements for a facet call is approximately 10GB, severely limiting the maximum number of concurrent requests. Nevertheless, it is interesting to see how much performance changes when the number of concurrent requests rises. To avoid reducing the disk cache, we only tested with 1 and 2 concurrent requests.
Observations (over multiple runs; only one run in shown in the graph):
- For searches with small to medium result sets (aka “normal” searches), performance for 2 concurrent requests was nearly twice as bad as for 1 request.
- For searches with large result sets, performance for 2 concurrent requests were more than twice as bad as for 1 request. This is surprising as a slightly better than linear performance drop were expected.
Conclusion: Further tests seems to be in order due to the surprisingly bad scaling. One possible explanation is that memory speed is the bottleneck. Limiting that number to 1 or 2 and queuing further requests is the best option for maintaining a stable system due to memory overhead.Scaling requirements
Admittedly, the whole facet-on-URL-thing might not be essential for the user experience. If we avoid faceting on that field and only facet on the more sane fields, such as host, domain and 3 smaller fields, we can turn up the number of concurrent requests without negative impact on disk cache.
- Mean performance with 1 concurrent request is 10 times better, compared to the full faceting scenario.
- From 1-4 threads, latency drops and throughput improves.
- From 4-32 threads, latency drops but throughput does not improve.
Conclusion: As throughput does not improve for more than 4 concurrent threads, using a limit of 4 seems beneficial. However, as we are planning to add faceting on links_domains and links_hosts, as well as grouping on URL, the measured performance is not fully representative of future use of search in the Net Archive.