Christmas Eve in Gainesville added to photodatabase
Rumour has it that our 25 lovely 1TB Samsung 840 EVO drives in our Net Archive search machine does not perform well, when data are left untouched for months. Rumour in this case being solid confirmation with a firmware fix from Samsung. In our setup, index shards are written once, then left untouched for months or even years. Exactly the circumstances that trigger the performance degradation.Measurements, please!
Our 25 shards are build over half a year, giving us an unique opportunity to measure drives in different states of decay. First experiment was very simple: Just read all the data from the drive sequentially by issuing cat index/* > /dev/null and plot the measured time spend with the age of the files on the x axis. That shows the impact on bulk read speed. Second experiment was to issue Solr searches to each shard in isolation, testing search speed one drive at a time. That shows the impact on small random reads.
Inspecting the graph, it seems that search performance quickly gets worse until the data are about 4 months old. After that it stabilizes. Bulk reading on the other hand continue to worsen during all 7 months, but that has little relevance for search.
The Net Archive search uses SolrCloud for querying the 25 shards simultaneously and merging the result. We only had 24 shards at the previous measurement 6 weeks ago, but the results should still be comparable. Keep in mind that our goal is to have median response times below 2 seconds for all standard searches; searches matching the full corpus and similar are allowed to take longer.
The distinct hill is present both for the old and the new measurements: See Even sparse faceting is limited for details. But the hill has grown for the latest measurements; response times has nearly doubled for the slowest searches. How come it got that much worse during just 6 weeks?
Theory: In a distributed setup, the speed is dictated by the slowest shard. As the data gets older on the un-patched Samsung drives, the chances of having slow reads rises. Although the median response time for search on a shard with 3 month old data is about the same as one with 7 month old data, the chances of very poor performance searches rises. As the whole collection of drives got 6 weeks older, the chances of not having poor performance from at least one of the drives during a SolrCloud search fell.
Note how our overall median response time actually got better with the latest measurement, although the mean (average) got markedly worse. This is due to the random distribution of result set sizes. The chart paint a much clearer picture.Well, fix it then!
The good news is that there is a fix from Samsung. The bad news is that we cannot upgrade the drives using the controller on the server. Someone has to go through the process of removing them from the machine and perform the necessary steps on a workstation. We plan on doing this in January and besides the hassle and the downtime, we foresee no problems with it.
However, as the drive bug is for old data, a rewrite of all the 25*900GB files should freshen the charges and temporarily bring them back to speed. Mads Villadsen suggested using dd if=somefile of=somefile conv=notrunc, so let’s try that. For science!
It took nearly 11 hours to process drive 1, which had the oldest data. That fits well with the old measurement of bulk speed for that drive, which was 10½ hour for 900GB. After that, bulk speed increased to 1 hour for 900GB. Reviving the 24 other drives was done in parallel with a mean speed of 17MB/s, presumably limited by the controller. Bulk read speeds for the reviewed drives was 1 hour for 900GB, except for drive 3 which took 1 hour and 17 minutes. Let’s file that under pixies.
Repeating the individual shard search performance test from before, we get the following results:
Note that the x-axis is now drive number instead of data age. As can be seen, the drives are remarkably similar in performance. Comparing to the old test, they are at the same speed as the drive with 1 month old data, indicating that the degradation sets in after more than 1 month and not immediately. The raw numbers for the same 7 drives as listed in the first table are:
Running the full distributed search test and plotting the results together with the 1½ month old measurements as well as the latest measurements with the degraded drives gives us the following.
Performance is back to the same level as 1½ month ago, but how come it is not better than that? A quick inspection of the machine revealed that 2 backup jobs had started and were running during the last test; it is unknown how heavy that impact is on the drives, so the test will be re-run when the backups has finished.Conclusion
The performance degradation of non-upgraded Samsung 840 EVO drives is very real and the degradation is serious after a couple of months. Should you own such drives, it is highly advisable to apply the fixes from Samsung.
The deadline for the Heritage Health Information (HHI) 2014: A National Collections Care Survey has been extended to February 13, 2015! The HHI 2014 is a national survey on the condition of collections held by archives, libraries, historical societies, museums, scientific research collections, and archaeological repositories. It is the only comprehensive survey to collect data on the condition and preservation needs of our nation’s collections.
Invitations to participate were sent to institution directors in October. These invitations included personalized login information, which may be entered at www.hhi2014.com. Questions about the survey may be directed to hhi2014survey [at] heritagepreservation [dot] org or 202-233-0824.
Heritage Health Information 2014 is sponsored by the Institute of Museum and Libraries Services and the National Endowments for the Humanities & Arts, and is conducted by Heritage Preservation. Please do all you can to ensure that your institution is represented in this important survey. Your responses are critical in garnering future support for collections care.
The Library of Congress and the Institute of Museum and Library Services (IMLS) recently announced the official open call for applications for the 2015 National Digital Stewardship Residency, to be held in the Washington, D.C. area. Applications will close on January 30, 2015. To apply, go to the official USAJobs application website.
For the 2015–16 class, five residents will be chosen for a year-long residency at a prominent institution in the Washington, D.C. area. The residency will begin in June, 2015, with an intensive week-long digital stewardship workshop at the Library of Congress. Thereafter, each resident will move to his or her designated host institution to work on a significant digital stewardship project. These projects will allow them to acquire hands-on knowledge and skills involving the collection, selection, management, long-term preservation, and accessibility of digital assets.
The five institutions, and the projects they will offer to NDSR residents, are:
- American Institute of Architects: Building Curation into Records Creation: Developing a Digital Repository Program at the American Institute of Architects
- U.S. Senate Historical Office: Improving Digital Stewardship in the U.S. Senate
- National Library of Medicine: NLM-Developed Software as Cultural Heritage
- District of Columbia Public Library: Personal Digital Preservation Access and Education through the Public Library
- Government Publishing Office: Preparation for Audit and Certification of GPO’s FDsys as a Trustworthy Digital Repository
The inaugural class of the NDSR was held in Washington, D.C. in 2013-14. Host institutions for that class included Association of Research Libraries, the Dumbarton Oaks Research Library, the Folger Shakespeare Library, the Library of Congress, the University of Maryland, the National Library of Medicine, the National Security Archive, the Public Broadcasting Service, the Smithsonian Institution Archives and the World Bank.
“We are excited to be collaborating with such dynamic host institutions for the second NDSR residency class in Washington, D.C.,” said Library of Congress Supervisory Program Specialist George Coulborne. “In collaboration with the hosts, we look forward to developing the most engaging experience possible for our residents. Last year’s residents all found employment in fields related to digital stewardship or went on to pursue higher degrees. We hope to replicate that outcome with this class of residents, as well as build bridges between the host institutions and the Library of Congress to advance digital stewardship.”
“At IMLS, we are delighted to continue our work on and funding support for the second round of the NDSR,” said Maura Marx, IMLS Deputy Director for Library Services. “We welcome the new hosts and look forward to welcoming the new residents to all the opportunities this program presents.”
To qualify, applicants must have a master’s degree or higher academic credential, graduating between spring 2013 and spring 2015, with a strong interest in digital stewardship. Currently enrolled doctoral students also are encouraged to apply. Applicants must submit a detailed resume and cover letter, their undergraduate and graduate transcripts, three letters of recommendation, and a creative video that explains an applicant’s interest in the program. Visit the NDSR application website.
The residents chosen for NDSR 2015 will be announced by early April 2015. For additional information and updates regarding the National Digital Stewardship Residency, please see the program website.
The Office of Strategic Initiatives, part of the Library of Congress, oversees the NDSR for the Library and directs the overall digital strategic planning for the Library and the national program for long-term preservation of digital cultural assets, leading a collaborative institution-wide effort to develop consolidated digital future plans, and integrating the delivery of information technology services.
The post Apply for 2015 National Digital Stewardship Residency Program appeared first on District Dispatch.
Bitcoin was the worst investment of 2014, as its value halved.Bitcoin's hash rate had been growing exponentially since the start of 2013 but has been approximately flat for the last quarter, indicating that investment in new mining hardware has dried up.The reason for investment drying up is likely that the revenue from mining is less than a third of what it was.The Bitcoin market capitalization dropped from $11B to $4.4B.Even if you don't accept my economies of scale arguments, these numbers should temper your enthusiasm for basing peer-to-peer storage on a crypto-currency.
Technological developments in 3D printing are empowering people to learn new skills, launch business ventures and solve complex health problems. As this cutting-edge technology becomes more common in libraries, what do librarians need to know? Join a panel of information professionals for the session “Library 3D Printing—Unlocking the Opportunities, Understanding the Challenges” which takes place during the 2015 American Library Association’s (ALA) Midwinter Meeting in Chicago. The session will be held from 10:30–11:30 a.m. on Sunday, February 1, 2015, in the McCormick Convention Center room W470A.
The panel will tackle the policy implications of 3D printing from all angles, with a view to helping the library community establish smart user policies. Topics of discussion will include intellectual property and intellectual freedom issues, product liability questions, the educational and entrepreneurial applications of library 3D printing and more.
Speakers include Barbara Jones, director of the ALA Office for Intellectual Freedom; Tom Lipinski, dean and professor at the University of Wisconsin-Milwaukee School of Information Studies; and Charlie Wapner, information policy analyst at the ALA Office for Information Technology Policy.
The post Library experts to talk 3D printing at 2015 ALA Midwinter Meeting appeared first on District Dispatch.
Happy Christmas! It is my sincere wish that everyone reading this is/has had a wonderful time with family and good friends over the holiday season. This year marks the second year that I’ve been away from my family – my parents, my brothers, my in-laws – all still in Oregon. It’s the hardest time to be away from family; we have always been close and while my wife and kids have our own Christmas traditions, we’ve always found time around the holidays to be together as an extended family. But this second year in Ohio has been very different than the first. Last year, we were still trying to settle into our new community, new friends and absorb the Midwest culture (which is very different than the west coast). Ohio had become home, and yet, it wasn’t.
This year has been different. We’ve made good friends, our kids have found a place to fit in; we’ve bought a house and are putting down roots. Ohio State University continues to be a place with challenges and opportunities to learn and grow – but more importantly, it has become a place not just with colleagues that I respect and continue to learn from, but a place where friendships have been made. When my older son had a bit of a health scare, it was my community at Ohio State and the friends we’ve made in our neighborhood that helped to provide immediate support, and continue to support us. As I look back on 2014 and all the wonderful friends and adventures that we’ve had in our new adopted state, I realize just how fortunate and blessed my family has been to find a community, job, and friends that have just fit.
This last year has also saw the continued growth of both MarcEdit and its user community. On the application side, this year saw the release of MarcEdit 6, the MARCNext tool kit, integration with OCLC’s WorldCat, new language tools, automation tools, etc. The user community…well, I’m consistently amazed by the large and diverse user community that has grown up around something that I really made available with the hope that maybe just one other person might find it useful. This is a great community, and I’m always humbled by the kindness and helpfulness displayed. I’m told often how much people appreciate this work. Well, I appreciate you as well. I have always appreciated the opportunity to work with so many interesting people on projects and problems that potentially can have lasting impacts. It has, and always will be one my great pleasures.
On to the update….in what has become a tradition, I’m releasing the MarcEdit Christmas update. I’d already provided a little bit of information related to what was changing in a previous blog post: http://blog.reeset.net/archives/1632 – but I’m including the full list below:Changes:
- Enhancement: MARCCompare: Added options to allow users to define colors for added and deleted content.
- Enhancement: MARCCompare: Added options to support automatic sorting of data prior to comparison. Users can define the field for sorting (default is the 001)
- Enhancement: MARCEngine: Improved support for automated conversion of NRC notation in UTF8 data to ensure proper representation of UTF8 characters.
- Modified Behavior: Automated Update: Previously, MarcEdit would check for an update every time the application was run. If an update had occurred, the program would prompt the user for action. If the user cancelled the action, the program would re-prompt the user each time the program was started. Because many users work in environments where their updates are managed by central IT staff, this constant re-prompting was problematic. Often, it would lead to users simply disabling update notification. To make this more user friendly, the new behavior works as follows: When the program determines an update has been made, the program will prompt the user. If the user takes no action, the program will no longer prompt for action, but instead will provide an icon denoting the presence of an update in the lower right corner, next to the Preferences shortcut.
- Enhancement: Link Identifiers Tool: I’ve added support for MESH headings through the use of their beta SPARQL end-point. Records run through the linking tool with identified MESH headings will automatically be resolved against the NLM database.
- Enhancement: SPARQL Browser: This was described in blog post: http://blog.reeset.net/archives/1632, but this is one a new tool added to the MARCNext toolkit.
- Enhancement: RDF Toolkit: In building the SPARQL Browser, I integrated a new RDF toolkit into MarcEdit. At this point, the SPARQL Browser is the only resource making like use of its functionality – but I anticipate that this new functionality will be utilized for a variety of other functions as the cataloging community continues to explore new metadata models.
- Bug Fix: Diacritic insertion via intellisense: When typing a diacritic, selecting it by double clicking on the value would result in the file scrolling to the top. The program now resets to the cursor position. When the user just clicked enter to select the value, a new line was inserted behind the diacritic mnemonic — both of these have been fixed.
- Bug Fix: Mac Threading issues: One of the things that came to my attention on the last update is that Mono has some issues when generating system dialog boxes within separate threads. It appears that the new garbage collector in Mono may be sweeping object pointers prematurely. The easy solution is to remove the need to generate these system messages or move them when necessary. This has been done. Immediately, this corrects the issue related to MarcEdit crashing when the update message was generated.
- Bug Fix: Mac Fonts: I was having some trouble with Mac systems not failing graciously when the system requested a font not found on the system. In Windows and Linux implementations of Mono, the default process is to fall through the requested font family until an appropriate font was found. Under OSX, Mono’s behavior is different, with fonts not returning any value, defaulting to undefined blocks. I’ve reworked the font selection class to ensure that a fall back font is always selected on all systems – which has corrected this problem on the Mac.
The MarcEdit update is available for download from: http://marcedit.reeset.net/downloads for all systems. You may also download and update the application via the automatic updating utilizing from within MarcEdit itself.
Again, Happy Christmas!
Open Library will be down for 3 hours starting from Dec 24, 10:00PM SF Time (PST, UTC/GMT -8 hours) due to a scheduled hardware maintenance.
We’ll post updates here and on @openlibrary twitter.
Thank you for your cooperation.
We’re fans of lists here at the Library of Congress and there is no better way to close out the year on The Signal than taking a look back at our popular blog posts of the year.
Our most viewed post of the year, and our second most viewed post of all time since our blog launched in 2011, was the post about the discovery of unreleased Duke Nukem video game code. It generated quite a lot of buzz and was picked up by the gaming and technical news sites, including: Polygon, Engadget, Eurogamer, The Verge, Gamasutra, and CNET.
Here’s the entire list of top 10 posts of 2014 (out of 189 total posts), ranked by page views based on data as of December 22:
- Duke’s Legacy: Video Game Source Disc Preservation at the Library of Congress
- Personal Digital Archiving: The Basics of Scanning
- What Do you Mean by Archive? Genres of Usage for Digital Preservers
- Research is Magic: An Interview with Ethnographers Jason Nguyen & Kurt Baer
- Exhibiting .gifs: An Interview with curator Jason Eppink
- New NDSA Report: The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions
- We’re All Digital Archivists Now: An Interview with Sibyl Schaefer
- The PDF’s Place in a History of Paper Knowledge: An Interview with Lisa Gitelman
- What Does it Take to Be a Well-rounded Digital Archivist?
- Digital Archiving: Making It Personal at the Public Library
And here are the top 10 posts with the most comments, based on data as of December 22:
- Personal Digital Archiving: The Basics of Scanning
- Duke’s Legacy: Video Game Source Disc Preservation at the Library of Congress
- What Do you Mean by Archive? Genres of Usage for Digital Preservers
- When it Comes to Keepsakes, What’s the Difference Between Physical and Digital?
- Where are the Born-Digital Archives Test Data Sets?
- Research and Development for Digital Cultural Heritage Preservation: A (Virtual and In-Person) Open Forum
- What Does it Take to Be a Well-rounded Digital Archivist?
- Comparing Formats for Still Image Digitizing: Part One
- Data: A Love Story in the Making
- Tag and Release: Acquiring & Making Available Infinitely Reproducible Digital Objects
It is heartening to see that most of the top posts of the year talk about jobs and skills in our profession, along with posts of interviews with practitioners working on stewarding various types of digital content. Looking specifically at the blog posts that generated the most comments, we were really excited to see excellent engagement and conversation occurring between commenters.
Thank you to all of our readers and commenters for making 2014 a memorable one on The Signal!
Christmas in Denton added to photodatabase
Within the library community, we understand the value of public programming—at least from an experiential perspective, seeing how our users benefit. But how can we understand the benefits and challenges of public programming systematically across libraries, and ultimately at a national level?
The National Impact of Library Public Programs Assessment (NILPPA), a project of the American Library Association’s (ALA) Public Programs Office, is addressing these questions. Research work during the past year has yielded initial findings. You may find these findings of interest, and your comments will help to move this work forward.
The ALA Office for Information Technology Policy (OITP) thinks about the public policy implications of public programming. For many in the library community, the focus is on the substantive programming itself and the direct benefits to communities. For our orientation, public programming provides libraries with visibility (think marketing and advertising) in communities as important cultural and educational institutions. Public programming may also advance specific policy objectives such as improving literacy (including digital literacy), understanding challenges of privacy and surveillance in society, or the importance of widespread access to advanced technology (e.g., high-speed broadband).
Forget the New York Times best seller list when deciding what to read on any days off you might have in front of you. The Federal Communications Commission (FCC) released the second E-rate Modernization Order in plenty of time for you to print it out and stuff it in your carryon (if you’re lucky enough to be traveling somewhere sunny) or keep it on your bedside table if you’re like me and not so lucky.
Among the major changes adopted in this order are those geared to close the broadband capacity gap for libraries and schools, particularly for those in rural areas. These include:
- Suspending the amortization rules for special construction;
- Allowing applicants to pay their non-discounted portion for construction costs over multiple years;
- Equalizing the treatment of dark and lit fiber;
- Permitting self-construction of high-speed broadband networks; and
- Adding discounts when states match funds for broadband construction projects.
These are the changes directly related to addressing the lack of affordable access to high-capacity broadband. The Commission also increased the funding available by $1.5 billion, bringing the program up to $3.9 billion. And as always, there are a number of other important program changes that provide new opportunities for libraries. We are preparing a summary of the order, but in the meantime, the FCC has one of their own which explains the major changes, some of which take effect in the 2015 funding year.
As we alluded to earlier, there is a lot of work ahead to make sure that libraries have the supports that they need to take advantage of the new funding and program changes. To that end Susan Hildreth, director of the Institute of Museum and Library Services (IMLS), and FCC Chairman Tom Wheeler held a conference call which was both a recognition of the hard work of the American Library Association (ALA) and our library partners and a call to action. We are heeding the call to action and planning ongoing outreach and education to provide as much information to applicants and library leaders as we can. As a first step we are working with the Public Library Association to hold a webinar, January 8, to go into detail on the second E-rate order. And there will be more to come in the weeks ahead.
Read the FCC’s summary and you can get back to the book you put aside, but if you take on the full 106 pages of the E-rate order, try our official E-rate cocktail:“The Exparte”
- 2 ounces Campari
- 1 ounce Gin (your choice)
- 3 drops Bitters (try Angustura or orange)
- Topped with club soda and garnished with an orange twist
For those of you who have been closely following the E-rate proceeding for the last 18 months or for those intimately aware of the intricacies of the E-rate application cycle here’s an ode to help you ring in the new E-rate year.An E-rate Holiday Ode The end of 2014 is now very near
and we have E-rate reform, so we hear.
Santa Wheeler has a very full sack
and the FCC/USAC elves have much on their backs.
All the new and confusing program regs
for some needed clarity we humbly beg.
With so many questions now still pending
we fear that 2015 may be never ending.
So much is new, so much has changed
even E-rate veterans’ minds go insane!
Like C2 reforms – yes we’ve waited so long
that useless 2-in-5 is finally gone!
C2 budgets, a bit hard to understand
but the FCC says your C2 funding’s in hand.
New rules for fiber, which is good news
changes like this we can certainly use.
A large increase in funding with money that’s new
to be spread among many and not just a few.
No need to amortize those big requests
get all funds in one year, likely the best.
The state match will stretch our limited funds
may not be too much but it’s more than just some.
How to do CEP, when will we hear?
the time to do this is very near.
The bad urban/rural change last summer
the new order fixes, it’s no longer a bummer.
To complete the new 471, I can hardly wait
though when it’s done I’ll be in a catatonic state.
But we’ve finally reached the end of program reform
so in the New Year let’s celebrate – the E-rate’s reborn. –Poem by Bob Bocher, OITP Fellow
Happy New Year
Today I found the following resources and bookmarked them on <a href=
- Contiki Contiki is an open source operating system for the Internet of Things. Contiki connects tiny low-cost, low-power microcontrollers to the Internet.
Digest powered by RSS Digest
The 113th Congress concluded its work in time to leave town for the holidays. While not the most productive Congress in terms of bills passed, the 113th was able to finish one of the mandatory “must do” items of funding the Federal government for Fiscal Year 2015.
One might notice that while the Fiscal Year actually began October 1, for Congress, a three month delay is not uncommon in the highly partisan and dysfunctional climate. The Federal government has been operating under a Continuing Resolution, a Congressionally-enacted measure to provide short term funding to keep the doors of government open while Appropriators hammer out details of longer term funding levels.
What exactly is a Cromnibus? It’s not a Nightmare Before Christmas, but rather a massive funding bill that provides funding to keep the Federal government open for a short period of time (a Continuing Resolution) and also provides long term funding for eleven of Federal agencies in one bill (an Omnibus)…thus the marvelously named CR-Omnibus!
How did libraries fare in the Cromnibus funding package? Mostly, programs supported by the libraries received level funding, which is good news in the austere atmosphere on Capitol Hill. For example, the Library Services and Technology Act, Head Start, Innovative Approaches to Literacy, and Career and Technical Education State Grants all received the same level of funding as FY 2014.
A few programs received slight increases or decreases. Small increases were granted to the Institute of Museum and Library Services, Striving Readers, Library of Congress, and the Government Publishing Office (formally known as the Government Printing Office). Slight decreases were dealt to Assessment programs, National Archives, and Electronic Government initiatives.
You can view an expanded chart displaying the funding levels of top ALA priority programs by clicking here.
Now that the FY 15 budget is done and the 113th Congress has concluded, the 114th Congress will arrive in a few weeks and work on the FY 16 budget will begin.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.