Last updated September 1, 2016. Created by Peter Murray on September 1, 2016.
Log in to edit this page.
The Islandora Foundation is thrilled to announce the second Islandoracon, to be held at the lovely LIUNA Station in Hamilton, Ontario from May 15 - 19, 2017. The conference schedule will take place over five days, including a day of post-conference sessions and a full-day Hackfest.
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.
New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
Open Knowledge Foundation: Freedom to control MyData: Access to personal data as a step towards solving wider social issues.
This piece is part of a series of posts from MyData 2016 – an international conference that focuses on human centric personal information management. The conference is co-hosted by the Open Knowledge Finland chapter of the Open Knowledge International Network.Song lyrics: Pharrell Williams “Freedom”; Image Pixabay CC0
Indeed, the theme of MyData so far is freedom. The freedom to own our data. Freedom however, is a very complicated subject that has been subjected to so many arguments, interpretations and even wars. I will avoid the more complicated philosophy and dive instead into a more daily life example. In the pop song quoted above, freedom can be understood as being carefree – “Who cares what they see and what they know?” Taking it to the MyData context, are we granting freedom to others to do whatever they want with our data and information because we trust them, or just because we don’t care?
MyData speakers have looked at the issue of freedom from a different angle. Taavi Kotka, Estonia CIO, claims that the fifth freedom of the EU should be the freedom of Data. People, explains Kotka, should have the choice of what can be done with their data. They should know and understand the possibilities that sharing the data can bring (for example, like better and easier services across the EU countries), and the threat that this can entail, like misuses of their data. For that we need pioneer regulators. For that we need the private sector and civil society to pressure and showcase what we can do with data and shift change accordingly.…thinking outside of the box can help governments to move forward and at the end of the day, to supply better services for citizens
This shifting in regulations and thinking should also be accepted by government. It was refreshing to hear the Finnish Minister of Transport and Communications, Anne Berner saying that government should not be afraid of disruption, but accept disruption and be disruptive themselves. MyData is disruptive in the sense that it is challenging the norms of the current data storage and use, and thinking outside of the box can help governments to move forward and at the end of the day, to supply better services for citizens.
Another topic that has been raised up repeatedly is the digital self and the idea that data is a stepping stone to a better society. The question is then, that in order to build a good society do we need to understand our private data? Maybe understanding data is not a good enough end goal? Maybe a better framing would be to create information and knowledge from the data? I was excited to see a project that can help consumers to evaluate and decide who to trust: Ranking Digital Rights. Ranking Digital Rights looks at big tech corporations and ranks their public commitments and disclosed policies affecting users’ freedom of expression and privacy. This is a very good tool for discussion and advocacy on these topics.Ranking Digital Rights looks at big tech corporations and ranks their public commitments and disclosed policies affecting users’ freedom of expression and privacy.
To return to the question of open. Does freedom of data mean open data? The closed system does not allow us to access our own data. We can’t get insights. How do we create different models to get there?
And I think this is where I enjoy this conference the most – the variety of people. In the last two years I have been in many open data conferences, but the business community side of these events has been very limited, or at least for me, not appealing. Here at MyData, there are tracks for many different stakeholders – from insurance firms to banks, from health to education. I have met people who see the MyData initiative not only as a moral thing to do, but also as an opportunity to innovate and create trust with users. Trust, as I am rediscovering, is key for growth. Ignoring the mistrust of users can lead to a broken market. More than trust, I was happy to see people who are trying to influence their companies not only to go the MyData way, but also to open relevant data from their companies to the public, so we can work on and maybe solve social issues. Seeing the two go hand-in-hand is great, and I am looking forward to more conversations like these.
Tomorrow, Rufus Pollock, our president and Open Knowledge International’s Co-Founder is going to speak about how we can collaborate with others for a better future. You can catch him at 9.30 Helsinki time on screen.io/mydata. Here is a preview for his talk tomorrow:http://blog.okfn.org/files/2016/09/MyDataRufus.mp3 We want openness for public data – public information that could be made available to anyone. And we want access for every person to their own personal data…both are about empowering people to access information.
Posted Sept. 1, this update resolves a couple issues. Particularly:
* Bug Fix: Custom Field Sorting: Fields without the sort field may drop the LDR. This has been corrected.
* Bug Fix: OCLC Integration: regression introduced with the engine changes when dealing with diacritics. This has been corrected.
* Bug Fix: MSI Installer: AUTOUPDATE switch wasn’t being respected. This has been corrected.
* Enhancement: MARCEngine: Tweaked the transformation code to provide better support for older processing statements.
* Bug Fix: Custom Field Sorting: Fields without the sort field may drop the LDR. This has been corrected.
* Bug Fix: MARCEngine: Regression introduced with the last update that caused one of the streaming functions to lose encoding information. This has been corrected.
* Enhancement: MARCEngine: Tweaked the transformation code to provide better support for older processing statements.
I’ll be adding a knowledge-base article, but I updated the windows MSI to fix the admin command-line added to allow administrators to turn off the auto-update feature. Here’s an example of how this works: >>MarcEdit_Setup64.msi /qn AUTOUPDATE=no
I don’t believe the AUTOUPDATE key is case sensitive – but the documented use pattern is upper-case and what I’ll test against going forward.
Downloads are available via the downloads page: http://marcedit.reeset.net/downloads/
When I hear the term “Evergreen,” it immediately invokes images of nature’s symbiotic relationships – Bald eagles nesting in coniferous trees, lady slipper orchids thriving in soil nutrients typically found beneath conifers and hemlocks, pollinators and mammals relying on evergreens for food and, in return, help to redistribute seeds. There is also a complex network of dialogues being exchanged throughout these evergreen forests.
During the past decade, I have been very blessed to hold multiple discussions with people about Evergreen, and it’s not surprising that the continued theme from my fellow coworkers’ blog posts is the emphasis on community. Community grants opportunities and a feeling of personal ownership (how awesome is it that non-proprietary software helps to promote a sense of ownership). Community also helps to foster symbiotic and sustainable relationships. Relationships that are rooted in dialog.
In February 2007, as a reference and genealogy librarian at a rural public library, I held my first conversations with both librarians and patrons about their Evergreen user experiences. Fast forwarding to August 2016, I still treasure every conversation that I have with librarians about their needs, expectations, and experiences. With each library migration, it is with honor and humbleness to hear about the librarians’ current workflows and needs. These user needs are constantly being met with each passing version of Evergreen.
For some, those needs may appear simple. I was so excited by the Update Expire Date button! Or, more complex, like the intricate gears that make meta-record level holds possible. One of the strongest examples of community dialog and symbiosis is the continued refinement of the Acquisitions module.
I couldn’t possible describe all of the awesomeness that I have observed over the past 10 years or single it down to a special moment; there’s just too much. Each patron, library staff member, consortia member, volunteer, contributor, developer, support, data analyst (did I forget anyone?) contributes to Evergreen’s complex web of communication and overall sustainability. I can say that I know how fortunate I am, as a Project Manager, to see the forest for the trees and to know that the Evergreen Community’s roots are growing stronger with each passing year.
This is the eleventh in our series of posts leading up to Evergreen’s Tenth birthday.
Yesterday, just one day before the anniversary of the 1.1.2 release, I published the 1.1.3 release of the PEAR File_MARC library. The only change is the addition of a convenience method for fields called getContents() that simply concatenates all of the subfields together in order, with an optional separator string. Many thanks to Carsten Klee for contributing the code!
You can install File_MARC through the usual channels: PEAR or composer. Have fun!
Today I was privileged to present to the 6th International Congress of Technological Innovation, Innovatics 2016, organized by Duoc UC Libraries, Library of Santiago, and University of Chile Libraries. The conference was simultaneously translated in English and Spanish. To aid the translators, I wrote out the text of my presentation for them to review. Below is the text as it was intended to be presented; I did diverge in a few places mostly based on what others said earlier in the conference.Evolution of Open Source in Libraries
Thank you for the opportunity to talk with you today. My name is Peter Murray, and I’ve been involved in open source projects in libraries for at least 20 years. After receiving my undergraduate degree in computer science, I went to work for my university library as they were bringing up their first library automation system. This was early in the days of information retrieval on the internet, and I adapted the Gopher program to offer a front end to the library’s resources. Gopher came from the University of Minnesota in the United States, and they released their code on the internet for free. There wasn’t a notion of organized open source at the time – it was just free software that anyone could download and adapt. There wasn’t a sense of community around the software, and the tools to share changes with each other were very rudimentary. Stretching back to the 1950s and 1960s, this was the era of “free software”.
During the mid-1990s I worked for Case Western Reserve University in Cleveland, Ohio. I was part of a team that saw the early possibilities of the World Wide Web and aggressively pursued them. We worked to reorganize the library services onto the web and to try to add personalization to the library’s website using a new programming language called Personal Home Page. We know it today as PHP. It was also at this time that the phrase “open source” was coined. “Open source” meant more than just the having the source code available. It also a recognition that organizations had a business case for structuring a community around the source code. In this case, it was release of the Netscape browser code that was the spark that ignited the use of the “open source” phrase.
In the early 2000s I worked for the University of Connecticut. During this time, we saw the formation of open source projects in libraries. Two projects that are still successful today are the DSpace and the Fedora repository projects. These two projects started with grants from foundations to create software that allowed academic libraries to efficiently and durably store the growing amount of digital files being produced by scholars. Both projects followed paths where the software was created for the needs of their parent organizations. It was seen as valuable by other organizations, and new developers were added as contributors to the project.
Also in the early 2000s the Koha integrated library system started to build its community. The core code was written by a small team of developers for a public library in New Zealand in the last few months of 1999 to solve a year-2000 issue with their existing integrated library system. Within a year, Koha publicly released its code, created a community of users on SourceForge – a website popular in the 2000s for hosting code, mailing lists, documentation, and bug reports. The tools for managing the activity of open source communities were just starting to be formed. There is a direct line between SourceForge – the most popular open source community of its time – and GitHub – arguably the most important source code hosting community today.
In the late 2000s I was working for a consortium of academic libraries in Ohio called OhioLINK. In this part of my career, I was with an organization that began actively using library open source software to deliver services to our users. Up until this point, I had – like many organizations – made use of open source tools – the HTTPd web server from Apache, MySQL as a database, PHP and Perl as programming languages, and so forth. Now we saw library-specific open source make headway into front-line library services. And now that our libraries were relying on this software for services to our patrons, we began looking for supporting organizations. DSpace and Fedora each created foundations to hold the source code intellectual property and hire staff to help guide the software. The DSpace and Fedora foundations then merged to become DuraSpace. Foundations are important because the become a focal point of governance around the software and a place where money could be sent to ensure the ongoing health of the open source project.
In the early 2010s I went to work for a larger library consortium called LYRASIS. LYRASIS has about 1,400 member libraries across much of the United States. I went to work at LYRASIS on a project funded by the Andrew W. Mellon Foundation on helping libraries make decisions about using open source software. The most visible part of the project was the FOSS4LIB.org website. It hosts decision support tools, case studies, and a repository of library-oriented open source software. We proposed the project to the Mellon Foundation because LYRASIS member libraries were asking questions about how they could make use of open source software themselves. Throughout the 2000s it was the libraries with developers that were creating and contributing to open source projects. Libraries without developers were using open source through service provider companies, and now they wanted to know more about what it meant to get involved in open source communities. FOSS4LIB.org was one place where library professionals could learn about open source.
The early 2010s also saw growth in service providers for open source software in libraries. The best example of that is this list of service providers for the Koha integrated library system. As we let this scroll through the continents, I hope this gives you a sense that large, well-supported, multinational projects are alive and well in the library field. Koha is the most impressive example of a large service provider community. Other communities, such as DSpace, also have worldwide support for the software. What is not represented here is the number of library consortia that have started supporting open source software for their members. Where it makes sense for libraries to pool their resources to support a shared installation of an open source system, those libraries can reap the benefits of open source.
Now here in the mid-2010s I’m working for a small, privately-held software development company called Index Data. Index Data got its start 20 years ago when its two founders left the National Library of Denmark to create software tools that they saw libraries needed. Index Data’s open source Z39.50 toolkit is widely used in commercial and open source library systems, as is its MasterKey metasearch framework. The project I’m working on now is called FOLIO, an acronym for the English phrase “Future of Libraries is Open”. I’ll be talking more about FOLIO this afternoon, but by way of introduction I want to say now that the FOLIO project is a community of library professionals and an open source project that is rethinking the role of the integrated library system in library services.Revisit the Theme
With that brief review of the evolution of open source software in libraries, let’s return to the topic of this talk – Free Software in Libraries: Success Stories and Their Impact on Today’s Libraries. As you might have guessed, open source software can have a significant impact on how services are delivered to our patrons. In fact, open source software – in its best form – is significantly different from buying software from a supplier. On the one hand, when you buy software from a supplier you are at the mercy of that supplier for implementing new features and for fixing bugs in the software. You also have an organization that you can hold accountable. On the other hand, open source software is as much about the community surrounding the software as it is the code itself. And to be a part of the community means that you have rights and responsibilities. I’d like to start first with rights.Rights
The rights that come along with open source are, I think, somewhat well understood. These are encoded in the open source licenses that projects adopt. You have the right to view, adapt, and redistribute the source code. You can use the software for any purpose you desire, even for purposes not originally intended by the author. And the one that comes most to mind, you have the right to run the software without making payment to someone else. Let’s look at these rights.Use
In the open source license, the creator of the software is giving you permission to use the software without needing to contact the author. This right cannot be revoked, so you have the assurance that the creator cannot suddenly interrupt your use, even if you decide to use the software for something the creator didn’t intend. This also means that you can bring together software from different sources and create a system that meets the needs of your users.Copy
You have the right to make a copy of the software. You can copy the software for your own use or to give to a friend or colleague. You can create backup copies and run the software in as many places as you need. Most importantly, you have this right without having to pay a royalty or fee to the creator. One example of this is the Fenway Libraries Online in the Boston Massachusetts area. Ten libraries within the Fenway consortium needed a system to manage their electronic resource licenses and entitlements. After an exhaustive search for a system that met their requirements, they selected the CORAL project originally built at the University of Notre Dame. There is a case study on Fenway’s adoption of CORAL on the FOSS4LIB.org website.Inspect
A key aspect of open source is the openness of the code itself. You can look at the source code and figure out how it works. This is especially crucial if you want to move away from the system to something else; you can figure out, for instance, how the data is stored in a database and write programs that will translate that data from one system to another. Have you ever needed to migrate from one system to another? Even if you didn’t have to do the data migration yourself, can you see where it would be helpful to have a view of the data structures?Modify
Hand-in-hand with the right to inspect open source code is the right to modify it to suit your needs. In almost all cases, the modifications to the code use the same open source license as the original work. What is interesting about modifications, though, is that sometimes the open source license may specify conditions for sharing modifications.Fork
Ultimately, if the open source project you are using is moving in a different direction, you have the right to take the source code and start in your own direction. Much like a fork in the road, users of the open source project decide which branch of the fork to take. Forks can sometimes remain relatively close together, which makes it somewhat easy to move back and forth between them. Over time, though, forks usually diverge and go separate ways, or one will turn out not to be the one chosen by many, and it will die off. With this right to fork the code, it is ultimately the community that decides the best direction for the software. There was an example a few years ago within the Koha community where one service provider wanted to control the process of creating and releasing the Koha software in ways that a broader cross-section of the community didn’t appreciate. That segment of the community took the code and re-formed new community structures around it.Responsibilities
These rights – Use, Copy, Inspect, Modify, and Fork – form the basis of the open source license statement that is a part of the package. Some form of each of these rights are spelled out in the statement. What is left unsaid is the responsibilities of users of the open source software. These responsibilities are not specified in the open source license that accompanies the software, but they do form the core values of the community that grows around the development and enhancing of the project. Each community is different, just like each software package has its own features and quirks, but they generally have some or all of the following characteristics. And depending on each adopter's needs and capacity, there will be varying levels of commitments each organization can make to these responsibilities. As you work with open source code, I encourage you to keep these responsibilities in mind.Participate
The first responsibility is to participate in the community. This can be as simple to joining web forums or mailing lists to get plugged into what the community is doing and how it goes about doing it. By joining and lurking for a while, you can get familiar with the community norms and find out what roles others are playing. The larger automation companies in libraries typically have users groups, and joining the community of an open source group is usually no different than that of a proprietary company. One of the key aspects of open source projects is how welcoming it is to new participants. The Evergreen community has what I think is a great web page for encouraging libraries to try out the software, read the documentation, and get involved in the community.Report Bugs
Library software has bugs – the systems we use are just too complicated to account for every possible variation as the software is developed and tested. If you find something that doesn't work, report it! Find the bug reporting process and create a good, comprehensive description of your problem. Chances are you are not the only one seeing the problem, and your report can help others triangulate the issue. This is an area where open source is, I think, distinctly different from proprietary software. With proprietary systems, the list of bugs is hidden from the customers. You may not know if you are the only one seeing a problem or if the issue is in common with many other libraries. In open source projects, the bug list is open, and you can see if you are one among many people seeing the issue. In most open source projects, bug reports also include a running description of how the issue is being addressed – so you can see when to anticipate a fix coming in the software. As an aside for open source projects in the audience: make it easy for new community members to find and use your bug reporting process. This is typically a low barrier of entry into the community, and a positive experience in reporting a bug and getting an answer will encourage that new community member to stick around and help more. We'll talk about triaging bug reports in a minute.Financially Support
Thinking back to our open source software rights, we know we can use the software without paying a license fee. That doesn't mean, though, that open source is free to write and maintain. Some of the most grown up software projects are backed by foundations or community homes, and these organizations hire developers and outreach staff, fund user group meetings, and do other things that grow the software and the community surrounding it. If your organization's operations are relying on a piece of open source, use some of your budget savings from paying for a vendor's proprietary system to contribute to the organization that is supporting the software. DuraSpace is a success story here. Since its founding, it has attracted memberships from libraries all around the world. Libraries don’t have to pay to use the DSpace or Fedora software. Those that do pay recognize that their membership dues are going to fund community and technical managers as well as server infrastructure that everyone counts on to keep the projects running smoothly.Help
As staff become more familiar with the open source system, they can share that expertise with others around them. This is not only personally rewarding, but it also improves the reputation of your organization. A healthy and growing open source community will have new adopters coming in all the time, and your experience can help someone else get started too. EIFL, an acronym for the English “Electronic Information for Libraries”, is a not-for-profit organization that works to enable access to knowledge in 47 countries in Africa, Asia, and Europe. One of their programs is to help libraries in developing countries adopt and use open source software for their own institutions. They gather groups of new users and match them with experienced users so they can all learn about a new open source software package at about the same pace. Through this mentoring program, these libraries now have capabilities that they previously didn’t have or couldn’t afford.Triage
A few slides earlier, I encouraged new adopters to report bugs. That can quickly overwhelm a bug tracking system with issues that are not really issues, issues that have been solved and the code is waiting for the next release, issues where there is a workaround in a frequently asked questions document, and issues that are real bugs where more detail is needed for the developers to solve the problem. As a community member triaging bugs, you look for new reports that match your experience and where you can add more detail or point the submitter to a known solution or workaround. Sometimes this points to a need for better documentation (discussed in the next slide); other times it needs a fix or an enhancement to the software, and the report moves on to the development group. Another note for projects: provide a clear way for reported issues to move through the system. This can be as informal as a shared understanding in the community, or as formal as a state diagram published as part of the project's documentation that describes how an issue is tagged and moved through various queues until it reaches some resolution.Documentation
Open source software is often criticized — rightly so — for poor documentation. It is often the last thing created as part of the development process, and is sometimes created by developers who feel more comfortable using a writing voice full of jargon rather than a voice that is clear, concise, and targeted to end-users. Contributing to documentation is a perfect place for expert users but inexperienced coders to make the deepest impact on a project. You don't need to understand how the code was written, you just need to describe clearly how the software is used. Documentation can come in the form of user manuals, frequently-asked-questions lists, and requests to the developers to add help language or change the display of feature to make its use clearer.Translate
Translation is also something that an experienced user can do to support a project. One sign of a mature open source project is that the developers have taken the time to extract the strings of field labels, help text, and user messages into a language file so that these strings can be translated into another language. Translating doesn't mean being able to write code; it just means being able to take these strings and convert them into another language. If you find an open source package that has all of the functionality you need but the native language of the system is not the native language of your users, providing translations can be a great way to make the software more useful to you while also opening up a whole new group of organizations that can use the project as well.Test
This is getting a little more complicated, but if you are running the software locally and can set up a test environment, as release candidates for new software come from the developers, try them out. Run a copy of your data and your workflows through the release candidate to make sure that nothing breaks and new features work as advertised. Some projects will create downloadable "virtual machines" or put the release candidate version of the software on a sandbox for everyone to test, and that lowers the barrier for testing to just about anyone.Request
How feature requests are made are another distinguishing characteristic between open source and proprietary systems. All proprietary systems have some way of registering feature requests and various processes for working through them. In an open source community, there is a lot more transparency about what is being worked on. All of the bug reports and feature requests are listed for everyone to see and comment on. There might even be a formal voting mechanism that guides developers on what to work on next. Volunteer developers from different organizations with similar needs can more easily find each other and tackle a problem together. Developers hired by the software's foundation or community home have a better understanding of what activity will have the biggest impact for the users. This all starts, though, with you making your requests and commenting on the requests of others.Challenge
Healthy projects need forward tension to keep moving, and one way to do that is with eyes-wide-open constructive criticism. It is easy and common for communities to get bogged down in doing things the same way when it might make sense to try a different technique or adopt a different tool. It is also, unfortunately, common for communities to become insular and to become unwelcoming to new people or people unlike themselves. Open source works best when a wide variety of people are all driving towards the same goal. Be aware, though that the good will within communities can be destroyed by unkind and insulting behavior. Just as meetings have adopted codes of conduct, I think it is appropriate for project communities to develop codes of conduct and to have action and enforcement mechanisms in place to step in when needed. This can be tough — in the cultural heritage field, participants in open source are typically volunteers, and it can be difficult to confront or offend a popular or prolific volunteer. The long-term health of the community requires it, though.Code
Only at the very last do we get to coding. The software can't exist without developers, but it is too easy to put the developers first and forget the other pillars that the community relies on — bug reporters and triage volunteers, documentation writers and translators, software testers and community helpers. If you are a developer, try fixing a bug. Pick something that is small but annoying — something that scratches your own itch but that the more mainstream developers don't have time to tackle. A heads-up to open source project leaders: create a mentorship pathway for new developers to join the project, and provide a mechanism to list "easy" bugs that would be useful for developers new to a project to work on.Public Libraries
Throughout the presentation I’ve mentioned academic libraries and library organizations that are making use of open source now. Open source adoption is not limited to academic libraries, though, and I wanted to mention the work of the Meadville Public Library in rural Pennsylvania of the United States. There is a case study on the FOSS4LIB.org website where they describe their library and how they came to choose the Koha integrated library system. The Meadville Public Library has a small staff and an even smaller technology budget. When they decided to migrate to a new system in the mid-2000s, they realized they had a choice to pay the commercial software licensing fees to a traditional library vendor or to put that money towards building skills in the staff to host a system locally. The case study describes their decision-making process, including site visits, fiscal analysis, and even joining a “hackfest” developer event in France to help build new functionality that their installation would need. I invite you to read through the case study to learn about their path to open source software. This library uses open source almost exclusively throughout their systems – from their desktop computers and word processing software to their servers.Conclusion
Making use of open source software is more often about the journey than the destination. In the end, our libraries need systems that enable patrons to find the information they are seeking and to solve the problems that they face. If nothing else, open source software in libraries is a different path for meeting those needs. Open source software, though, can be more. It can be about engaging the library and its staff in the process of designing, building, and maintaining those systems. It can be about the peer-to-peer exchange of ideas with colleagues from other institutions and from service providers on ways to address our patrons’ needs. And sometimes open source software can be about reducing the total cost of ownership as compared with solutions from proprietary software providers. Libraries across the world have successfully adopted open source software. There have been a few unsuccessful projects as well. From each of these successful and unsuccessful projects, we learn a little more about the process and grow a little more as a profession. I encourage you if you haven’t done so already, to learn about how open source software can help in reaching your library’s goals.
Thank you for your attention, and I am happy to take questions, observations, or to hear about your stories of using open source software.
From A. Soroka, the University of Virginia
DuraSpace News: NOW AVAILABLE–TRAC Certified Long-term Digital Preservation: DuraCloud and Chronopolis for Institutional Treasures
Austin, TX An institution’s identity is often formed by what it saves for current and future access. Digital collections curated by the academy can include research data, images, texts, reports, artworks, books, and historic documents help define an academic institution’s identity.
Sixteen years is long enough, surely, to get to know a cat.
Amelia had always been her mother’s child. She had father and sister too, but LaZorra was the one Mellie always cuddled up to and followed around. Humans were of dubious purpose, save for our feet: from the scent we trod back home Mellie seemed to learn all she needed of the outside world.
Her father, Erasmus, left us several years ago; while Mellie’s sister mourned, I’m not sure Rasi’s absence made much of an impression on our clown princess — after all, LaZorra remained, to provide orders and guidance and a mattress.
Where Zorri went, Mellie followed — and thus a cat who had little use for humans slept on our bed anyway.
Recently, we lost both LaZorra and Sophia, and we were afraid: afraid that Amelia’s world would close in on her. We were afraid that she would become a lost cat, waiting alone for comfort that would never return.
The first couple days after LaZorra’s passing seemed to bear our fears out. Amelia kept to her routine and food, but was isolated. Then, some things became evident.
Our bed was, in fact, hers. Hers to stretch out in, space for my legs be damned.
Our feet turned out not to suffice; our hands were required too. For that matter, for the first time in her life, she started letting us brush her.
And she enjoyed it!
Then she decided that we needed correction — so she began vocalizing, loudly and often.
And now we have a cat anew: talkative and demanding of our time and attention, confident in our love.
Sixteen years is not long enough to get to know a cat.
- Identifying the subject of the search.
- Locating this subject in a guide which refers the searcher to one or more documents.
- Locating the documents.
- Locating the required information in the documents.
These overlap somewhat with FRBR's user tasks (find, identify, select, obtain) but the first step in Vickery's group is my focus here: Identifying the subject of the search. It is a step that I do not perceive as implied in the FRBR "find", and is all too often missing from library/use interactions today.
A person walks into a library... Presumably, libraries are an organized knowledge space. If they weren't the books would just be thrown onto the nearest shelf, and subject cataloging would not exist. However, if this organization isn't both visible and comprehended by users, we are, firstly, not getting the return on our cataloging investment and secondly, users are not getting the full benefit of the library.
In Part V of my series on Catalogs and Context, I had two salient quotes. One by Bill Katz: "Be skeptical of the of information the patron presents"; the other by Pauline Cochrane: "Why should a user ever enter a search term that does not provide a link to the syndetic apparatus and a suggestion about how to proceed?". Both of these address the obvious, yet often overlooked, primary point of failure for library users, which is the disconnect between how the user expresses his information need vis-a-vis the terms assigned by the library to the items that may satisfy that need.
Vickery's Three Issues for Stage 1
Issue 1: Formulating the topic Vickery talks about three issues that must be addressed in his first stage, identifying the subject on which to search in a library catalog or indexing database. The first one is "...the inability even of specialist enquirers always to state their requirements exactly..." [1 p.1] That's the "reference interview" problem that Katz writes about: the user comes to the library with an ill-formed expression of what they need. We generally consider this to be outside the boundaries of the catalog, which means that it only exists for users who have an interaction with reference staff. Given that most users of the library today are not in the physical library, and that online services (from Google to Amazon to automated courseware) have trained users that successful finding does not require human interaction, these encounters with reference staff are a minority of the user-library sessions.
In online catalogs, we take what the user types into the search box as an appropriate entry point for a search, even though another branch of our profession is based on the premise that users do not enter the library with a perfectly formulated question, and need an intelligent intervention to have a successful interaction with the library. Formulating a precise question may not be easy, even for experienced researchers. For example, in a search about serving persons who have been infected with HIV, you may need to decide whether the research requires you to consider whether the person who is HIV positive has moved along the spectrum to be medically diagnosed as having AIDS. This decision is directly related to the search that will need to be done:
HIV-positive persons--Counseling of
AIDS (Disease)--Patients--Counseling of
Issue 2: from topic to query The second of Vickery's caveats is that "[The researcher] may have chosen the correct concepts to express the subject, but may not have used the standard words of the index."[1 p.4] This is the "entry vocabulary" issue. What user would guess that the question "Where all did Dickens live?" would be answered with a search using "Dickens, Charles -- Homes and haunts"? And that all of the terms listed as "use for" below would translate to the term "HIV (Viruses)" in the catalog? (h/t Netanel Ganin):
As Pauline Cochrane points out, beginning in the latter part of the 20th century, libraries found themselves unable to include the necessary cross-reference information in their card catalogs, due to the cost of producing the cards. Instead, they asked users to look up terms in the subject heading reference books used by catalog librarians to create the headings. These books are not available to users of online catalogs, and although some current online catalogs include authorized alternate entry points in their searches, many do not.* This means that we have multiple generations of users who have not encountered "term switching" in their library catalog usage, and who probably do not understand its utility.
Even with such a terminology-switching mechanism, finding the proper entry in the catalog is not at all simple. The article by Thomas Mann (of Library of Congress, not the German author) on “The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries”  shows not only how complex that process might be, but it also indicates that the translation can only be accomplished by a library-trained expert. This presents us with a great difficulty because there are not enough such experts available to guide users, and not all users are willing to avail themselves of those services. How would a user discover that literature is French, but performing arts are in France?:
Performing arts -- France -- History
Or, using the example in Mann's piece, the searcher looking for in information on tribute payments in the Peloponnesian war needed to look under "Finance, public–Greece–Athens". This type of search failure fuels the argument that full text search is a better solution, and a search of Google Books on "tribute payments Peloponnesian war" does yield some results. The other side of the argument is that full text searches fail to retrieve documents not in the search language, while library subject headings apply to all materials in all languages. Somehow, this latter argument, in my experience, doesn't convince.
Issue 3: term order The third point by Vickery is one that keyword indexing has solved, which is "...the searcher may use the correct words to express the subject, but may not choose the correct combination order."[1 p.4] In 1959, when Vickery was writing this particular piece, having the wrong order of terms resulted in a failed search. Mann, however, would say that with keyword searching the user does not encounter the context that the pre-coordinated headings provide; thus keyword searching is not a solution at all. I'm with him part way, because I think keyword searching as an entry to a vocabulary can be useful if the syndetic structure is visible with such a beginning. Keyword searching directly against bibliographic records, less so.
Comparison to FRBR "find" FRBR's "find" is described as "to find entities that correspond to the user’s stated search criteria". [6 p. 79] We could presume that in FRBR the "user's stated search criteria" has either been modified through a prior process (although I hardly know what that would be, other than a reference interview), or that the library system has the capability to interact with the user in such a way that the user's search is optimized to meet the terminology of the library's knowledge organization system. This latter would require some kind of artificial intelligence and seems unlikely. The former simply does not happen often today, with most users being at a computer rather than a reference desk. FRBR's find seems to carry the same assumption as has been made functional in online catalogs, which is that the appropriateness of the search string is not questioned.
Summary There are two take-aways from this set of observations:
- We are failing to help users refine their query, which means that they may actually be basing their searches on concepts that will not fulfill their information need in the library catalog.
- We are failing to help users translate their query into the language of the catalog(s).
I would add that the language of the catalog should show users how the catalog is organized and how the knowledge universe is addressed by the library. This is implied in the second take-away, but I wanted to bring it out specifically, because it is a failure that particularly bothers me.
Notes*I did a search in various catalogs on "cancer" and "carcinoma". Cancer is the form used in LCSH-cataloged bibliographic records, and carcinoma is a cross reference. I found a local public library whose Bibliocommons catalog did retrieve all of the records with "cancer" in them when the search was on "carcinoma"; and that the same search in the Harvard Hollis system did not (carcinoma: 1889 retrievals; cancer 21,311). These are just two catalogs, and not a representative sample, to say the least, but the fact seems to be shown.
References Vickery, B C. Classification and Indexing in Science. New York: Academic Press, 1959.
 Katz, Bill. Introduction to Reference Work: Reference Services and Reference Processes. New York: McGraw-Hill, 1992. p. 82 http://www.worldcat.org/oclc/928951754. Cited in: Brown, Stephanie Willen. The Reference Interview: Theories and Practice. Library Philosophy and Practice 2008. ISSN 1522-0222
 Modern Subject Access in the Online Age: Lesson 3 Author(s): Pauline A. Cochrane, Marcia J. Bates, Margaret Beckman, Hans H. Wellisch, Sanford Berman, Toni Petersen, Stephen E. Wiberley and Jr. Source: American Libraries, Vol. 15, No. 4 (Apr., 1984), pp. 250-252, 254-255 Stable URL: http://www.jstor.org/stable/25626708
 Modern Subject Access in the Online Age: Lesson 2 Pauline A. Cochrane American Libraries Vol. 15, No. 3 (Mar., 1984), pp. 145-148, 150 Stable URL: http://www.jstor.org/stable/25626647
 Thomas Mann, “The Peloponnesian War and the Future of Reference, Cataloging, and Scholarship in Research Libraries” (June 13, 2007). PDF, 41 pp. http://guild2910.org/Pelopponesian%20War%20June%2013%202007.pdf
 IFLA Study Group on the Functional Requirements for Bibliographic Records. Functional Requirements for Bibliographic Records, 2009. http://archive.ifla.org/VII/s13/frbr/frbr_2008.pdf.
Library of Congress: The Signal: Nominations Sought for the U.S. Federal Government End of Term Web Archive
This is a guest post by Abbie Grotke, lead information technology specialist of the Library of Congress Web Archiving Team
Readers of The Signal may recall prior efforts to archive United States Federal Government websites during the end of presidential terms. I last wrote about this in 2012 when we were working on preserving the government domain during the end of President Obama’s first term. To see the results of our 2008 and 2012 efforts, visit the End of Term Archive.
As the Obama administration comes to a close, the End of Term project team has formed again and we need help from you.
For the End of Term 2016 archive, the Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries and the U.S. Government Publishing Office have joined together for a collaborative project to preserve public United States Government websites at the end of the current presidential administration ending January 20, 2017. Partners are joining together to select, collect, preserve, and make the web archives available for research use.
This web harvest — like its predecessors in 2008 and 2012 — is intended to document the federal government’s presence on the web during the transition of Presidential administrations and to enhance the existing collections of the partner institutions. This broad comprehensive crawl of the .gov domain will include as many federal .gov sites as we can find, plus federal content in other domains (such as .mil, .com and social media content).
And that’s where you come in. You can help the project immensely by nominating your favorite .gov website, other federal government websites or governmental social media account with the End of Term Nomination Tool. Please nominate as many sites as you want. Nominate early and often. Tell your friends, family and colleagues to do the same. Help us preserve the .gov domain for posterity, public access and long-term preservation.
I’ve never actually read Fred Brooks’ Mythical Man-Month, but have picked up many of it’s ideas by cultural osmosis. I think I’m not alone, it’s a book that’s very popular by reputation, but perhaps not actually very influential in terms of it’s ideas actually being internalized by project managers and architects.
Or as Brooks himself said:
Some people have called the book the “bible of software engineering.” I would agree with that in one respect: that is, everybody quotes it, some people read it, and a few people go by it.
Ha. I should really get around to reading it, I routinely run into things that remind me of the ideas I understand from it that I’ve just sort of absorbed (perhaps inaccurately).
In the meantime, here’s another good quote from Brooks to stew upon:
The ratio of function to conceptual complexity is the ultimate test of system design.
Quite profound really. Terribly frustrating to work with software packages can, I think, almost always be described in those terms: The ratio of function to conceptual complexity is far, far too low. That is nearly(?) the definition of a frustrating to work with software package.
Filed under: General
Open Knowledge Foundation: What does personal data have to do with open data? Initial thoughts from #MyData2016
This piece is part of a series of posts from MyData 2016 – an international conference that focuses on human centric personal information management. The conference is co-hosted by the Open Knowledge Finland chapter of the Open Knowledge International Network.
What does personal data have to do with open data? We usually preach NOT to open personal data, and to be responsible about it. So why should an open knowledge organisation devote a whole conference to topics related to personal data management? I will explore these questions in a series of blog posts written straight from the MyData16 conference in Helsinki, Finland.
MyData is a very abstract concept that is still in the process of refinement. In its essence, MyData is about giving control of the personal data trail that we leave on the internet to the users. Under the MyData framework, users decide where to store their data and can control and guide the use this data can have. In most applications today, our data is closed and owned by other big corporations, where it is primarily used to make money. The MyData concept looks to bring back the control to the user, but also tries to develop the commercial use of the data, making everyone happy.Under the MyData framework, users decide where to store their data and can control and guide the use this data can have.
Here is Mika Honkanen, vice chairman of the OK Finland board, explaining about MyData:http://blog.okfn.org/files/2016/08/MyDataMika.mp3
For those of you who missed Open Knowledge Festival in 2012 (like me), Open Knowledge Finland know how to produce events. Besides the conference program (and super exciting evening program!), you can also find the Ultrahack, a 72 hours hackathon that will try and answer my questions above and will be involved in creating applicable uses to the MyData concept. I am excited to see how it will turn out and what uses – social and fiscal ones, people can find.
For the following three days, keep following us on the OKI Twitter account for updates from the conference. Check the MyData website, and let us know if you want us to go to a session for you!
In 2015, Evergreen saw two major releases, 2.8.0 and 2.9.0, and a number of maintenance releases.
Two major releases in 2015, just as there were two in 2014, and in each of the three years before that — just as there will be two major releases in 2016.
In 2015, the seventh Evergreen Conference was held in Hood River, Oregon — continuing an unbroken string of annual conferences that was started in 2009.
In 2015, Evergreen’s development team gained a new core committer, Kathy Lussier.
New folks started writing documentation; more libraries started using Evergreen; more bug reports were filed.
In 2015, in particular with the release of 2.9.0, a number of components of Evergreen that had served their purpose were removed. Gone was JSPac. Goodbye, old selfcheck page! Auf Nimmerwiedersehen, script-based circulation policies!
In 2015, work continued on the web-based staff client.
In 2015, the Evergreen web team took steps to ensure backwards compatibility.
To sum up: 2015 was not the most exciting year in the project’s history, but it was a solid one: a year continuing rhythms that had been established and strengthened as the project grew.
Rhythms matter to libraries, of course. There is the staccato of each visit to the library, each checkout, each reference question, each person finding refuge or hope or a few minute distraction. Themes arise and repeat each year: summer reading; the onslaught of undergraduates; conferences; board meetings and budgetary cycles. Sometimes a crescendo surprises us: the adoption of MARC; the disquiet and discussions of seeking to replace MARC; libraries deciding to reclaim their tools and embrace free software.
And the music does not stop: libraries must simultaneously embrace the now, do their part to keep the past alive, and look to the future.
— Galen Charlton, Infrastructure and Added Services Manager
This is the tenth in our series of posts leading up to Evergreen’s Tenth birthday.
From Abigail Grotke, Digital Library Project Manager, Library of Congress
Washington, DC How would YOU like to help preserve the United States federal government .web domain for future generations? But, that's too huge of a swath of Internet real estate for any one person to preserve, right?!
Wrong! The volunteers working on the End of Term Web Archiving Project are doing just that. But we need your help.
From Mike Conlon, VIVO project director
New sites! August was a big month for new VIVO implementations. Eight new implementations are underway:
When The Signal debuted in 2011, its focus was exclusively on the challenge of digital preservation, which is why its URL was http://blogs.loc.gov/digitalpreservation. The Signal was a forum for news and information about digital preservation — unique problems and solutions, standards, collaborations and achievements. The Signal’s authors interviewed leaders in the field, profiled colleagues and drew attention to exemplary projects.
In time, The Signal became a leading source of information about digital preservation. The success of The Signal’s community engagement was evident in the volume of responses we got to our blog posts and the dialogs they sparked; some posts still attract readers and get comments years after the posts’ original publications.
The scope of The Signal has grown organically beyond digital preservation and we are reflecting that growth by changing The Signal’s URL to http://blogs.loc.gov/thesignal. Old links will still work but will redirect to the new URL. If you subscribe to an RSS feed, please change that URL to http://blogs.loc.gov/thesignal/feed.
We will continue to share information about Library of Congress digital initiatives and cover broad topics such as digital humanities, digital stewardship, crowd sourcing, computational research, scholar labs, data visualization, digital preservation and access, eBooks, rights issues, metadata, APIs, data hosting and technology sharing and innovative trends.
As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting George Bailey and Cameron Baker’s talk, “Rackspace Email’s solution for indexing 50k documents per second”.
Customers of Rackspace Email have always needed the ability to search and review login, routing, and delivery information for their emails. In 2008, a solution was created using Hadoop MapReduce to process all of the logs from hundreds of servers and create Solr 1.4 indexes that would provide the search functionality. Over the next several years, the number of servers generating the log data grew from hundreds to several thousands which required the cluster of Hadoop and Solr 1.4 servers to grow to ~100 servers. This growth caused the MapReduce jobs for indexing the data to take anywhere from 20 minutes to several hours.
In 2015, Rackspace Email set out to solve this ever growing need to index and search billions of events from thousands of servers and decided to leverage SolrCloud 5.1. This talk covers how Rackspace replaced over ~100 physical servers with 10 and improved functionality to allow for documents to be indexed and searchable within 5 seconds.
George Bailey is a Software Developer for Rackspace Email Infrastructure.
Cameron Baker is a Linux Systems Engineer for Rackspace Email Infrastructure.
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented by George Bailey & Cameron Baker, Rackspace from Lucidworks
Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…