You are here

Feed aggregator

SearchHub: Rule-Based Replica Assignment for SolrCloud

planet code4lib - Tue, 2015-05-12 17:43

When Solr needs to assign nodes to collections, it can either automatically assign them randomly or the user can specify a set nodes where it should create the replicas. With very large clusters, it is hard to specify exact node names and it still does not give you fine grained control over how nodes are chosen for a shard. The user should be in complete control of where the nodes are allocated for each collection, shard and replica. This helps to optimally allocate hardware resources across the cluster.

Rule-based replica assignment is a new feature coming to Solr 5.2 that allows the creation of rules to determine the placement of replicas in the cluster. In the future, this feature will help to automatically add/remove replicas when systems go down or when higher throughput is required. This enables a more hands-off approach to administration of the cluster.

This feature can be used in the following instances:

  • Collection creation
  • Shard creation
  • Shard splitting
  • Replica creation
Common use cases
  • Don’t assign more than 1 replica of this collection to a host
  • Assign all replicas to nodes with more than 100GB of free disk space or, assign replicas where disk space is more
  • Do not assign any replica on a given host because I want to run an overseer there
  • Assign only one replica of a shard in a rack
  • Assign replica in nodes hosting less than 5 cores or assign replicas in nodes hosting least number of cores
What is a rule?   A rule is a set of conditions that a node must satisfy before a replica core can be created there. A rule consists of three conditions:
  • shard – this is the name of a shard or a wild card (* means for each shard). If shard is not specified, then the rule applies to the entire collection
  • replica – this can be a number or  a wild-card ( * means any number zero to infinity )
  • tag – this is an attribute of a node in the cluster that can be used in a rule .eg: “freedisk” “cores”, “rack”, “dc” etc. The tag name can be a custom string. If creating a custom tag, a Solr plugin called a snitch is responsible for providing tags and values.
  Operators A condition can have one of the four operators
  • equals (no operator required) :   tag:x means tag value must be equal to ‘x’
  • greater than (>) : tag:>x means tag value greater than ‘x’. x must be a number
  • less than (<) :  tag:<x means tag value less than ‘x’. x must be a number
  • not equal (!) :  tag:!x means tag value MUST NOT be equal to ‘x’. The equals check is performed on String value
  Examples example 1 :  keep less than 2 replicas (at most 1 replica) of this collection on any node replica:<2,node:* example 2 : for a given shard , keep less than 2 replicas on any node shard:*,replica:<2,node:*   example 3 :  “assign all replicas in shard1 to rack 730 “ shard:shard1,replica:*,rack:730 default value of replica is *. So, it can be omitted and the rule can be reduced to shard:shard1,rack:730 note: This means that there should be a snitch which provides values for the tag ‘rack’   example 4  :  “create replicas in nodes with less than 5 cores only” replica:*,cores:<5 or simplified as, cores:<5 example 5: Do not create any replicas in host host:! fuzzy operator (~) This can be used as a suffix to any condition. This would first try to satisfy the rule strictly. If Solr can’t find enough nodes to match the criterion, it tries to find the next best match which may not satisfy the criterion For example, best match . freedisk:>200~.  Try to assign replicas of this collection on nodes with more than 200GB of free disk space and if that is not possible choose the node which has the most free disk space   Choosing among equals The nodes are sorted first and the rules are used to sort them. This ensures that even if many nodes match the rules, the best nodes are picked up for node assignment. For example, if there is a rule that says “freedisk:>20” nodes are sorted on disk space descending and a node with the most disk space is picked up first. Or if the rule is “cores:<5”, nodes are sorted with number of cores ascending and the node with least number of cores is picked up first.   Snitch Tag values come from  a plugin called Snitch. If there is a tag called ‘rack’ in a rule, there must be Snitch which provides the value for ‘rack’ for each node in the cluster . A snitch implements Snitch interface . Solr, by default, provides a default snitch which provides the following tags
  • cores : No:of cores in the node
  • freedisk : Disk space available in the node
  • host : host name of the node
  • node: node name
  • sysprop.{PROPERTY_NAME} : These are values available from system properties. sysprop.key means a value that is passed to the node as -Dkey=keyValue during the node startup. It is possible to use rules like sysprop.key:expectedVal,shard:*
  How are Snitches configured? It is possible to use one or more snitches for a set of rules. If the rules only need tags from default snitch it need not be explicitly configured. example: snitch=class:fqn.ClassName,key1:val1,key2:val2,key3:val3   How does the system collect tag values?
  1. Identify the set of tags in the rules
  2. Create instances of Snitches specified. The default snitch is created anyway
  3. Ask each snitch if it can provide values for the any of the tags. If, even one tag does not have a snitch, the assignment fails
  4. After identifying the snitches, ask them to provide the tag values for each node in the cluster
  5. If the value for a tag is not obtained for a given node , it cannot participate in the assignment
  How to configure rules? Rules are specified per collection during collection creation as request parameters. It is possible to specify multiple ‘rule’ and ‘snitch’ params as in this example: snitch=class:EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:<2,dc:dc3 These rules are persisted in the clusterstate in Zookeeper and are available throughout the lifetime of the collection. This enables the system to perform any future node allocation without direct user interaction  

The post Rule-Based Replica Assignment for SolrCloud appeared first on Lucidworks.

District Dispatch: Some debates don’t have E-asy winners

planet code4lib - Tue, 2015-05-12 15:24

A major focus of ALA’s Office for Information Technology Policy (OITP) of late has been recasting the common perception of libraries among many decision makers and influencers to reflect current reality. To this end, OITP has started a Policy Revolution!—the punchy name of our most recent grant project—through which we aim to increase our community’s visibility and capacity for engagement in national policymaking. To bring the “Revolution” to fruition, we are producing a national public policy agenda for libraries, building the capacity of library advocates to communicate effectively with beltway decision makers, and devising new strategies for library advocacy at the national level. As part of these coordinated efforts, OITP (with Senior Counsel Alan Fishel) coined The E’s of Libraries trademark—a pithy shorthand for what today’s libraries do for people: Education, Employment, Entrepreneurship, Empowerment, and Engagement.

Since we began the Policy Revolution! initiative, we’ve been soliciting a wide range of perspectives on the services modern libraries provide—so ALA was eager to help moderate a “tri-bate” on “The E’s” between local area high school students last Tuesday at the Washington, D.C.-based law firm Arent Fox. Participants in the tri-bate were assigned to one of three teams, each of which represented a particular component of the E’s. Each team was asked to argue that their component represented the area in which libraries provide the most benefit to the public: Side 1—Employment and Entrepreneurship; Side 2—Education; Side 3—Engagement and Empowerment.

Tri-baters included Penelope Blackwell, Crystal Carter, Diamond Green, Taylor McDonald, Zinquarn Wright and Devin Wingfield of McKinley Technology High School; and Amari Hemphill, Lauren Terry, Layonna Mathis, Jacques Doby, Davon Thomas, Malik Morris and David Johnson of Eastern Senior High School. OITP’s Larra Clark and Marijke Visser, and ALA Executive Director Keith Michael Fiels comprised the panel of judges.

The discussion was spirited, with each team demonstrating clear strengths. The Employment and Entrepreneurship teams had a strong command of library statistics, citing data from the Institute of Museum and Library Services (IMLS) and the ALA/University of Maryland Digital Inclusion Survey (e.g., 75% of libraries provide assistance with preparing resumes, and 96% offer online job and employment resources). They made a strong case for libraries not just as employment hubs, but also as trusted workforce development centers, where people of all ages can build the skills and competencies needed to be competitive in the digital age. Their arguments were particularly apropos, given ALA’s ongoing efforts to ensure libraries are recognized as eligible participants in the skills training activities recently authorized by the Workforce Investment and Opportunity Act (WIOA).

The Education teams did a strong job of describing libraries as safe spaces that support learning within and beyond school walls. They shared a clear understanding that libraries not only provide opportunities for K-12 students, but also to non-traditional students seeking to gain skills and credentials critical for participation in today’s economy. Perhaps the most impressive aspect of one team’s performance was their decision to describe education as the foundation that undergirds every other aspect of life in which libraries provide assistance: “How can you create a resume if you can’t read or write,” the team asked in their rebuttal, providing one of the lines of the day.

The Engagement and Empowerment teams also turned in impressive performances. Despite their formidable task of describing library involvement in two hard-to-define areas, the team met the challenge by depicting libraries as places where people have the freedom and the resources to pursue individual passions and interests. They also displayed a strong understanding of the modern library as a one-stop community hub, explaining that libraries of all kinds are secure spaces that keep young people on the path to productivity, and provide all people the opportunity to participate in society.

As impressed as we were by the students’ firm grasp of the resources and services today’s libraries provide, the day was fundamentally not about gauging their ability to articulate what our community does on a national scale. It was rather about gaining their personal perspectives on the strengths and challenges of library service, and their expectations for what libraries should do to meet the needs of communities of all kinds.

The discussions the judges had with the students following the conclusion of the tri-bate were particularly informative in this regard. Several students suggested that libraries should find new ways to engage young people, which we at OITP particularly appreciate, given our ongoing work to build a new program on children and youth. Students called for librarians to connect with them at recreation centers and other non-library spaces to raise awareness and connect library services in new (and fun!) ways. Others suggested that library professionals should continue and even enhance their focus on providing instruction in digital technologies and basic computer and internet skills.

We found the students’ perspectives and input invaluable, and we look forward to using it to inform our continued work to raise awareness of all today’s libraries do for the public, and to increase the library community’s profile in national public policy debates. We want to thank Arent Fox for hosting the day’s session, and—most importantly—all of the students who participated for an invigorating and informative discussion. Great job to all!

The post Some debates don’t have E-asy winners appeared first on District Dispatch.

David Rosenthal: Potemkin Open Access Policies

planet code4lib - Tue, 2015-05-12 15:00
Last September Cameron Neylon had an important post entitled Policy Design and Implementation Monitoring for Open Access that started:
We know that those Open Access policies that work are the ones that have teeth. Both institutional and funder policies work better when tied to reporting requirements. The success of the University of Liege in filling its repository is in large part due to the fact that works not in the repository do not count for annual reviews. Both the NIH and Wellcome policies have seen substantial jumps in the proportion of articles reaching the repository when grantees final payments or ability to apply for new grants was withheld until issues were corrected.He points out that:
Monitoring Open Access policy implementation requires three main steps. The steps are:
  1. Identify the set of outputs are to be audited for compliance
  2. Identify accessible copies of the outputs at publisher and/or repository sites
  3. Check whether the accessible copies are compliant with the policy
Each of these steps are difficult or impossible in our current data environment. Each of them could be radically improved with some small steps in policy design and metadata provision, alongside the wider release of data on funded outputs.He makes three important recommendations:
  • Identification of Relevant Outputs: Policy design should include mechanisms for identifying and publicly listing outputs that are subject to the policy. The use of community standard persistable and unique identifiers should be strongly recommended. Further work is needed on creating community mechanisms that identify author affiliations and funding sources across the scholarly literature.
  • Discovery of Accessible Versions: Policy design should express compliance requirements for repositories and journals in terms of metadata standards that enable aggregation and consistent harvesting. The infrastructure to enable this harvesting should be seen as a core part of the public investment in scholarly communications.
  • Auditing Policy Implementation: Policy requirements should be expressed in terms of metadata requirements that allow for automated implementation monitoring. RIOXX and ALI proposals represent a step towards enabling automated auditing but further work, testing and refinement will be required to make this work at scale.
What he is saying is that defining policies that mandate certain aspects of Web-published materials without mandating that they conform to standards that make them enforceable over the Web is futile. This should be a no-brainer. The idea that, at scale, without funding, conformance will be enforced manually is laughable. The idea that researchers will voluntarily comply when they know that there is no effective enforcement is equally laughable.

LITA: LITA Presents Two Webinars on Kids, Technology and Libraries

planet code4lib - Tue, 2015-05-12 13:00

Technology and Youth Services Programs: Early Literacy Apps and More
Wednesday May 20, 2015
1:00 pm – 2:00 pm Central Time


After Hours: Circulating Technology to Improve Kids’ Access
Wednesday May 27, 2015
1:00 pm – 2:00 pm Central Time

Register now for either or both of these exciting and lively webinars

Technology and Youth Services Programs, join Claire Moore, Head of Children’s Service at Darien Library (CT). In this digital age it has become increasingly important for libraries to infuse technology into their programs and services. Youth services librarians are faced with many technology routes to consider and app options to evaluate and explore. Claire will discuss innovative and effective ways the library can create opportunities for children, parents and caregivers to explore new technologies.

After Hours: Circulating Technology to Improve Kids’ Access, join Megan Egbert the Youth Services Manager for the Meridian Library District (ID). For years libraries have been providing access and training to technology through their services and programs. Kids can learn to code, build a robot, and make a movie with an iPad at the library. But what can they do when they get home? The Meridian Library (ID) has chosen to start circulating new types of technology such as Arduinos, Raspberry Pi’s, robots, iPads and apps. Join Megan to discover benefits, opportunities and best practices.

Register for either one or both of the webinars

Full details
Can’t make the date but still want to join in? Registered participants will have access to the recorded webinar.


LITA Member: $45
Non-Member: $105
Group: $196

Registration Information

Register Online page arranged by session date (login required)
Mail or fax form to ALA Registration
Call 1-800-545-2433 and press 5

Questions or Comments?

For all other questions or comments related to the course, contact LITA at (312) 280-4269 or Mark Beatty,

DuraSpace News: Overview of DuraSpace-Related Presentations at OR2015

planet code4lib - Tue, 2015-05-12 00:00

Winchester, MA  We are looking forward to seeing friends, colleagues, partners and collaborators at next month’s Tenth International Conference on Open Repositories (OR2015) in Indianapolis. The full program is available here. The breakdown below highlights those sessions related to DuraCloud, DSpace, Fedora and Hydra where DuraSpace staff and/or community members will be presenting or co-presenting.

DuraSpace News: Traveling to Indy for OR2015? Please come to the Developer/Ideas Challenge Reception

planet code4lib - Tue, 2015-05-12 00:00

Winchester, MA  Please join co-hosts DuraSpaceDPN (Digital Preservation Network) and ORCID as we celebrate developers and creative ideas for moving repositories forward. The informal reception will be held at the Tomlinson Tap Room located in the Indianapolis City Market, home of the original Farmers' Market.

Galen Charlton: Forth to Hood River!

planet code4lib - Mon, 2015-05-11 23:51

Tomorrow I’m flying out to Hood River, Oregon, for the 2015 Evergreen International Conference.

I’ve learned my lesson from last year — too many presentations at one conference make Galen a dull boy — but I will be speaking a few times:

Hiding Deep in the Woods: Reader Privacy and Evergreen (Thursday at 4:45)

Protecting the privacy of our patrons and their reading and information seeking is a core library value – but one that can be achieved only through constant vigilance. We’ll discuss techniques for keeping an Evergreen system secure from leaks of patron data; policies on how much personally identifying information to keep, and for how long; and how to integrate Evergreen with other software securely.

Angling for a new Staff Interface (Friday at 2:30)

The forthcoming web-based staff interface for Evergreen uses a JavaScript framework called AngularJS. AngularJS offers a number of ways to ease putting new interfaces together quickly such as tight integration of promises/deferred objects, extending HTML via local directives, and an integrated test framework – and can help make Evergreen UI development (even more) fun. During this presentation, which will include some hands-on exercise, Bill, Mike and Galen will give an introduction to AngularJS with a focus on how it’s used in Evergreen. By the end of the session, attendees have gained knowledge that they can immediately apply to working on Evergreen’s web staff interface. To perform the exercises, attendees are expected to be familiar with JavaScript .

Jane in the Forest: Starting to do Linked Data with Evergreen (Saturday at 10:30)

Linked Data has been on the radar of librarians for years, but unless one is already working with RDF triple-stores and the like, it can be a little hard to see how the Linked Data future will look like for ILSs. Adapting some of the ideas of the original Jane-athon session at ALA Midwinter 2015 in Chicago, we will go through an exercise of putting together small sets of RDA metadata as RDF… then seeing how that data can be used in the Evergreen. By the end, attendees will have learned a bit not just about the theory of Linked Data, but how working with it can work in practice.

I’m looking forward to hearing other presentations and the keynote by Joseph Janes, but more than that, I’m looking forward to having a chance to catch up with friends and colleagues in the Evergreen community.

Karen Coyle: Catalogers and Coders

planet code4lib - Mon, 2015-05-11 22:30
Mandy Brown has a blog post highlighting The Real World of Technology by Ursula Franklin. As Brown states it, Franklin describes
holistic technologies and prescriptive technologies. In the former, a practitioner has control over an entire process, and frequently employs several skills along the way...By contrast, a prescriptive technology breaks a process down into steps, each of which can be undertaken by a different person, often with different expertise.It's the artisan vs. Henry Ford's dis-empowered worker. As we know, there has been some recognition, especially in the Japanese factory models, that dis-empowered workers produce poorer quality goods with less efficiency. Brown has a certain riff on this, but what came immediately to my mind was the library catalog.

The library catalog is not a classic case of the assembly line, but it has the element of different workers being tasked with different aspects of an outcome, but no one responsible for the whole. We have (illogically, I say) separated the creation of the catalog data from the creation of the catalog.

In the era of card catalogs (and the book catalogs that preceded them), catalogers created the catalog. What they produced was what people used, directly. Catalogers decided the headings that would be the entry points to the catalog, and thus determined how access would take place. Catalogers wrote the actual display that the catalog user would see. Whether or not people would find things in the catalog was directly in the hands of the catalogs, and they could decide what would bring related entries within card-flipping distance of each other, and whether cross-references were needed.

The technology of the card catalog was the card. The technologist was the cataloger.

This is no longer the case. The technology of the catalog is now a selection of computer systems. Not only are catalogers not designing these systems, in most cases no one in libraries is doing so. This has created a strange and uncomfortable situation in the library profession. Cataloging is still based on rules created by a small number of professional bodies, mostly IFLA and some national libraries. IFLA is asking for comments on its latest edition of the International Cataloging Principles but those principles are not directly related to catalog technology. Some Western libraries are making use of or moving toward the rules created by the Joint Steering Committee for Resource Description and Access (RDA), which boasts of being "technology neutral." These two new-ish standards have nothing to say about the catalog itself, as if cataloging existed in some technological limbo.

Meanwhile, work goes on in bibliographic data arena with the development of the BIBFRAMEs, variations on a new data carrier for cataloging data. This latter work has nothing to say about how resources should be cataloged, and also has nothing to say about what services catalogs should perform, nor how they should make the data useful. It's philosophy is "whatever in, whatever out."

Meanwhile #2, library vendors create the systems that will use the machine-readable data that is created following cataloging rules that very carefully avoid any mention of functionality or technology. Are catalog users to be expected to perform left-anchored searches on headings? Keyword searches on whole records? Will the system provide facets that can be secondary limits on search results? What will be displayed to the user? What navigation will be possible? Who decides?

The code4lib community talks about getting catalogers and coders together, and wonders if catalogers should be taught to code. The problem, however, is not between coders and catalogers but is systemic in our profession. We have kept cataloging and computer technology separate, as if they aren't both absolutely necessary. One is the chassis, the other the engine, but nothing useful can come off the assembly line unless both are present in the design and the creation of the product.

It seems silly to have to say this, but you simply cannot design data and the systems that will use the data each in their own separate silo. This situation is so patently absurd that I am embarrassed to have to bring it up. Getting catalogers and coders together is not going to make a difference as long as they are trying to fit one group's round peg into the others' square hole. (Don't think about that metaphor too much.) We have to have a unified design, that's all there is to it.

What are the odds? *sigh*

Nicole Engard: Bookmarks for May 11, 2015

planet code4lib - Mon, 2015-05-11 20:30

Today I found the following resources and bookmarked them on Delicious.

  • Which Flight Will Get You There Fastest? FiveThirtyEight analyzed 6 million flights to figure out which airports, airlines and routes are most likely to get you there on time and which ones will leave you waiting.
  • SemanticScuttle SemanticScuttle is a social bookmarking tool experimenting with new features like structured tags and collaborative descriptions of tags.

Digest powered by RSS Digest

The post Bookmarks for May 11, 2015 appeared first on What I Learned Today....

Related posts:

  1. Delicious Tag Bundles
  2. A Wiki goes Social
  3. Flight info from Google

Evergreen ILS: Mutual Respect in the Evergreen Community

planet code4lib - Mon, 2015-05-11 19:43

This week’s Evergreen International Conference marks the first event in our community where attendees were asked during registration to adhere to an Event Code of Conduct and a Photography Policy.

The Executive Oversight Board approved the Code of Conduct (CoC) during the last Evergreen conference in Cambridge and approved the photography policy shortly thereafter. Since that time, several of us in the community have been working on the procedures for addressing CoC complaints if they arise. Our hope is that we will never need to use these procedures, but we want to make sure that we respond appropriately if we do need to use them.

Attendees will be able to convey their photography preferences through their lanyard color.
Photo by Grace Dunbar

I want to use this blog post to talk about why I contributed my time over the past year to work on the photography policy and to help put the CoC procedures in place.

As a community, I think it’s important that we ensure the CoC isn’t just a statement we put on our web site and then assume we have done our job to establish a safe environment at our events. Instead, it’s the first step towards setting a tone for the behavior we expect of our community members, not just when attending Evergreen events, but also in their day-to-day interactions on the list, in the #evergreen IRC channel, or in other project communication channels.

The Evergreen community is one in which we treat everyone as a professional, regardless of their gender, sexual orientation, disability, physical appearance, body size, race or religion. We might not always agree with other people and may even dislike others in the community. However, we should always treat them with the respect that every professional deserves.

We are a community of mutual respect, where people don’t need to worry that they will be harassed or joked about because of perceived differences; where people know that they will be treated as a colleague, not as a potential date; and where people know they will be judged on the merit of their contributions.

Why is this important? By establishing this kind of environment, people will see the Evergreen community as a welcoming one, not one that should be avoided. They’ll be more likely to contribute. However, the most compelling reason is that it’s the right thing to do.

As everyone gets ready to attend the conference, I encourage to read over the Code of Conduct as a reminder of what kind of behavior may be considered a violation of the code. If you are uncertain if something you might say or incorporate in a presentation is inappropriate, err on the side of caution. If you see or experience something that is out of line, report it to an incident responder.

Also, please remember to abide by the new photography policy. The lanyards are a great way for people to let you know what their photography preferences are. We expect you to respect their wishes. If you want to record somebody, you always must ask first.

Most of all, enjoy the conference. If everyone follows these guidelines, it should be an informative and enjoyable event for all.

Federated Search Blog: Federated Search – The Killer iWatch App?

planet code4lib - Mon, 2015-05-11 14:30

This post was re-published with permission from the Deep Web Technologies Blog. Please view the original post here.

The Beagle Research Group Blog posted “Apple iWatch: What’s the Killer App” on March 10, including this line: “An alert might also come from a pre-formed search that could include a constant federated search and data analysis to inform the wearer of a change in the environment that the wearer and only a few others might care about, such as a buy or sell signal for a complex derivative.” While this enticing suggestion is just a snippet in a full post, we thought we’d consider the possibilities this one-liner presents. Could federated search become the next killer app?

Well no, not really. Federated search in and of itself isn’t an application, it’s more of a supporting technology.  It supports real-time searching, rather than indexing, and provides current information on fluxuating information such as weather, stocks, flights, etc.  And that is exactly why it’s killer: Federated Search finds new information of any kind, anywhere, singles out the most precise data to display, and notifies the user to take a look.

In other words, its a great technology for mobile apps to use.  Federated search connects directly to the source of the information, whether medical, energy, academic journals, social media, weather, etc. and finds information as soon as it’s available.  Rather than storing information away, federated search links a person to the data circulating that minute, passing on the newest details as soon as they are available, which makes a huge difference with need-to-know information.  In addition, alerts can be set up to notify the person, researcher, or iWatch wearer of that critical data such as a buy or sell signal as The Beagle Research Group suggests.

Of course, there’s also the issue of real-estate to keep in mind – the iWatch wraps less that 2 inches of display on a wrist.  That’s not much room for a hefty list of information, much less junky results.  What’s important is the single, most accurate piece of information that’s been hand-picked (so to speak) just for you pops up on the screen.  Again, federated search can makes that happen quite has connections.

There is a world of possibility when it comes to using federated search technology to build applications, whether mobile or for desktop uses. Our on-demand lifestyles require federating, analyzing, and applying all sorts of data, from health, to environment, to social networking. Federated search is not just for librarians finding subscription content anymore.  The next-generation federated search is for everyone in need of information on-the-fly. Don’t worry about missing information (you won’t).  Don’t worry if information is current (it is).  In fact, don’t worry at all. Relax, sit back and get alert notifications to buy that stock, watch the weather driving home, or check out an obscure tweet mentioning one of your hobbies. Your world reports to you what you need to know.  And that, really, is simply killer.

Thom Hickey: In defense of MARC

planet code4lib - Mon, 2015-05-11 14:16

My colleague Roy Tennant has eloquently argued that "MARC Must Die".  Certainly much of what he says is true, but it's been almost 13 years since that column and MARC, while weakening, still seems to be hanging on.  I think there is actually more going on here than simply legacy software that won't go away, so I'd like to offer my less eloquent defense of MARC.

First what I'm defending here is underlying MARC Communications format (ANSI Z39.2/ISO 2709), or at least the XML/JSON equivalents, not full MARC-21 or other variants with their baggage of rules.  Those rules and their encoding could also use some defense (after all they were all established after extensive debate) but that is a different essay.

What I like about MARC is its relative lack of predetermined structure beyond indicators, fields and subfields and conventions about using 3 digit numerics for tags, and a-z, 0-9 for subfield indicators.  Admittedly, you are restricted with the two level field/subfield, but an amazing amount of bibliographic stuff fits (and anything can be shoved) into such a structure. The 5 digit length field is a serious limitation, but overcome with the translation to XML/JSON.  I watched a video by Donald Knuth recently about "Constraint Based Composition" where he talks about constraints on form fostering creativity.  Maybe MARC format's constraints may be just the sort of structure that makes metadata somewhat manageable, while still retaining some flexibility.

Contrast this to a MODS/MADS record.  Here everything is laid out, with rules to make sure you don't modify the structure.  When we started xA, a small authority file to feed into and control VIAF, MADS seemed a better fit for the rather simple authority records we expected to create than a full-blown MARC-21 authorities record.  What I didn't realize was that bringing up a simple editor for MARC is a day or two's work (see mS, also known as the MARC Sandbox).  A full MADS editor would be substantially more work.  In fact xA's simple take on MADS was substantially more work.  Which was OK, until we decided to change something.  No longer was it just a matter of sticking in a new text field (and possibly modifying a table to allow it), but interface issues need to be faced, and changes made in the record display and editing forms.

All of a sudden, the slightly more work for a possibly friendlier form, turned very unfriendly for someone (me) trying not to spend any more time on the infrastructure than absolutely necessary.  In mS you can actually cut and paste from a MARC display in other MARC tools and things 'just work'.  Plus there is the familiarity with MARC which makes many people (at least where I work!) very comfortable.  The ability to easily add fields and subfields to accommodate new needs appears to be much more important than a pretty interface.  In fact, we attempted to extend xA to handle the needs of some Syriac scholars, but failed because of the amount of effort involved.  What did they use instead?  A very complex Excel spreadsheet!  I've seen any number of attempts to fit bibliographic information into spreadsheets, and they do not work nearly as well as MARC.

The first version of the mS form was actually MARC, but with tags, indicators, subfields neatly separated into their own boxes.  It turned out that the simple text field we are now using was both easier to work with (e.g. cut/paste) and substantially easier to get working smoothly.

I admit that using MARC seems quite retro. MARC is an old format that has held up surprising well, but is showing its age. Using it is a bit like using Emacs or Vi to create software.  They are very 'close to the metal' and that's just what lots of people want.


LITA: Don’t Go Chasin’ Waterfalls

planet code4lib - Mon, 2015-05-11 14:00

Fellow LITA blogger Leo Stezano has been knocking it out of the park lately with his insightful posts about Agile Development. Agile is a word that gets thrown around a lot in the tech world, and while many people in the library world may have a kinda-sorta understanding of what it is, far less have a solid understanding of why it is. Agile seems to make a lot of sense on the surface, but one can only appreciate Agile when one knows where it came from and what it was rebelling against.

In the beginning, there was the dreaded Waterfall model:

By Peter Kemp / Paul Smith (Adapted from Paul Smith’s work at wikipedia) [CC BY 3.0 (], via Wikimedia Commons

As you can see from the diagram, you start with requirements and work on them until they are done. Then you move on to design and work on it until it is done, then implementation, etc. and the fun rolls downhill from there. The basic assumption of the Waterfall model is that we even can be truly “done” with any of these discrete phases of the development process. This old-school way of handling projects is sometimes derogatorily referred to as “Big Bang Development” (no relation to the TV show) because you get the requirements for the project (which you understand completely, right?) and then you go off into your cave to meditate for months and re-emerge with a shining, beautiful and perfect deliverable. That’s the dream, anyways.

What really happens if you try to run a project like this is that you would re-emerge with a product that does exactly what it was supposed to do (according to what you were told), but not what it actually needs to do. Let’s use an example: say your boss tells you, “I want you to make a pizza.” Awesome! What kind of pizza, boss? “Pepperoni, and make it big enough for everyone in the office to have some.” Alrighty, cool. You have your requirements, you go to the store to get the ingredients, and you build a big honkin’ pepperoni pizza. You deliver it to your boss, who promptly asks you “Is it organic? Gluten-free? Can it have feta cheese instead of mozzarella, and maybe capicola and olives instead of pepperoni? Also, I’m not extending your deadline or giving you more money to buy more groceries.” Bummer. You thought you knew what your boss wanted, and they probably did, too, but it turns out no one really knew what the successful project should look like at the beginning. The client (your boss) only knew what they wanted in response to what you give them.

This fact of life that clients don’t really know exactly what they want at the beginning of a project (or aren’t capable of accurately conveying that information to you, which is just as destructive) is a truth that the Waterfall model can’t accommodate. Fortunately, its a truth that is foundational to Agile Development. Agile assumes that clients only have a vague idea of what they want, and the entire development process is based around iterations where you build a little bit of something, show it to the client to get feedback, and then build a little more based on that feedback. It may not be as clean or discrete as the Waterfall model, but it works incredibly well by recognizing the psychological phenomenon that people are better at describing what changes they want to an existing thing than they are at describing what they want from scratch. In fact, this is why the process is called Agile; it’s a workflow that can stop on dime and change direction based on the client’s changing needs. The Waterfall model of development is like a freight train barreling down the tracks defined by the requirements at the start of the project; once it leaves the station, there’s no stopping it.

Let’s try the pizza project again, but from an Agile perspective. Your boss wants a pepperoni pizza. You go back to your office and draft up an ingredients list with links to the ingredients you’ll use on Amazon. You email it to your boss who looks at the list and emails you back to say that it looks good, but let’s make a few changes. They email you back a word document with the new ingredients they want. You go to the store and get them, you make a small test pizza and let your boss sample it. “I like it extra crispy. Cook it longer.” You make another test pizza cooked longer. “Perfect!” You make a giant pizza, everyone in the office loves it and they throw a parade in your honor. Iteration has saved the day!

While this is a heavily simplified view of both the Waterfall and Agile models of development, I feel that this iterative perspective on projects is one of the most important takeaways of learning to be Agile, and it’s a principle that applies to all projects, not just software development. So long as you remember that humans aren’t perfect and aren’t always great at explaining what they need, you can use this fundamental Agile principle for every kind of project (even making a pizza for your coworkers). When you take on a new task, start small and iterate small. It’s the fastest way to correctly get to the big picture.

CrossRef: New CrossRef Members

planet code4lib - Mon, 2015-05-11 13:51

Updated May 2, 2015

Voting Members
Association for Promotion and Development of Maritime Industries
Centro de Estudios Politicos y Constitucionales
Corporation Universidad de la Costa, CUC
Council for Children with Behavioral Disorders
Faculty of Agriculture in Osijek
IUPUI University Library
Medinform LTD
Sociedad Espanola de Patologia Dual (SEPD)
State University of Malang (UM)
Universa Medicina
Universidad de la Rioja
University of Rijeka, Faculty of Economics
Whioce Publishing Pte Ltd

Represented Members
Associacao Sergipana de Ciencia
Association of Test Publishers (ATP)
Ege Egitim Dergisi
Federacao Brasileira de Psicodrama
Institute for EU Studies
Institute of Global Affairs
Korean Academy of International Commerce
Korean Society for Engineering Education
Religious Studies
Sociedade Unificada de Ensino Augusto Motta -UNISUAM
The Center for Asia and Diaspora
The Korean Linguistic Society
The Society for History of Education
The Society of Korean Photography
Universidade Federal de Pelotas
Zaporozhye State Medical University

Last updated April 27, 2015

Voting Members
APO Society of Specialists in Heart Failure
French Sciences Publishing Group
International Academy of Science and Higher Education
International Solar Energy Society (ISES)
Journal of Contemporary Medicine and Dentistry
Laeknabladid/The Icelandic Medical Journal
SciPress Ltd
Telecommunications Association Inc.
The Association of Baccalaureate Social Work Program Directors, Inc.
University of Leon

Represented Members
Buryat State University
Institute of Biochemistry
International Journal of Applied Mathematics, Electronics and Computers
International Journal of Economics and Administrative Studies
Scientific and Practical Reviewed Journal Pulmonology
Selcuk Iletisim
SMS Institute of Technology
Society of Pansori
The Modern English Society

CrossRef: CrossRef Indicators

planet code4lib - Mon, 2015-05-11 13:45

Updated May 4, 2015

Total no. participating publishers & societies 4751
Total no. voting members 3424
% of non-profit publishers 57%
Total no. participating libraries 1947
No. journals covered 39,015
No. DOIs registered to date 73,608,961
No. DOIs deposited in previous month 483,190
No. DOIs retrieved (matched references) in previous month 59,784,568
DOI resolutions (end-user clicks) in previous month N/A

Evergreen ILS: Evergreen Staff Procedures for Handling a Code of Conduct Violation

planet code4lib - Mon, 2015-05-11 13:21

This procedure has been adapted from the PyCon Staff Procedure for incident handling , from Readercon Procedures for Addressing Reported Violations of the Code of Conduct, and from the Ada Initiative’s guide titled “Conference anti-harassment/Responding to Reports”.

Be sure to have a good understanding of our Code of Conduct policy, which can be found here:

Quiet Space: For the 2015 conference, there will be designated quiet space for talking to reporters. The designated quiet space on Wednesday will be the conference office space/suite. The Mt. Hood room should be used for the rest of the conference.

Procedures for Conference Staff

If the initial report is received by conference staff, the conference staff should:

  1. Ask whether the person making the report needs medical care and call 911 if they do.
  2. Ask the person what can be done to make them feel safe in the moment.
  3. Ask the person to relocate to a designated quiet space or the a quiet space of a person’s choosing (such as a hotel room or their friend’s room).
  4. Invite the person to call a partner, friend, or other supporter if they don’t already have someone with them.
  5. Tell the person that they are going to make a call to summon a designated responder to talk to them further.
  6. Conference staff should immediately connect the attendee with one of the trained Evergreen incident responders to handle the report. All conference staff should receive cell phone numbers for each of the incident responders s prior to the start of the conference, along with times they will be unavailable to respond (e.g. when they are presenting a program)
Procedures for Responders

If the initial report is received by an incident responder, the responder should follow steps 1-4 above. If the report is made by phone, the responder should stay on the phone with the reporter until they are in the same place.

Incident responders should ask the reporter to describe the incident and write down the information in a written report.. Report forms are available at and will also be available in paper form to help you gather the following information:

  • Identifying information (name) of the participant doing the harassing
  • The behavior that was in violation
  • The approximate time of the behavior (if different than the time the report was made)
  • The circumstances surrounding the incident
  • Other people involved in the incident

Prepare an initial response to the incident. This initial response is very important and will set the tone for the Evergreen Conference. Depending on the severity/details of the incident, please follow these guidelines:

  • If there is any general threat to attendees or the safety of anyone including conference staff is in doubt, summon security or police
  • Offer the victim a private place to sit
  • Ask “is there a friend or trusted person who you would like to be with you?” (if so, arrange for someone to fetch this person)
  • Ask them “how can I help?”
  • Provide them with your list of emergency contacts if they need help later
  • If everyone is presently physically safe, involve law enforcement or security only at a victim’s request

There are also some guidelines as to what not to do as an initial response:

  • Do not overtly invite them to withdraw the complaint or mention that withdrawal is OK. This suggests that you want them to do so, and is therefore coercive. “If you’re OK with it [pursuing the complaint]” suggests that you are by default pursuing it and is not coercive.
  • Do not ask for their advice on how to deal with the complaint. This is a staff responsibility.
  • Do not offer them input into penalties. This is the staff’s responsibility.
Addressing the Complaint

Once something is reported to an incident responder, the responder should immediately meet with the Safety Committee. If the reporter of the incident or the alleged harasser is a member of the Safety Committee, they should recuse themselves from this meeting and any subsequent discussion on how to address the complaint. The main objectives of this meeting is to find out the following:

  • What happened?
  • Are we doing anything about it?
  • Who is doing those things?
  • When are they doing them?
  • What do we want to communicate with the alleged harasser?

After the staff meeting and discussion, have an incident responder  communicate with the alleged harasser. Make sure to inform them of what has been reported about them.

Allow the alleged harasser to give their side of the story to the staff. After this point, if the report stands, let the alleged harasser know what actions will be taken against them. An additional meeting of the Safety Committee may be required before a final decision is made.

Some things for the staff to consider when dealing with Code of Conduct offenders:

  • Warning the harasser to cease their behavior and that any further reports will result in sanctions
  • Requiring that the harasser avoid any interaction with, and physical proximity to, their victim for the remainder of the event
  • Ending a talk that violates the policy early
  • Not publishing the video or slides of a talk that violated the policy
  • Not allowing a speaker who violated the policy to give (further) talks at the event now or in the future
  • Immediately ending any event volunteer responsibilities and privileges the harasser holds
  • Requiring that the harasser not volunteer for future events your organization runs (either indefinitely or for a certain time period)
  • Requiring that the harasser refund any community-funded travel grants and similar they received (this would need to be a condition of the grant at the time of being awarded)
  • Requiring that the harasser immediately leave the event and not return
  • Banning the harasser from future events (either indefinitely or for a certain time period)
  • Providing a report to the harasser’s employer in cases where the harassment occurred in an official employee capacity, such as working while paid event staff, while giving a talk about their employer’s product, while staffing an exhibit booth, while wearing their employers’ branded merchandise, while attempting to recruit someone for a job, or while claiming to represent their employer’s views.
  • Removing a harasser from membership in relevant organizations
  • Recommendation from the Safety Committee to remove the harasser from a leadership position in the community. This recommendation would be made to the group with the authority to make this decision (e.g. the EOB in the case of an EOB member, the developer community in the case of a core committer/release manager, etc.)
  • Publishing an account of the harassment and calling for the resignation of the harasser from their responsibilities (usually pursued by people without formal authority: may be called for if the harasser is the event leader, or refuses to stand aside from the conflict of interest, or similar; typically event staff have sufficient governing rights over their space that this isn’t as useful)

Give accused attendees a place to appeal to if there is one, but in the meantime the report stands.

Keep in mind that it is not a good idea to encourage an apology from the harasser. Forcing a victim of harassment to acknowledge an apology from their harasser forces further contact with their harasser. It also creates a social expectation that they will accept the apology, forgive their harasser, and return their social connection to its previous status.

If the harasser offers to apologize to the victim (especially in person), we suggest strongly discouraging it. If a staff member relays an apology to the victim, it should be brief and not require a response. (“X apologizes and agrees to have no further contact with you” is brief. “X is very sorry that their attempts to woo you were not received in the manner that was intended and will try to do better next time, they’re really really sorry and hope that you can find it in your heart to forgive them” is emphatically not.)

If the harasser attempts to press an apology on someone who would clearly prefer to avoid them, or attempts to recruit others to relay messages on their behalf, this may constitute continued harassment.

Communicating with the Community

It is very important how we deal with the incident publicly. Our policy is to make sure that everyone aware of the initial incident is also made aware that it is not according to policy and that official action has been taken – while still respecting the privacy of individual attendees. When speaking to individuals (those who are aware of the incident, but were not involved with the incident) about the incident it is a good idea to keep the details out.

In most cases, the conference chair, or designate, should make one or more public announcements describing the behavior involved and the repercussions. If necessary, this will be done with a short announcement either during the plenary and/or through other channels. No one other than the conference chair or someone delegated authority from the conference chair should make any announcements. No personal information about either party will be disclosed as part of this process. A sample statement might be:

“<thing> happened. This was a violation of our policy. We apologize for this. We have taken <action>. This is a good time for all attendees to review our policy at <location>. If anyone would like to discuss this further they can <contact us somehow>.”

If some attendees were angered by the incident, it is best to apologize to them that the incident occurred to begin with. If there are residual hard feelings, suggest to them to write an email to the conference chair or to the event coordinator. It will be dealt with accordingly.


Post-event procedures

Following the Evergreen conference, the Safety Committee will:

  • Solicit post-conference feedback to evaluate whether the conference provided a safe place for attendees and to determine if Code of Conduct violations are not being reported.
  • Review Code of Conduct incidents that occurred (if any) to determine if there are any ways they can improve the handling of such incidents. The intent of this review is not to reconsider action taken for the specific incident. Instead, it is an opportunity to identify ways to improve the procedures for responders and the Safety Committee .
  • Conduct an annual review of the Code of Conduct and procedures to identify any changes that are required.

Islandora: The Future of Forms in Islandora

planet code4lib - Mon, 2015-05-11 13:10

If you have been following events in the Islandora community lately, you probably know that we are in the midst of an ambitious project to build an Islandora that works on top of Fedora 4. The project is going great, but it has raised some questions about how to proceed with one of the most crucial yet under-appreciated tools in the Islandora stack: XML Formbuilder.

Largely the work of a single developer (discoverygarden's awesome Nigel Banks), XML Form Builder is a powerful (and sometimes difficult-to-master) tool that allows you to leverage the ease of Drupal forms to edit metadata for your Fedora objects. All Islandora Solution Packs come with standard MODS forms, but XML Form Builder lets you go far beyond those basic building blocks to meet almost any use case. A simple form for students to add objects with just a few fields; a brand new form to manipulate esoteric metadata standards; a tiny tweak to an existing Solution Pack form that brings it in line with your institution's specific needs. 

The Fedora 4 project team recognizes that this functionality needs to exist in the future version of Islandora. The question now is: How? Do we port over XML Forms and untangle the legacy of compromises that allows it to work in the current environment? Do we sidestep the issue and build a new tool that more closely leverages the underlying structure of Fedora 4? Do we build a new tool and make it look like XML Form Builder so it's comfortable to use, but still works completely differently? XML, RDF, Xpath, etc. It's a big task and it comes with a lot of big questions.

So, we turn to you, the Islandora Community, to get some answers. What tools do you need to work with metadata for your collections? How are you using XML Form Builder now? What parts of it don't you use? What parts are critical? To that end, Nick Ruest has put together a template to collect use cases. Please add yours to the list and help us shape Islandora's future.

DPLA: The principles for establishing international &amp; interoperable rights statements

planet code4lib - Mon, 2015-05-11 12:59

Cameramen mount ladders at tournament in Rhode Island, 1931. Copyright Jones, Leslie, 1886-1967, CC BY-NC-ND.

Over the past twelve months representatives from Europeana, DPLA and Creative Commons have been exploring the possibilities for a collaborative approach to rights statements that can be used to communicate the copyright status of cultural objects published via our platforms. This work is close to the heart of both Europeana and the DPLA as we both seek to share clear and accurate information about copyright status with users.

While we operate under different copyright laws on either side of the Atlantic, we share many common issues.  For some Cultural Organizations there are restrictions placed by the donor on the reuse of an object, for some national or state laws create legislative restrictions and for some organizations may only have permission to allow educational uses of object in their collections.  These are just some of the copyright issues faced  by organizations that contribute data to Europeana and the DPLA. In this cooperative effort, we have set out to build a flexible system of rights statements that allows our contributing partners to clearly communicate what users can or cannot do with the objects that are published via our platforms. This system further develops the approach taken by Europeana with the Europeana Licensing Framework making it more flexible and adapting it to the needs of institutions outside of the EU.

On behalf of the Working Group we are pleased to share with you today two White Papers that describe our recommendations for establishing a group of rights statements, and the enabling technical infrastructure.  These recommendations include a proposal for a list of shared rights statement that both the DPLA and Europeana can use depending on the needs of our respective organizations:

Recommendations for standardized international rights statements

This paper describes the need for a common standardized approach. Based on the experience of both of our organizations, we have described the principles we think any international approach to providing standardized  rights statements needs to meet. Together with this whitepaper we propose a list of twelve new rights statements that reflect the year long discussions of the Working Group.

In the coming weeks Europeana and the DPLA will reach out to our core stakeholders and ask for feedback directly. In the meantime, all interested parties are encouraged to please follow this link to share your thoughts and leave comments in the Rights Statement White Paper. After the end of this public feedback period on Friday 26th June  2015 we will incorporate the feedback and publish the results as a green paper that will be proposed for adoption by the DPLA, Europeana and other interested parties.

A proposal for a technical infrastructure

The existing Europeana Licensing Framework offers rights statements provided by third parties (the Creative Commons licenses and Public Domain Tools) and rights statements that are hosted on the Europeana portal. In order to ensure that the new rights statements can be used by institutions around the world we are proposing to host the new rights statements in their own namespace:

In the coming weeks Mark Matienzo and Antoine Isaac, our leads for the technical work, will write more about the approach described in this paper.  In the meantime you can follow this link to share your thoughts and leave comments in the Technical Infrastructure White Paper. After the end of this public feedback period on Friday 26th June 2015 we will incorporate the feedback and publish the results as a green paper that will be proposed for adoption by the DPLA, Europeana and other interested parties.

Still to come: a proposal for governance

In the summer we will share our final white paper which proposes an approach to managing these statements so that they are sustainable and managed by all stakeholders.  As with the other two papers, we will incorporate the feedback that we receive into a green paper. Based on all three green papers, we will begin implementation of the new rights statements after the summer.

Terry Reese: 2015 OSU Commencement

planet code4lib - Mon, 2015-05-11 05:36

This weekend, I got to partake in one of my favorite parts of being a faculty member — walking as part of the faculty processional for the 2015 Spring Commencement.  For the past two years, I’ve been lucky enough to participate in the Spring Commencement at The Ohio State University — and while the weather can be a bit on the warm side, the event always leaves me refreshed and excited to see how the Libraries can find new ways to partner with our faculty, students, and new alumni; and this year was no different (plus, any event where you get to wear robes and funny hats is one I want to be a part of).  Under the beating sun and unseasonable humidity, President Drake reminded all in attendance that while OSU is a world-class research and educational institution; our roots are and continue to be strengthened by our commitment as a land grant institution — to be an institution who’s core mission is to educate the sons and daughters of our great state, and beyond.  And I believe that.  I believe in the land grant mission, and of the special role and bond that land grant institutions have to their states and their citizens.  So it was with great joy, that I found myself in Ohio Stadium to celebrate the end of one journey for the ~11,000 OSU graduates, and the beginning of another as these graduates look to make their own way into the future.

In my two years at OSU, one of the things you hear a lot at this institution is a commitment to “Pay it Forward”.  I’ve found among the faculty, the staff, the alumni that number close to 1/2 a million — these aren’t just words but a way of life for so many who are a part of Buckeye Nation.  Is this unique to Ohio State — no, but with so many alumni, the influence is much easier to see.  You see it in the generosity of time, the long-term mentorship, the continued engagement with this institution — when you join Buckeye nation; you are joining one big extended family.

I find that it is sometimes easy to forget the role that you get to play as part of the faculty in helping our students be successful.  It’s easy to get bogged down in the committee work, the tenure requirements, your own research, or the job of being faculty member at a research institution.  It’s easy to take for granted the unique privilege that we have as faculty to serve and pay it forward to the next generation.  Sitting at Ohio Stadium, with so many graduates and their parents and friends…yes, it’s a privilege.  Congratulations class of 2015.

I had to catch Carol Diedrichs last Spring Commencement before retirement.


Looking into the Stands at the 2015 Commencement

Looking into the Stands at the 2015 Commencement…look at all those National Championship banners.

Looking into the Stands at the 2015 Commencement…there were suppose to be 60-70,000 in attendance. We may love football, but we love our graduates more.




Subscribe to code4lib aggregator