You are here

Feed aggregator

LITA: Librarians in the Wild: Selling Librarian Skills Outside of Libraries

planet code4lib - Fri, 2016-08-19 15:00

 

When I decided to pursue the MLIS seven years ago, it wasn’t so that I could tap into a vast supply of readily available Librarian positions. I did it because I was drawn to the profession and intrigued about how technology was changing it. My first job search as a degree-toting Librarian was a lucky one: I happened to find a place who needed someone with a strong foundation in project management and web development as well as an interest in librarianship, and that is as good a description of my professional self as you will find. That’s how I ended up at Avery Library.

Then life happened. As the project I was working on was ending, my wife and I decided to leave New York and return to San Diego. I’m not going to go into our reasons for moving, because they’re not relevant here; suffice it to say, my professional goals took a backseat to other considerations. The point is, I was back on the job market. This time, however, I wasn’t so fortunate: it took a while for me to find work, and when I did it was related to my pre-MLIS career. I’m still working in academia, but my current position is as a business analyst/project manager type person. Hey, sometimes you just have to pay the bills.

This recent experience got me thinking (again) about what being a librarian means in terms of everyday work and marketable skills. I’m not going to link to a bunch of depressing articles about the library job market, or the tenuous nature of the profession’s future; suffice it to say, there’s a lot of people with MLIS degrees out there, and, depending on where you live, not too many jobs in traditional Librarian positions.

Over the course of the next few months, I plan to highlight several different types of opportunities that provide a good fit for librarians looking to branch out, and in doing so answer three basic questions:

  1. What makes you (a librarian-type person) a good candidate for that job/field/organization type?
  2. How do you market your skills and experience to compete in that market?
  3. Why would you want to work there? What makes it a good fit not just for a librarian’s skills, but also her professional needs?

This is not a critique of the MLIS degree or the reasons people choose to pursue one. My goal is to provide information that is useful to you if you are trying to broaden your job search when finding a job in a traditional library is proving difficult, or if you just want to take your hard-earned skills and use them to try something new. I hope to highlight opportunities where librarians can not only get hired, but that also make meaningful use of librarianship skills and fulfill the needs of someone with a librarianship background. If you have suggestions for topics or any other feedback, please submit them it in the comments below.

Public Domain image courtesy of user Shyamal, Wikimedia Commons. Originally published in The Literary Digest.

In the Library, With the Lead Pipe: Pre-ILS Migration Catalog Cleanup Project

planet code4lib - Fri, 2016-08-19 13:00

Image by flickr user ashokboghani (CC BY-NC 2.0)

In Brief: This article was written to describe the University of New Mexico’s Health Sciences Library and Informatics Center’s (HSLIC) catalog cleanup process prior to migrating to a new integrated library system (ILS).  Catalogers knew that existing catalog records would need to be cleaned up before the migration, but weren’t sure where to start.  Rather than provide a general overall explanation of the project, this article will provide specific examples from HSLIC’s catalog cleanup process and will discuss specific steps to clean up records for a smooth transition to a new system.

by Robyn Gleasner

Introduction

In February 2014, the Health Sciences Library and Informatics Center (HSLIC) at the University of New Mexico (UNM) made the decision to migrate to OCLC’s WorldShare Management Services (WMS).  WMS is an integrated library system that includes acquisitions, cataloging, circulation, analytics, as well as a license manager. The public interface/discovery tool called Discovery is an open system that searches beyond items held by your library and extends to items available worldwide that can be requested via interlibrary loan.  We believed that Discovery would meet current user expectations with a one-stop searching experience by offering a place where users could find both electronic resources and print resources rather than having to search two separate systems.  In addition to user experience, we liked that both WMS and Discovery are not static systems. OCLC makes enhancements to the system as well as offers streamlined workflows for the staff. These functionalities, along with a lower price point, drew us to WMS. This article will discuss HSLIC’s catalog cleanup process before migrating to OCLC’s WMS.

Before the decision was made, the library formed an ILS Migration Committee consisting of members from technical services, circulation, and information technology (IT) that met weekly. This group interviewed libraries that were already using WMS as well as conducted literature searches and viewed recorded presentations from libraries using the system.  This research solidified the decision to migrate.

HSLIC began the migration and implementation process in June 2014 and went live with WMS and WorldCat Discovery in January 2015.  Four months elapsed from the time the decision was made to the time the actual migration process began due to internal security reviews and contract negotiation.  Catalogers knew that existing catalog records would need to be cleaned up before the migration, but weren’t sure where to start. Because of this, the cleanup process was not started until the OCLC cohort sessions began in June 2014.  These cohort sessions, led by an OCLC implementation manager, were designed to assist in the migration process with carefully thought out steps and directions and provided specific training in how to prepare and clean up records for extraction, as well as showed what fields from the records would migrate.

In addition to providing information about the migration, the OCLC cohort sessions also provided information on the specific modules within WMS including Metadata/Cataloging, Acquisitions, Circulation, Interlibrary Loan, Analytics and Reports, License Manager, and Discovery.  While the sessions were helpful, the cleanup of catalog records is a time-intensive process that could have been started during the waiting period. Luckily, we were one of the last institutions in the cohort to migrate bibliographic records.  This allowed more time to consider OCLC’s suggestions, make decisions, and then clean up records in our previous ILS, Innovative’s Millennium, before sending them to OCLC.

Literature Review

While there is extensive information in the professional literature regarding how to choose an ILS and how to make a decision about whether or not to move to a cloud based system, there is little information about the steps needed to clean up catalog records in order to prepare for the actual migration process. Dula, Jacobson, et al. (2012) recommend thinking “of migration as spring-cleaning: it’s an opportunity to take stock, clear out the old, and prepare for what’s next.” They “used whiteboards to review and discuss issues that required staff action” and “made decisions on how to handle call number and volume entry in WMS;” however, catalog record cleanup pre-migration was not discussed in detail.

Similarly, Dula and Ye (2013) stated that “[a] few key decisions helped to streamline the process.”  They “elected not to migrate historical circulation data or acquisitions data” and were well aware that they “could end up spending a lot of time trying to perfect the migration of a large amount of imperfect data” that the library no longer needed.  They planned on keeping reports of historical data to avoid this problem. Hartman (2013) mentioned a number of questions and concerns for migrating to WMS including whether or not to migrate historical data or to “start with a clean slate.” They decided that they “preferred the simpler two-tiered format of the OCLC records” to their previous three-tiered hierarchy, but found some challenges including the fact that multi-volume sets did not appear in the system as expected. The cataloger chose to to view this as “an opportunity to clean up the records” and methodically modify records prior to migration.  Hartman (2013) also discussed that the “missing” status listed in their previous ILS system did not exist in WMS and that they had to decide how or if they should migrate these records.

While the questions and concerns that these authors mentioned helped us focus on changes to make in the catalog prior to migration, we found no literature that discussed the actual process of cleaning up the records.  From the research, it was obvious that a number of decisions would have to be made in the current ILS before the migration would be possible.

Process

In order to make those decisions, the ILS Migration Committee met every other week to discuss what had been learned in the OCLC cohort sessions as well as any questions and concerns.  It was important for catalogers to understand why certain cataloging decisions had been made over the years to determine how items should be cataloged in the new system.  Our library’s cataloging manual and procedure documentation was read and questions were asked of members on the committee who had historical institutional knowledge. Topics included copy numbers, shelving locations, and local subject headings.  Notes and historical purchasing information were closely examined and their importance questioned.  Material formats and statuses were also examined before determining what should be changed to meet the new system’s specifications.

Copy Numbers

OCLC recommended taking a close look at copy numbers.  A few years ago a major weed of the media and the book collection was conducted.  Unfortunately, when items were withdrawn, the copy numbers were not updated in the system.  In some cases, copy number 4 and 5 were kept while 1-3 were withdrawn and deleted from the system.  In the new system this would appear that the library had 5 copies of a title, while it really owned two.  We decided that the actual copy number of an item wasn’t important to our library users because we could rely on the barcode; however, it was important to determine the number of copies so that WMS could  accurately identify when multiple copies of an item existed.

In order to make these corrections, a list was run in Millennium for items with copies greater than 1 and then item records were examined to discover how many copies existed in the catalog.  Corrections were then made as needed.  This was a bigger job than anticipated, but it was a necessary step to avoid post-migration cleanup of the copy numbers in order to prevent errors in WMS.

Shelving Locations

One of the first things we learned in the OCLC cohort sessions was that many of the statuses that we used in Millennium did not exist in WMS.  Some examples were:

MISSING

STOLEN

BILLED

CATALOGING

REPAIR

ON SEARCH

Because these statuses were no longer an option, we decided to create shelving locations that would reflect these statuses in WMS.  Some of these shelving locations aren’t necessarily physical locations in the library, but rather designations for staff to know where the item can be found. For example, items with a previous status of “repair” in Millennium now have a shelving location of “repair” in WMS. This alerts staff that the item is not available for checkout and is in repair in our processing room. We decided to delete items that had statuses of “stolen” and “missing” prior to migration to better reflect the holdings of our library.

We also decided to delete a number of shelving locations as they were no longer being used or no longer needed. For example, some locations were merged and others were renamed to better reflect and clarify where the physical shelving locations were in the library as well as the type of material the locations held.

Local Bibliographic Data and Subject Headings

WMS uses OCLC’s WorldCat master records for its bibliographic records.  This means that WMS libraries all use the same records and must include information that is specific to its library in a separate section called Local Bibliographic Data (LBD).  After much discussion, we decided to keep the following fields: 590, 600, 610, 651, 655, 690, and 691.   We felt that keeping these fields would create a better record and provide multiple access points for our users.

A number of records for Special Collections had local topical terms in the 690 field and local geographic names in the 691 and 651 fields.  For the most part, master records did not exist for these records as they were created locally for HSLIC’s use.  When these bibliographic records were sent to OCLC for the migration, the WorldCat master record was automatically created by OCLC as part of the migration process.  It was important that these subject headings were migrated as part of the project, so that they were included with the record and not lost as an access point. We also decided that the local genre information in the 655 field was important to retain as it provided an access point on a local collection level.  For example, we wanted to make sure that “New Mexico Southwest Collection” was not lost to our researchers who are familiar with that particular collection.  Generally, a genre heading contained in the 655 field would be considered part of the WorldCat master record that other libraries could use.  Because our local information would not be useful to other libraries, we decided to transfer this information to a 590 local note so that it would only be visible to our library users.

Notes

Decisions regarding local notes that were specific to our institution, such as general notes in the 500 field and textual holdings notes in the 850 field had to be made.   We requested that Innovative make the information in the 945 field visible to our catalogers.  This is the field that contains all of the local data including item information and is instrumental in the migration process.

500 General Notes

During the migration process, libraries have the option to load local bibliographic data to supplement the OCLC master records.  This means that when OCLC receives the library’s bibliographic records, as part of an automatic process the records are compared with OCLC’s master records according to a translation table submitted by the library.

The 500 field was closely examined to ensure that information wasn’t duplicated or deleted.   OCLC master records usually contain a 500 note field, a general note that would be relevant to any library that holds the item. For example, some records contain “Includes index” listed in the 500  note field. Because this field already exists within the master record and is relevant for anyone holding the item, we wanted to keep the information in the master record.  However, we had a number of notes in this field that were relevant only to our library and we could not simply keep the notes in this field.  If we had migrated the 500 field, it would have resulted in two note fields containing the same information in the master record as the note would “supplement” the master record.  Because of this, we chose not to migrate information in the 500 field in order to prevent duplicate information.  Instead, a list was created in Millennium mainly for Special Collection records that were created locally and not previously loaded into WorldCat.  The information in the 500 field was then examined in these special collection records by catalogers to determine whether or not the information was local or general  and then manually changed one record at a time.  If the information in this field was considered local and only important to HSLIC; it was moved to a 590 field, so that it would be visible to our users in Discovery and staff in WMS, but not to any other libraries who might want to use the record.

Local Holding Records

WMS’s local holding record (LHR) incorporates information from Millennium’s item record with the holding information from the bibliographic record. It includes information like the call number, chronology and enumeration, location, and price.  The LHR in WMS was created using the information found in the 945 field and was included in the extracted bibliographic records we sent to OCLC.  For the most part, migrating this information was simple except for a few unique cases for our library.

850 Holding Institution Field

The 850 holding institution field is part of the bibliographic record and was labeled in our instance of Millennium as “HSLIC Owns”.  This field was used to list coverage ranges or the dates and issues held by our library for journals, special collections material, and continuing resources. This information is usually cataloged in the 863 field within an item or local holdings record; however, HSLIC did not use this in Millennium.  WMS reserves the 850 field for OCLC institution symbols with holdings on a particular title, which meant that we could not continue to use the 850 field as we had previously.  Because WMS coverage dates are generated from the enumeration listed in the LHR, we explored the possibility of migrating the 850 field from the bibliographic record to the 863 field in the local holding record. Unfortunately, it was not possible to do a global update to cross from bibliographic record to an item record within Millennium during the migration process.

There were two options to create coverage statements in the migration process: 1. Allow the statements to be newly generated in WMS through the holdings statements generating tool or 2. move the current coverage statements to a 590 note. Because there were so many notes that needed to be moved to the 590 field, a decision was made to delete the 850 holding institution fields from almost all of our records and use the automated summaries generated in WMS. This left all serial records without coverage dates during the migration project in Millennium; however, we believed it would make the migration process to WMS easier.

Special Collection records did not include item-level date and enumeration in the item records and were instead cataloged at a box or series level.  This eliminated the possibility of using WMS automated summaries. Because of this, coverage statements were moved to a 590 public note for all special collections records.  This way the information was retained in the system, while still creating an opportunity to change the formatting at a later date if needed.

After the migration, it was discovered that the system generated coverage dates were not as complete or as easy to read in WMS as they had been in Millennium. It is an ongoing project to clean up and keep these summaries current in the new system.  Below is a screenshot of how the coverage dates appeared on the staff side of Millennium:

This is how the coverage dates appear in WMS:

In hindsight, we should have migrated the 850 field to a 590 field to keep the information as local bibliographic data in addition to using the WMS automated summary statement.  The coverage dates would then have appeared in a public note, which would have given our staff and users an additional place to look for the coverage dates.  It would also have given technical services staff a point of comparison when cleaning up the records post-migration.

Info/Historical Records

In Millennium, a local practice was developed to keep notes about subscriptions as an item record under the bibliographic record.  In WMS, these could not be migrated as items because they were not real items that could be checked out, but rather purchasing notes that were only important to staff.  Because of this, it was important that these notes not be visible to the public.  These notes were a constant topic of discussion among the implementation team members and with the OCLC cohort leaders.

One idea was to migrate them from an item to a bibliographic field by attaching the note as an 850 holdings institution field.  Unfortunately, just as it was not possible to do a global update to cross from bibliographic record to item record, it was also not possible to to cross from item record to bibliographic record.  OCLC tried to help with this, but could not find a solution for crossing between record types.  Even if this were possible, the above mentioned issues with the 850 field would have been encountered and the information would have to be moved to a 590 field to retain it.

Because this seemed complicated, a list was created of all of the info/historical records in Millennium and then exported to Excel to create a backup file containing these notes.  Soon after this was completed, OCLC developers found a way to translate the information from the 850 field to the 852 non-public subfield x note in WMS as part of the migration. Historical purchasing information is now in a note that is only visible to staff in WMS.

Continuing Resources

We have found continuing resources to be challenging in WMS.  Previously, we had used OCLC’s Connexion to create and manage bibliographic records and used material types that the system supplied.  While “continuing resource” is a material type in Connexion, it is not a material type in WMS.  Because of this, an available material type in the new system was chosen and then records were changed in Millennium to match the new system.  To do this, another list was created in Millennium of items with “continuation” listed as the material type.  The list was then examined and a determination was made as to whether or not the materials were actually still purchased as a continuation.  Most of the titles were no longer purchased in this way, so the migration presented an opportunity to make these corrections in the system.

Not every item listed as a “continuation” in Millennium was a serial item.  In some cases the titles were part of a monographic series.  Decisions then had to be made whether to use a serial record or a monograph record for items that had previously been considered continuing resources.  For items that had only an ISBN, we chose the monograph record and for those with an ISSN, we chose the serial record; however, many items had both an ISBN and an ISSN.  The decision was more difficult in these instances and continues to be difficult for these items because the format chosen affects how patrons can find the item in Discovery.  This is addressed in more detail below.

Analytic Records

At the beginning of the migration process, OCLC inquired about specific fields and data elements in our records to identify potential errors in the migration process which could be addressed before migrating. One question was whether the data contained linked records.   At first, we had no idea what this even meant, so we answered “no” on our initial migration questionnaire.  A few short weeks before the scheduled migration date, the linked records were discovered in the form of series analytic records. A series analytic record is basically a record that is cataloged as an overarching monographic series title that is then linked to individual titles within that series.  This means that the item record is linked to the overarching bibliographic record for the series as well as the bibliographic record for the individual title, which then links both bibliographic records.  Unknown to those working on the migration project, previous catalogers had an ongoing project to unlink all of these analytic records when a monographic series subscription was no longer active.  Notes were found on how to unlink the records, but no notes on what the titles were or where the previous catalogers left off in the project were found.  Unfortunately, we had no way to identify linked records in Millennium.

We unlinked as many of the records as possible before the migration, but finally had to send the data to OCLC knowing that many linked records still remained. These records migrated as two separate instances of the same barcode, which created two LHRs in WMS, subsequently causing duplicate barcodes in WMS.  After the migration, OCLC provided a number of reports including a duplicate barcode report, so that these duplicate instances could be found. To correct these records, the item was pulled and examined to determine if the serial or the monograph record best represented it.   The local holdings record was corrected for the title and the LHR from the unchosen bibliographic record was deleted.

In Millennium, the choice between representing an item with a serial or monograph record had few implications for users. However, in WMS, choosing a serial record could allow for article level holdings to be returned in Discovery, while choosing a monograph record would not. Conversely, choosing a serial record for an item which looks like a monograph might make the item more difficult to find if users narrow their search to “book.”   Because of this, careful review of items and material types was necessary to help create the best user experience.

For example, “The Handbook of Nonprescription Drugs” looks like a book with a hard cover to most library users and even staff. In Discovery,  if the format is limited to “journal,” the title is the first search result:

If the search is limited to the format “book,” the title is not found on the first page of the search results.

Serials

As was mentioned previously, OCLC relies on the 945 field to view all item information.  For the most part, serials records contained the 850 HSLIC Owns field that was discussed earlier. The 945 subfield a was used to list the following distinctions: Current Print Subscription, Current Print and Electronic Subscription, and Electronic Subscription.  Because the 945 subfield a also contained the volume dates, we chose to move this information to a 590 local note field.

Once those notes were moved, we found that enumeration and chronology was entered in various subfields within the 945 field.  The date was usually in subfield a, volume notes were found in subfield d, while the volume number was in subfield e.  The below example is taken from an extraction in Millennium and shows the enumeration and chronology for volume 53 of the journal “Diabetes” published in 2004. The first line shows an example of a note that this volume is a supplement, while the second line shows a more typical entry with volume number and coverage.

945 |c|e53|a2004|dSupplements|

945 |c|e53|a2004:July-2004:Dec|

The enumeration and chronology was constructed from these subfields where possible; however, if this information was repeated in a different subfield, it had to be cleaned up post-migration.

Electronic Resources

We decided not to migrate electronic resources cataloged in Millennium to WMS.  Electronic resources are managed within Collection Manager, which is WMS’ electronic resource manager.   It was specified in the translation table that any record with a location of electronic resource not be migrated to the new system.  Unfortunately, many of the electronic resources records unintentionally migrated.  They may have been attached to a print record or perhaps did not have the location set as electronic resource.  Holdings had to be removed from these records post-migration.

Before migration, we decided to delete records for freely available e-books from Millennium.   Most of these resources were provided for the public via government websites hosted by the Center for Disease Control (CDC) and could easily be accessed through other means of searching.  These resources could be added to Collection Manager post-migration if deemed important.

Similarly, electronic records were not migrated directly from Serial Solutions, our previous electronic resource manager. Instead, electronic resources were manually added to Collection Manager for a cleaner migration.  All electronic resources are shared with University Libraries (UL), the main campus library, so close collaboration with UL was necessary in order to share and track these resources.  While all HSLIC resources were shared with UL and all UL resources shared with us, we decided to select only the resources that were relevant to the health sciences in Collection Manager.  This created a more health sciences focused electronic resources collection, so that titles relevant to these subjects are displayed at the top of the search.

Suppressed Records

One of OCLC’s slogans is “because what is known must be shared,” so it makes sense that WMS does not have the capability to suppress records. If an item has our holdings on it and has an LHR, then it is viewable to the public in Discovery.  For the most part this concept worked for us.  There were two record types in Millennium where this idea presented challenges: suppressed items and equipment records.

Suppressed Items

At the time of migration, there were around 1200 books that had been removed from the general collection and stored in offsite storage for future consideration for adding to Special Collections.   These records were suppressed in Millennium, so that only staff could see them in the backend. Adding these items back into the collection was considered, so that records would not be lost, but it was finally decided this would be far too time consuming in the middle of the migration and that many of the titles would probably be deleted later on.

Instead, another list was created in Millennium containing items in offsite storage with a status of “suppressed”.  An Excel spreadsheet was then created that contained the titles, OCLC numbers, and even the call numbers of all of the formerly suppressed titles, allowing for easy reference to the items in storage.  We instructed OCLC not to migrate any records with a status of suppressed.

Equipment Records

Similarly, there were a number of equipment records that were only viewable and useful to staff at the circulation desk.  These records were for laptops, iPads, a variety of cables and adaptors, even some highlighters, and keys.  These items all had barcodes and could be checked out, but patrons had to know that they existed in order to ask for them.  While this never seemed to be a problem for users and it did seem strange to create bibliographic records for equipment items, it was decided to create brief records and then migrate them anyway in hope of promoting use.

Now users have the ability to see if a laptop is available for checkout before even asking.  While the idea of these records is a bit unorthodox from traditional cataloging, creating the records ultimately added to the service the library was already providing in addition to providing a way to circulate the equipment using WMS.

Conclusion

Although there were a number of steps, a number of surprises, and a number of decisions that had to be made, the pre-migration cleanup process was definitely worth the work.  Many errors were discovered post-migration, but without doing the initial clean up, there would have been even more problems.

At HSLIC, we have one full time cataloger/ILS manager and one full time electronic resources/serials librarian.  It took nearly 6 months to clean up catalog records before migrating to WMS. Starting the cleanup process earlier would have saved us a lot of work and resulted in cleaner records to migrate.

We should have started looking for the linked series analytic records immediately.  This would have given us more time to identify the records, unlink them, and decide which record best represented the item before sending the records to OCLC.  This would have prevented post-migration cleanup of duplicate barcodes and prevented circulation staff any confusion when trying to check these items out to users.

Five out of eight members of HSLIC’s ILS migration committee had worked at HSLIC less than a year before we began the  migration process. This provided a balance between historical institutional knowledge with new perspectives.  It helped us look at the catalog with fresh eyes and allowed us to ask “why” whenever the answer was,“that is the way we have always done things.” If “why” couldn’t be answered or no longer seemed relevant, we considered making a change.

The catalog should reflect what is on the shelf and what is accessible electronically.  The online catalog is the window to the library itself and should accurately represent what the library holds. Because of electronic access to ebooks and ejournals, some of our users won’t ever step into the physical library, which makes the accuracy of the online catalog or discovery layer even more important. Even if your library isn’t moving to a new ILS, it is important for catalogers and technical services staff to ask, “What is in the library’s catalog?” and then ask “Why?”  As we discovered at HSLIC, keeping notes and shelving locations just because “that is what had always been done” in some cases was no longer compatible with the new system and in other cases was no longer efficient or comprehensible. Sometimes change is exactly what is needed to keep the catalog relevant to library users.

Acknowledgements

Thank you to the peer reviewers, Violet Fox and Annie Pho, for helping me focus and clarify my ideas and experiences in this article.  You both made the peer review process an interesting and enjoyable experience.  Thank you to Sofia Leung, publishing editor, for guiding me through the process.  I would also like to thank all of the members on the HSLIC ILS Migration Committee who made the migration possible.  I would especially like to thank Victoria Rodrigues for her hard work on cleaning up the serial records and adding our electronic resources to the new system.

Works Cited

Dula, M., Jacobsen, L., Ferguson, T., and Ross, R. (2012). Implementing a new cloud computing library management service. Computers in Libraries, 32(1), 6-40.

Dula, M., and Ye, G. (2013). Case study: Pepperdine University Libraries’ Migration to OCLC’s Worldshare. Journal of Web Librarianship, 6(2),125–132. doi: 10.1080/19322909.2012.677296

Hartman, R. (2013). Life in the cloud: A WorldShare Management Services case study. Journal of Web Librarianship, 6(3),176-185. doi: 10.1080/19322909.2012.702612

OCLC. (2015) Accessed January 14, 2016, from https://www.oclc.org/en-US/share/home.html

 

 

Open Knowledge Foundation: Why the Open Knowledge International should join ICANN

planet code4lib - Fri, 2016-08-19 11:19

First, let me quickly tell you what kind of an organisation is ICANN (Internet Corporation for Assigned Names and Numbers), for I feel that albeit its burgeoning global importance, most people are still not at all familiar with it.

In a nutshell, ICANN governs the domain names and addresses of the internet. Today, it does so with a multi stakeholder model, that involves all the interested parties in deciding over the rules and protocols that are needed to keep the internet free and safe for its users. Such interest groups include the technical community, registries and registrars of internet domains, the civil society, commerce and the governments.

If you want to know more about the fascinating history of ICANN, you’re in luck, for just a few days ago, the Washington Post had an excellent glance at the history, in an informative and entertaining form.

So, why do I want the Open Knowledge International to join ICANN? I find these reasons to be sufficient:

  1. First and foremost, the timing couldn’t be better. ICANN is right now in the progress of updating and reviewing its internal bylaws and this process is called workstream 2, or WS2. The bylaws are used to control the decision making process within the quasi-private oversight group, that is ICANN. I probably don’t need to elaborate on the speed of massively growing Internet to this audience, but I’ll do it anyway; Apparently, we created more data in 2013-2014, than all the previous years put together, and it seems we’re still very much in the accelerating growth phase of the Internet.
  2. As it is, the access to the data that the governance of the Internet creates, is almost non-existent. I think there could be a treasure trove of information to be used for the improvement of the global Internet community, as well for scientific research. ICANN holds the keys to a central point of communications like no other entity in the world. Likewise, the culture of transparency could be massively improved and that is the name of an actual subgroup within the WS2-process, that was kicked off in the ICANN56 in Helsinki this June. My suggestion is for the OKI to join ICANN’s Non-Commercial Stakeholder Group and contribute in formulating the transparency bylaws with other members of the NCSG.
  3. The size and breadth of OKI makes us a valuable member to ICANN. Their triannual conferences take place all over the world, rotating in turns to different continents. This would give a global organisation like ours, the chance to participate live almost every time with minimum expenses. The conferences themselves are free with food and drinks.
  4. The networking possibilities are simply too impressive to ignore. If we can contribute in the work of ICANN, I am sure we can grow our own network of  member countries and individual participants as well.
  5. As a large organisation (over 500 members) we would get two organisational votes instead of one.

WS 2 Sub-Issues on transparency, which is the area I feel we should focus on:

  1. Increased Transparency at ICANN
  2. Reform of Document Information Disclosure Policy (DIDP)
  3. Board deliberations
  4. Culture of transparency at ICANN
  5. Discussions with governments and lobbying
  6. Improvements to ICANN’s “whistle-blower” policy
These are the other headings discussed in the WS2, some of them less intuitive than others, but I won’t go to more detail on them now:

1.   Create a Framework of Interpretation for ICANN’s New Commitment to Respect Human Rights
2.  Influence of ICANN’s jurisdiction on operational policies and accountability mechanisms
3.  Staff Accountability
4.  SO / AC Accountability
5.  Reform of Ombudsman’s Office
6.  “Diversity” at ICANN
7.  Reviewing the Cooperative Engagement Process (CEP), 1st step to filing an Independent Review
– Panel matter
8.  Guidelines for ICANN Board “standard of conduct”
– RE: removal of board members

I am convinced of our mutual benefits with being a contributing member in ICANN and I hope I’ve managed to pass my enthusiasm on to you. Please do not hesitate to ask for elaborations on specifics, and I will promise to, at the very least, look for the answer or point you in the right direction, best to my ability.

Sincerely yours,

Raoul Plommer
Board member, OKFI

Open Knowledge Foundation: Opinion piece – Why Open Knowledge International should join ICANN

planet code4lib - Fri, 2016-08-19 10:27

This is an opinion piece by an Open Knowledge network member. You can also publish your opinion on the blog by sending us an email to blog@okfn.org

First, let me quickly tell you what kind of an organisation is ICANN (Internet Corporation for Assigned Names and Numbers), for I feel that albeit its burgeoning global importance, most people are still not at all familiar with it.

In a nutshell, ICANN governs the domain names and addresses of the internet. Today, it does so with a multi-stakeholder model, that involves all the interested parties in deciding over the rules and protocols that are needed to keep the internet free and safe for its users. Such interest groups include the technical community, registries and registrars of internet domains, the civil society, commerce and the governments.

If you want to know more about the fascinating history of ICANN, you’re in luck, for just a few days ago, the Washington Post had an excellent glance at the history, in an informative and entertaining form.

So, why do I want the Open Knowledge International to join ICANN? I find these reasons to be sufficient:

  1. First and foremost, the timing couldn’t be better. ICANN is right now in the progress of updating and reviewing its internal bylaws, and this process is called Workstream 2, or WS2. The bylaws are used to control the decision-making process within the quasi-private oversight group, that is ICANN. I probably don’t need to elaborate on the speed of massively growing Internet to this audience, but I’ll do it anyway; Apparently, we created more data in 2013-2014, than all the previous years put together, and it seems we’re still very much in the accelerating growth phase of the Internet.
  2. As it is, the access to the data that the governance of the Internet creates is almost non-existent. I think there could be a treasure trove of information to be used for the improvement of the global Internet community, as well for scientific research. ICANN holds the keys to a central point of communications like no other entity in the world. Likewise, the culture of transparency could be massively improved, and that is the name of an actual subgroup within the WS2-process, that was kicked off in the ICANN56 in Helsinki this June. My suggestion is for the OKI to join ICANN’s Non-Commercial Stakeholder Group and contribute in formulating the transparency bylaws with other members of the NCSG.
  3. The size and breadth of OKI make us a valuable member to ICANN. Their triannual conferences take place all over the world, rotating in turns to different continents. This would give a global organisation like ours, the chance to participate live almost every time with minimum expenses. The conferences themselves are free with food and drinks.
  4. The networking possibilities are simply too impressive to ignore. If we can contribute in the work of ICANN, I am sure we can grow our network of member countries and individual participants as well.
  5. As a large organisation (over 500 members) we would get two organisational votes instead of one.

WS 2 Sub-Issues on transparency, which is the area I feel we should focus on:

  1. Increased Transparency at ICANN
  2. Reform of Document Information Disclosure Policy (DIDP)
  3. Board deliberations
  4. Culture of Transparency at ICANN
  5. Discussions with governments and lobbying
  6. Improvements to ICANN’s “whistle-blower” policy
These are the other headings discussed in the WS2, some of them less intuitive than others, but I won’t go into more detail on them now:

1.   Create a Framework of Interpretation for ICANN’s New Commitment to Respect Human Rights
2.  Influence of ICANN’s jurisdiction on operational policies and accountability mechanisms
3.  Staff Accountability
4.  SO / AC Accountability
5.  Reform of Ombudsman’s Office
6.  “Diversity” at ICANN
7.  Reviewing the Cooperative Engagement Process (CEP), 1st step to filing an Independent Review
– Panel matter
8.  Guidelines for ICANN Board “standard of conduct.”
– RE: removal of board members

I am convinced of our mutual benefits with being a contributing member in ICANN and I hope I’ve managed to pass my enthusiasm on to you. Please do not hesitate to ask for elaborations on specifics, and I will promise to, at the very least, look for the answer or point you in the right direction, best to my ability.

David Rosenthal: The 120K BTC Heist

planet code4lib - Thu, 2016-08-18 15:00
Based on my experience of P2P systems in the LOCKSS Program, I've been writing skeptically about Bitcoin and the application of blockchain technology to other applications for nearly three years. In that time there have been a number of major incidents warning that skepticism is essential, including:
Despite these warnings, enthusiasm for the future of blockchain technology is still rampant. Below the fold, the latest hype and some recent responses from less credulous sources.

Last Friday Nathaniel Popper at the New York Times wrote in Envisioning Bitcoin’s Technology at the Heart of Global Finance:
A new report from the World Economic Forum predicts that the underlying technology introduced by the virtual currency Bitcoin will come to occupy a central place in the global financial system.

A report released Friday morning by the forum, a convening organization for the global elite, is one of the strongest endorsements yet for a new technology — the blockchain — that has become the talk of the financial industry, despite the shadowy origins of Bitcoin. Apparently:
The 130-page report from the forum is the product of a year of research and five gatherings of executives from several major institutions, including JPMorgan Chase, Visa, MasterCard and BlackRock.

The report estimates that 80 percent of banks around the world could start distributed ledger projects by next year. Large central banks are also studying how the blockchain will alter the way money moves around the globe.What could possibly go wrong? The idea that institutions like these would take insane risks, crash the world economy, and blackmail governments into bailing them out is ridiculous. But Popper notes:
But few real-world uses of the blockchain have come to fruition, other than Bitcoin itself. That has led to some questions about whether the blockchain is the proverbial solution looking for a problem, rather than an innovation that will be used widely.

Existing virtual currencies have continued to struggle with security problems. One of the largest Bitcoin exchanges, Bitfinex, recently lost more than $60 million worth of Bitcoin in a hacking — the latest of several such incidents.

The World Economic Forum report suggests that it will take some time for such problems to be worked out. In addition to the technology issues, the report says that the industry will have to work with governments to create standard rules and laws to govern transactions.So that's OK then. They aren't going to rush into deploying new technology without understanding all the implications, or at least making sure that they aren't left holding the bag when something does go wrong.

Does anyone remember the last time the banks tried to replace old, shopworn record-keeping technology with spiffy new computerized systems? It was a company called MERS, a shell company owned by the banks. Replacing the paper system for recording mortgages with an electronic system saved the banks billions, led to rampant fraud by the banks, cost innocent people their homes, and enabled the derivatives that crashed the economy in 2008.

Lets look at some of the implications of global distributed ledger systems. The day before the nearly $70M theft from Bitfinex Izabella Kaminska at the Financial Times, whose work in the area has been consistently and appropriately skeptical, posted Bitcoin’s panopticon problem pointing out that because the average Bitcoin user needs intermediary services such as Coinbase:
the average customer needs to give up as much if not more personal data, often by more dubious means (online upload mechanisms, email or the post) to much less experienced organisations. Once in the system, meanwhile, customer transactions can be linked on a much broader and more publicly intrusive level than anything in the standing banking system. Moreover, there are no associated par value or liquidity guarantees for the customer if and when things go wrong.So cryptocurrencies are mostly:
a giant privacy bait and switch. There are simply no money transmitter institutions of Coinbase’s size that can afford to operate in defiance of the law of the land, unless they care to be based in the sort of jurisdictions most other banking institutions won’t care to do business with.

Meanwhile, if the cost in the banking system is indeed mostly related to the cost credit checking, due diligence and policing non-compliance, it’s worth considering how exactly the likes of Coinbase improve on these processes vis-a-vis traditional institutions?Once the Bitfinex theft hit the headlines, Kaminska was off and running, first with Time to reevaluate blockchain hype:
The mark-to-market value of the stolen coins is roughly $70m, but again who can really tell their true worth. Bitcoin is an asset class where the liquidation of 119,756 (approximately 0.8 per cent of the total bitcoin circulation) can move the market more than 20 per cent, suggesting a certain fantastical element to the valuation.and:
We probably won’t know what really happened at Bitfinex for a while. But what is clear is that thus far the technology which was supposed to be revolutionising finance and making it more secure (oddly, by skirting regulations) is looking awfully like the old technology which ran the system into the ground.

Either way it’s unlikely to be good news for Bitfinex. If the failing was down to a problem with the multi-signature mechanism, then the affair potentially stands to undermine many of the blockchain systems and companies which have come to rely on the system for security. On the same basis it also stands to undermine the side-chain and escrow-based solutions bitcoin developers are working on to overcome the bitcoin network’s scaling constraint.

If the failing was down to an internal security breach or poor risk management on the other hand (say due to naivety or inexperience), this creates an argument for additional capital provisioning, regulatory scrutiny and macroprudential oversight — taking away much of the cost advantage associated with the network.Two days later Kaminska was back with Day three post Bitfinex hack: Bitcoin bailouts, liabilities and hard forks, among other interesting observations returning to the panopticon issue:
The first relates to the ongoing legal recourse rights of Bitfinex victims. Even though they may have lost their right to pursue Bitfinex for compensation, they are still going to be entitled to track the funds across the blockchain to seek recourse from whomsoever receives the bitcoins in their accounts. That’s good news for victims, but mostly likely very bad news for bitcoin’s fungible state and thus its status as a medium of exchange.

Just one successful claim by a victim who tracks his funds to an identifiable third party, and the precedent is set. Any exchanges dealing with bitcoin in a legitimate capacity would from then on be inclined to do much stronger due diligence on whether the bitcoins being deposited in their system were connected to ill-gotten gains. This in turn would open the door to the black-listing of funds that can not prove they were originated honestly via legitimate earnings.This got Tim Worstall at Forbes going with Bitcoin's Latest Economic Problem - Market Ouvert Or Squatters' Rights.
Of course, people should not steal things. And yet for a currency to work it has to be possible to take the currency at its face value. Thus it may well be that the bank robber paid you for his beer with stolen money but you got it fair and square and thus the bank doesn’t get it back as and when they find out. Another way to put this is that the crime dies with the criminal. And yet the blockchain upends all of that. Because every transaction which any one bitcoin has been involved in is traceable.Three days later, Kaminska returned with Bitfinex and a 36 percent charge from the school of life:
Publicly, the Hong Kong-based bitcoin exchange Bitfinex has lumped its users with a 36 per cent haircut on all balances to cover the $70m hack which it experienced last week.

The haircut applies to all customers irrespective of whether they were holding bitcoin balances or dollar balances or other altcoin balances. ...

Privately and anecdotally, however, customers are reporting some variance with regard to the way the haircut is being imposed. Some US customers, for example, who only had dollar balances are reporting they’ve been able to get all their money back.Another three days and Kaminska posted How I learned to stop blockchain obsessing and love the Barry Manilow, a sustained analogy between the hype cycle of music and fashion, and the hype cycle of blockchain technology which argues:
there is some commentary emerging to suggest we are indeed in a phase transition and what’s cool isn’t the blockchain anymore but rather the defiant acknowledgement that the old operating system — for all its flaws — is built on the right regulatory, legal and trusted foundations after all and just needs some basic tweaking.and goes on to point to a number of very interesting such commentaries, starting with Credit Suisse:
The buzz surrounding blockchain is comparable to that surrounding the internet in the late 1980s – some go as far as to suggest that blockchain has the potential to reimagine and reinvent key institutions – for example, the corporation. We are less sanguine, and note eight key challenges that have the potential to limit the utility, and therefore reduce adoption, of blockchain systems.Every one of the eight is apposite, especially:
8. A forked road, the lesson of the DAO attack… The DAO attack exposed flaws in smart contracts on Ethereum which should act as a reminder that nascent code is susceptible to bugs before it is truly tire-kicked, and even then, complete surety is never guaranteed. The ‘hard fork’ undertaken by the Ethereum community also shows that blockchains are only immutable when consensus wants them to be. So in practice blockchains are decentralized (not), anonymous (not and not), immutable (not), secure (not), fast (not) and cheap (not). What's (not) to like?

Equinox Software: Evergreen 2006

planet code4lib - Thu, 2016-08-18 14:48

I have trouble pinpointing the exact moment when Evergreen was conceived, even though I was one of the principal agents.  Let me start by covering a few acronyms: PINES is the name of a statewide inter-lending library consortium project in Georgia, one of the largest of its kind.  GPLS is short for the Georgia Public Library Service, a state agency that administers PINES and many other library projects.  In 2002, I was hired by GPLS for PINES as a contractor to develop an add-on reporting system to address the limitations of their then library automation system.  With the success of that project, I was hired on as a full-time employee in 2003 to maintain and further develop that system, as well as create other needed software solutions to prop up their existing system.  It was during that time that we lobbied for and eventually received the go-ahead to develop Evergreen, and in 2004 we hired Mike Rylander and Bill Erickson to help develop the software.  In 2006, PINES went live on Evergreen and the rest is history.

But there’s a part of the story that doesn’t get told often enough, and that’s the influence of the free/libre and open-source software movements, with the likes of Richard M. Stallman, who wrote the GNU General Public License and started the Free Software Foundation, Eric S. Raymond, who wrote the Cathedral and the Bazaar, Larry Wall, the creator of the Perl programming language, and Linus Torvalds, the creator of the Linux kernel and more recently, the Git version control system.  I was (and am) a huge open source and free software advocate; I cut my teeth on Linux during college and followed the battles between open source and proprietary software very closely.  There were huge forces arrayed against us; Microsoft was abusing their monopoly power preventing OEM’s from installing Linux while calling open source a cancer, the SCO lawsuit was happening in 2003, and most governments and governmental agencies were very skeptical of open source software, GPLS included.  It’s funny how some of those same battles were later mirrored in our efforts.

There are some very philosophical reasons why open source software meshes well with libraries (software developers even collect code into “libraries”), and while we did use those as arguments in our appeal for Evergreen, it was really the pragmatic aspects that made it all possible.

1) The building blocks in the software world were a lot bigger than they used to be (and this trend continues to be true), and increasingly open source themselves.  We didn’t have to constantly reinvent the wheel, and could use software like GCC, Linux, Apache, PostgreSQL, Ejabberd, Mozilla, CVS (and then Subversion, Bazaar, and Git), MARC::Record, Simple2ZOOM, etc.  We could be informed by and share code with other open source efforts like Koha.

2) Each of these open source applications had (and continues to have) development communities and ecosystems that we could participate in (including the wider open source community as a whole).  We could (and do) leverage volunteers and domain experts who just want to help out.  Or pay people if we needed to (we did that too, for example, with some enhancements to PostgreSQL).

And all of this not starting from scratch actually allowed us to start from scratch, with more modern design paradigms.    For example, we made real use (not mere buzzword compliance) of relational databases and a service-oriented architecture.

Most importantly, we were already using these things prior to Evergreen in our daily work, and demonstrated what just a single developer could do with modern open source tools and software.  Now, with almost a dozen active committers and many more contributors of domain expertise, documentation, testing, etc.–in other words, a community of our own–we’re pretty much unstoppable.  Happy Birthday Evergreen!

— Jason Etheridge, Community and Migration Manager

 

DuraSpace News: DSpace, Fedora and VIVO Hold Project Steering Group Elections

planet code4lib - Thu, 2016-08-18 00:00

Austin, TX  Elections were held in July to choose members who will serve on DuraSpace community-supported open source project steering groups for three year terms. The DSpace, Fedora and VIVO Projects are pleased to announce the following results and extend a warm welcome to new Steering Group members.

DSPACE

DSpace Project leaders look forward to the efforts of the following new Steering Group members in helping to guide the project.

Karen Coyle: Classification, RDF, and promiscuous vowels

planet code4lib - Wed, 2016-08-17 23:23
"[He] provided (i) a classified schedule of things and concepts, (ii) a series of conjunctions or 'particles' whereby the elementary terms can be combined to express composite subjects, (iii) various kinds of notational devices ... as a means of displaying relationships between terms." [1]

"By reducing the complexity of natural language to manageable sets of nouns and verbs that are well-defined and unambiguous, sentence-like statements can be interpreted...."[2]

The "he" in the first quote is John Wilkins, and the date is 1668.[3] His goal was to create a scientifically correct language that would have one and only one term for each thing, and then would have a set of particles that would connect those things to make meaning. His one and only one term is essentially an identifier. His particles are linking elements.

The second quote is from a publication about OCLC's linked data experiements, and is about linked data, or RDF. The goals are so obviously similar that it can't be overlooked. Of course there are huge differences, not the least of which is the technology of the time.*

What I find particularly interesting about Wilkins is that he did not distinguish between classification of knowledge and language. In fact, he was creating a language, a vocabulary, that would be used to talk about the world as classified knowledge. Here we are at a distance of about 350 years, and the language basis of both his work and the abstract grammar of the semantic web share a lot of their DNA. They are probably proof of some Chomskian theory of our brain and language, but I'm really not up to reading Chomsky at this point.

The other interesting note is how similar Wilkins is to Melvil Dewey. He wanted to reform language and spelling. Here's the section where he decries alphabetization because the consonants and vowels are "promiscuously huddled together without distinction." This was a fault of language that I have not yet found noted in Dewey's work. Could he have missed some imperfection?!





*Also, Wilkins was a Bishop in the Anglican church, and so his description of the history of language is based literally on the Bible, which makes for some odd conclusions.



[1]Schulte-Albert, Hans G. Classificatory Thinking from Kinner to Wilkins: Classification and Thesaurus Construction, 1645-1668. Quoting from Vickery, B. C. "The Significance of John Wilkins in the History of Bibliographical Classification." Libri 2 (1953): 326-43.
[2]Godby, Carol J, Shenghui Wang, and Jeffrey Mixter. Library Linked Data in the Cloud: Oclc's Experiments with New Models of Resource Description. , 2015.
[3] Wilkins, John. Essay Towards a Real Character, and a Philosophical Language. S.l: Printed for Sa. Gellibrand, and for John Martyn, 1668.

SearchHub: Learning to Rank in Solr

planet code4lib - Wed, 2016-08-17 20:08

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Bloomberg’s Michael Nilsson and Diego Ceccarelli’s talk, “Learning to Rank in Solr”.

In information retrieval systems, learning to rank is used to re-rank the top X retrieved documents using trained machine learning models. The hope is that sophisticated models can make more nuanced ranking decisions than a standard Solr query. Bloomberg has integrated a reranking component directly into Solr, enabling others to easily build their own learning to rank systems and access the rich matching features readily available in Solr. In this session, Michael and Diego review the internals of how Solr and Lucene score documents and present Bloomberg’s additions to Solr that enable feature engineering, feature extraction, and reranking.

Michael Nilsson is a software engineer working at Bloomberg LP, and has been a part of the company’s Search and Discoverability team for four years. He’s used Solr to build the company’s terminal cross domain search application, searching though millions of people, companies, securities, articles, and more.

Diego Ceccarelli is a software engineer at Bloomberg LP, working in the News R&D team. His work focuses on improving search relevance in the news search functions. Before joining Bloomberg, Diego was a researcher in Information Retrieval at the National Council of Research in Italy, whilst completing his Ph.D. in the same field at the University of Pisa. He is experienced in Lucene and Solr, dating back to his work on the Europeana project in 2010, and since then enjoys diving into these technologies.

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bloomberg LP from Lucidworks

Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Learning to Rank in Solr appeared first on Lucidworks.com.

LITA: Jobs in Information Technology: August 17, 2016

planet code4lib - Wed, 2016-08-17 19:35

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Oak Park Public Library, Web Services Specialist, Oak Park, IL

Denver Public Library, Digital Project Manager, Denver, CO

Denver Public Library, Technology Access and Training Manager, Denver, CO

The Folger Shakespeare Library, Digital Strategist, Washington, DC

Darien Library, Senior Technology Assistant, Norwalk, CT

Champlain College, Technology Librarian, Burlington, VT

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Equinox Software: 4,208 Days, 22,653 Commits, 1,883,352 Lines of Code

planet code4lib - Wed, 2016-08-17 16:09

Ten years ago, something remarkable happened. A brand new open source ILS went live in over 200 libraries across the state of Georgia. While the migration happened in a matter of days, it was the culmination of two years’ worth of work by a small team.

Today, that same open source ILS is utilized by more than 1,500 libraries all over the United States, Canada, and across the world. The small team has grown into an active community, supporting and improving the software each year. That software is Evergreen and Equinox is beyond proud to be the leading provider of support and development services for it.

As we approach Evergreen’s tenth birthday–Labor Day weekend–we’ll look at each year of Evergreen’s life. Equinox Team Members will be posting a blog post each day leading up to Labor Day, beginning on Thursday, August 18 (That’s tomorrow! Yay!).  Join us as we take a closer look at the software that has brought so many people together.

Evergreen’s Baby Picture handwritten by Mike Rylander

Open Knowledge Foundation: An interview with Rufus Pollock – Why I am Excited about MyData 2016 in Finland

planet code4lib - Wed, 2016-08-17 10:50

A few weeks ago I sat down for a virtual interview with Molly Schwartz from Open Knowledge Finland about my thoughts on open data and mydata and why I am so excited about the MyData 2016 conference. The three-day conference is taking place from August 31 to September 2 in Helsinki and is being organized by Open Knowledge Finland in partnership with Aalto University and Fing.

You can register for MyData 2016 here. Discount price for the members of the Open Knowledge Network is just 220 eur / 3 day conference. Ask for the discount code from (jogi@okf.fi) before registering at the MyData 2016 Holvi store. You can also still apply to be a volunteer for the conference.

This event shares many of the same organizers as the 2012 Open Knowledge Festival in Helsinki so you can expect the same spirit of fun, creativity and quality that made that such an incredible experience.

Transcript:

Molly Schwartz: So hi everybody, this is Molly Schwartz here, one of the team members helping to put on the MyData conference in Helsinki from August 31 to September 2. And I’m sitting here with one of our plenary speakers, Dr. Rufus Pollock, who is one of founders and the president of Open Knowledge, a worldwide network working to provide access to more open and broad datasets. So we’re very excited to have him here. So, Rufus, something that not a lot of people know is that MyData is actually an initiative that was born out the Finnish chapter of Open Knowledge (OKFFI), how do you feel about things that were kind of started by your idea springing up of their own accord?

Rufus Pollock: Well, it’s inspirational and obviously really satisfying. And not just in a personal way: it’s just wonderful to see how things flourish. Open Knowledge Finland have been an incredibly active chapter. I first went to Finland in I think it was 2010, and I was inspired then. Finland is just a place where you have a feeling you are in a wise society. The way they approach things, they’re very engaged but they have that non-attachment, a rigor of looking at things, and also trying things out. Somehow there’s not a lot of ego, people are very curious to learn and also to try things out, and I think deep down are incredibly innovative.

And I think this event is really in that tradition. I think the area of personal data and MyData is a huge issue, and one with a lot of connections to open data, even if it’s distinct. So I think it’s a very natural thing for a chapter from Open Knowledge to be taking on and looking at because it’s central to how we look at the information society, the knowledge society, of the 21st Century.

MS: Definitely. I totally agree. I like that you brought up that this concept of personal data is somewhat distinct, but it’s inevitably tied to this concept of opening data. Oftentimes opening datasets, you’re dealing with personal datasets as well. So, what are the kind of things you’re planning to speak about, loosely, at the conference, and what do you look forward to hearing from other people who will be at the MyData?

RP: Yes, that’s a great question. So, what am I looking to talk about and engage with and what am I looking forward to hearing about? Well, maybe I’ll take the second first.

What I am looking forward to

I think one of the reasons I’m really excited to participate and come is it’s the area where – even though I obviously know a lot about data and open data – this area of personal data is one where I am not as much an expert – by a long way. So I’m really curious to hear about it and especially about things like: what is the policy landscape? What do people think are the big things that are coming up? I’m really interested to see what the business sector is looking at.

There’s been quite a lot of discussion about how one could innovate in this space in a way that is both a wider opportunity for people to use data, personal data in usable ways, maybe in health care, maybe in giving people credit, I mean in all kinds of areas. But how do you do that in a way that respects and preserves people’s privacy, and so on. So, I think that’s really interesting as well, and again I’m not so up on that space. I’m looking forward to meeting and hearing from some of the people in that area.

And similarly on the policy, on the business side, and also on the civil society side and on the research side. I’ve heard about things like differential privacy and some of the breakthroughs we’ve had over the last years about how one might be able to allow people like researchers to analyse information, like genetics, like healthcare without getting direct access to the individual data and creating privacy issues. And there’s clearly a lot of value one could have from researchers being able to look at, for example, at genomic data from individuals across a bunch of them. But it’s also crucial to be able to preserve privacy there, and what are the kind of things going on there? And the research side I think would also touch on the policy side of matters as well.

What I would like to contribute

That brings me to what, for my part, I would like to contribute. I think Open Knowledge and we generally are on a journey at a policy level. We’ve got this incredible information revolution, this digital revolution, which means we’re living in a world of bits, and we need to make sure that world works for everyone. And that it works, in the sense that, rather than delivering more inequality – which it could easily do – and more exploitation, it give us fairness and empowerment, it brings freedom rather than manipulation or oppression. And I think openness is just key there.

And this vision of openness isn’t limited to just government – we can do it for all public datasets. By public datasets I don’t just mean government datasets, I mean datasets that you can legitimately share with anyone.

Now private, personal data you can’t legitimately give to anyone, or share with anyone — or you shouldn’t be able to!

So I think an interesting question is how those two things go together — the public datasets and the private, personal data. How they go together both in overall policy, but also in the mind of the public and of citizens and so on — how are they linked?

And this issue of how we manage information in the 21st century doesn’t just stop at some line where it’s like, oh, it’s public data, you know, and therefore we can look at it this way. Those of us working to make a world of open information have to look at private data too.

At Open Knowledge we have always had this metaphor of a coin. And one side of this coin is public data, e.g. government data. Now that you can open to everyone, everyone is empowered to have access. Now the flip side of that coin is YOUR data, your personal data. And your data is yours: you should get to choose how it’s shared and how it’s used.

Now while Open Knowledge is generally focused on, if you like, the public side, and will continue to do so, overall I think across the network this issue of personal data is just huge, it has this huge linkage. And I think the same principles can be applied. Just as for open data what we say is that people have freedom to access, share, use, government, whatever data is being opened, so with YOUR data, YOU should be empowered to access, share, and use that, as YOU see fit. And right now that is just not the case. And that’s what leads to the abuses we get concerned about, but it’s also what stops some of the innovation and stops people from being empowered and able to understand and take action on their own lives — what might you learn from having my last five years of, say, your shopping receipts or mobile phone location data.

Ultimately what happens to public data and what happens to personal data, they’re interconnected, both in people’s minds and, in a sense, they don’t just care about one thing or another, they care about, how is digital information going to work, how’s my data going to be managed, how’s the world’s data going to be managed.

I also think MyData is some of the most relevant issues for ordinary people. For example, just recently I had to check if someone paid me and it was just a nightmare. I had to scroll back through endless screens on my online banking account to find ways to download different files to piece it all together. Why didn’t they let me download all the data in a convenient way rather than having to dig forever and then only get the last three months. They’ve got that data on their servers, why can’t I have it? And, you know, maybe not only do I want it, but maybe there’s some part I would share anonymized, it could be aggregated and we could discover patterns that might be important — just, as one example we might be able to estimate inflation better. Or take energy use: I would happily share my house’s energy use data with people, even if it does tell you when I go to bed, I’d be happy to share if that let’s us discover how to make things environmentally better.

The word I think at the heart of it is empowerment. We at Open Knowledge want to see people empowered in the information age to understand, to make choices, to hold power to account, and one of the fundamental things is you being empowered with the information about you that companies or governments have, and we think you should be given access to that, and you should be choosing who else has access to it, and not the company, and not the government, per se.

MS: Yes. And that’s exactly the principle of why MyData came out of Open Knowledge, as you mentioned earlier, the idea of why can not these principles of Open Knowledge, of the datasets we want to be receiving, also apply to our data that we would like to be open in the same way back to us?

RP: Absolutely correct Molly, I mean just yes, absolutely.

MS: And that’s why it’s also so interesting, so many people have been talking about this kind of inherent tension between openness and privacy, and kind of, changing how we’re thinking about that, and seeing it actually as the same principles just being applied to individual people.

RP: Exactly, back in 2013 I wrote a post with my co-CEO Laura James about this idea and even used the term MyData. There’s an underlying unity that you’re pointing out that actually is a deep principle.

Remember openness isn’t an end in itself, right, it’s a means to an end – like money! And the purpose of having information and opening it up is to empower human beings to do something, to understand, to innovate, to learn, to discover, to earn a living, whatever it is. And that idea of empowerment, fundamentally, is common in both threads, both to MyData and personal data and access to that, and the access to public data for everyone. So I think you are totally right.

MS: Yes. So, thank you so much Rufus for joining us today, we are so looking forward to having you at the conference. You mention that you’ve been to Finland before. How long ago was that?

RP: I was there in 2012 for Open Knowledge Festival which was amazing. And then in 2010. Finland is an amazing place, Helsinki is an amazing place, and it will be an amazing event, so I really invite you to come along to the conference.

MS: I second that, and it’s many of the same people who are involved in organizing the Open Knowledge Festival who are involved in organizing MyData, so we can expect much of the same.

RP: A brilliant programme, high quality people. An incredible kind of combination of kind of joy and reliability, so you’ll have an amazing time, come join us.

MS: Yes. Ok, so thank you Rufus, and we will see you in August!

RP: See you in August!

LibUX: A practical security guide for web developers

planet code4lib - Wed, 2016-08-17 03:50

Lisa Haitz in slack pointed out this gem of a repo, intended to be a practical security guide for web developers.

Security issues happen for two reasons –

1. Developers who have just started and cannot really tell a difference between using MD5 or bcrypt.
2. Developers who know stuff but forget/ignore them.

Our detailed explanations should help the first type while we hope our checklist helps the second one create more secure systems. This is by no means a comprehensive guide, it just covers stuff based on the most common issues we have discovered in the past.

Their security checklist — I think — demonstrates just how involved web security can be in that, first, no wonder so many mega-sites have been hacked in the last year, and second, libraries probably aren’t ready for anticipatory design.

A practical security guide for developers

LibUX: UI Content Resources

planet code4lib - Wed, 2016-08-17 03:38

In our slack, Penelope Singer shared this mega-list of articles, books, and examples for making good content, establishing a content strategy, and the like.

You’ll find this list useful if:
* You’re a writer working directly with an interface
* You’re a designer that is often tasked with writing user interface copy
* You’re a content strategist working on a product and want to learn more about the words used in an interface
* You’re a copywriter and want to learn more about user experience

UI Content Resources

LibUX: Circulating Ideas #99: Cecily Walker

planet code4lib - Wed, 2016-08-17 02:22

We — Amanda and Michael — were honored to guest-host an episode of Circulating Ideas, interviewing Cecily Walker about design thinking and project management. Steve Thomas was nice enough to let us re-broadcast our interview.

Cecily Walker is a librarian at Vancouver Public Library, where she focuses on user experience, community digital projects, digital collections, and the intersection of social justice, technology, and public librarianship. It was her frustration with the way that software was designed to meet the needs of highly technical users rather than the general public that led her to user experience, but it was her love of information, intellectual freedom, and commitment to social justice that led her back to librarianship. Cecily can be found on Twitter (@skeskali) where she frequently holds court on any number of subjects, but especially lipstick.

Show notes

This Vancouver
“UX, consideration, and a CMMI-based model” [Coral Sheldon-Hess]
“Mindspring’s 14 Deadly Sins”
Cecily on Twitter

Subscribe

If you like, you can download the MP3 or subscribe to LibUX on StitcheriTunes, YouTube, Soundcloud, Google Play Music, or just plug our feed straight into your podcatcher of choice.

William Denton: Bad-Ass Librarians

planet code4lib - Wed, 2016-08-17 01:14

I saw this at the bookstore today and bought it immediately: The Bad-Ass Librarians of Timbuktu and Their Race to Save the World’s Most Precious Manuscripts, by Joshua Hammer.

I’ll try to do a review when I’ve read it, but in the meantime, anything about bad-ass librarians needs to be shared with all the other bad-ass librarians out there.

Karen Coyle: The case of the disappearing classification

planet code4lib - Wed, 2016-08-17 00:14
I'm starting some research into classification in libraries (now that I have more time due to having had to drop social media from my life; see previous post). The main question I want to answer is: why did research into classification drop off at around the same time that library catalogs computerized? This timing may just be coincidence, but I'm suspecting that it isn't.

 I was in library school in 1971-72, and then again in 1978-80. In 1971 I took the required classes of cataloging (two semesters), reference, children's librarianship, library management, and an elective in law librarianship. Those are the ones I remember. There was not a computer in the place, nor do I remember anyone mentioning them in relation to libraries. I was interested in classification theory, but not much was happening around that topic in the US. In England, the Classification Research Group was very active, with folks like D.J. Foskett and Brian Vickery as mainstays of thinking about faceted classification. I wrote my first published article about a faceted classification being used by a UN agency.[1]

 In 1978 the same school had only a few traditional classes. I'd been out of the country, so the change to me was abrupt. Students learned to catalog on OCLC. (We had typed cards!) I was hired as a TA to teach people how to use DIALOG for article searching, even though I'd never seen it used, myself. (I'd already had a job as a computer programmer, so it was easy to learn the rules of DIALOG searching.) The school was now teaching "information science". Here's what that consisted of at the time: research into term frequency of texts; recall and precision; relevance ranking; database development.

I didn't appreciate it at the time, but the school had some of the bigger names in these areas, including William Cooper and M. E. "Bill" Maron. (I only just today discovered why he called himself Bill - the M. E., which is what he wrote under in academia, stands for "Melvin Earl". Even for a nerdy computer scientist, that was too much nerdity.) 1978 was still the early days of computing, at least unless you were on a military project grant or worked for the US Census Bureau. The University of California, Berkeley, did not have visible Internet access. Access to OCLC or DIALOG was via dial-up to their proprietary networks. (I hope someone has or will write that early history of the OCLC network. For its time it must have been amazing.)

The idea that one could search actual text was exciting, but how best to do it was (and still is, to a large extent) unclear. There was one paper, although I so far have not found it, that was about relevance ranking, and was filled with mathematical formulas for calculating relevance. I was determined to understand it, and so I spent countless hours on that paper with a cheat sheet beside me so I could remember what uppercase italic R was as opposed to lower case script r. I made it through the paper to the very end, where the last paragraph read (as I recall): "Of course, there is no way to obtain a value for R[elevance], so this theory cannot be tested." I could have strangled the author (one of my profs) with my bare hands.

Looking at the articles, now, though, I see that they were prescient; or at least that they were working on the beginnings of things we now take for granted. One statement by Maron especially strikes me today:
A second objective of this paper is to show that about is, in fact, not the central concept in a theory of document retrieval. A document retrieval system ought to provide a ranked output (in response to a search query) not according to the degree that they are about the topic sought by the inquiring patron, but rather according to the probability that they will satisfy that person‘s information need. This paper shows how aboutness is related to probability of satisfaction.[2] This is from 1977, and it essentially describes the basic theory behind Google ranking. It doesn't anticipate hyperlinking, of course, but it does anticipate that "about" is not the main measure of what will satisfy a searcher's need. Classification, in the traditional sense, is the quintessence of about. Is this the crux of the issue? As yet, I don't know. More to come.

[1]Coyle, Karen (1975). "A Faceted Classification for Occupational Safety and Health". Special Libraries. 66 (5-6): 256–9.
[2]Maron, M. E. (1977) "On Indexing, Retrieval, and the Meaning of About". Journal of the American Society for Information Science, January, 1977, pp. 38-43

DuraSpace News: VIVO Updates for August 14–VIVO 1.9 Cheat Sheet, Conference, Survey

planet code4lib - Wed, 2016-08-17 00:00

From Mike Conlon, VIVO project director

Karen Coyle: This is what sexism looks like: Wikipedia

planet code4lib - Tue, 2016-08-16 21:00
We've all heard that there are gender problems on Wikipedia. Honestly there are a lot of problems on Wikipedia, but gender disparity is one of them. Like other areas of online life, on Wikipedia there are thinly disguised and not-so thinly disguised attacks on women. I am at the moment the victim of one of those attacks.

Wikipedia runs on a set of policies that are used to help make decisions about content and to govern behavior. In a sense, this is already a very male approach, as we know from studies of boys and girls at play: boys like a sturdy set of rules, and will spend considerable time arguing whether or not rules are being followed; girls begin play without establishing a set of rules, develop agreed rules as play goes on if needed, but spend little time on discussion of rules.

If you've been on Wikipedia and have read discussions around various articles, you know that there are members of the community that like to "wiki-lawyer" - who will spend hours arguing whether something is or is not within the rules. Clearly, coming to a conclusion is not what matters; this is blunt force, nearly content-less arguing. It eats up hours of time, and yet that is how some folks choose to spend their time. There are huge screaming fights that have virtually no real meaning; it's a kind of fantasy sport.

Wiki-lawyering is frequently used to harass. It is currently going on to an amazing extent in harassment of me, although since I'm not participating, it's even emptier. The trigger was that I sent back for editing two articles about men that two wikipedians thought should not have been sent back. Given that I have reviewed nearly 4000 articles, sending back 75% of those for more work, these two are obviously not significant. What is significant, of course, is that a woman has looked at an article about a man and said: "this doesn't cut it". And that is the crux of the matter, although the only person to see that is me. It is all being discussed as violations of policy, although there are none. But sexism, as with racism, homophobia, transphobia, etc., is almost never direct (and even when it is, it is often denied). Regulating what bathrooms a person can use, or denying same sex couples marriage, is a kind of lawyering around what the real problem is. The haters don't say "I hate transexuals" they just try to make them as miserable as possible by denying them basic comforts. In the past, and even the present, no one said "I don't want to hire women because I consider them inferior" they said "I can't hire women because they just get pregnant and leave."

Because wiki-lawyering is allowed, this kind of harassment is allowed. It's now gone on for two days and the level of discourse has gotten increasingly hysterical. Other than one statement in which I said I would not engage because the issue is not policy but sexism (which no one can engage with), it has all been between the wiki-lawyers, who are working up to a lynch mob. This is gamer-gate, in action, on Wikipedia.

It's too bad. I had hopes for Wikipedia. I may have to leave. But that means one less woman editing, and we were starting to gain some ground.

The best read on this topic, mainly about how hard it is to get information that is threatening to men (aka about women) into Wikipedia: WP:THREATENING2MEN: Misogynist Infopolitics and the Hegemony of the Asshole Consensus on English Wikipedia
I have left Wikipedia, and I also had to delete my Twitter account because they started up there. I may not be very responsive on other media for a while. Thanks to everyone who has shown support, but if by any chance you come across a kinder, gentler planet available for habitation, do let me know. This one's desirability quotient is dropping fast.

SearchHub: Lessons from Sharding Solr at Etsy

planet code4lib - Tue, 2016-08-16 20:57

As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Gregg Donovan’s session, “Lessons from Sharding Solr at Etsy”.

Gregg covers the following lessons learned at Etsy while sharding Solr: How to enable SolrJ to handle distributed search fanout and merge; How to instrument Solr for distributed tracing so that distributed searches may be better understood, analyzed, and debugged; Strategies for managing latency in distributed search, including tolerating partial results and issuing backup requests in the presence of lagging shards.

Gregg Donovan is a Senior Software Engineer at Etsy.com in Brooklyn, NY, working on the Solr and Lucene infrastructure that powers more than 120 million queries per day. Gregg spoke at Lucene/Solr Revolution 2015 in Austin, Lucene Revolution 2011 in San Francisco, Lucene Revolution 2013 in San Diego, and previously worked with Solr and Lucene at TheLadders.com.

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy from Lucidworks

Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

The post Lessons from Sharding Solr at Etsy appeared first on Lucidworks.com.

Pages

Subscribe to code4lib aggregator