You are here

Feed aggregator

FOSS4Lib Recent Releases: Evergreen - 2.10.4

planet code4lib - Thu, 2016-05-26 00:37

Last updated May 25, 2016. Created by gmcharlt on May 25, 2016.
Log in to edit this page.

Package: EvergreenRelease Date: Wednesday, May 25, 2016

William Denton: CC-BY

planet code4lib - Thu, 2016-05-26 00:08

I’ve changed the license on my content to CC-BY: Creative Commons Attribution 4.0.

UPDATE 25 May 2016: The feed metadata is now updated too. “We copy documents based on metadata.”

Evergreen ILS: Evergreen 2.10.4 released

planet code4lib - Wed, 2016-05-25 21:55

We are pleased to announce the release of Evergreen 2.10.4, a bug fix release.

Evergreen 2.10.4 fixes the following issues:

  • Fixes the responsive view of the My Account Items Out screen so that Title and
    Author are now in separate columns.
  • Fixes an incorrect link for the MVF field definition and adds a new link to
    BRE in fm_IDL.xml.
  • Fixes a bug where the MARC stream authority cleanup deleted a bib
    record instead of an authority record from the authority queue.
  • Fixes a bug where Action Triggers could select an inactive event
    definition when running.
  • Eliminates the output of a null byte after a spool file is processed
    in MARC steam importer.
  • Fixes an issue where previously-checked-out items did not display in
    metarecord searches when the Tag Circulated Items Library Setting is
    enabled.
  • Fixes an issue in the 0951 upgrade script where the script was not
    inserting the version into config.upgrade_log because the line to do so
    was still commented out.

Please visit the downloads page to retrieve the server software and staff clients

John Mark Ockerbloom: Sharing journals freely online

planet code4lib - Wed, 2016-05-25 19:19

What are all the research journals that anyone can read freely online?  The answer is harder to determine than you might think.  Most research library catalogs can be searched for online serials (here’s what Penn Libraries gives access to, for instance), but it’s often hard for unaffiliated readers to determine what they can get access to, and what will throw up a paywall when they try following a link.

Current research

The best-known listing of current free research journals has been the Directory of Open Access Journals (DOAJ), a comprehensive listing of free-to-read research journals in all areas of scholarship. Given the ease with which anyone can throw up a web site and call it a “journal” regardless of its quality or its viability, some have worried that the directory might be a little too comprehensive to be useful.  A couple of years ago, though, DOAJ instituted more stringent criteria for what it accepts, and it recently weeded its listings of journals that did not reapply under its new criteria, or did not meet its requirements.   This week I am pleased to welcome over 8,000 of its journals to the extended-shelves listings of The Online Books Page.  The catalog entries are automatically derived from the data DOAJ provides; I’m also happy to create curated entries with more detailed cataloging on readers’ request.

Historic research

Scholarly journals go back centuries.  Many of these journals (and other periodicals) remain of interest to current scholars, whether they’re interested in the history of science and culture, the state of the natural world prior to recent environmental changes, or analyses and source documents that remain directly relevant to current scholarship.  Many older serials are also included in The Online Books Page’s extended shelves courtesy of HathiTrust, which currently offers over 130,000 serial records with at least some free-to-read content.  Many of these records are not for research journals, of course, and those that are can sometimes be fragmentary or hard to navigate.  I’m also happy to create organized, curated records for journals offered by HathiTrust and others at readers’ request.

It’s important work to organize and publicize these records, because many of these journals that go back a long way don’t make their content freely available in the first place one might look.  Recently I indexed five journals founded over a century ago that are still used enough to be included in Harvard’s 250 most popular works: Isis, The Journal of Comparative Neurology, The Journal of Infectious Diseases, The Journal of Roman Studies, and The Philosophical Review.  All five had public domain content offered at their official journal site, or JSTOR, behind paywalls (with fees for access ranging from $10 to $42 per article) that was available for free elsewhere online.  I’d much rather have readers find the free content than be stymied by a paywall.  So I’m compiling free links for these and other journals with public domain runs, whether they can be found at Hathitrust, JSTOR (which does make some early journal content, including from some of these journals, freely available), or other sites.

For many of these journals, the public domain extends as late as the 1960s due to non-renewal of copyright, so I’m also tracking when copyright renewals actually start for these journals.  I’ve done a complete inventory of serials published until 1950 that renewed their own copyrights up to 1977.  Some scholarly journals are in this list, but most are not, and many that are did not renew copyrights for many years beyond 1922.  (For the five journals mentioned above, for instance, the first copyright-renewed issues were published in 1941, 1964, 1959, 1964, and 1964 respectively– 1964 being the first year for which renewals were automatic.)

Even so, major projects like HathiTrust and JSTOR have generally stopped opening journal content at 1922, partly out of a concern for the complexity of serial copyright research.  In particular, contributions to serials could have their own copyright renewals separate from renewals for the serials themselves.  Could this keep some unrenewed serials out of the public domain?  To answer this question, I’ve also started surveying information on contribution renewals, and adding information on those renewals to my inventory.  Having recently completed this survey for all 1920s serials, I can report that so far individual contributions to scholarly journals were almost never copyright-renewed on their own.  (Individual short stories, and articles for general-interest popular magazines, often were, but not articles intended for scientific or scholarly audiences.)  I’ll post an update if the situation changes in the 1930s or later. So far, though, it’s looking like, at least for research journals, serial digitization projects can start opening issues past 1922 with little risk.  There are some review requirements, but they’re comparable in complexity to the Copyright Review Management System that HathiTrust has used to successfully open access to hundreds of thousands of post-1922 public domain book volumes.

Recent research

Let’s not forget that a lot more recent research is also available freely online, often from journal publishers themselves.  DOAJ only tracks journals that make their content open access immediately, but there are also many journals that make their content freely readable online a few months or years after initial publication.  This content can then be found in repositories like PubMedCentral (see the journals noted as “Full” in the “participation” column), publishing platforms like Highwire Press (see the journals with entries in the “free back issues” column), or individual publishers’ programs such as Elsevier’s Open Archives.

Why are publishers leaving money on the table by making old but copyrighted content freely available instead of charging for it?  Often it’s because it’s what’s makes their supporters– scholars and their funders– happy.  NIH, which runs PubMedCentral, already mandates open access to research it funds, and many of the journals that fully participate in PubMedCentral’s free issue program are largely filled with NIH-backed research.  Similarly, I suspect that the high proportion of math journals in Elsevier’s Open Archives selection has something to do with the high proportion of mathematicians in the Cost of Knowledge protest against Elsevier.  When researchers, and their affiliated organizations, make their voices heard, publishers listen.

I’m happy to include listings for  significant free runs of significant research journals on The Online Books Page as well, whether they’re open access from the get-go or after a delay.  I won’t list journals that only make the occasional paid-for article available through a “hybrid” program, or those that only have sporadic “free sample” issues.  But if a journal you value has at least a continuous year’s worth of full-sized, complete issues permanently freely available, please let me know about it and I’ll be glad to check it out.

Sharing journal information

I’m not simply trying to build up my own website, though– I want to spread this information around, so that people can easily find free research journal content wherever they go.  Right now, I have a Dublin Core OAI feed for all curated Online Books Page listings as well as a monthly dump of my raw data file, both CC0-licensed.  But I think I could do more to get free journal information to libraries and other interested parties.  I don’t have MARC records for my listings at the moment, but I suspect that holdings information– what issues of which journals are freely available, and from whom– is more useful for me to provide than bibliographic descriptions of the journals (which can already be obtained from various other sources).  Would a KBART file, published online or made available to initiatives like the Global Open Knowledgebase, be useful?  Or would something else work better to get this free journal information more widely known and used?

Issues and volumes vs. articles

Of course, many articles are made available online individually as well, as many journal publishers allow.  I don’t have the resources at this point to track articles at an individual level, but there are a growing number of other efforts that do, whether they’re proprietary but comprehensive search platforms like Google Scholar and Web of Science, disciplinary repositories like ArXiV and SSRN, institutional repositories and their aggregators like SHARE and BASE, or outright bootleg sites like Sci-Hub.  We know from them that it’s possible to index and provide access to the scholarly knowledge exchange at a global scale, but doing it accurately, openly, comprehensively, sustainably, and ethically is a bigger challenge.   I think it’s a challenge that the academic community can solve if we make it a priority.  We created the research; let’s also make it easy for the world to access it, learn from it, and put it to work.  Let’s make open access to research articles the norm, not the exception.

And as part of that, if you’d like to help me highlight and share information on free, authorized sources for online journal content, please alert me to relevant journals, make suggestions in the comments here, or get in touch with me offline.


Library of Congress: The Signal: The Radcliffe Workshop on Technology & Archival Processing

planet code4lib - Wed, 2016-05-25 19:18

This is a guest post from Julia Kim, archivist in the American Folklife Center at the Library of Congress.

Professor Michael Connelly delivering keynote. Photo by Radcliffe Workshop on Technology and Archival Processing.

The annual meeting of the Radcliffe Technology Workshop (April 4th – April 5th, #radtech16) brought together historians, (digital) humanists and archivists for an intensive discussion of the “digital turn” and its effect on our work. The result was a focused and highly participatory meeting among professionals working across disciplinary lines with regards to our respective methodologies and codes of conduct. The talks and panels served as springboards for rich conversations addressing many of the big picture questions in our fields. Added to this was the use of round-table small group discussions after panel presentations, something that I wish was more a norm at professional events. This post covers only a small portion of the two days.

Matthew Connelly (Columbia University) asked “Will the coming of Big Data mean the end of history as we know it?” The answer was a resounding “yes.” Based on his years as a researcher at the National Archives and Records Administration (NARA), Connelly surveyed the history of government secrets, its inefficiencies, and the minuscule sample rate determining record retention and the resultant losses to the historical record of major world events. Part of his work as a researcher involved making use of these efforts to initiate the largest searchable collection of now de-classified government records with “The Declassification Engine” and the History Lab. In amassing and analyzing the largest data collection of declassified and unredacted records, their work uncovers secrets via systematic omission, for example. (Read more at Wired magazine.)

The next panel, “Connections and Context: A Moderated Conversation about Archival Processing for the Digital Humanities Generation,” was organized around archival processing challenges and included Meredith Evans (Jimmy Carter Presidential Library and Museum), Cristina Pattuelli (Pratt Institute), and Dorothy Waugh (Emory University).

  • Meredith Evans (Jimmy Carter Presidential Library and Museum) of “Documenting Ferguson,” discussed her work “Documenting the Now” and her efforts to push archivists outside of their comfort zone and into the community to collect documentation as events unfolded.
  • Cristina Pattuelli (Pratt Institute) presented on the Linked Jazz linked data pilot project, which pulls together tools into a single platform to create connections with jazz-musician data. The initial data, digitized oral history transcripts, is further enriched and mashed with other types of data sets, like discography information from Carnegie Hall. (Read the overview published on EDUCAUSE.)
  • Dorothy Waugh (Emory University) spoke to the researcher aspect — or more aptly, the lack of researchers — of born-digital collections. (I wrote a related story titled “Researcher Interactions with Born-Digital”.) Her work underlines the need to cultivate not only donors but also the researchers we hope will one day want to investigate time-date stamps and disk images, for example. While few collections are available for research, the lack of researchers using born-digital collections is also a problem. Researchers are unaware of collections and do not, in a sense, know how to approach using these collections. She is in the process of developing a pilot project with undergraduate students to remedy this.
  • Benjamin Moser, the authorized biographer of Susan Sontag, spoke of his own discomfort, at times, with a researcher’s abilities to exploit privileged knowledge in email. To Moser, email increased the responsibilities of both the archive and the researcher to work in a manner that is “tasteful” and underlined the need to define and educate others in what that may mean. (Read his story published in The New Yorker.)

Mary O’Connell Murphy introducing “Collections and Context” panel. Photo by Radcliffe Workshop on Technology and Archival Processing.

There were a number of questions and concerns that we discussed, such as: What course of action is necessary or right when community activists feel discomfort with their submissions? How can we make sure that these collections aren’t misused? How can we protect individuals from legal prosecution? What are our duties to donors, to the law, and to our professions, and how do individuals navigate the conflicts among their competing claims? How can we, across disciplines, develop a way of discussing these issues? If the archives are defined as an associated set of values and practices, how can we address the lack of consensus on how to (re)interpret them, in light of the challenges of digital collections?

Claire Potter (the New School) delivered a keynote entitled “Fibber McGee’s Closet: How Digital Research Transformed the Archive– But Not the History Department,” which underlined these new challenges and the need for history methodologies to shift alongside shifts in archival methodologies. “The Archive, of course, has always represented systems of cognition,” as Potter put it, “but when either the nature of the archive or the way the archive is used changes, we must agree to change with it.” Historians must learn to triage in the face of the increased volume, despite the slow pace at which educational and research models have moved. Potter called for archivists and historians to work together to support our complimentary roles in deriving meaning and use from collections. “The long game will be, historians, I hope, will begin to see archives and information technology as an intellectual and scholarly choice.” The Archives can be a teaching space and research space. (Read the text of her full talk.)

“Why Can’t We Stand Archival Practice on Its Head?” included three case studies experimenting with forms of “digitization as processing”- Larisa Miller (Hoover Institution, Stanford University), Jamie Roth and Erica Boudreau (John F. Kennedy Center Presidential Library and Museum), and Elizabeth Kelly (Loyola University, New Orleans).

  • ­Larisa Miller (Hoover Institution, Stanford University) reviewed the evolution of optical character recognition (OCR) and its use as a processing substitute. In comparing finding aids to these capabilities, she noted that “any access method will produce some winners and some losers.” Miller underscored the resource decisions that every archive must account for: Is this about finding aids or the best way to provide access? By eliminating archival processing, many more materials are digitized and made available to users. Ultimately, what methods maximize resources to get the most materials out to end users? In addition to functional reasons, Miller was critical of some core processing tasks: “The more arrangement we do, the more we violate original order.” (Read her related article published in The American Archivist.)
  • Jamie Roth and Erica Boudreau (John F. Kennedy Center Presidential Library and Museum) implemented multiple modes to test against one another: systematic digitization, digitization “on-demand” and simultaneous digitization while processing. Their talks emphasized impediments to digitization for access, such as their need to comply with legal requirements with restricted material and the lack of reliability with OCR. Roth emphasized that poor description still leads to lack of access or “access in name only.” They also cited researcher’s strong preferences for the analog original, even when given the option to use the digitized version.
  • Elizabeth Kelly (Loyola University, New Orleans) also experimented with folder-level metadata in digitizing university photographs. The scanning resulted in significant resource savings but surveyed users found the experimentally scanned collection “difficult to search and browse, but acceptable to some degree.” (Her slides are on Figshare.)

A great point from some audience members was that these types of item-level online displays are not viable information for data researchers. Item-level organization seems to be a carryover from the analog world that, once again, serves some and not others with their evaluations.

“Going Beyond the Click: A Moderated Conversation on the Future of Archival Description,” included Jarrett Drake (Princeton), Ann Wooton (PopUp Archive) and Kari Smith (Massachusetts Institute of Technology, but I’ll focus on Drake’s work. Drake, Smith, and Wooten all addressed the major insufficiencies in existing descriptive and access practices in different ways. Smith will publish a blog post with more information on MIT’s Engineering the Future of the Past this Friday, May 27.

  • Jarrett Drake (Princeton) spoke from his experiences at Princeton, as well as with “A People’s Archive for Police Violence in Cleveland.” He delivered an impassioned attack of foundational principles — such as provenance, appraisal and respect des fonds — as not only technically insufficient in a landscape of corporatized ownership in the cloud, university ownership of academic work and collaborative work, but also as unethical carryovers of our colonialist and imperialistic past. With this technological shift, however, he emphasized the greater possibility for change: “First, we occupy a moment in history in which the largest percentage of the world’s population ever possesses the power and potential to author and create documentation about their lived experiences.” (Read the full text of his talk.)

While I haven’t done justice to the talks and the ensuing conversation and debate, the Radcliffe Technology Workshop helped me to expand my own thinking by framing problems to include invested practitioners and theorists outside of the digital preservation sphere. To my knowledge it is also the only event of its kind.

LITA: Jobs in Information Technology: May 25, 2016

planet code4lib - Wed, 2016-05-25 18:36

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Pacific States University(PSU), Librarian, Los Angeles, CA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

John Mark Ockerbloom: Sharing journals freely online

planet code4lib - Wed, 2016-05-25 16:07

What are all the research journals that anyone can read freely online?  The answer is harder to determine than you might think.  Most research library catalogs can be searched for online serials (here’s what Penn Libraries gives access to, for instance), but it’s often hard for unaffiliated readers to determine what they can get access to, and what will throw up a paywall when they try following a link.

Current research

The best-known listing of current free research journals has been the Directory of Open Access Journals (DOAJ), a comprehensive listing of free-to-read research journals in all areas of scholarship. Given the ease with which anyone can throw up a web site and call it a “journal” regardless of its quality or its viability, some have worried that the directory might be a little too comprehensive to be useful.  A couple of years ago, though, DOAJ instituted more stringent criteria for what it accepts, and it recently weeded its listings of journals that did not reapply under its new criteria, or did not meet its requirements.   This week I am pleased to welcome over 8,000 of its journals to the extended-shelves listings of The Online Books Page.  The catalog entries are automatically derived from the data DOAJ provides; I’m also happy to create curated entries with more detailed cataloging on readers’ request.

Historic research

Scholarly journals go back centuries.  Many of these journals (and other periodicals) remain of interest to current scholars, whether they’re interested in the history of science and culture, the state of the natural world prior to recent environmental changes, or analyses and source documents that remain directly relevant to current scholarship.  Many older serials are also included in The Online Books Page’s extended shelves courtesy of HathiTrust, which currently offers over 130,000 serial records with at least some free-to-read content.  Many of these records are not for research journals, of course, and those that are can sometimes be fragmentary or hard to navigate.  I’m also happy to create organized, curated records for journals offered by HathiTrust and others at readers’ request.

It’s important work to organize and publicize these records, because many of these journals that go back a long way don’t make their content freely available in the first place one might look.  Recently I indexed five journals founded over a century ago that are still used enough to be included in Harvard’s 250 most popular works: Isis, The Journal of Comparative Neurology, The Journal of Infectious Diseases, The Journal of Roman Studies, and The Philosophical Review.  All five had public domain content offered at their official journal site, or JSTOR, behind paywalls (with fees for access ranging from $10 to $42 per article) that were available for free elsewhere online.  I’d much rather have readers find the free content than be stymied by a paywall.  So I’m compiling free links for these and other journals with public domain runs, whether they can be found at Hathitrust, JSTOR (which does make some early journal content, including from some of these journals, freely available), or other sites.

For many of these journals, the public domain extends as late as the 1960s due to non-renewal of copyright, so I’m also tracking when copyright renewals actually start for these journals.  I’ve done a complete inventory of serials published until 1950 that renewed their own copyrights up to 1977.  Some scholarly journals are in this list, but most are not, and many that are did not renew copyrights for many years beyond 1922.  (For the five journals mentioned above, for instance, the first copyright-renewed issues were published in 1941, 1964, 1959, 1964, and 1964 respectively– 1964 being the first year for which renewals were automatic.)

Even so, major projects like HathiTrust and JSTOR have generally stopped opening journal content at 1922, partly out of a concern for the complexity of serial copyright research.  In particular, contributions to serials could have their own copyright renewals separate from renewals for the serials themselves.  Could this keep some unrenewed serials out of the public domain?  To answer this question, I’ve also started surveying information on contribution renewals, and adding information on those renewals to my inventory.  Having recently completed this survey for all 1920s serials, I can report that so far individual contributions to scholarly journals were almost never copyright-renewed on their own.  (Individual short stories, and articles for general-interest popular magazines, often were, but not articles intended for scientific or scholarly audiences.)  I’ll post an update if the situation changes in the 1930s or later. So far, though, it’s looking like, at least for research journals, serial digitization projects can start opening issues past 1922 with little risk.  There are some review requirements, but they’re comparable in complexity to the Copyright Review Management System that HathiTrust has used to successfully open access to hundreds of thousands of post-1922 public domain book volumes.

Recent research

Let’s not forget that a lot more recent research is also available freely online, often from journal publishers themselves.  DOAJ only tracks journals that make their content open access immediately, but there are also many journals that make their content freely readable online a few months or years after initial publication.  This content can then be found in repositories like PubMedCentral (see the journals noted as “Full” in the “participation” column), publishing platforms like Highwire Press (see the journals with entries in the “free back issues” column), or individual publishers’ programs such as Elsevier’s Open Archives.

Why are publishers leaving money on the table by making old but copyrighted content freely available instead of charging for it?  Often it’s because it’s what’s makes their supporters– scholars and their funders– happy.  NIH, which runs PubMedCentral, already mandates open access to research it funds, and many of the journals that fully participate in PubMedCentral’s free issue program are largely filled with NIH-backed research.  Similarly, I suspect that the high proportion of math journals in Elsevier’s Open Archives selection has something to do with the high proportion of mathematicians in the Cost of Knowledge protest against Elsevier.  When researchers, and their affiliated organizations, make their voices heard, publishers listen.

I’m happy to include listings for  significant free runs of significant research journals on The Online Books Page as well, whether they’re open access from the get-go or after a delay.  I won’t list journals that only make the occasional paid-for article available through a “hybrid” program, or those that only have sporadic “free sample” issues.  But if a journal you value has at least a year’s worth of full-sized, complete issues permanently freely available, please let me know about it and I’ll be glad to check it out.

Sharing journal information

I’m not simply trying to build up my own website, though– I want to spread this information around, so that people can easily find free research journal content wherever they go.  Right now, I have a Dublin Core OAI feed for all curated Online Books Page listings as well as a monthly dump of my raw data file, both CC0-licensed.  But I think I could do more to get free journal information to libraries and other interested parties.  I don’t have MARC records for my listings at the moment, but I suspect that holdings information– what issues of which journals are freely available, and from whom– is more useful for me to provide than bibliographic descriptions of the journals (which can already be obtained from various other sources).  Would a KBART file, published online or made available to initiatives like the Global Open Knowledgebase, be useful?  Or would something else work better to get this free journal information more widely known and used?

Issues and volumes vs. articles

Of course, many articles are made available online individually as well, as many journal publishers allow.  I don’t have the resources at this point to track articles at an individual level, but there are a growing number of other efforts that do, whether they’re proprietary but comprehensive search platforms like Google Scholar and Web of Science, disciplinary repositories like ArXiV and SSRN, institutional repositories and their aggregators like SHARE and BASE, or outright bootleg sites like Sci-Hub.  We know from them that it’s possible to index and provide access to the scholarly knowledge exchange at a global scale, but doing it accurately, openly, comprehensively, sustainably, and ethically is a bigger challenge.   I think it’s a challenge that the academic community can solve if we make it a priority.  We created the research; let’s also make it easy for the world to access it, learn from it, and put it to work.  Let’s make open access to research articles the norm, not the exception.

And as part of that, if you’d like to help me highlight and share information on free, authorized sources for online journal content, please alert me to relevant journals, make suggestions in the comments here, or get in touch with me offline.


District Dispatch: Presidential campaigns weigh in on education & libraries

planet code4lib - Wed, 2016-05-25 15:22

Representatives from all three major Presidential campaigns are expected to participate in this week’s CEF Presidential Forum to be held May 26 in Washington. ALA will be participating in the half-day forum and encourages members to view and participate online.

Source: www.thisisamericanrugby.com”

ALA members are invited to follow the Forum online as the event will be live streamed  starting at 10:00 AM and running through 12:00 PM EST. ALA has submitted library-themed questions for the Presidential representatives, but you can participate in the event by submitting your questions at  SubmitQ@cef.org or tweeting your questions via twitter using #CEFpresForum.

The Committee for Education Funding (CEF) is hosting the 2016 Presidential Forum, which will emphasize education as a critical domestic policy and the need for continuing investments in education. At the forum, the high-level surrogates will discuss in depth the education policy agendas of the remaining candidates. A second panel of education experts from think tanks will discuss the educational landscape that awaits the next administration.  CEF has hosted Presidential Forums during previous elections.

Candy Crowley, award-winning journalist and former Chief Political Correspondent for CNN, will moderate both panels.

The post Presidential campaigns weigh in on education & libraries appeared first on District Dispatch.

David Rosenthal: Randall Munroe on Digital Preservation

planet code4lib - Wed, 2016-05-25 15:00
Randall Munroe succinctly illustrates a point I made at length in my report on emulation:
And here, for comparison, is one of the Internet Archive's captures of the XKCD post. Check the mouse-over text.

Open Knowledge Foundation: Introducing: MyData

planet code4lib - Wed, 2016-05-25 14:01

this post was written by the OK Finland team

What is MyData?

MyData is both an alternative vision and guiding technical principles for how we, as individuals, can have more control over the data trails we leave behind us in our everyday actions.

The core idea is that we, you and I, should have an easy way to see where data about us goes, specify who can use it, and alter these decisions over time. To do this, we are developing a standardized, open, and mediated approach to personal data management by creating “MyData operators.”

Standardised operator model

A MyData operator account would act like an email account for your different data streams. Like an email, different parties can host an operator account, with different sets of functionalities. For example, some MyData operators could also provide personal data storage solutions, others could perform data analytics or work as identity provider. The one requirement for a MyData operator is that it lets individual receive and send data streams according to one interoperable set of standards.

What “MyData” can do?

“MyData” model does a few things that the current data ecosystem does not.

It will let you to re-use your data with a third party – For example, you could take data collected about your purchasing habits from a loyalty card of your favourite grocery store and re-use it in a financing application to see how you are spending your money on groceries.

It will let you see and change how you consent to your data use Currently,  different service providers and applications use complicated terms of service where most users just check ‘yes’ or ‘no’ once , without being entirely sure what they agree to.

It will let you change services – With MyData you will be able to take your data from one operator to another if you decide to change services.

Make it happen, make it right

MyData2016 conference will be held in Aug 31st- Sep 2nd in Helsinki Hall of Culture.

Right now, the technical solutions for managing your data according to MyData approach exist. There are many initiatives, emerging out of both the public and private sectors around the world, paving the way for human-centered personal data management. We believe strongly in the need to collaborate with other initiatives to develop an infrastructure in a way that works with all the complicated systems at work in the current data landscape. Buy your tickets for early bird discount before May 31st.

Follow MyData on social media for updates:

Twitter https://twitter.com/mydata2016 Facebook https://www.facebook.com/mydata2016/

William Denton: CC-BY

planet code4lib - Wed, 2016-05-25 02:16

I’ve changed the license on my content to CC-BY: Creative Commons Attribution 4.0.

Chris Beer: Autoscaling AWS Elastic Beanstalk worker tier based on SQS queue length

planet code4lib - Wed, 2016-05-25 00:00
We are deploying a Rails application (for the [Hydra-in-a-Box](https://github.com/projecthydra-labs/hybox) project) to [AWS Elastic Beanstalk](https://aws.amazon.com/elasticbeanstalk/). Elastic Beanstalk offers us easy deployment, monitoring, and simple auto-scaling with a built-in dashboard and management interface. Our application uses several potentially long-running background jobs to characterize, checksum, and create derivates for uploaded content. Since we're deploying this application within AWS, we're also taking advantage of the [Simple Queue Service](https://aws.amazon.com/sqs/) (SQS), using the [`active-elastic-job`](https://github.com/tawan/active-elastic-job) gem to queue and run `ActiveJob` tasks. Elastic Beanstalk provides settings for "Web server" and "Worker" tiers. Web servers are provisioned behind a load balancer and handle end-user requests, while Workers automatically handle background tasks (via SQS + active-elastic-job). Elastic Beanstalk provides basic autoscaling based on a variety of metrics collected from the underlying instances (CPU, Network, I/O, etc), although, while sufficient for our "Web server" tier, we'd like to scale our "Worker" tier based on the number of tasks waiting to be run. Currently, though, the ability to auto-scale the worker tier based on the underlying queue depth isn't enable through the Elastic Beanstak interface. However, as Beanstalk merely manages and aggregates other AWS resources, we have access to the underlying resources, including the autoscaling group for our environment. We should be able to attach a custom auto-scaling policy to that auto scaling group to scale based on additional alarms. For example, let's we want to add additional worker nodes if there are more than 10 tasks for more than 5 minutes (and, to save money and resources, also remove worker nodes when there are no tasks available). To create the new policy, we'll need to: - find the appropriate auto-scaling group by finding the Auto-scaling group with the `elasticbeanstalk:environment-id` that matches the worker tier environment id; - find the appropriate SQS queue for the worker tier; - add auto-scaling policies that add (and remove) instances to the autoscaling group; - create a new CloudWatch alarm that measures the SQS queue exceeds our configured depth (5) that triggers the auto-scaling policy to add additional worker instances whenever the alarm is triggered; - and, conversely, create a new CloudWatch alarm that measures the SQS queue hits 09 that trigger the auto-scaling action to removes worker instances whenever the alarm is triggered. [img1] [img2] and, similarly for scaling back down. Even though there are several manual steps, they aren't too difficult (other than discovering the various resources we're trying to orchestrate), and using Elastic Beanstalk is still valuable for the rest of its functionality. But, we're in the cloud, and really want to automate everything. With a little CloudFormation trickery, we can even automate creating the worker tier with the appropriate autoscaling policies. First, knowing that the CloudFormation API allows us to pass in an existing SQS queue for the worker tier, let's create an explicit SQS queue resource for the workers: ```json "DefaultQueue" : { "Type" : "AWS::SQS::Queue", } ``` And wire it up to the Beanstalk application by setting the `aws:elasticbeanstalk:sqsd:WorkerQueueURL` (not shown: sending the worker queue to the web server tier): ```json "WorkersConfigurationTemplate" : { "Type" : "AWS::ElasticBeanstalk::ConfigurationTemplate", "Properties" : { "ApplicationName" : { "Ref" : "AWS::StackName" }, "OptionSettings" : [ ..., { "Namespace": "aws:elasticbeanstalk:sqsd", "OptionName": "WorkerQueueURL", "Value": { "Ref" : "DefaultQueue"} } } } }, "WorkerEnvironment": { "Type": "AWS::ElasticBeanstalk::Environment", "Properties": { "ApplicationName": { "Ref" : "AWS::StackName" }, "Description": "Worker Environment", "EnvironmentName": { "Fn::Join": ["-", [{ "Ref" : "AWS::StackName"}, "workers"]] }, "TemplateName": { "Ref": "WorkersConfigurationTemplate" }, "Tier": { "Name": "Worker", "Type": "SQS/HTTP" }, "SolutionStackName" : "64bit Amazon Linux 2016.03 v2.1.2 running Ruby 2.3 (Puma)" ... } } ``` Using our queue we can describe one of the `CloudWatch::Alarm` resources and start describing a scaling policy: ```json "ScaleOutAlarm" : { "Type": "AWS::CloudWatch::Alarm", "Properties": { "MetricName": "ApproximateNumberOfMessagesVisible", "Namespace": "AWS/SQS", "Statistic": "Average", "Period": "60", "Threshold": "10", "ComparisonOperator": "GreaterThanOrEqualToThreshold", "Dimensions": [ { "Name": "QueueName", "Value": { "Fn::GetAtt" : ["DefaultQueue", "QueueName"] } } ], "EvaluationPeriods": "5", "AlarmActions": [{ "Ref" : "ScaleOutPolicy" }] } }, "ScaleOutPolicy" : { "Type": "AWS::AutoScaling::ScalingPolicy", "Properties": { "AdjustmentType": "ChangeInCapacity", "AutoScalingGroupName": ????, "ScalingAdjustment": "1", "Cooldown": "60" } }, ``` However, to connect the policy to the auto-scaling group, we need to know the name for the autoscaling group. Unfortunately, the autoscaling group is abstracted behind the Beanstalk environment. To gain access to it, we'll need to create a custom resource backed by a Lambda function to extract the information from the AWS APIs: ```json "BeanstalkStack": { "Type": "Custom::BeanstalkStack", "Properties": { "ServiceToken": { "Fn::GetAtt" : ["BeanstalkStackOutputs", "Arn"] }, "EnvironmentName": { "Ref": "WorkerEnvironment" } } }, "BeanstalkStackOutputs": { "Type": "AWS::Lambda::Function", "Properties": { "Code": { "ZipFile": { "Fn::Join": ["\n", [ "var response = require('cfn-response');", "exports.handler = function(event, context) {", " console.log('REQUEST RECEIVED:\\n', JSON.stringify(event));", " if (event.RequestType == 'Delete') {", " response.send(event, context, response.SUCCESS);", " return;", " }", " var environmentName = event.ResourceProperties.EnvironmentName;", " var responseData = {};", " if (environmentName) {", " var aws = require('aws-sdk');", " var eb = new aws.ElasticBeanstalk();", " eb.describeEnvironmentResources({EnvironmentName: environmentName}, function(err, data) {", " if (err) {", " responseData = { Error: 'describeEnvironmentResources call failed' };", " console.log(responseData.Error + ':\\n', err);", " response.send(event, context, resource.FAILED, responseData);", " } else {", " responseData = { AutoScalingGroupName: data.EnvironmentResources.AutoScalingGroups[0].Name };", " response.send(event, context, response.SUCCESS, responseData);", " }", " });", " } else {", " responseData = {Error: 'Environment name not specified'};", " console.log(responseData.Error);", " response.send(event, context, response.FAILED, responseData);", " }", "};" ]]} }, "Handler": "index.handler", "Runtime": "nodejs", "Timeout": "10", "Role": { "Fn::GetAtt" : ["LambdaExecutionRole", "Arn"] } } } ``` With the custom resource, we can finally get access the autoscaling group name and complete the scaling policy: ```json "ScaleOutPolicy" : { "Type": "AWS::AutoScaling::ScalingPolicy", "Properties": { "AdjustmentType": "ChangeInCapacity", "AutoScalingGroupName": { "Fn::GetAtt": [ "BeanstalkStack", "AutoScalingGroupName" ] }, "ScalingAdjustment": "1", "Cooldown": "60" } }, ``` The complete worker tier is part of our CloudFormation stack: https://github.com/hybox/aws/blob/master/templates/worker.json

DuraSpace News: Luso-Brazilian Digital Library Launched

planet code4lib - Wed, 2016-05-25 00:00

From Tiago Ferreira, Neki IT

 

District Dispatch: Last week in appropriations

planet code4lib - Tue, 2016-05-24 19:41

The Appropriations process in Congress is a year-long cycle with fits and starts, and includes plenty of lobbying, grassroots appeals, lobby days, speeches, hearings and markups, and even creative promotions designed to draw attention to the importance of one program or another. ALA members and the Office of Government Relations continue to play a significant role in this process. Recently, for example, we’ve worked to support funding for major library programs like LSTA and IAL, as well as to address policy issues that arise in Congressional deliberations. Your grassroots voice helps amplify my message in meetings with Congressional staff.

The House and Senate Appropriations Committees have begun to move their FY2017 funding bills through the subcommittee and full committee process as the various spending measures to the Floor and then to the President’s desk. Last week was a big week for appropriations on Capitol Hill and I was back-and-forth to various Congressional hearings, meetings, and events. Here are a few of last week’s highlights:

Source: csp_iqoncept

Tuesday – There’s another word for that    

The full House Appropriations Committee convened (in a type of meeting called a “markup”) to discuss, amend and vote on two spending bills: those for the Department of Defense and the Legislative Branch. A recent proposed change to Library of Congress (LC) cataloging terminology having nothing to do with funding at all was the focus of action on the Legislative Branch bill. Earlier in April, the Subcommittee Chair Tom Graves (R-GA14) successfully included instructions to the Library in a report accompanying the bill that would prohibit the LC from implementing changes in modernizing the outdated, and derogatory, terms “illegal aliens” and “aliens.”

An amendment was offered during Tuesday’s full Committee meeting by Congresswoman Debbie Wasserman Schultz (D-FL23) that would have removed this language from the report (a position strongly and actively supported by ALA and highlighted during National Library Legislative Day). The amendment generated extensive discussion, including vague references by one Republican to “outside groups” (presumably ALA) that were attempting to influence the process (influence the process? in Washington? shocking!).

The final roll call vote turned out to be a nail biter as ultimately four Committee Republicans broke with the Subcommittee chairman to support the amendment. Many in the room, myself included, thought the amendment might have passed and an audible gasp from the audience was heard upon announcement that it had failed by just one vote (24 – 25). Unfortunately, two Committee Democrats whose votes could have carried the amendment were not able to attend. The Legislative Branch spending bill now heads to the Floor and another possible attempt to pass the Wasserman Schultz amendment …. or potentially to keep the bill from coming up at all.

Wednesday – Can you hear me now? Good.

In Congress, sometimes the action occurs outside the Committee rooms. It’s not uncommon, therefore, for advocates and their congressional supporters to mount a public event to ratchet up the pressure on the House and Senate. ALA has been an active partner in a coalition seeking full funding for Title IV, Part A of the Every Student Succeeds Act. On Wednesday, I participated in one such creative endeavor: a rally on the lawn of the US Capitol complete with high school choir, comments from supportive Members of Congress, and “testimonials” from individuals benefited by Title IV funding.

This program gives school districts the flexibility to invest in student health and safety, academic enrichment, and education technology programs. With intimate knowledge of the entire school campus, libraries are uniquely positioned to assist in determining local needs for block grants, and for identifying needs within departments, grade levels, and divisions within a school or district. Congress authorized Title IV in the ESSA at $1.65 billion for FY17, however the President’s budget requests only about one third of that necessary level.

The cloudy weather threatened — but happily did not deliver — rain and the event came off successfully. Did Congress hear us? Well, our permit allowed the use of amplified speakers, so I’d say definitely yes!

Thursday – A quick vote before lunch

On Thursday, just two days after House Appropriators’ nail biter of a vote over Legislative Branch Appropriations, the full Senate Appropriations Committee took up their version of that spending bill in addition to Agriculture Appropriations. For a Washington wonk, a Senate Appropriations Committee hearing is a relatively epic thing to behold. Each Senator enters the room trailed by two to four staffers carrying reams of paper. Throughout the hearing, staffers busily whisper amongst each other, and into the ears of their Senators (late breaking news that will net an extra $10 million for some pet project, perhaps?)

While a repeat of Tuesday’s House fracas wasn’t at all anticipated (ALA had worked ahead of time to blunt any effort to adopt the House’s controversial Library of Congress provision in the Senate), I did wonder whether there had been a last minute script change when the Chairman took up the Agriculture bill first and out of order based on the printed agenda for the meeting. After listening to numerous amendments addressing such important issues as Alaska salmon, horse slaughter for human consumption (yuck?), and medicine measurement, I was definitely ready for the Legislative Branch Appropriations bill to make its appearance. As I intently scanned the room for any telltale signs of soon-to-be-volcanic controversy, the Committee Chairman brought up the bill, quickly determined that no Senator had any amendment to offer, said a few congratulatory words, successfully called for a voice vote and gaveled the bill closed.

Elapsed time, about 3 minutes! I was unexpectedly free for lunch…and, for some reason, craving Alaska salmon.

Epilogue – The train keeps a rollin’

This week’s activity by the Appropriations Committees of both chambers demonstrates that the leaders of Congress’ Republican majority are deliberately moving the Appropriations process forward. Indeed, in the House and Senate they have promised to bring all twelve funding bills to the floor of both chambers on time…something not done since 1994. Sadly, however, staffers on both sides of the aisle tell me thatthey expect the process to stall at some point. If that happens, once again Congress will need to pass one or more “Continuing Resolutions” (or CRs) after October 1 to keep the government operating. One thing is certain; there is lots of work to be done this summer to defend library funding and policies.

The post Last week in appropriations appeared first on District Dispatch.

District Dispatch: Judiciary Committee Senators face historic “E-Privacy” protection vote

planet code4lib - Tue, 2016-05-24 17:55

More good news could be in the offing for reform of ECPA, the Electronic Communications Privacy Act. Senate Judiciary Committee Chairman Charles Grassley (R-IA) recently (and pleasantly) surprised reform proponents by calendaring a Committee vote on the issue now likely to take place this coming Thursday morning, May 26th.  The Committee, it is hoped, will take up and pass H.R. 699, the Email Privacy Act, which was unanimously approved by the House of Representatives, as reported in District Dispatch, barely three weeks ago.  (A similar but not identical Senate bill co-authored by Judiciary Committee Ranking Member Patrick Leahy [D-VT], S. 356, also could be called up and acted upon.)

Source: www.searchquarry.com

Either bill finally would update ECPA in the way most glaringly needed: to virtually always require the government to get a standard, judicially-approved search warrant based upon probable cause to acquire the full content of an individual’s emails, texts, tweets, cloud-based files or other electronic communications. No matter which is considered, however, there remains a significant risk that, on Thursday, the bill’s opponents will try to dramatically weaken that core reform by exempting certain agencies (like the IRS and SEC) from the new warrant requirement, and/or by providing dangerous exceptions to law enforcement and security agencies acting in overbroadly defined “emergency” circumstances.

Earlier today, ALA joined a new joint letter signed by nearly 65 of its public and sector coalition partners calling on Senators Grassley and Leahy to take up and pass H.R. 699 as approved by the House: in other words “without any [such] amendments that would weaken the protections afforded by the bill” ultimately approved by 419 of the 435 House Members.

Now is the time to tell the Members of the Senate Judiciary Committee that almost 30 years has been much too long to wait for real ECPA reform. Please go to ALA’s Legislative Action Center to email to your Senate Judiciary Senator now!

The post Judiciary Committee Senators face historic “E-Privacy” protection vote appeared first on District Dispatch.

SearchHub: Welcome Jeff Depa!

planet code4lib - Tue, 2016-05-24 17:30

We’re happy to announce another new addition to the Lucidworks team! Please welcome Jeff Depa, our new Senior Vice President of Worldwide Field Operations in May 2015 (full press release: Lucidworks Appoints Search Veterans to Senior Team).

Jeff will lead the company’s day-to-day field operations, including its rapidly growing sales, alliances and channels, systems engineering and professional services business. Prior to Lucidworks, Jeff has over 17 years in leadership positions across sales, consulting, and systems engineering with companies such as Oracle, Sun, and most recently at DataStax.

Jeff earned a B.S. in Biomedical Engineering from Case Western Reserve University and also holds a Masters in Management. Aside for a passion to enable clients to unleash the power of their data, Jeff is an avid pilot and enjoys spending time with his family in Austin, TX.

We sat down with Jeff to learn more about his passion for search:

What attracted you to Lucidworks?

Lucidworks is at the forefront of unleashing the value hidden in the massive amount of data companies have collected across disparate systems. They have done a phenomenal job in driving the adoption of Apaceh Solr, but more importantly, building a platform in Fusion that allows enterprises from high volume ecommerce shops to healthcare to easily adopt and deploy a search solution that goes beyond the industry standard, and really focuses on providing the right information at the right time with unique relevancy and machine learning technologies.

What will you be working on at Lucidworks?

I’ll be focused on building on top of a solid foundation as we continue to drive the adoption of Fusion in the market and expand our team to capture the market opportunity with our customers and partners. I’m excited to be part of this journey.

Where do you think the greatest opportunities lie for companies like Lucidworks?

In today’s economy, value is driven from creating a unique, personalized and real time experience for customers and employees. Lucidworks sits squarely in the middle of an enterprise’s disparate and rapidly evolving data sources and enables the transformation of data to information that can be used to improve the user experience. The ability to to tie that information to a high impact customer result is a huge opportunity for Lucidworks.

Welcome to the team Jeff!

The post Welcome Jeff Depa! appeared first on Lucidworks.com.

LITA: Mindful Tech, a 2 part webinar series with David Levy

planet code4lib - Tue, 2016-05-24 15:09

Mindful Tech: Establishing a Healthier and More Effective Relationship with Our Digital Devices and Apps
Tuesdays, June 7 and 14, 2016, 1:00 – 2:30 pm Central Time
David Levy, Information School, University of Washington

Register Now for this 2 part webinar

“There is a long history of people worrying and complaining about new technologies and also putting them up on a pedestal as the answer. When the telegraph and telephone came along you had people arguing both sides—that’s not new. And you had people worrying about the explosion of books after the rise of the printing press.

What is different is for the last 100-plus years the industrialization of Western society has been devoted to a more, faster, better philosophy that has accelerated our entire economic system and squeezed out anything that is not essential.

As a society, I think we’re beginning to recognize this imbalance, and we’re in a position to ask questions like “How do we live a more balanced life in the fast world? How do we achieve adequate forms of slow practice?”

David Levy – See more at: http://tricycle.org/trikedaily/mindful-tech/#sthash.9iABezUN.dpuf

Don’t miss the opportunity to participate in this well known program by David Levy, based on his recent widely reviewed and well regarded book “Mindful Tech”. The popular interactive program will include exercises and participation now re-packaged into a 2 part webinar format. Both parts will be fully recorded for participants to return to, or to work with varying schedules.

Register Now for the 2 part Mindful Tech webinar series

This two part, 90 minutes each, webinars series will introduce participants to some of the central insights of the work Levy has been doing over the past decade and more. By learning to pay attention to their immediate experience (what’s going on in their minds and bodies) while they’re online, people are able to see more clearly what’s working well for them and what isn’t, and based on these observations to develop personal guidelines that allow them to operate more effectively and healthfully. Levy will demonstrate this work by giving participants exercises they can do, both during the online program and between the sessions.

Presenter

David Levy

David M. Levy is a professor at the Information School of the University of Washington. For more than a decade, he has been exploring, via research and teaching, how we can establish a more balanced relationship with our digital devices and apps. He has given many lectures and workshops on this topic, and in January 2016 published a book on the subject, “Mindful Tech: How to Bring Balance to Our Digital Lives” (Yale). Levy is also the author of “Scrolling Forward: Making Sense of Documents in the Digital Age” (rev. ed. 2016).

Additional information is available on his website at: http://dmlevy.ischool.uw.edu/

Then register for the webinar and get Full details

Can’t make the dates but still want to join in? Registered participants will have access to both parts of the recorded webinars.

Cost:

  • LITA Member: $68
  • Non-Member: $155
  • Group: $300

Registration Information

Register Online page arranged by session date (login required)
OR
Mail or fax form to ALA Registration
OR
Call 1-800-545-2433 and press 5
OR
email registration@ala.org

Questions or Comments?

For all other questions or comments related to the preconference, contact LITA at (312) 280-4269 or Mark Beatty, mbeatty@ala.org.

Islandora: iCampBC - Instructors Announced!

planet code4lib - Tue, 2016-05-24 13:59

Islandora Camp is going back to Vancouver from July 18 - 20, courtesy of our wonderful hosts at the British Columbia Electronic Library Network. Camp will (as usual) consist of three days: One day of sessions taking a big-picture view of the project and where it's headed, one day of hands-on workshops for developers and front-end administrators, and one day of community presentations and deeper dives into Islandora tools and sites. The instructors for that second day have been selected and we are pleased to introduce them:

Developers

Mark Jordan has taught at two other Islandora Camps and at the Islandora Conference. He is the developer of Islandora Context, Islandora Themekey, Islandora Datastream CRUD, and the XML Solution Pack, and is one of the codevelopers of the the Move to Islandora Kit. He is also an Islandora committer and is currently serving as Chair of the Islandora Foundation Board. His day job is as Head of Library Systems at Simon Fraser University.

Rosie Le Faive started with Islandora in 2012 while creating the a trilingual digital library for the Commission for Environmental Cooperation. With experience and - dare she say - wisdom gained from creating highly customized sites, she's now interested in improving the core Islandora code so that everyone can use it. Her interests are in mapping relationships between objects, and intuitive UI design. She is the Digital Infrastructure and Discovery librarian at UPEI, and develops for Agile Humanities.  

Admins

Melissa Anez has been working with Islandora since 2012 and has been the Community and Project Manager of the Islandora Foundation since it was founded in 2013. She has been a frequent instructor in the Admin Track and developed much of the curriculum, refining it with each new Camp.

Janice Banser is the Systems Librarian at Simon Fraser University.  She has been working with Islandora, specifically the admin interface, for over a year now. She is a member of the Islandora Documentation Interest Group and has contributed to the last two Islandora releases. She has been working with Drupal for about 6 years and has been a librarian since 2005.

ZBW German National Library of Economics: Three new blog posts dealing with current ZBW developments

planet code4lib - Tue, 2016-05-24 10:36

At ZBW and within the Department of Innovative Information Systems and Publishing Technologies, we conduct applied research and software development for introducing and enhancing research infrastructures for Economics. In a series of three blogposts, we report about current developments in different fields: In a first post, we depict the integration of a central research data repository with already existing research workflows and practices. A second post is about connecting our EconBiz metadata to a framework for automatic recommendation of scientific resources in common web environments. Last but not least, a third post introduces into several smaller applications we developed in the context of our EconBiz beta section. As customary, our readers and users are invited to test our applications and give feedback, e.g. by commenting to this article or by sending us an email to labs@zbw.eu.

The EEXCESS RecommenderEconBiz Beta ServicesIntegrating a Research Data Repository with established research practices Data management   Recommender system   EconBiz beta  

Patrick Hochstenbach: Crosshatching with my fountain pen

planet code4lib - Tue, 2016-05-24 04:34
Filed under: portaits Tagged: crosshatch, fountain pen, ink, paper, portrait, sktchy, twsbi

Pages

Subscribe to code4lib aggregator