You are here

Feed aggregator

LibUX: 039 – Jean Felisme

planet code4lib - Mon, 2016-05-30 04:50

I’ve known Jean Felisme for awhile through WordCamp Miami. We see each other quite a bit at meetups and he’s a ton of fun – he’s also been pretty hardcore about evangelizing freelance. Recently he made the switch from freelance into the very special niche that is the higher-ed web, so when he was just six weeks into his new position at the School of Computing and Information Sciences at Florida International University I took the opportunity to pick his brain.

Hope you enjoy.

If you like, you can download the MP3 or subscribe to LibUX on Stitcher, iTunes, Google Play Music, or just plug our feed straight into your podcatcher of choice. Help us out and say something nice. Your sharing and positive reviews are the best marketing we could ask for.

Here’s what we talked about
  • 1:08 – WP Campus is coming up!
  • 2:45 – All about Jean
  • 4:28 – How the trend toward building-out in-house teams will impact freelance
  • 9:38 – What is the day-to-day like just six weeks in?
  • 12:03 – Student-hosted applications and content – scary
  • 13:09 – The makeup of Jean’s team
  • 17:43 – Are you playing with any web technology you haven’t before?
  • 19:37 – The tight relationship with the students
  • 20:31 – On web design curriculum
  • 28:00 – We fail to wrap up and keep talking about freelance for a few more minutes.

The post 039 – Jean Felisme appeared first on LibUX.

District Dispatch: Next CopyTalk on Government Overreach

planet code4lib - Fri, 2016-05-27 16:04

Please join us on June 2 for a free webinar on another form of copyright creep.

Please join us on June 2 for a free webinar on another form of copyright creep. This one on recent efforts to copyright state government works.  

Issues behind State governments copyrighting government works

The purpose of copyright is to provide incentives for creativity in exchange for a time limited government provided monopoly. When drafting the federal copyright law, Congress explicitly prohibited the federal government as well as employees of the federal government from having the authority to create a copyright in government created works. However, the federal law is silent on state government power to create, hold, and enforce copyrights. This has resulted in a patchwork of varying levels of state copyright laws across all fifty states.

Currently, California favors the approach where a vast majority of works created by the state and local government are by default in the public domain. An ongoing debate is happening now as to whether California should end the public domain status of most state and local government works. The state legislature is contemplating a bill (AB 2880) that would authorize copyright authority to all state agencies, local governments, and political subdivisions. In recent years entities of state government have attempted to rely on copyright as a means to suppress the dissemination of taxpayer-funded research and as a means to chill criticism but failed in the courts due to a lack of copyright authority. Ernesto Falcon, legislative counsel with the Electronic Frontier Foundation, will review the status of the legislation, the court decisions that lead to its creation, and the debate that now faces the California legislature.

Day/Time: Thursday, June 2 at 2pm Eastern/11am Pacific for our hour long free webinar.

Go to http://ala.adobeconnect.com/copytalk/ and sign in as a guest. You’re in.

This program is brought to you by OITP’s copyright education subcommittee.

The post Next CopyTalk on Government Overreach appeared first on District Dispatch.

FOSS4Lib Recent Releases: VuFind - 3.0.1

planet code4lib - Fri, 2016-05-27 15:44
Package: VuFindRelease Date: Friday, May 27, 2016

Last updated May 27, 2016. Created by Demian Katz on May 27, 2016.
Log in to edit this page.

Bug fix release.

Eric Lease Morgan: VIAF Finder

planet code4lib - Fri, 2016-05-27 13:34

This posting describes VIAF Finder. In short, given the values from MARC fields 1xx$a, VIAF Finder will try to find and record a VIAF identifier. [0] This identifier, in turn, can be used to facilitate linked data services against authority and bibliographic data.

Quick start

Here is the way to quickly get started:

  1. download and uncompress the distribution to your Unix-ish (Linux or Macintosh) computer [1]
  2. put a file of MARC records named authority.mrc in the ./etc directory, and the file name is VERY important
  3. from the root of the distribution, run ./bin/build.sh

VIAF Finder will then commence to:

  1. create a “database” from the MARC records, and save the result in ./etc/authority.db
  2. use the VIAF API (specifically the AutoSuggest interface) to identify VAIF numbers for each record in your database, and if numbers are identified, then the database will be updated accordingly [3]
  3. repeat Step #2 but through the use of the SRU interface
  4. repeat Step #3 but limiting searches to authority records from the Vatican
  5. repeat Step #3 but limiting searches to the authority named ICCU
  6. done

Once done the reader is expected to programmatically loop through ./etc/authority.db to update the 024 fields of their MARC authority data.

Manifest

Here is a listing of the VIAF Finder distribution:

  • 00-readme.txt – this file
  • bin/build.sh – “One script to rule them all”
  • bin/initialize.pl – reads MARC records and creates a simple “database”
  • bin/make-dist.sh – used to create a distribution of this system
  • bin/search-simple.pl – rudimentary use of the SRU interface to query VIAF
  • bin/search-suggest.pl – rudimentary use of the AutoSuggest interface to query VIAF
  • bin/subfield0to240.pl – sort of demonstrates how to update MARC records with 024 fields
  • bin/truncate.pl – extracts the first n number of MARC records from a set of MARC records, and useful for creating smaller, sample-sized datasets
  • etc – the place where the reader is expected to save their MARC files, and where the database will (eventually) reside
  • lib/subroutines.pl – a tiny set of… subroutines used to read and write against the database
Usage

If the reader hasn’t figured it out already, in order to use VIAF Finder, the Unix-ish computer needs to have Perl and various Perl modules — most notably, MARC::Batch — installed.

If the reader puts a file named authority.mrc in the ./etc directory, and then runs ./bin/build.sh, then the system ought to run as expected. A set of 100,000 records over a wireless network connection will finish processing in a matter of many hours, if not the better part of a day. Speed will be increased over a wired network, obviously.

But in reality, most people will not want to run the system out of the box. Instead, each of the individual tools will need to be run individually. Here’s how:

  1. save a file of MARC (authority) records anywhere on your file system
  2. not recommended, but optionally edit the value of DB in bin/initialize.pl
  3. run ./bin/initialize.pl feeding it the name of your MARC file, as per Step #1
  4. if you edited the value of DB (Step #2), then edit the value of DB in bin/search-suggest.pl, and then run ./bin/search-suggest.pl
  5. if you want to possibly find more VIAF identifiers, then repeat Step #4 but with ./bin/search-simple.pl and with the “simple” command-line option
  6. optionally repeat Step #5, but this time use the “named” command-line option, and the possible named values are documented as a part of the VAIF API (i.e., “bav” denotes the Vatican
  7. optionally repeat Step #6, but with other “named” values
  8. optionally repeat Step #7 until you get tired
  9. once you get this far, the reader may want to edit bin/build.sh, specifically configuring the value of MARC, and running the whole thing again — “one script to rule them all”
  10. done

A word of caution is now in order. VIAF Finder reads & writes to its local database. To do so it slurps up the whole thing into RAM, updates things as processing continues, and periodically dumps the whole thing just in case things go awry. Consequently, if you want to terminate the program prematurely, try to do so a few steps after the value of “count” has reached the maximum (500 by default). A few times I have prematurely quit the application at the wrong time and blew my whole database away. This is the cost of having a “simple” database implementation.

To do

Alas, search-simple.pl contains a memory leak. Search-simple.pl makes use of the SRU interface to VIAF, and my SRU queries return XML results. Search-simple.pl then uses the venerable XML::XPath Perl module to read the results. Well, after a few hundred queries the totality of my computer’s RAM is taken up, and the script fails. One work-around would be to request the SRU interface to return a different data structure. Another solution is to figure out how to destroy the XML::XPath object. Incidentally, because of this memory leak, the integer fed to simple-search.pl was implemented allowing the reader to restart the process at a different point dataset. Hacky.

Database

The use of the database is key to the implementation of this system, and the database is really a simple tab-delimited table with the following columns:

  1. id (MARC 001)
  2. tag (MARC field name)
  3. _1xx (MARC 1xx)
  4. a (MARC 1xx$a)
  5. b (MARC 1xx$b and usually empty)
  6. c (MARC 1xx$c and usually empty)
  7. d (MARC 1xx$d and usually empty)
  8. l (MARC 1xx$l and usually empty)
  9. n (MARC 1xx$n and usually empty)
  10. p (MARC 1xx$p and usually empty)
  11. t (MARC 1xx$t and usually empty)
  12. x (MARC 1xx$x and usually empty)
  13. suggestions (a possible sublist of names, Levenshtein scores, and VIAF identifiers)
  14. viafid (selected VIAF identifier)
  15. name (authorized name from the VIAF record)

Most of the fields will be empty, especially fields b through x. The intention is/was to use these fields to enhance or limit SRU queries. Field #13 (suggestions) is for future, possible use. Field #14 is key, literally. Field #15 is a possible replacement for MARC 1xx$a. Field #15 can also be used as a sort of sanity check against the search results. “Did VIAF Finder really identify the correct record?”

Consider pouring the database into your favorite text editor, spreadsheet, database, or statistical analysis application for further investigation. For example, write a report against the database allowing the reader to see the details of the local authority record as well as the authority data in VIAF. Alternatively, open the database in OpenRefine in order to count & tabulate variations of data it contains. [4] Your eyes will widened, I assure you.

Commentary

First, this system was written during my “artist’s education adventure” which included a three-month stint in Rome. More specifically, this system was written for the good folks at Pontificia Università della Santa Croce. “Thank you, Stefano Bargioni, for the opportunity, and we did some very good collaborative work.”

Second, I first wrote search-simple.pl (SRU interface) and I was able to find VIAF identifiers for about 20% of my given authority records. I then enhanced search-simple.pl to include limitations to specific authority sets. I then wrote search-suggest.pl (AutoSuggest interface), and not only was the result many times faster, but the result was just as good, if not better, than the previous result. This felt like two steps forward and one step back. Consequently, the reader may not ever need nor want to run search-simple.pl.

Third, while the AutoSuggest interface was much faster, I was not able to determine how suggestions were made. This makes the AutoSuggest interface seem a bit like a “black box”. One of my next steps, during the copious spare time I still have here in Rome, is to investigate how to make my scripts smarter. Specifically, I hope to exploit the use of the Levenshtein distance algorithm. [5]

Finally, I would not have been able to do this work without the “shoulders of giants”. Specifically, Stefano and I took long & hard looks at the code of people who have done similar things. For example, the source code of Jeff Chiu’s OpenRefine Reconciliation service demonstrates how to use the Levenshtein distance algorithm. [6] And we found Jakob Voß’s viaflookup.pl useful for pointing out AutoSuggest as well as elegant ways of submitting URL’s to remote HTTP servers. [7] “Thanks, guys!”

Fun with MARC-based authority data!

Links

[0] VIAF – http://viaf.org

[1] VIAF Finder distribution – http://infomotions.com/sandbox/pusc/etc/viaf-finder.tar.gz

[2] VIAF API – http://www.oclc.org/developer/develop/web-services/viaf.en.html

[4] OpenRefine – http://openrefine.org

[5] Levenshtein distance – https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance

[6] Chiu’s reconciliation service – https://github.com/codeforkjeff/refine_viaf

[7] Voß’s viaflookup.pl – https://gist.github.com/nichtich/832052/3274497bfc4ae6612d0c49671ae636960aaa40d2

District Dispatch: Full STEAM ahead: Creating Tomorrowland today

planet code4lib - Fri, 2016-05-27 01:50

Sascha Paladino, Series Creator and Executive Producer

STEAM programming (which includes science, technology, engineering, arts and mathematics) is fast becoming a core service in libraries across the country. From intermixing STEAM activities into family story hour to teen maker spaces and coding camps, public and school libraries provide engaging opportunities for kids of all ages to develop a passion for science, technology, engineering, the arts, and math. Curious to learn who else is experimenting with STEAM programs for kids? The article below comes from Sascha Paladino, who is the creator and executive producer of “Miles from Tomorrowland”, a Disney Junior animated series that weaves science, technology, engineering and mathematics concepts geared towards kids ages 2-7 into its storylines. Paladino will delve deeper into the topic at the 2016 American Library Association Annual Conference joined by others instrumental in getting Miles off the ground and kids into STEAM.

Six years ago, I came up with an idea for an animated series about a family on an adventure in outer space – from the kid’s perspective. I wanted to explore the universe through the eyes of a seven-year-old. I remembered how I saw outer space when I was young – as the greatest imaginable place for adventure – and I wanted to capture that feeling.

I pitched the idea to Disney, who liked it, and we began developing what would become MILES FROM TOMORROWLAND. Through the ups and downs that are part of any TV show’s journey to the screen, I tried to stay focused on my goals: tell entertaining stories, encourage kids to dream big and inspire viewers to explore STEAM (Science, Technology, Engineering, Arts and Math).

Luckily, with the support of Disney, I was able to surround the MILES creative team with a group of genius scientists. Dr. Randii Wessen of NASA’s Jet Propulsion Laboratory came onboard as an advisor, as did NASA astronaut Dr. Yvonne Cagle, and Space Tourism Society founder John Spencer. They shared their deep knowledge and experience with us, and gave our show some serious scientific street cred.

Along the way, I got a crash course in outer space. I was able to immerse myself in the science of our universe, and learned all about exoplanets, tardigrades, and electromagnetic pulses, for starters. Then, I could sit down with my writing and design teams and figure out ways to work these science facts into engaging stories to share with our audience.

I realized that I was making the show I wished I had as a kid: An exciting adventure that incorporates real science in a way that appeals to viewers whether or not they gravitate towards science. I always loved science, but my career path took me into the arts. In making this show, I learned that the arts can be a route into the sciences – which is why I’m really glad that STEM has expanded to STEAM, to include the “A” for “arts.”

My hope is that by exposing all sorts of kids to concepts such as black holes, coronal mass ejections, and spaghettification (best word ever), they’re inspired to explore further and deeper once the television is turned off.

When we were researching the series, we met with scientists, techies, and space professionals from amazing places such as NASA, SpaceX, Virgin Galactic, and Google. Over and over, we heard that they were inspired to go into their field because of science-fiction TV shows and movies that they saw as kids. Real-life innovations such as the first flip-phone were directly influenced by fantastical creations imagined on STAR TREK. Science fiction becomes science fact. It’s the circle of (sci-fi) life.

Now that MILES FROM TOMORROWLAND is on the air, I’ve been hearing from parents and kids that our vision of the future is giving the scientists of tomorrow some ideas. Nothing could make me happier. We’ve seen kids make their own creative versions of Miles’ tech and gear, such as cardboard spaceships and gadgets made from dried macaroni. As NASA’s Dr. Cagle told me recently, one of our goals should be to encourage kids to “engineer their dreams.” That sums it up perfectly.

I even heard from a kid who loves Miles’ Blastboard – his flying hoverboard – so much that he decided to sit down and design a real one. Whether it works or not is beside the point (although I’m quite sure that it does). What matters to me is that MILES FROM TOMORROWLAND set off a spark that, I hope, will continue to grow, multiply, and eventually inspire a future generation of scientists and innovators.

But mostly, I can’t wait to ride that Blastboard.

Join the “Coding in Tomorrowland: Inspiring Girls in STEM” session at the 2016 American Library Association Annual Conference in Orlando, which takes place on Sunday, June 26, 2016, from 1:00-2:30 p.m. (in the Orange County Convention Center, in room OCCC W303). Session speakers include “Miles from Tomorrowland” creator and executive producer, Sascha Paladino; series consultant and NASA astronaut, Dr. Yvonne Cagle; and Disney Junior executive, Diane Ikemiyashiro. This session will be moderated by Roger Rosen, who is the chief executive officer of Rosen Publishing and a senior advisor for national policy advocacy to ALA’s Office for Information Technology Policy.

Want to attend other policy sessions at the 2016 ALA Annual Conference? View all ALA Washington Office sessions

The post Full STEAM ahead: Creating Tomorrowland today appeared first on District Dispatch.

David Rosenthal: Abby Smith Rumsey's "When We Are No More"

planet code4lib - Thu, 2016-05-26 15:00
Back in March I attended the launch of Abby Smith Rumsey's book When We Are No More. I finally found time to read it from cover to cover, and can recommend it. Below the fold are some notes.

There are four main areas where I have comments on Rumsey's text. On page 144, in the midst of a paragraph about the risks to our personal digital information she writes:
The documents on our hard disks will be indecipherable in a decade.The word "indecipherable" implies not data loss but format obsolescence. As I've written many times, Jeff Rothenberg was correct to identify format obsolescence as a major problem for documents published before the advent of the Web in the mid-90s. But the Web caused documents to evolve from being the private property of a particular application to being published. On the Web, published documents don't know what application will render them, and are thus largely immune to format obsolescence.

It is true that we're currently facing a future in which most current browsers will not render preserved Flash, not because they don't know how to but because it isn't safe to do so. But oldweb.today shows that the technological fix for this problem is already in place. Format obsolescence, were it to occur, would be hard for individuals to mitigate. Especially since it isn't likely to happen, it isn't helpful to lump it in with threats they can do something about by, for example, keeping local copies of their cloud data.

On page 148 Rumsey discusses the problem of the scale of the preservation effort needed and the resulting cost:
We need to keep as much as we can as cheaply as possible. ... we will have to invent ways to essentially freeze-dry data, to store data at some inexpensive low level of curation, and at some unknown time in the future be able to restore it. ... Until such a long-term strategy is worked out, preservation experts focus on keeping digital files readable by migrating data to new hardware and software systems periodically. Even though this looks like a short-term strategy, it has been working well  ... for three decades and more.Yes, it has been working well and will continue to do so provided the low level of curation manages find enough money to keep the bits safe. Emulation will ensure that if the bits survive we will be able to render them, and it does not impose significant curation costs along the way.

The aggressive (and therefore necessarily lossy) compression Rumsey enviasges would reduce storage costs, and I've been warning for some time that Storage Will Be Much Less Free Than It Used To Be. But it is important not to lose sight of the fact that ingest, not storage, is the major cost in digital preservation. We can't keep it all; deciding what to keep and putting it some place safe is the most expensive part of the process.

On page 163 Rumsey switches to ignoring the cost and assuming that, magically, storage supply will expand to meet the demand:
Our appetite for more and more data is like a child's appetite for chocolate milk: ... So rather than less, we are certain to collect more. The more we create, paradoxically, the less we can afford to lose.Alas, we can't store everything we create now, and the situation isn't going to get better.

On page 166 Rumsey writes:
Other than the fact that preservation yields long-term rewards, and most technology funding goes to creating applications that yield short-term rewards, it is hard to see why there is so little investment, either public or private, in preserving data. The culprit is our myopic focus on short-term rewards, abetted by financial incentives that reward short-term thinking. Financial incentives are matters of public policy, and can be changed to encourage more investment in digital infrastructure.I completely agree that the culprit is short-term thinking, but the idea that "incentives ... can be changed" is highly optimistic. The work of, among others, Andrew Haldane at the Bank of England shows that short-termism is a fundamental problem in our global society. Inadequate investment in infrastructure, both physical and digital, is just a symptom, and is far less of a problem than society's inability to curb carbon emissions.

Finally, some nits to pick. On page 7 Rumsey writes of the Square Kilometer Array:
up to one exabyte (1018 bytes) of data per dayI've already had to debunk another "exabyte a day" claim. It may be true that the SKA generates an exabyte a day but it could not store that much data. An exabyte a day is most of the world's production of storage. Like the Large Hadron Collider, which throws away all but one byte in a million before it is stored the SKA actually stores only(!) a petabyte a day (according to Ian Emsley, who is responsible for planning its storage). A book about preserving information for the long term should be careful to maintain the distinction between the amounts of data generated, and stored. Only the stored data is relevant.

On page 46 Rumsey writes:
our recording medium of choice, the silicon chip, is vulnerable to decay, accidental deletion and overwritingOur recording medium of choice is not, and in the foreseeable future will not be, the silicon chip. It will be the hard disk, which is of course equally vulnerable, as any read-write digital medium would be. Write-once media would be somewhat less vulnerable, and they definitely have a role to play, but they don't change the argument.

FOSS4Lib Recent Releases: Evergreen - 2.10.4

planet code4lib - Thu, 2016-05-26 00:37

Last updated May 25, 2016. Created by gmcharlt on May 25, 2016.
Log in to edit this page.

Package: EvergreenRelease Date: Wednesday, May 25, 2016

William Denton: CC-BY

planet code4lib - Thu, 2016-05-26 00:08

I’ve changed the license on my content to CC-BY: Creative Commons Attribution 4.0.

UPDATE 25 May 2016: The feed metadata is now updated too. “We copy documents based on metadata.”

Evergreen ILS: Evergreen 2.10.4 released

planet code4lib - Wed, 2016-05-25 21:55

We are pleased to announce the release of Evergreen 2.10.4, a bug fix release.

Evergreen 2.10.4 fixes the following issues:

  • Fixes the responsive view of the My Account Items Out screen so that Title and
    Author are now in separate columns.
  • Fixes an incorrect link for the MVF field definition and adds a new link to
    BRE in fm_IDL.xml.
  • Fixes a bug where the MARC stream authority cleanup deleted a bib
    record instead of an authority record from the authority queue.
  • Fixes a bug where Action Triggers could select an inactive event
    definition when running.
  • Eliminates the output of a null byte after a spool file is processed
    in MARC steam importer.
  • Fixes an issue where previously-checked-out items did not display in
    metarecord searches when the Tag Circulated Items Library Setting is
    enabled.
  • Fixes an issue in the 0951 upgrade script where the script was not
    inserting the version into config.upgrade_log because the line to do so
    was still commented out.

Please visit the downloads page to retrieve the server software and staff clients

John Mark Ockerbloom: Sharing journals freely online

planet code4lib - Wed, 2016-05-25 19:19

What are all the research journals that anyone can read freely online?  The answer is harder to determine than you might think.  Most research library catalogs can be searched for online serials (here’s what Penn Libraries gives access to, for instance), but it’s often hard for unaffiliated readers to determine what they can get access to, and what will throw up a paywall when they try following a link.

Current research

The best-known listing of current free research journals has been the Directory of Open Access Journals (DOAJ), a comprehensive listing of free-to-read research journals in all areas of scholarship. Given the ease with which anyone can throw up a web site and call it a “journal” regardless of its quality or its viability, some have worried that the directory might be a little too comprehensive to be useful.  A couple of years ago, though, DOAJ instituted more stringent criteria for what it accepts, and it recently weeded its listings of journals that did not reapply under its new criteria, or did not meet its requirements.   This week I am pleased to welcome over 8,000 of its journals to the extended-shelves listings of The Online Books Page.  The catalog entries are automatically derived from the data DOAJ provides; I’m also happy to create curated entries with more detailed cataloging on readers’ request.

Historic research

Scholarly journals go back centuries.  Many of these journals (and other periodicals) remain of interest to current scholars, whether they’re interested in the history of science and culture, the state of the natural world prior to recent environmental changes, or analyses and source documents that remain directly relevant to current scholarship.  Many older serials are also included in The Online Books Page’s extended shelves courtesy of HathiTrust, which currently offers over 130,000 serial records with at least some free-to-read content.  Many of these records are not for research journals, of course, and those that are can sometimes be fragmentary or hard to navigate.  I’m also happy to create organized, curated records for journals offered by HathiTrust and others at readers’ request.

It’s important work to organize and publicize these records, because many of these journals that go back a long way don’t make their content freely available in the first place one might look.  Recently I indexed five journals founded over a century ago that are still used enough to be included in Harvard’s 250 most popular works: Isis, The Journal of Comparative Neurology, The Journal of Infectious Diseases, The Journal of Roman Studies, and The Philosophical Review.  All five had public domain content offered at their official journal site, or JSTOR, behind paywalls (with fees for access ranging from $10 to $42 per article) that was available for free elsewhere online.  I’d much rather have readers find the free content than be stymied by a paywall.  So I’m compiling free links for these and other journals with public domain runs, whether they can be found at Hathitrust, JSTOR (which does make some early journal content, including from some of these journals, freely available), or other sites.

For many of these journals, the public domain extends as late as the 1960s due to non-renewal of copyright, so I’m also tracking when copyright renewals actually start for these journals.  I’ve done a complete inventory of serials published until 1950 that renewed their own copyrights up to 1977.  Some scholarly journals are in this list, but most are not, and many that are did not renew copyrights for many years beyond 1922.  (For the five journals mentioned above, for instance, the first copyright-renewed issues were published in 1941, 1964, 1959, 1964, and 1964 respectively– 1964 being the first year for which renewals were automatic.)

Even so, major projects like HathiTrust and JSTOR have generally stopped opening journal content at 1922, partly out of a concern for the complexity of serial copyright research.  In particular, contributions to serials could have their own copyright renewals separate from renewals for the serials themselves.  Could this keep some unrenewed serials out of the public domain?  To answer this question, I’ve also started surveying information on contribution renewals, and adding information on those renewals to my inventory.  Having recently completed this survey for all 1920s serials, I can report that so far individual contributions to scholarly journals were almost never copyright-renewed on their own.  (Individual short stories, and articles for general-interest popular magazines, often were, but not articles intended for scientific or scholarly audiences.)  I’ll post an update if the situation changes in the 1930s or later. So far, though, it’s looking like, at least for research journals, serial digitization projects can start opening issues past 1922 with little risk.  There are some review requirements, but they’re comparable in complexity to the Copyright Review Management System that HathiTrust has used to successfully open access to hundreds of thousands of post-1922 public domain book volumes.

Recent research

Let’s not forget that a lot more recent research is also available freely online, often from journal publishers themselves.  DOAJ only tracks journals that make their content open access immediately, but there are also many journals that make their content freely readable online a few months or years after initial publication.  This content can then be found in repositories like PubMedCentral (see the journals noted as “Full” in the “participation” column), publishing platforms like Highwire Press (see the journals with entries in the “free back issues” column), or individual publishers’ programs such as Elsevier’s Open Archives.

Why are publishers leaving money on the table by making old but copyrighted content freely available instead of charging for it?  Often it’s because it’s what’s makes their supporters– scholars and their funders– happy.  NIH, which runs PubMedCentral, already mandates open access to research it funds, and many of the journals that fully participate in PubMedCentral’s free issue program are largely filled with NIH-backed research.  Similarly, I suspect that the high proportion of math journals in Elsevier’s Open Archives selection has something to do with the high proportion of mathematicians in the Cost of Knowledge protest against Elsevier.  When researchers, and their affiliated organizations, make their voices heard, publishers listen.

I’m happy to include listings for  significant free runs of significant research journals on The Online Books Page as well, whether they’re open access from the get-go or after a delay.  I won’t list journals that only make the occasional paid-for article available through a “hybrid” program, or those that only have sporadic “free sample” issues.  But if a journal you value has at least a continuous year’s worth of full-sized, complete issues permanently freely available, please let me know about it and I’ll be glad to check it out.

Sharing journal information

I’m not simply trying to build up my own website, though– I want to spread this information around, so that people can easily find free research journal content wherever they go.  Right now, I have a Dublin Core OAI feed for all curated Online Books Page listings as well as a monthly dump of my raw data file, both CC0-licensed.  But I think I could do more to get free journal information to libraries and other interested parties.  I don’t have MARC records for my listings at the moment, but I suspect that holdings information– what issues of which journals are freely available, and from whom– is more useful for me to provide than bibliographic descriptions of the journals (which can already be obtained from various other sources).  Would a KBART file, published online or made available to initiatives like the Global Open Knowledgebase, be useful?  Or would something else work better to get this free journal information more widely known and used?

Issues and volumes vs. articles

Of course, many articles are made available online individually as well, as many journal publishers allow.  I don’t have the resources at this point to track articles at an individual level, but there are a growing number of other efforts that do, whether they’re proprietary but comprehensive search platforms like Google Scholar and Web of Science, disciplinary repositories like ArXiV and SSRN, institutional repositories and their aggregators like SHARE and BASE, or outright bootleg sites like Sci-Hub.  We know from them that it’s possible to index and provide access to the scholarly knowledge exchange at a global scale, but doing it accurately, openly, comprehensively, sustainably, and ethically is a bigger challenge.   I think it’s a challenge that the academic community can solve if we make it a priority.  We created the research; let’s also make it easy for the world to access it, learn from it, and put it to work.  Let’s make open access to research articles the norm, not the exception.

And as part of that, if you’d like to help me highlight and share information on free, authorized sources for online journal content, please alert me to relevant journals, make suggestions in the comments here, or get in touch with me offline.


Library of Congress: The Signal: The Radcliffe Workshop on Technology & Archival Processing

planet code4lib - Wed, 2016-05-25 19:18

This is a guest post from Julia Kim, archivist in the American Folklife Center at the Library of Congress.

Professor Michael Connelly delivering keynote. Photo by Radcliffe Workshop on Technology and Archival Processing.

The annual meeting of the Radcliffe Technology Workshop (April 4th – April 5th, #radtech16) brought together historians, (digital) humanists and archivists for an intensive discussion of the “digital turn” and its effect on our work. The result was a focused and highly participatory meeting among professionals working across disciplinary lines with regards to our respective methodologies and codes of conduct. The talks and panels served as springboards for rich conversations addressing many of the big picture questions in our fields. Added to this was the use of round-table small group discussions after panel presentations, something that I wish was more a norm at professional events. This post covers only a small portion of the two days.

Matthew Connelly (Columbia University) asked “Will the coming of Big Data mean the end of history as we know it?” The answer was a resounding “yes.” Based on his years as a researcher at the National Archives and Records Administration (NARA), Connelly surveyed the history of government secrets, its inefficiencies, and the minuscule sample rate determining record retention and the resultant losses to the historical record of major world events. Part of his work as a researcher involved making use of these efforts to initiate the largest searchable collection of now de-classified government records with “The Declassification Engine” and the History Lab. In amassing and analyzing the largest data collection of declassified and unredacted records, their work uncovers secrets via systematic omission, for example. (Read more at Wired magazine.)

The next panel, “Connections and Context: A Moderated Conversation about Archival Processing for the Digital Humanities Generation,” was organized around archival processing challenges and included Meredith Evans (Jimmy Carter Presidential Library and Museum), Cristina Pattuelli (Pratt Institute), and Dorothy Waugh (Emory University).

  • Meredith Evans (Jimmy Carter Presidential Library and Museum) of “Documenting Ferguson,” discussed her work “Documenting the Now” and her efforts to push archivists outside of their comfort zone and into the community to collect documentation as events unfolded.
  • Cristina Pattuelli (Pratt Institute) presented on the Linked Jazz linked data pilot project, which pulls together tools into a single platform to create connections with jazz-musician data. The initial data, digitized oral history transcripts, is further enriched and mashed with other types of data sets, like discography information from Carnegie Hall. (Read the overview published on EDUCAUSE.)
  • Dorothy Waugh (Emory University) spoke to the researcher aspect — or more aptly, the lack of researchers — of born-digital collections. (I wrote a related story titled “Researcher Interactions with Born-Digital”.) Her work underlines the need to cultivate not only donors but also the researchers we hope will one day want to investigate time-date stamps and disk images, for example. While few collections are available for research, the lack of researchers using born-digital collections is also a problem. Researchers are unaware of collections and do not, in a sense, know how to approach using these collections. She is in the process of developing a pilot project with undergraduate students to remedy this.
  • Benjamin Moser, the authorized biographer of Susan Sontag, spoke of his own discomfort, at times, with a researcher’s abilities to exploit privileged knowledge in email. To Moser, email increased the responsibilities of both the archive and the researcher to work in a manner that is “tasteful” and underlined the need to define and educate others in what that may mean. (Read his story published in The New Yorker.)

Mary O’Connell Murphy introducing “Collections and Context” panel. Photo by Radcliffe Workshop on Technology and Archival Processing.

There were a number of questions and concerns that we discussed, such as: What course of action is necessary or right when community activists feel discomfort with their submissions? How can we make sure that these collections aren’t misused? How can we protect individuals from legal prosecution? What are our duties to donors, to the law, and to our professions, and how do individuals navigate the conflicts among their competing claims? How can we, across disciplines, develop a way of discussing these issues? If the archives are defined as an associated set of values and practices, how can we address the lack of consensus on how to (re)interpret them, in light of the challenges of digital collections?

Claire Potter (the New School) delivered a keynote entitled “Fibber McGee’s Closet: How Digital Research Transformed the Archive– But Not the History Department,” which underlined these new challenges and the need for history methodologies to shift alongside shifts in archival methodologies. “The Archive, of course, has always represented systems of cognition,” as Potter put it, “but when either the nature of the archive or the way the archive is used changes, we must agree to change with it.” Historians must learn to triage in the face of the increased volume, despite the slow pace at which educational and research models have moved. Potter called for archivists and historians to work together to support our complimentary roles in deriving meaning and use from collections. “The long game will be, historians, I hope, will begin to see archives and information technology as an intellectual and scholarly choice.” The Archives can be a teaching space and research space. (Read the text of her full talk.)

“Why Can’t We Stand Archival Practice on Its Head?” included three case studies experimenting with forms of “digitization as processing”- Larisa Miller (Hoover Institution, Stanford University), Jamie Roth and Erica Boudreau (John F. Kennedy Center Presidential Library and Museum), and Elizabeth Kelly (Loyola University, New Orleans).

  • ­Larisa Miller (Hoover Institution, Stanford University) reviewed the evolution of optical character recognition (OCR) and its use as a processing substitute. In comparing finding aids to these capabilities, she noted that “any access method will produce some winners and some losers.” Miller underscored the resource decisions that every archive must account for: Is this about finding aids or the best way to provide access? By eliminating archival processing, many more materials are digitized and made available to users. Ultimately, what methods maximize resources to get the most materials out to end users? In addition to functional reasons, Miller was critical of some core processing tasks: “The more arrangement we do, the more we violate original order.” (Read her related article published in The American Archivist.)
  • Jamie Roth and Erica Boudreau (John F. Kennedy Center Presidential Library and Museum) implemented multiple modes to test against one another: systematic digitization, digitization “on-demand” and simultaneous digitization while processing. Their talks emphasized impediments to digitization for access, such as their need to comply with legal requirements with restricted material and the lack of reliability with OCR. Roth emphasized that poor description still leads to lack of access or “access in name only.” They also cited researcher’s strong preferences for the analog original, even when given the option to use the digitized version.
  • Elizabeth Kelly (Loyola University, New Orleans) also experimented with folder-level metadata in digitizing university photographs. The scanning resulted in significant resource savings but surveyed users found the experimentally scanned collection “difficult to search and browse, but acceptable to some degree.” (Her slides are on Figshare.)

A great point from some audience members was that these types of item-level online displays are not viable information for data researchers. Item-level organization seems to be a carryover from the analog world that, once again, serves some and not others with their evaluations.

“Going Beyond the Click: A Moderated Conversation on the Future of Archival Description,” included Jarrett Drake (Princeton), Ann Wooton (PopUp Archive) and Kari Smith (Massachusetts Institute of Technology, but I’ll focus on Drake’s work. Drake, Smith, and Wooten all addressed the major insufficiencies in existing descriptive and access practices in different ways. Smith will publish a blog post with more information on MIT’s Engineering the Future of the Past this Friday, May 27.

  • Jarrett Drake (Princeton) spoke from his experiences at Princeton, as well as with “A People’s Archive for Police Violence in Cleveland.” He delivered an impassioned attack of foundational principles — such as provenance, appraisal and respect des fonds — as not only technically insufficient in a landscape of corporatized ownership in the cloud, university ownership of academic work and collaborative work, but also as unethical carryovers of our colonialist and imperialistic past. With this technological shift, however, he emphasized the greater possibility for change: “First, we occupy a moment in history in which the largest percentage of the world’s population ever possesses the power and potential to author and create documentation about their lived experiences.” (Read the full text of his talk.)

While I haven’t done justice to the talks and the ensuing conversation and debate, the Radcliffe Technology Workshop helped me to expand my own thinking by framing problems to include invested practitioners and theorists outside of the digital preservation sphere. To my knowledge it is also the only event of its kind.

LITA: Jobs in Information Technology: May 25, 2016

planet code4lib - Wed, 2016-05-25 18:36

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Pacific States University(PSU), Librarian, Los Angeles, CA

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

John Mark Ockerbloom: Sharing journals freely online

planet code4lib - Wed, 2016-05-25 16:07

What are all the research journals that anyone can read freely online?  The answer is harder to determine than you might think.  Most research library catalogs can be searched for online serials (here’s what Penn Libraries gives access to, for instance), but it’s often hard for unaffiliated readers to determine what they can get access to, and what will throw up a paywall when they try following a link.

Current research

The best-known listing of current free research journals has been the Directory of Open Access Journals (DOAJ), a comprehensive listing of free-to-read research journals in all areas of scholarship. Given the ease with which anyone can throw up a web site and call it a “journal” regardless of its quality or its viability, some have worried that the directory might be a little too comprehensive to be useful.  A couple of years ago, though, DOAJ instituted more stringent criteria for what it accepts, and it recently weeded its listings of journals that did not reapply under its new criteria, or did not meet its requirements.   This week I am pleased to welcome over 8,000 of its journals to the extended-shelves listings of The Online Books Page.  The catalog entries are automatically derived from the data DOAJ provides; I’m also happy to create curated entries with more detailed cataloging on readers’ request.

Historic research

Scholarly journals go back centuries.  Many of these journals (and other periodicals) remain of interest to current scholars, whether they’re interested in the history of science and culture, the state of the natural world prior to recent environmental changes, or analyses and source documents that remain directly relevant to current scholarship.  Many older serials are also included in The Online Books Page’s extended shelves courtesy of HathiTrust, which currently offers over 130,000 serial records with at least some free-to-read content.  Many of these records are not for research journals, of course, and those that are can sometimes be fragmentary or hard to navigate.  I’m also happy to create organized, curated records for journals offered by HathiTrust and others at readers’ request.

It’s important work to organize and publicize these records, because many of these journals that go back a long way don’t make their content freely available in the first place one might look.  Recently I indexed five journals founded over a century ago that are still used enough to be included in Harvard’s 250 most popular works: Isis, The Journal of Comparative Neurology, The Journal of Infectious Diseases, The Journal of Roman Studies, and The Philosophical Review.  All five had public domain content offered at their official journal site, or JSTOR, behind paywalls (with fees for access ranging from $10 to $42 per article) that were available for free elsewhere online.  I’d much rather have readers find the free content than be stymied by a paywall.  So I’m compiling free links for these and other journals with public domain runs, whether they can be found at Hathitrust, JSTOR (which does make some early journal content, including from some of these journals, freely available), or other sites.

For many of these journals, the public domain extends as late as the 1960s due to non-renewal of copyright, so I’m also tracking when copyright renewals actually start for these journals.  I’ve done a complete inventory of serials published until 1950 that renewed their own copyrights up to 1977.  Some scholarly journals are in this list, but most are not, and many that are did not renew copyrights for many years beyond 1922.  (For the five journals mentioned above, for instance, the first copyright-renewed issues were published in 1941, 1964, 1959, 1964, and 1964 respectively– 1964 being the first year for which renewals were automatic.)

Even so, major projects like HathiTrust and JSTOR have generally stopped opening journal content at 1922, partly out of a concern for the complexity of serial copyright research.  In particular, contributions to serials could have their own copyright renewals separate from renewals for the serials themselves.  Could this keep some unrenewed serials out of the public domain?  To answer this question, I’ve also started surveying information on contribution renewals, and adding information on those renewals to my inventory.  Having recently completed this survey for all 1920s serials, I can report that so far individual contributions to scholarly journals were almost never copyright-renewed on their own.  (Individual short stories, and articles for general-interest popular magazines, often were, but not articles intended for scientific or scholarly audiences.)  I’ll post an update if the situation changes in the 1930s or later. So far, though, it’s looking like, at least for research journals, serial digitization projects can start opening issues past 1922 with little risk.  There are some review requirements, but they’re comparable in complexity to the Copyright Review Management System that HathiTrust has used to successfully open access to hundreds of thousands of post-1922 public domain book volumes.

Recent research

Let’s not forget that a lot more recent research is also available freely online, often from journal publishers themselves.  DOAJ only tracks journals that make their content open access immediately, but there are also many journals that make their content freely readable online a few months or years after initial publication.  This content can then be found in repositories like PubMedCentral (see the journals noted as “Full” in the “participation” column), publishing platforms like Highwire Press (see the journals with entries in the “free back issues” column), or individual publishers’ programs such as Elsevier’s Open Archives.

Why are publishers leaving money on the table by making old but copyrighted content freely available instead of charging for it?  Often it’s because it’s what’s makes their supporters– scholars and their funders– happy.  NIH, which runs PubMedCentral, already mandates open access to research it funds, and many of the journals that fully participate in PubMedCentral’s free issue program are largely filled with NIH-backed research.  Similarly, I suspect that the high proportion of math journals in Elsevier’s Open Archives selection has something to do with the high proportion of mathematicians in the Cost of Knowledge protest against Elsevier.  When researchers, and their affiliated organizations, make their voices heard, publishers listen.

I’m happy to include listings for  significant free runs of significant research journals on The Online Books Page as well, whether they’re open access from the get-go or after a delay.  I won’t list journals that only make the occasional paid-for article available through a “hybrid” program, or those that only have sporadic “free sample” issues.  But if a journal you value has at least a year’s worth of full-sized, complete issues permanently freely available, please let me know about it and I’ll be glad to check it out.

Sharing journal information

I’m not simply trying to build up my own website, though– I want to spread this information around, so that people can easily find free research journal content wherever they go.  Right now, I have a Dublin Core OAI feed for all curated Online Books Page listings as well as a monthly dump of my raw data file, both CC0-licensed.  But I think I could do more to get free journal information to libraries and other interested parties.  I don’t have MARC records for my listings at the moment, but I suspect that holdings information– what issues of which journals are freely available, and from whom– is more useful for me to provide than bibliographic descriptions of the journals (which can already be obtained from various other sources).  Would a KBART file, published online or made available to initiatives like the Global Open Knowledgebase, be useful?  Or would something else work better to get this free journal information more widely known and used?

Issues and volumes vs. articles

Of course, many articles are made available online individually as well, as many journal publishers allow.  I don’t have the resources at this point to track articles at an individual level, but there are a growing number of other efforts that do, whether they’re proprietary but comprehensive search platforms like Google Scholar and Web of Science, disciplinary repositories like ArXiV and SSRN, institutional repositories and their aggregators like SHARE and BASE, or outright bootleg sites like Sci-Hub.  We know from them that it’s possible to index and provide access to the scholarly knowledge exchange at a global scale, but doing it accurately, openly, comprehensively, sustainably, and ethically is a bigger challenge.   I think it’s a challenge that the academic community can solve if we make it a priority.  We created the research; let’s also make it easy for the world to access it, learn from it, and put it to work.  Let’s make open access to research articles the norm, not the exception.

And as part of that, if you’d like to help me highlight and share information on free, authorized sources for online journal content, please alert me to relevant journals, make suggestions in the comments here, or get in touch with me offline.


District Dispatch: Presidential campaigns weigh in on education & libraries

planet code4lib - Wed, 2016-05-25 15:22

Representatives from all three major Presidential campaigns are expected to participate in this week’s CEF Presidential Forum to be held May 26 in Washington. ALA will be participating in the half-day forum and encourages members to view and participate online.

Source: www.thisisamericanrugby.com”

ALA members are invited to follow the Forum online as the event will be live streamed  starting at 10:00 AM and running through 12:00 PM EST. ALA has submitted library-themed questions for the Presidential representatives, but you can participate in the event by submitting your questions at  SubmitQ@cef.org or tweeting your questions via twitter using #CEFpresForum.

The Committee for Education Funding (CEF) is hosting the 2016 Presidential Forum, which will emphasize education as a critical domestic policy and the need for continuing investments in education. At the forum, the high-level surrogates will discuss in depth the education policy agendas of the remaining candidates. A second panel of education experts from think tanks will discuss the educational landscape that awaits the next administration.  CEF has hosted Presidential Forums during previous elections.

Candy Crowley, award-winning journalist and former Chief Political Correspondent for CNN, will moderate both panels.

The post Presidential campaigns weigh in on education & libraries appeared first on District Dispatch.

David Rosenthal: Randall Munroe on Digital Preservation

planet code4lib - Wed, 2016-05-25 15:00
Randall Munroe succinctly illustrates a point I made at length in my report on emulation:
And here, for comparison, is one of the Internet Archive's captures of the XKCD post. Check the mouse-over text.

Open Knowledge Foundation: Introducing: MyData

planet code4lib - Wed, 2016-05-25 14:01

this post was written by the OK Finland team

What is MyData?

MyData is both an alternative vision and guiding technical principles for how we, as individuals, can have more control over the data trails we leave behind us in our everyday actions.

The core idea is that we, you and I, should have an easy way to see where data about us goes, specify who can use it, and alter these decisions over time. To do this, we are developing a standardized, open, and mediated approach to personal data management by creating “MyData operators.”

Standardised operator model

A MyData operator account would act like an email account for your different data streams. Like an email, different parties can host an operator account, with different sets of functionalities. For example, some MyData operators could also provide personal data storage solutions, others could perform data analytics or work as identity provider. The one requirement for a MyData operator is that it lets individual receive and send data streams according to one interoperable set of standards.

What “MyData” can do?

“MyData” model does a few things that the current data ecosystem does not.

It will let you to re-use your data with a third party – For example, you could take data collected about your purchasing habits from a loyalty card of your favourite grocery store and re-use it in a financing application to see how you are spending your money on groceries.

It will let you see and change how you consent to your data use Currently,  different service providers and applications use complicated terms of service where most users just check ‘yes’ or ‘no’ once , without being entirely sure what they agree to.

It will let you change services – With MyData you will be able to take your data from one operator to another if you decide to change services.

Make it happen, make it right

MyData2016 conference will be held in Aug 31st- Sep 2nd in Helsinki Hall of Culture.

Right now, the technical solutions for managing your data according to MyData approach exist. There are many initiatives, emerging out of both the public and private sectors around the world, paving the way for human-centered personal data management. We believe strongly in the need to collaborate with other initiatives to develop an infrastructure in a way that works with all the complicated systems at work in the current data landscape. Buy your tickets for early bird discount before May 31st.

Follow MyData on social media for updates:

Twitter https://twitter.com/mydata2016 Facebook https://www.facebook.com/mydata2016/

William Denton: CC-BY

planet code4lib - Wed, 2016-05-25 02:16

I’ve changed the license on my content to CC-BY: Creative Commons Attribution 4.0.

Chris Beer: Autoscaling AWS Elastic Beanstalk worker tier based on SQS queue length

planet code4lib - Wed, 2016-05-25 00:00
We are deploying a Rails application (for the [Hydra-in-a-Box](https://github.com/projecthydra-labs/hybox) project) to [AWS Elastic Beanstalk](https://aws.amazon.com/elasticbeanstalk/). Elastic Beanstalk offers us easy deployment, monitoring, and simple auto-scaling with a built-in dashboard and management interface. Our application uses several potentially long-running background jobs to characterize, checksum, and create derivates for uploaded content. Since we're deploying this application within AWS, we're also taking advantage of the [Simple Queue Service](https://aws.amazon.com/sqs/) (SQS), using the [`active-elastic-job`](https://github.com/tawan/active-elastic-job) gem to queue and run `ActiveJob` tasks. Elastic Beanstalk provides settings for "Web server" and "Worker" tiers. Web servers are provisioned behind a load balancer and handle end-user requests, while Workers automatically handle background tasks (via SQS + active-elastic-job). Elastic Beanstalk provides basic autoscaling based on a variety of metrics collected from the underlying instances (CPU, Network, I/O, etc), although, while sufficient for our "Web server" tier, we'd like to scale our "Worker" tier based on the number of tasks waiting to be run. Currently, though, the ability to auto-scale the worker tier based on the underlying queue depth isn't enable through the Elastic Beanstak interface. However, as Beanstalk merely manages and aggregates other AWS resources, we have access to the underlying resources, including the autoscaling group for our environment. We should be able to attach a custom auto-scaling policy to that auto scaling group to scale based on additional alarms. For example, let's we want to add additional worker nodes if there are more than 10 tasks for more than 5 minutes (and, to save money and resources, also remove worker nodes when there are no tasks available). To create the new policy, we'll need to: - find the appropriate auto-scaling group by finding the Auto-scaling group with the `elasticbeanstalk:environment-id` that matches the worker tier environment id; - find the appropriate SQS queue for the worker tier; - add auto-scaling policies that add (and remove) instances to the autoscaling group; - create a new CloudWatch alarm that measures the SQS queue exceeds our configured depth (5) that triggers the auto-scaling policy to add additional worker instances whenever the alarm is triggered; - and, conversely, create a new CloudWatch alarm that measures the SQS queue hits 09 that trigger the auto-scaling action to removes worker instances whenever the alarm is triggered. [img1] [img2] and, similarly for scaling back down. Even though there are several manual steps, they aren't too difficult (other than discovering the various resources we're trying to orchestrate), and using Elastic Beanstalk is still valuable for the rest of its functionality. But, we're in the cloud, and really want to automate everything. With a little CloudFormation trickery, we can even automate creating the worker tier with the appropriate autoscaling policies. First, knowing that the CloudFormation API allows us to pass in an existing SQS queue for the worker tier, let's create an explicit SQS queue resource for the workers: ```json "DefaultQueue" : { "Type" : "AWS::SQS::Queue", } ``` And wire it up to the Beanstalk application by setting the `aws:elasticbeanstalk:sqsd:WorkerQueueURL` (not shown: sending the worker queue to the web server tier): ```json "WorkersConfigurationTemplate" : { "Type" : "AWS::ElasticBeanstalk::ConfigurationTemplate", "Properties" : { "ApplicationName" : { "Ref" : "AWS::StackName" }, "OptionSettings" : [ ..., { "Namespace": "aws:elasticbeanstalk:sqsd", "OptionName": "WorkerQueueURL", "Value": { "Ref" : "DefaultQueue"} } } } }, "WorkerEnvironment": { "Type": "AWS::ElasticBeanstalk::Environment", "Properties": { "ApplicationName": { "Ref" : "AWS::StackName" }, "Description": "Worker Environment", "EnvironmentName": { "Fn::Join": ["-", [{ "Ref" : "AWS::StackName"}, "workers"]] }, "TemplateName": { "Ref": "WorkersConfigurationTemplate" }, "Tier": { "Name": "Worker", "Type": "SQS/HTTP" }, "SolutionStackName" : "64bit Amazon Linux 2016.03 v2.1.2 running Ruby 2.3 (Puma)" ... } } ``` Using our queue we can describe one of the `CloudWatch::Alarm` resources and start describing a scaling policy: ```json "ScaleOutAlarm" : { "Type": "AWS::CloudWatch::Alarm", "Properties": { "MetricName": "ApproximateNumberOfMessagesVisible", "Namespace": "AWS/SQS", "Statistic": "Average", "Period": "60", "Threshold": "10", "ComparisonOperator": "GreaterThanOrEqualToThreshold", "Dimensions": [ { "Name": "QueueName", "Value": { "Fn::GetAtt" : ["DefaultQueue", "QueueName"] } } ], "EvaluationPeriods": "5", "AlarmActions": [{ "Ref" : "ScaleOutPolicy" }] } }, "ScaleOutPolicy" : { "Type": "AWS::AutoScaling::ScalingPolicy", "Properties": { "AdjustmentType": "ChangeInCapacity", "AutoScalingGroupName": ????, "ScalingAdjustment": "1", "Cooldown": "60" } }, ``` However, to connect the policy to the auto-scaling group, we need to know the name for the autoscaling group. Unfortunately, the autoscaling group is abstracted behind the Beanstalk environment. To gain access to it, we'll need to create a custom resource backed by a Lambda function to extract the information from the AWS APIs: ```json "BeanstalkStack": { "Type": "Custom::BeanstalkStack", "Properties": { "ServiceToken": { "Fn::GetAtt" : ["BeanstalkStackOutputs", "Arn"] }, "EnvironmentName": { "Ref": "WorkerEnvironment" } } }, "BeanstalkStackOutputs": { "Type": "AWS::Lambda::Function", "Properties": { "Code": { "ZipFile": { "Fn::Join": ["\n", [ "var response = require('cfn-response');", "exports.handler = function(event, context) {", " console.log('REQUEST RECEIVED:\\n', JSON.stringify(event));", " if (event.RequestType == 'Delete') {", " response.send(event, context, response.SUCCESS);", " return;", " }", " var environmentName = event.ResourceProperties.EnvironmentName;", " var responseData = {};", " if (environmentName) {", " var aws = require('aws-sdk');", " var eb = new aws.ElasticBeanstalk();", " eb.describeEnvironmentResources({EnvironmentName: environmentName}, function(err, data) {", " if (err) {", " responseData = { Error: 'describeEnvironmentResources call failed' };", " console.log(responseData.Error + ':\\n', err);", " response.send(event, context, resource.FAILED, responseData);", " } else {", " responseData = { AutoScalingGroupName: data.EnvironmentResources.AutoScalingGroups[0].Name };", " response.send(event, context, response.SUCCESS, responseData);", " }", " });", " } else {", " responseData = {Error: 'Environment name not specified'};", " console.log(responseData.Error);", " response.send(event, context, response.FAILED, responseData);", " }", "};" ]]} }, "Handler": "index.handler", "Runtime": "nodejs", "Timeout": "10", "Role": { "Fn::GetAtt" : ["LambdaExecutionRole", "Arn"] } } } ``` With the custom resource, we can finally get access the autoscaling group name and complete the scaling policy: ```json "ScaleOutPolicy" : { "Type": "AWS::AutoScaling::ScalingPolicy", "Properties": { "AdjustmentType": "ChangeInCapacity", "AutoScalingGroupName": { "Fn::GetAtt": [ "BeanstalkStack", "AutoScalingGroupName" ] }, "ScalingAdjustment": "1", "Cooldown": "60" } }, ``` The complete worker tier is part of our CloudFormation stack: https://github.com/hybox/aws/blob/master/templates/worker.json

DuraSpace News: Luso-Brazilian Digital Library Launched

planet code4lib - Wed, 2016-05-25 00:00

From Tiago Ferreira, Neki IT

 

District Dispatch: Last week in appropriations

planet code4lib - Tue, 2016-05-24 19:41

The Appropriations process in Congress is a year-long cycle with fits and starts, and includes plenty of lobbying, grassroots appeals, lobby days, speeches, hearings and markups, and even creative promotions designed to draw attention to the importance of one program or another. ALA members and the Office of Government Relations continue to play a significant role in this process. Recently, for example, we’ve worked to support funding for major library programs like LSTA and IAL, as well as to address policy issues that arise in Congressional deliberations. Your grassroots voice helps amplify my message in meetings with Congressional staff.

The House and Senate Appropriations Committees have begun to move their FY2017 funding bills through the subcommittee and full committee process as the various spending measures to the Floor and then to the President’s desk. Last week was a big week for appropriations on Capitol Hill and I was back-and-forth to various Congressional hearings, meetings, and events. Here are a few of last week’s highlights:

Source: csp_iqoncept

Tuesday – There’s another word for that    

The full House Appropriations Committee convened (in a type of meeting called a “markup”) to discuss, amend and vote on two spending bills: those for the Department of Defense and the Legislative Branch. A recent proposed change to Library of Congress (LC) cataloging terminology having nothing to do with funding at all was the focus of action on the Legislative Branch bill. Earlier in April, the Subcommittee Chair Tom Graves (R-GA14) successfully included instructions to the Library in a report accompanying the bill that would prohibit the LC from implementing changes in modernizing the outdated, and derogatory, terms “illegal aliens” and “aliens.”

An amendment was offered during Tuesday’s full Committee meeting by Congresswoman Debbie Wasserman Schultz (D-FL23) that would have removed this language from the report (a position strongly and actively supported by ALA and highlighted during National Library Legislative Day). The amendment generated extensive discussion, including vague references by one Republican to “outside groups” (presumably ALA) that were attempting to influence the process (influence the process? in Washington? shocking!).

The final roll call vote turned out to be a nail biter as ultimately four Committee Republicans broke with the Subcommittee chairman to support the amendment. Many in the room, myself included, thought the amendment might have passed and an audible gasp from the audience was heard upon announcement that it had failed by just one vote (24 – 25). Unfortunately, two Committee Democrats whose votes could have carried the amendment were not able to attend. The Legislative Branch spending bill now heads to the Floor and another possible attempt to pass the Wasserman Schultz amendment …. or potentially to keep the bill from coming up at all.

Wednesday – Can you hear me now? Good.

In Congress, sometimes the action occurs outside the Committee rooms. It’s not uncommon, therefore, for advocates and their congressional supporters to mount a public event to ratchet up the pressure on the House and Senate. ALA has been an active partner in a coalition seeking full funding for Title IV, Part A of the Every Student Succeeds Act. On Wednesday, I participated in one such creative endeavor: a rally on the lawn of the US Capitol complete with high school choir, comments from supportive Members of Congress, and “testimonials” from individuals benefited by Title IV funding.

This program gives school districts the flexibility to invest in student health and safety, academic enrichment, and education technology programs. With intimate knowledge of the entire school campus, libraries are uniquely positioned to assist in determining local needs for block grants, and for identifying needs within departments, grade levels, and divisions within a school or district. Congress authorized Title IV in the ESSA at $1.65 billion for FY17, however the President’s budget requests only about one third of that necessary level.

The cloudy weather threatened — but happily did not deliver — rain and the event came off successfully. Did Congress hear us? Well, our permit allowed the use of amplified speakers, so I’d say definitely yes!

Thursday – A quick vote before lunch

On Thursday, just two days after House Appropriators’ nail biter of a vote over Legislative Branch Appropriations, the full Senate Appropriations Committee took up their version of that spending bill in addition to Agriculture Appropriations. For a Washington wonk, a Senate Appropriations Committee hearing is a relatively epic thing to behold. Each Senator enters the room trailed by two to four staffers carrying reams of paper. Throughout the hearing, staffers busily whisper amongst each other, and into the ears of their Senators (late breaking news that will net an extra $10 million for some pet project, perhaps?)

While a repeat of Tuesday’s House fracas wasn’t at all anticipated (ALA had worked ahead of time to blunt any effort to adopt the House’s controversial Library of Congress provision in the Senate), I did wonder whether there had been a last minute script change when the Chairman took up the Agriculture bill first and out of order based on the printed agenda for the meeting. After listening to numerous amendments addressing such important issues as Alaska salmon, horse slaughter for human consumption (yuck?), and medicine measurement, I was definitely ready for the Legislative Branch Appropriations bill to make its appearance. As I intently scanned the room for any telltale signs of soon-to-be-volcanic controversy, the Committee Chairman brought up the bill, quickly determined that no Senator had any amendment to offer, said a few congratulatory words, successfully called for a voice vote and gaveled the bill closed.

Elapsed time, about 3 minutes! I was unexpectedly free for lunch…and, for some reason, craving Alaska salmon.

Epilogue – The train keeps a rollin’

This week’s activity by the Appropriations Committees of both chambers demonstrates that the leaders of Congress’ Republican majority are deliberately moving the Appropriations process forward. Indeed, in the House and Senate they have promised to bring all twelve funding bills to the floor of both chambers on time…something not done since 1994. Sadly, however, staffers on both sides of the aisle tell me thatthey expect the process to stall at some point. If that happens, once again Congress will need to pass one or more “Continuing Resolutions” (or CRs) after October 1 to keep the government operating. One thing is certain; there is lots of work to be done this summer to defend library funding and policies.

The post Last week in appropriations appeared first on District Dispatch.

Pages

Subscribe to code4lib aggregator