- Terrified robots will take middle class jobs? Look in a mirror, in which Andrew argues that jobs are automated only when they have been drained of all human functions such as judgement.
- Meet the original Big Data, TED Talk, Thought Shower Futurist, in which Andrew discusses the analogies between the work of William Playfair (1759-1823) and today's Big Data enthusiasts.
an embezzler and a blackmailer, with some unscrupulous data-gathering methods. He would kidnap farmers until they told him how many sheep they had. Today he’s remembered as the father of data visualisation. He was the first to use the pie chart, the line chart, the bar chart.
Playfair stressed the confusion of the moment, its historical discontinuity, and advanced himself as a guru with new methods who was able to make sense of it.Both extracts are worth your time.
- “Hells Bells” (Back In Black, 1980)
- “Thunderstruck” (The Razors Edge, 1990)
- “Back in Black” (Back In Black, 1980)
- “Shake a Leg” (Back In Black, 1980)
- “Jailbreak” (‘74 Jailbreak, 1984)
- “It’s a Long Way to the Top (If You Wanna Rock 'N’ Roll)” (High Voltage, 1975)
- “T.N.T.” (High Voltage, 1975)
- “Dirty Deeds Done Dirt Cheap” (Dirty Deeds Done Dirt Cheap, 1976)
- “Bad Boy Boogie” (Let There Be Rock, 1977)
- “Whole Lotta Rosie” (Let There Be Rock, 1977)
- “The Jack (Live)” (If You Want Blood You’ve Got It, 1978)
- “Let There Be Rock (Live)” (If You Want Blood You’ve Got It, 1978)
- “Rocker (Live)” (If You Want Blood You’ve Got It, 1978)
- “Highway to Hell” (Highway to Hell, 1979)
- “Who Made Who” (Who Made Who, 1986)
- “For Those About to Rock (We Salute You)” (For Those About to Rock (We Salute You), 1981)
I’ve decided to post a few playlists. Sixteen songs is a nice number to keep things contained and focused, so I call my best-of-a-band playlists “My 0x10” (0x10 == 10 hexadecimal == 10 base 16 == 16 base 10) so they sort together. If there was more room I’d include, among others, “Down Payment Blues” and “Kicked In the Teeth” from Powerage (1978), and the rest of If You Want Blood You’ve Got It, which is a masterpiece.
As release maintainer of the 2.9 series of the Evergreen ILS software, I am proud to announce that patch version 2.9.2 was released today.
This release has a number of bug fixes in Acquisitions, Cataloging, Circulation, Administration, and the OPAC.
I would like to highlight one fix that is very important to our international users. Beginning with the 2.9.0 release, the browser-based staff client translations were integrated into the mainline Evergreen releases. When this was done an oversight was made and the browser staff client translations ended up replacing the OPAC translations during installation. The Evergreen 2.9.2 release fixes this so that the translations now sit side by side. You will need to change your web server configuration for the translation fix to take effect. Please see Setting a default language and adding optional languages in the documentation for updated instructions.
For more information about the bug fixes in this release, please see the release notes.
Library of Congress: The Signal: Avoid Jitter! Measuring the Performance of Audio Analog-to-Digital Converters
The following is a guest post by Carl Fleischhauer, a Project Manager in the National Digital Initiatives unit at the Library of Congress.
It’s not for everyone, but I enjoy trying to figure out specialized technical terminology, even at a superficial level. For the last month or two, I have been helping assemble a revision of a FADGI guideline (PDF) and an accompanying explanatory report (PDF). Both of these documents pertain to measuring the performance of analog-to-digital converters (ADC) for sound recordings; you can find them and other relevant documents at a dedicated FADGI Web page. This experience has let me peer into the world of audio engineering. (FADGI stands for the Federal Agencies Digitization Guidelines Initiative; there are two Working Groups, one for still images, one for audio-visual.)
If you are familiar with audio digitization, some of the terms will come easily and are even self-explanatory: frequency response (what are the highest and lowest sound pitches that can be reproduced without having the level or volume fall out of range?), total harmonic distortion + noise (has to do with, well, distortion and noise, stuff you don’t want in your audio stream), and two types of crosstalk (for stereo, you want to keep left separate from right). Among the terms that were a little more mysterious to me was common-mode rejection ratio. I learned that this has to do with the ability of a device to reject noise and interference that typically results from picking up unwanted electromagnetic interference in the wiring between the audio source and the ADC’s input.
My favorite mystery terms consist of this pair: sync input jitter susceptibility and jitter transfer gain. For a layperson, the word jitter is irresistible. Alas, even with audio at stake, this has nothing to do with music or dance. Rather, the term reflects the importance of a “clock.” As discussed in the sidebar that follows, digitization entails sampling the audio, producing a large number of numbers (the digits, natch) at very precise intervals in time, aka “event instants.” One Audio Engineering Society standard (AES-12id-2006, r2011) defines jitter as “the dynamic deviation of event instants in a stream or signal from their ideal positions in time.” Accuracy and precision are at stake. Continued below sidebar.
SIDEBAR: Audio digitization and what an ADC does
One way or another sound starts as waves, first in the air and (if being recorded) subsequently transformed into changes in voltage that march in step with the waves in the air: “electrical waves.” In the not-so-old days, these waves were recorded on tape, where magnetic particles made a record of the voltage changes, more or less a magnetic “analog” of the electrical waves.
Those not-so-old tapes deteriorate as time passes, and several FADGI-member agencies are actively copying them in order to save their recorded content. The content is transferred to digital files, i.e., digitized. The ADC is the central device in the digitizing system, where it transforms the (electrical) waves it receives into a stream of digits that represent points on the wave. (If you are imagining a game of connect the dots, you are not too far off.)
Most preservation specialists recommend that the waveform be sampled 96,000 times per second, i.e., 96 kiloherz (kHz) in tech-speak. This high number will do a dandy job capturing the “horizontal” frequency of the wave movement. And in order capture the amplitude, the “vertical” movement, each sample should be represented by a number (to position it on the graph, as it were) that is 24 bits long. For comparison, the bits-per-sample on an audio compact disk is only 16 bits per sample, with a sampling frequency of 44,100 per second. Audio CDs are less “hi-fi” than the files made to serve preservation goals.
The point of the preceding is this: an ADC must produce 96,000 reliable data points (samples) each second, each sample accurately represented by a 24-bit-long number. That is quite a trick. And when you add things like having to cope with stereo (two-channel) audio, and other vagaries of the overall setup, the trick becomes even more challenging.
Continuation of text from above sidebar.
The FADGI report, released this month, is one product from a multiyear project that saw an initial guideline published in 2012. The new report describes activities carried out during 2015, some of which have led to proposed adjustments to the 2012 guideline. Meanwhile a fresh round of work begins in 2016. The main author is the Working Group’s expert consultant Chris Lacinak of Audiovisual Preservation Solutions, supported by the audio maintenance technician Phillip Sztenderowicz. Field tests were carried out at three federal agencies: the Voice of America, the National Archives and Records Administration, and the Packard Campus of the Library of Congress. The FADGI team is eager to hear from specialists in the field, and encourages review and comments, with a due date of May 30, 2016.
Determining the metrics and methods for ADC performance testing has proven to be challenging, and this progress report takes its place in a series of documents, all linked to the web page cited above. Why challenging? There are two broad dimensions to the effort, each difficult to properly define and, as it happens, the dimensions can intersect in unanticipated ways.
One dimension entails some expected parameters:
- What features or capabilities of the ADC ought be measured?
- How should those features or capabilities be measured?
- What ought be the pass-fail points for preservation work?
The second dimension reflects the Working Group’s interest in allowing for levels of performance. The 2012 guideline pertains to ADC performance at the highest level. From the start, however, the Working Group also sought to develop guidelines for moderate or minimum levels of performance as well. Such levels might be selected by organizations with modest resources but who wish to proceed with a digitization project, e.g., a federal agency with a historical collection of recordings of lecture-like presentations by staff, originally recorded on audiocassettes. The agency may determine that copies made with a “very good” ADC will meet every conceivable future need. Or an archive may have certain classes of material, e.g., radio station logging tapes, for which “acceptable” digitized copies produced with a minimum performance system will be sufficient.
This second dimension also plays off another factor: the cost of testing equipment, and the skill level required to use it. The high-level ADC performance guideline contains 12 metrics, several with very exacting measurements. In order to evaluate performance against all 12 metrics at the desired levels of precision, an organization will need an audio analyzer from a category that costs upwards of $20,000. Several large federal agencies possess such devices (two participated in a 2015 FADGI field test) but many others do not, and the cost is prohibitive for many of them.
As described in the new report, the matter of sorting out the metrics and methods for the high-level performance guideline are largely a matter of refinement. (A set of minor adjustments is proposed in the new documents posted this month.) But it was difficult to chart a course toward the goal of a lower-cost, lower-skill-requiring guideline.
The difficulty has to do with the intricacies of measuring audio performance, and the ways in which the specific measurement tools–high cost, low cost–execute the measurements, also known as test methods. A low-cost system may provide a reasonable assessment of, say, Total Harmonic Distortion + Noise (THD+N), but it does not do so in precisely the same manner as the high-cost measurement system. Thus, although both systems offer a measurement of THD+N, and although the low-cost system’s ability to measure that performance “tops out” at a lower quality level than the high-cost system, it is not strictly the case that there is a simple “lower performance number” relationship when compared to the reading from the high-cost system. It is also the case that the low-cost system itself has limited capabilities. The ADC being tested might perform better than what the low-cost system reports, due to the testing system’s limitations. However–and this is important–a low-cost test system will ferret out clear performance failures from an ADC, even if it is a high-performance unit.
One outcome of the 2015 project is the conceptual framework represented in the table that follows. This framework uses terms to name the test systems that reflect both of the elements in play: ADC performance and measurement system cost.
Earlier this week had I the opportunity to head to Alberta to talk governance, Librarian style.
You may think this is a rare problem, since Capybara/JS is such a popular tool suite, and you didn’t find too much on it on Google, and what you did find mostly suggested you’d only have problems like this if you were “doing it wrong”, so maybe you just weren’t an experienced enough coder to figure it out.
After researching these problems, my belief is that intermittent test failures are actually fairly endemic to those using Capybara JS drivers, at least/especially with a complex JS front-end environment (Angular, React/Flux, Ember, etc). It’s not just you, I believe, these problems plague many experienced developers.
This blog post summarizes what I learned trying to make my own JS feature tests reliable — I think there is no magic bullet, but to begin with you can understand the basic architecture and nature of race condition problems in this context; there are a bucket of configuration issues you can double-check to reduce your chances of problems somewhat; turning off rspec random ordering may be surprisingly effective at decreasing intermittent failures; but ultimately, building reliable JS feature tests with Capybara is a huge challenge.
My situation: I had not previously had any experience with new generation front-end JS front-end frameworks. Relatedly, I had previously avoided Capybara JS features, being scared of them (retroactively my intution was somewhat justified), and mostly not testing my (limited) JS. But at the new gig, I was confronted with supporting a project which: Had relatively intensive front-end JS, for ‘legacy’ reasons using a combination of React and Angular; was somewhat under-tested, with JS feature tests somewhat over-represented in the test suite (things that maybe could have been tested with functional/controller or unit tests, were instead being tested with UI feature tests); and had such intermittent unreliability in the test suite that it made the test suite difficult to use for it’s intended purpose.
I have not ultimately solved the intermittent failures, but I have significantly decreased their frequency, making the test suite more usable.
I also learned a whole lot in the process. If you are a “tldr” type, this post might not be for you, it has become large. My goal is to provide the post I wish I had found before embarking on many, many hours of research and debugging; it may take you a while to read and assimilate, but if you’re as frustrated as I was, hopefully it will save you many more hours of independent research and experimentation to put it all together.
The relevant testing stack in the app I was investigating is: Rails 4.1.x, Rspec 2.x (with rspec-rails), Capybara, DatabaseCleaner, Poltergeist. So that’s what I focused on. Changing any of these components (say, MiniTest for Rspec) could make things come out different, although the general picture is probably the same with any Capybara JS driver.No blame
To get it out of the way, I’m not blaming Capybara as ‘bad software’ either.
The inherent concurrency involved in the way JS feature tests are done makes things very challenging.
Making things more challenging is that the ‘platform’ that gives us JS feature testing is composed of a variety of components with separate maintainers, all of which are intended to work “mix and match” with different choices of platform components: Rails itself in multiple versions; Capybara, with rspec, or even minitest; with DatabaseCleaner probably not but not presumedly; with your choice of JS browser simulator driver.
All of these components need to work together to try and avoid race conditions, all of these components keep changing and releasing new versions relatively independently and un-syncronized; and all of these components are maintained by people who are deeply committed to making sure their part does it’s job or contract adequately, but there’s not necessarily anyone with the big-picture understanding, authority, and self-assigned responsibility to make the whole integration work.
Such is often the ruby/rails open source environment. It can make it confusing to figure out what’s really going on.Of Concurrency and Race Conditions in Capybara JS Feature tests
“Concurrency” means a situation where two or more threads or processes are operating “at once”. A “race condition” is when a different outcome can happen each time the same code involving concurrency is run, depending on exactly the order or timing of each concurrent actor (depending on how the OS ends up scheduling the threads/processes, which will not be exactly the same each time).
- The main thread in the main process that is executing your tests in order.
- The actual or simulated browser (for Poltergeist, a headless Webkit process; for selenium-webkit, an actual Firefox) that Capybara (via a driver) is controlling, loading pages from the Rails app (2 above) and interacting with them.
There are two main categories of race condition that arise in the Capybara JS feature test stack. Your unreliable tests are probably because of one or both of these. To understand why your tests are failing unreliably and what you can do about it, you need to understand the concurrent architecture of a Capybara JS feature test as above, and these areas of potential race conditions.
An acceptance/feature/integration test (I will use those terms interchangeably, we’re talking about tests of the UI) for a web app consists of: Simulate a click (or other interaction) on a page, see if it what results is what you expect. Likely a series of those.
So Capybara ends up waiting just some amount of time while periodically checking the expectation to see if it’s met, up to a maximum amount of time.
This will exhibit straightforwardly as a specific test that sometimes passes and other times doesn’t. To fix it, you need to make sure Capybara is waiting for results, and willing to wait long enough.Use the right Capybara API to ensure waits
In older Capybara, developers would often explicitly tell Capybara exactly when to wait and what to wait for with the `wait_until` method.
In Capybara 2.0, author @jnicklas removed the `wait_until` method, explaining that Capybara has sophisticated waiting built-in to many of it’s methods, and wait_until was not neccesary — if you use the Capybara API properly: “For the most part, this behaviour is completely transparent, and you don’t even really have to think about it, because Capybara just does it for you.”
In practice, I think this can end up less transparent than @jnicklas would like, and it can be easier than he hopes to do it wrong. In addition to the post linked above, additional discussions of using Capybara ‘correctly’ to ensure it’s auto-waiting is in play are here, here and here, plus the Capybara docs.
I ran into just a few feature examples in the app I was working on that had obvious problems in this area.Making Capybara wait long enough
When Capybara is waiting, how long is it willing to wait before giving up? `Capybara.default_wait_time`, by default 2 seconds. If there are actions that sometimes or always take longer than this, you can increase the `Capybara.default_wait_time` — but do it in a suite-wide `before(:each)` hook, because I think Capybara may reset this value on every run, in at least some versions.
You can also run specific examples or sections of code with a longer wait value by wrapping in a `using_wait_time N do` block.
At first I spent quite a bit of time playing with this, because it’s fairly understandable and seemed like it could be causing problems. But I don’t think I ended up finding any examples in my app-at-hand that actually needed a longer wait time, that was not the problem.
I do not recommend trying to patch in `wait_until` again, or to patch in various “wait_for _jquery_ajax”, “wait_for_angular”, etc., methods you can find googling. You introduce another component that could have bugs (or could become buggy with a future version of JQuery/Ajax/Capybara/poltergeist/whatever), you’re fighting against the intention of Capybara, you’re making things even more complicated and harder to debug, and even if it works you’re tying yourself even further to your existing implementation, as there is no reliable way to wait on an AJAX request with the underlying actual browser API. My app-in-hand had some attempts in these directions, but even figuring out if they were working (especially for Angular) was non-trivial. Better just fix your test to wait properly on the expected UI, if you at all can.
In fact, while this stuff is confusing at first, it’s a lot less confusing — and has a lot more written about it on the web — than the other category of Capybara race condition…2. Race condition BETWEEN test examples: Feature test leaving unfinished business, Controller actions still not done processing when test ends.
At some point the test example gets to the end, and has tested everything it’s going to test.
What if, at this point, there is still code running in the Rails app? Maybe an AJAX request was made and the Capybara test didn’t bother waiting for the response.Anatomy of a Race Condition
RSpec will go on to the next test, but the Rails code is still running. It will run DatabaseCleaner.clean, and clear out the database — and the Rails code that was still in progress finds the database cleaned out from under it. Depending on the Rails config, maybe the Rails app now even tries to reload all the classes for dev-mode class reloading, and the code that was still in progress finds class constants undefined and redefined from under it. These things are all likely to cause exceptions to be raised by that code.
Or maybe the code unintentionally in progress in the background isn’t interrupted, but it continues to make changes to the database that mess with the new test example that rspec has moved on to, causing that example to fail.
It’s a mess. Rspec assumes each test is run in isolation, when there’s something else running and potentially making changes to the test database concurrently, all bets are off. The presence and the nature of the problem caused depends on exactly how long the unintentional ‘background’ processing takes to complete, and how it lines up on the timeline against the new test, which will vary from run to run, which is what makes this a race condition.
This does happen. I’m pretty sure it’s what was happening to the app I was working on — and still is, I wasn’t able to fully resolve it, although I ameliorated the symptoms with the config I’ll describe below.
The presence and nature of the problem also can depend on which test is ‘next’, which will be different from run to run under random rspec — but I found even re-running the suite with the same seed, the presence and nature of exhibit would vary.What does it look like?
Tests that fail only when run as part of the entire test suite, but not when run individually. Which sure makes them hard to debug.
One thing you’ll see when this is happening is different tests failing each time. The test that shows up as failing or erroring isn’t actually the one that has the problematic implementation — it’s the previously run JS feature test (or maybe even a JS feature test before that?) that sinned by ending while stuff was still going on in the Rails app. Which test was the previously run test will vary every run with a different seed, using rspec random testing. RSpec default output doesn’t tell which was the previous test on a given run; and the RSpec ‘text’ formatter doesn’t really give us the info in the format we want it either (have to translate from human-readable label to test file and line number yourself, which is kind of infeasible sometimes). I’ve thought about writing an RSpec custom formatter that just prints out file/line information for each example as it goes to give me some hope of figuring out which test is really leaving it’s business unfinished, but haven’t done so.
It can be very hard to recognize when you are suffering from this problem, although when you can’t figure out what the heck else could possibly be going on, that’s a clue. It took me a bunch of hours to realize this was a possible thing, and the thing that was happening to me. Hopefully this very long blog post will save you more time then it costs you to read.
Different tests, especially but not exclusively feature tests, failing/erroring each time you run the very same codebase is a clue.
Another clue is when you see errors reported by RSpec asFailure/Error: Unable to find matching line from backtrace
I think that one is always an exception raised as a result of `Capybara.raise_server_errors = true` (the default) in the context of a feature test that left unfinished business. You might make those go away with `Capybara.raise_server_errors = false`, but I really didn’t want to go there, the last thing I want is even less information about what’s going on.
With Postgres, I also believe that `PG::TRDeadlockDetected: ERROR: deadlock detected` exceptions are symptomatic of this problem, although I can’t completely explain it and they may be unrelated (may be DatabaseCleaner-related, more on that later).
And I also still get my phantom_js processes sometimes dying unexpectedly; related? I dunno.
But I think it can also show up as an ordinary unreliable test failure, especially in feature tests.So just don’t do that?
If I understand right, current Capybara maintainer Thomas Walpole understands this risk, and thinks the answer is: Just don’t do that. You need to understand what your app is doing under-the-hood, and make sure the Capybara test waits for everything to really be done before completing. Fair enough, it’s true that there’s no way to have reliable tests when the ‘unfinished business’ is going on. But it’s easier said than done, especially with complicated front-end JS (Angular, React/Flux, etc), which often actually try to abstract away whether/when an AJAX request is happening, whereas following this advice means we need to know exactly whether, when, and what AJAX requests are happening in an integration test, and deal with them accordingly.
I couldn’t completely get rid of problems that I now strongly suspect are caused by this kind of race condition between test examples, couldn’t completely get rid of the “unfinished business”.
But I managed to make the test suite a lot more reliable — and almost completely reliable once I turned off rspec random test order (doh), by dotting all my i’s in configuration…Get your configuration right
There are a lot of interacting components in a Capybara JS Feature test, including: Rails itself, rspec, Capybara, DatabaseCleaner, Poltergeist. (Or equivalents or swap-outs for many of these).
They each need to be set up and configured right to avoid edge case concurrency bugs. You’d think this would maybe just happen by installing the gems, but you’d be wrong. There are a number of mis-configurations that can hypothetically result in concurrency race conditions in edge cases (even with all your tests being perfect).
They probably aren’t effecting you, they’re edge cases. But when faced with terribly confusing hard to reproduce race condition unreliable tests, don’t you want to eliminate any known issues? And when I did all of these things, I did improve my test reliability, even in the presence of presumed continued feature tests that don’t wait on everything (race condition category #2 above).Update your dependencies
When googling, I found many concurrency-related issues filed for the various dependencies. I’m afraid I don’t keep a record of them. But Rspec, Capybara, DatabaseCleaner, and Poltegeist have all had at least some known concurrency issues (generally with how they all relate to each other) in the past.
Update to the latest versions of all of them, to at least not be using a version with a known concurrency-related bug that’s been fixed.
I’m still on Rspec 2.x, but at least I updated to the last Rspec 2.x (2.14.1). And updated DatabaseCleaner, Capybara, and Poltegeist to the latest I could.Be careful configuring DatabaseCleaner — do not use the shared connection monkey-patch
DatabaseCleaner is used to give all your tests a fresh-clean database to reduce unintentional dependencies.
For non-JS-feature tests, you probably have DatabaseCleaner configured with the :truncation method — this is pretty cool, it makes each test example happen in an uncommitted transaction, and then just rolls back the transaction after every example. Very fast, very isolated!
But this doesn’t work with feature tests, because of the concurrency. Since JS feature tests boot a Rails app in another thread from your actual tests, using a different database connection, the running app wouldn’t be able to see any of the fixture/factory setup done in your main test thread in an uncommitted transaction.
So you probably have some config in spec/spec_helper.rb or spec/rails_helper.rb to try and do your JS feature tests using a different DatabaseCleaner mode.
Go back and look at the DatabaseCleaner docs and see if you are set up as currently recommended. Recently DatabaseCleaner README made a couple improvements to the recommended setup, making it more complicated but more reliable. Do what it says.
My previous setup wasn’t always properly identifying the right tests that really needed the non-:truncation method, the improved suggestion does it with a `Capybara.current_driver == :rack_test` test, which should always work.
Do make sure to set `config.use_transactional_fixtures = false`, as the current suggestion will warn you about if you don’t.
Do use append_after instead of append to add your `DatabaseCleaner.clean` hook, to make sure database cleaning happens after Capybara is fully finished with it’s own cleanup. (It probably doesn’t matter, but why take the risk).
It shouldn’t matter if you use :truncation or :deletion strategy; everyone uses “:truncation” because “it’s faster”, but the DatabaseCleaner documentation actually says: “So what is fastest out of :deletion and :truncation? Well, it depends on your table structure and what percentage of tables you populate in an average test.” I don’t believe the choice matters for the concurrency-related problems we’re talking about.
Googing, you’ll find various places on the web advising (or copying advice from other places) monkey-patching Rails ConnectionPool with a “shared_connection” implementation originated by José Valim to make :transaction strategy work even with Capybara JS feature tests. Do not do this. ActiveRecord has had a difficult enough time with concurrency without intentionally breaking it or violating it’s contract — ActiveRecord ConnectionPool intends to give each thread it’s own database connection. This hack is intentionally breaking that. IF you have any tests that are exhibiting “race conditions between examples” (a spec ending while activity is still going on in the Rails app), this hack WILL make it a lot WORSE. Hacking the tricky concurrency related parts of ActiveRecord ConnectionPool is not the answer. Not even if lots of blog posts from years ago tell you to do it, not even if the README or wiki page for one of the components tells you to (I know one does, but now I can’t find it to cite it on a hall of shame), they are wrong. (This guy agrees with me, so do others if you google). It was a clever idea José had, but it did not work out, and should not still be passed around the web.Configure Rails under test to reduce concurrency and reduce concurrency-related problems
In a newly generated Rails 4.x app, if you look in `./config/environments/test.rb`, you’ll find this little hint, which you probably haven’t noticed before:# Do not eager load code on boot. This avoids loading your whole application # just for the purpose of running a single test. If you are using a tool that # preloads Rails for running tests, you may have to set it to true. config.eager_load = false
If that sounds suggestive, it’s because by saying “a tool that preloads Rails for running tests”, this comment is indeed trying to talk about Capybara with a JS driver, which loads a Rails app in an extra thread. It’s telling you to set eager_load to true if you’re doing that.
Except in at least some (maybe all?) versions of Rails 4.x, setting `config.eager_load = true` will change the default value of `config.allow_concurrency` from false to true. So by changing that, you may now have `config.allow_concurrency`.
You don’t want that, at least not if you’re dealing with horrible horrible race condition test suite already. Why not, you may ask, our whole problem is concurrency, shouldn’t we be better off telling Rails to allow it? Well, what this config actually does (in Rails 4.x, in 5.x I dunno) is control whether the Rails app itself will force every request to wait in line and be served on one at a time (allow_concurrency false), or create multiple threads (more threads, even more concurrency!) to handle multiple overlapping requests.
This configuration might make your JS feature tests even slower, but when I’m already dealing with a nightmare of unreliable race condition feature tests, the last thing I want is even more concurrency.
I’d set:config.allow_concurrency = false config.eager_load = true
Here in this Rails issue you can find a very confusing back and forth about whether `config.allow_concurrency = false` is really necessary for Capybara-style JS feature tests, or if maybe only the allow_concurrency setting is necessary and you don’t really need to change `eager_load` at all, or if the reason you need to set one or another is actually a bug in Rails, which was fixed in a Rails patch release, so what you need to do may depend on what version you are using… at the end of it I still wasn’t sure what the Rails experts were recommending or what was going on. I just set them both. Slower tests are better than terribly terribly unreliable tests, and I’m positive this is the safest configuration.
All this stuff has been seriously refactored in Rails 5.0. In the best case, it will make it all just work, they’re doing some very clever stuff in Rails 5 to try and allow class-reloading even in the presence of concurrency. In the worst case, it’ll just be a new set of weirdness, bugs, and mis-documentation for us to figure out. I haven’t looked at it seriously yet. (As I write this, 5.0.0.beta.2 is just released).Why not make sure Warden test helpers are set up right
It’s quite unlikely to be related to this sort of problems, but if you’re using Warden test helpers for devise, as recommended on the devise wiki for use with Capybara, you may not have noticed the part about cleaning up with `Warden.test_reset!` in an `after` hook.
This app had the Warden test helpers in it, but wasn’t doing the clean-up properly. When scouring the web for anything related to Capybara, I found this, and fixed it up to do as recommended. It’s really probably not related to the failures you’re having, but might as well as set things up as documented while you’re at it.I wouldn’t bother with custom Capybara cleanup
While trying to get things working, I tried various custom `after` hooks with Capybara cleanup, various of `Capybara.reset_session!`, `driver.reset!` and others. I went down a rabbit hole trying to figure out exactly what these methods do, which varies from driver to driver, and what they should do, is there a bug in a driver’s implementation?
None of it helped ultimately. Capybara does it’s own cleanup for itself, it’s probably good enough (especially if DatabaseCleaner.cleanup is properly set up with `after_append` to run after Capybara’s cleanup as it should). Spending a bunch of hours trying to debug or customize this didn’t get me much enlightenment or test reliability improvements.The Nuclear Option: Rack Request Blocker
Joel Turkel noticed the “unfinished business race condition” problem (his blog post helped me realize I was on the right track), and came up with some fairly tricky rack middleware attempting to deal with it by preventing the Rails app from accepting more requests if an outstanding thing is still going on from a feature test that didn’t wait on it.
I experimented with this, and it seemed to both make my tests much slower (not unexpected), and also not cure my problem, I was still getting race condition failures for some reason. So I abandoned it.
But you could try it, I include it for completeness — it is theoretically the only path to actually guaranteeing against feature test “unfinished business”.
At first I thought it was really doing nothing different than `config.allow_concurrency = false`, already built into Rails was doing (allow_concurrency false puts in the Rack::Lock middleware already included with Rails).
But it actually is a bit more powerful — it will allow a unit test (or any test including those not using Capybara JS driver) to wait on the absolute completion of any unfinished business left by a feature test, and at the beginning of the example. Theoretically. I’m not sure why it didn’t work for me, it’s something you could try.Sadly, maybe disable RSpec config.order = “random”
I did all of these things. Things did get better. (I think? The trick with non-reproducible failures is you never know if you are just having a run of luck, but I’m pretty sure I improved it). But they weren’t fixed. I still had unreliable tests.
Somewhere towards the end of this after many hours, I realized my problem was really about the feature tests not waiting on ‘unfinished business’ (I didn’t discover these things in the same order this post is written!), and it would obviously be best to fix that. But I had some pretty complex semi-‘legacy’ front-end JS using a combination of Angular and React (neither of which I had experience with), it just wasn’t feasible, I just wanted it to be over.
You know what did it?
Commenting out `config.order = “random”` from rspec configuration.
At first I had no idea why this would matter — sure, some random orders might be more likely to trigger race conditions then others, but it’s not just a magic seed, it’s turning off random test ordering altogether.
Aha. Because when a JS feature test follows another JS feature test, `config.allow_concurrency = false` is decent (although far from perfect) at holding up the second feature test until the ‘unfinished business’ is complete — it won’t eliminate overlap, but it’ll reduce it.
But when one (or several or a dozen) ordinary tests follow the JS feature test with ‘unfinished business’, they don’t have `allow_concurrency = false` to protect them, since they aren’t using the full Rails stack with middleware effected by this.
If you turn off random test ordering, all your feature tests end up running in sequence together, and all your other tests end up running in sequence together, without intermingling.
That was the magic that got me to, if not 100% reliable without race condition, pretty darn reliable, enough that i only occasionally see race condition failure now.
I don’t feel great about turning off test order randomization, but I also remember when we all wrote tests before rspec even invented the feature, and we did fine. There’s probably also a way to get Rspec to randomize order _within_ types/directories, but still run all feature tests in a block, which should be just as good.Postscript: Aggressively Minimize JS Feature Tests
I have come to the conclusion that it is extremely challenging and time-consuming to get Capybara JS feature tests to work reliably, and that this is a necessary consequence of the architecture involved. As a result, your best bet is to avoid or ruthlessly minimize the number of feature tests you write.
The problem is that what is necessary to avoid feature test “unfinished business” is counter to the very reason I write tests.
I want and need my tests to test interface (in the case of a feature test this really is user interface, in other cases API), independent of implementation. If I refactor or rewrite the internals of an implementation, but intend the interface remains the same — I need to count on my tests passing if and only if the interface indeed remains the same. That’s one of the main reasons I have tests. That’s the assumption behind the “red-green-refactor” cycle of TDD (not that I do TDD really myself strictly, but I think that workflow does capture the point of tests).
@twalpole, the current maintainer of Capybara, is aware of the “unfinished business” problem, and says that you basically just need to write your tests to make sure they wait:
So you either need to have an expectation in your flaky tests that checks for a change that occurs when all ajax requests are completed or with enough knowledge about your app it may be possible to write code to make sure there are no ajax requests ongoing (if ALL ajax requests are made via jquery then you could have code that keeps checking until the request count is 0 for instance) and run that in an after hook that you need to define so it runs before the capybara added after hook that resets sessions….
….you still need to understand exactly what your app is doing on a given page you’re testing.
The problem with this advice is it means the way a test is written is tightly coupled to the implementation, and may need to be changed every time the implementation (especially JS code) is changed. Which kind of ruins the purpose of tests for me.
It’s also very challenging to do if you have complex JS front-end (angular, react, Ember, etc), which often intentionally abstracts away exactly when AJAX requests are occuring. You’ve got to go spelunking through abstraction layers to do it — to write the test right in the first place, and again every time there’s any implementation change which might effect things.
Maybe even worse, fancy new JS front-end techniques often result in AJAX requests which result in no visible UI change (to transparently sync state on the back end, maybe only producing UI change in error cases, “optimistic update” style), which means to write a test that properly waits for “unfinished business” you’d need to violate another piece of Capybara advice, as original Capybara author @jnicklas wrote, “I am firmly convinced that asserting on the state of the interface is in every way superior to asserting on the state of your model objects in a full-stack test” — Capybara is written to best support use cases where you only test the UI, not the back-end.
It’s unfortunate, because there are lots of things that make UI-level integration/feature tests attractive:
- You’re testing what ultimately matters, what the user actually experiences. Lower-level tests can pass with the actual user-facing app still broken if based on wrong assumptions, but feature tests can’t.
- You haven’t figured out a great way to test your JS front-end in pure JS, and integrate into your CI, but you already know how to write ruby/rails feature tests.
- You are confronting an under-tested “legacy” app, whose internals you don’t fully understand, and you need better testing to be confident in your refactoring — it makes a lot of sense to start with UI feature tests, and is sometimes even recommended for approaching an under-tested legacy codebase.
There are two big reasons to try and avoid feature tests with a JS-heavy front-end though: 1) They’re slow (inconvenient), and 2) They are nearly infeasible to make work reliably (damning, especially on a legacy codebase).
Until/unless there’s a robust, well-maintained (ideally by Capybara itself, to avoid yet another has-to-coordinate component) lower-level solution along the lines rack_request_blocker, I think all we can do is avoid Capybara JS feature tests as much as possible — stick to the bare minimum of ‘happy path’ scenarios you can get away with (also common feature-test advice), it’s less painful than the alternative.
If you’re looking for consulting or product development with Rails or iOS, I work at Friends of the Web, a small company that does that.
Filed under: General
Guest post by advocacy expert Stephanie Vance.
February marks the start of the budget and appropriations busy season in Washington, DC – and by March we’re into the full swing of funding requests, Committee markups and, as always potential cuts. Last year, the question of whether Library Services and Technology Act (LSTA) would continue to be funded was a nail-bitter. Library advocates spoke out, though, and we were able to save the program, receiving just under $183 million.
But our work’s not done. We need to continue to keep LSTA off the chopping block! Members of Congress are less likely to reduce or eliminate funding for programs that directly and profoundly benefit their communities – and the only way they’ll know about that is through you.
To be most effective, you’ll want to know about a few fundamental differences between three types of money-related bills, specifically “Authorization,” “Budget,” and “Appropriations:” and each of these goes through a different procedure in different committees at different times with different people — you get the drift. Here’s a basic breakdown:
- Authorizations: These are bills that, when passed into law, allow programs to exist. LSTA, for example, was originally authorized in 1996. In general, money can’t be spent on a program if it’s not authorized. In addition, legislators use the authorization process to propose changes to programs, some of which may be helpful, and others not-so-much. Programs are often up for reauthorization, which is one reason why we always need to make sure legislators know how critical LSTA funds are to their communities.
- Budget: This is a blueprint that outlines how Budget Committee policymakers THINK we should spend money, including on LSTA. It’s not binding and doesn’t become law, but it’s an important way to set priorities. The President just proposed his budget on February 9th. Congress will propose one (or several) over the next few months. Then they fight it out in the…
- Appropriations: This is where we get the cold, hard cash, specifically the approximately $183 million we’re so interested in for LSTA. Legislators and their staff need to know about the fabulous things these grants funded in the past back home in their communities. That’s the only way they’ll support them in the future.
Don’t get us started on “Continuing Resolutions” and “Omnibus” bills which you might hear more about this Fall. We’ll bore your ears off. Suffice to say that not all of this goes as smoothly as we’d like and the rules get changed. All. The. Time.
We need to show support for LSTA at each step of the process, which takes place year ‘round. That’s why it feels like we’re ALWAYS asking you to send an e-mail, make a phone call, post on social media, schedule a visit, attend a town hall, or send a carrier pigeon. If we keep the pressure on throughout the year, we may just get that $183 million – or more!
The post $183 million reasons you should keep track of the U.S. budget process appeared first on District Dispatch.
Serving Veterans @ your Library: Learning about California’s Veteran Resource Centers and How to Develop your Community Based Service Model for Veterans
Thursday, February 25, 2016
2:00-3:00 PM EST (11:00- 12:00 PM PST)
The needs of veterans are often in the news and libraries are well positioned to provide services to veterans. Are you looking for ideas for serving military veterans in your community?
California public libraries have a model for you!
This webinar will introduce you to the network of Veteran Resource Centers in California public libraries, http://calibrariesforveterans.org/
The network is an LSTA project funded through the California State Library which is working to connect veterans and their families to benefits and services for which they may be eligible. There are 38 “Veteran Resource Center” sites in public libraries around the state.
The project is in partnership with the California Department of Veterans Affairs (CalVet), https://www.calvet.ca.gov/Pages/Staff-Bios.aspx
Jennifer E. Manning, Co-Chair, Subcommittee on E-Government Services, ALA Committee on Legislation
- Karen Bosch Cobb, Library Consultant, Infopeople, Co-manager of Veterans Connect @ the Library program in California
- Jacquie Brinkley— Library Consultant, Infopeople, Co-manager of Veterans Connect @ the Library program in California
This webinar will be recorded and available for future viewing on the ALA Washington Office website at http://www.districtdispatch.org/category/webinars/
The post Upcoming Free Webinar on Serving Veterans @ Your Library appeared first on District Dispatch.
The VIAF Auto Suggest API is currently experiencing an unplanned outage
D-Lib: Desktop Batch Import Workflow for Ingesting Heterogeneous Collections: A Case Study with DSpace 5
D-Lib: Leveraging Heritrix and the Wayback Machine on a Corporate Intranet: A Case Study on Improving Corporate Archives
Harvard's Berkman Center recently published a study entitled Don’t Panic: Making Progress on the ‘Going Dark’ Debate. The study group was convened by Matt Olsen, Bruce Schneier, and Jonathan Zittrain, and the New York Times reports:
Among the chief authors of the report is Matthew G. Olsen, who was a director of the National Counterterrorism Center under Mr. Obama and a general counsel of the National Security Agency.
Two current senior officials of the N.S.A. — John DeLong, the head of the agency’s Commercial Solutions Center, and Anne Neuberger, the agency’s chief risk officer — are described in the report as “core members” of the group, but did not sign the report because they could not act on behalf of the agency or the United States government in endorsing its conclusions, government officials said. Two of the report's conclusions are:
- Networked sensors and the Internet of Things are projected to grow substantially, and this has the potential to drastically change surveillance. The still images, video, and audio captured by these devices may enable real-time intercept and recording with after-the fact access. Thus an inability to monitor an encrypted channel could be mitigated by the ability to monitor from afar a person through a different channel.
- Metadata is not encrypted, and the vast majority is likely to remain so. This is data that needs to stay unencrypted in order for the systems to operate: location data from cell phones and other devices, telephone calling records, header information in e-mail, and so on. This information provides an enormous amount of surveillance data that was unavailable before these systems became widespread.
James Clapper, the US director of national intelligence, was more direct in testimony submitted to the Senate on Tuesday as part of an assessment of threats facing the United States.
“In the future, intelligence services might use the [internet of things] for identification, surveillance, monitoring, location tracking, and targeting for recruitment, or to gain access to networks or user credentials,” Clapper said. Xeni Jardin at BoingBoing points out that:
The war on encryption waged by the F.B.I. and other intelligence agencies is unnecessary, because the data trails we voluntarily leak allow “Internet of Things” devices and social media networks to track us in ways the government can access. Not to mention that yesterday's revelation of a buffer overflow in glibc means that much of the IoT is vulnerable and is unlikely to be patched. This bug won't be the last of its kind, so the agencies are likely to retain their ability to spy even if IoT vendors clean up their act.
The intelligence agencies aren't the only ones enjoying their new-found surveillance capabilities. J.M. Porup at Ars Technica reports:
Shodan, a search engine for the Internet of Things (IoT), recently launched a new section that lets users easily browse vulnerable webcams.
The feed includes images of marijuana plantations, back rooms of banks, children, kitchens, living rooms, garages, front gardens, back gardens, ski slopes, swimming pools, colleges and schools, laboratories, and cash register cameras in retail stores, according to Dan Tentler, a security researcher who has spent several years investigating webcam security. And sleeping babies.
Cory Doctorow at BoingBoing starts a recent book review:
Nitesh Dhanjani's 2015 O'Reilly book Abusing the Internet of Things: Blackouts, Freakouts, and Stakeouts is a very practical existence-proof of the inadequacy and urgency of Internet of Things security.
Abusing the Internet of Things is structured just like one of those cookbooks, only the recipes explain the (relatively simple) steps you need take to compromise everything from a smart lightbulb -- one recipe explains how to plunge a smart lighting system into permanent, irrevocable darkness -- to a smart baby-monitor (this was published months before a family in San Francisco woke to discover a griefer terrorizing their toddler through his bedside monitor) to a smart TV to -- what else? -- a smart car.Of course, companies can avoid any liability for harm caused by their shoddy security practices by copying VTech and disclaiming responsibility in their terms and conditions. Commisioner Julie Brill of the FTC addressed these problems in her closing plenary of the Fall 2016 CNI meeting. But Cory Doctorow points out the pressure on the FTC not to do anything:
Companies that use and trade in personal information rely on the people involved not discovering what's going on. For example, one data-broker sells its services to retailers as a means of getting the home addresses and other information of their customers in secret, avoiding "losing customers who feel that you’re invading their privacy." Of course, many of these information leaks are accidental, but the sad truth is that the pervasive surveillance enabled by the Internet, and especially by the Things in it, is not just the favored business model of government agencies but also of pretty much any company that can figure out how to collect and sell information. Conor Friedersdorf at The Atlantic has the example of a company that has, Vigilant Solutions:
Throughout the United States—outside private houses, apartment complexes, shopping centers, and businesses with large employee parking lots—a private corporation, Vigilant Solutions, is taking photos of cars and trucks with its vast network of unobtrusive cameras. It retains location data on each of those pictures, and sells it.
The company has taken roughly 2.2 billion license-plate photos to date. Each month, it captures and permanently stores about 80 million additional geotagged images. They may well have photographed your license plate. As a result, your whereabouts at given moments in the past are permanently stored. Vigilant Solutions profits by selling access to this data (and tries to safeguard it against hackers). Your diminished privacy is their product.
Supreme Court jurisprudence on GPS tracking suggests that repeatedly collecting data “at a moment in time” until you’ve built a police database of 2.2 billion such moments is akin to building a mosaic of information so complete and intrusive that it may violate the Constitutional rights of those subject to it.
Many powerful interests are aligned in wanting to know where the cars of individuals are parked. Unable to legally install tracking devices themselves, they pay for the next best alternative—and it’s gradually becoming a functional equivalent. More laws might be passed to stymie this trend if more Americans knew that private corporations and police agencies conspire to keep records of their whereabouts. You are paying for this technology through your taxes:
During the past five years, the U.S. Department of Homeland Security has distributed more than $50 million in federal grants to law-enforcement agencies—ranging from sprawling Los Angeles to little Crisp County, Georgia, population 23,000—for automated license-plate recognition systems, And also because police forces are either paying Vigilant for access to its data or giving them a piece of the action:
Vigilant Solutions, one of the country’s largest brokers of vehicle surveillance technology, is offering a hell of a deal to law enforcement agencies in Texas: a whole suite of automated license plate reader (ALPR) equipment and access to the company’s massive databases and analytical tools—and it won’t cost the agency a dime.
Vigilant is leveraging H.B. 121, a new Texas law passed in 2015 that allows officers to install credit and debit card readers in their patrol vehicles to take payment on the spot for unpaid court fines, also known as capias warrants. When the law passed, Texas legislators argued that not only would it help local government with their budgets, it would also benefit the public and police.
The “warrant redemption” program works like this. The agency is given no-cost license plate readers as well as free access to LEARN-NVLS, the ALPR data system Vigilant says contains more than 2.8-billion plate scans and is growing by more than 70-million scans a month. This also includes a wide variety of analytical and predictive software tools. Also, the agency is merely licensing the technology; Vigilant can take it back at any time.
The government agency in turn gives Vigilant access to information about all its outstanding court fees, which the company then turns into a hot list to feed into the free ALPR systems. As police cars patrol the city, they ping on license plates associated with the fees. The officer then pulls the driver over and offers them a devil’s bargain: get arrested, or pay the original fine with an extra 25% processing fee tacked on, all of which goes to Vigilant. Alex Campbell and Kendall Taggart's The Ticket Machine is a must-read, in-depth look at how this works. Yves Smith at naked capitalism points out how the private equity (PE) and security worlds have become intertwined:
And a contact who knows the private equity world pointed out (emphasis mine):
Morgan Stanley’s PE arm lists [Vigilant Solutions parent, Digital Recognition Network] as an investment, which is a good example of how enmeshed PE has become in the security/intelligence state. The spooks, I am sure, love the secrecy of PE compared to public markets. And the love goes both ways, as it is my experience that PE people love the spooks because returns are able to be generated by influence peddling behind closed doors, and also because they just find them intellectually interesting.In other words, proprietary opposition research, which is often hard to distinguish from blackmail. No-one is in a position to make educated trade-offs between the societal and personal costs and benefits of the Things they connect to their own Internet. How much worse is it when they had no say in what the Things were and where they were deployed?
Notice the similarities between this rush to exploit technical capabilities without regard for the possible downsides and the escalation of cyberwar. James Ball at Buzzfeed reports that Alex Gibney's new documentary Zero Days reveals that:
The United States hacked into critical civilian and military infrastructure in Iran to allow its operatives to disable the country with a devastating series of cyberattacks at a moment’s notice,
The targets of the U.S. hacking operations, covered by the code name “NITRO ZEUS,” include power plants, transport infrastructure, and air defenses, the film will state, with agents entering these protected systems nightly to make sure the attacks were still deployable.
the U.S.-Israel “Stuxnet” worm — which destroyed around 1 in 5 of the centrifuges used in Iran’s nuclear program — was just a small part of a much larger set of offensive capabilities developed against the nation.The cyberwarriors don't seem to have paid much attention to the possible downsides:
The State Department was seen by those in other agencies as a “wet blanket” when it came to operations for expressing concerns about violating the sovereignty of third-party nations’ cyberspace, or about operations that could have significant impact on civilians.
one confidential source expressed concerns to Gibney about the extent of NITRO ZEUS, saying some planners had “no fucking clue” as to the consequences of some of the proposed attacks.
“You take down part of a grid,” they told him, “you can accidentally take down electricity in the entire country.” At least Michael Hayden seems to have a "f**king clue":
Michael Hayden, a former director of both the CIA and the NSA, told Gibney the U.S. action risks creating new international norms of cyber warfare.
“I know no operational details and don’t know what anyone did or didn’t do before someone decided to use the weapon, all right,” he said. “I do know this: If we go out and do something, most of the rest of the world now thinks that’s a new standard, and it’s something they now feel legitimated to do as well. Exactly. And why would anyone think that the US was less vulnerable than Iran to these kind of attacks? But USA Inc's terms and conditions mean that the cyberwarriors bear no liability for the foreseeable consequences of their actions.
David Sanger and Mark Mazzetti at the New York Times have other details of the NITRO ZEUS program.
Similarly, the NSA has a program called SKYNETthat uses machine learning techniques on data collected from Pakistan's phone network to target for assassination by drone or death squad people whose behavior is deemed characteristic of terrorists. Machine learning experts estimate that it may have killed thousands of innocent people. Cory Doctorow's response on Dave Farber's IP list is a must-read analysis of the problem:
At root, this is a story about the problems that occur in the absence [of] adversarial peer review. NSA and GCHQ cut corners in their machine-learning approach, and no one called them on it, and they deployed it, and it kills people.
But is also a microcosm of the spy services' culture of secrecy and the way that the lack of peer review turns into missteps.
You could ask for no better proof that the NSA believed its actions would never be subjected to public scrutiny than the fact that they called the program SKYNET. We all remember how that turned out, right?
Skynet launched nuclear missiles under its command at Russia, which responded with a nuclear counter-attack against the U.S. and its allies. Consequent to the nuclear exchange, over three billion people were killed in an event that came to be known as Judgment Day.