Space Vatican

Ramblings of a curious coder

Threading the Rat - on Capybara and Database Connections

At Dressipi we use rspec request specs for our integration testing and these days we use capybara rather than webrat. A lot of our specs use the built in rack-test driver but a fair few use capybara-webkit (I’d like to look at poltergeist but haven’t quite got round to it yet).

Database connection management has long been an issue with tools such as capybara or selenium. You want to wrap your tests in transactions because that makes them much faster: you don’t have to cleanup the database after each spec and you can load content shared by a bunch of specs once and once only. But transactions being problems: capybara runs your app in a separate thread to your spec, so your app a separate database connection. Transactions are a per-connection thing so the transaction wrapping your spec have no effect on the connection used by the app thread: changes made by the app aren’t rolled back. On top of that by default transactions can’t see uncommitted changes from other transactions, so your app thread can’t see seed data your spec code has created. If you change the transaction isolation level then connections can see uncommitted transactions, however you’ll very soon end up with deadlocks in your code due to the locks acquired by the separate transactions.

Typically you do very little database access from your request specs, just insert a record here or check some state there, so for a while I used a bit of a hack to allow the spec code to run code inside the app thread. This was pretty messy.

Sharing is good but only if you behave

A little while ago (I don’t know who claims credit for this), the following bit of code started doing the rounds

1
2
3
4
5
6
7
8
class ActiveRecord::Base
  class_attribute :shared_connection

  def self.connection
    self.shared_connection || retrieve_connection
  end
end
ActiveRecord::Base.shared_connection = ActiveRecord::Base.connection

It’s included as part of cucumber-rails as the SharedConnectionStrategy. What this does it fairly simple. Normally the connection method locates a fresh connection for your thread via Rails’ connection pool. This monkeypatch allows you to force it to always return the same connection, so both your spec code and the capybara app use the same connection, despite being on separate threads. As a result the transaction begun at the start of spec encompasses any changes made by the app thread. Your spec code can make changes visible by the app thread and you spec code can see any changes made by the app.

If you’re keen on your threading you’ll probably smell a rat: things like database connections typically aren’t meant to be used simultaneously by multiple threads simultaneously.

In ruby 1.8 and in the early days of 1.9 you’d usually get away with this: database queries called into C code to do its querying and then returned ruby results. Calling into C code blocks the entire VM and so you’d never actually run 2 queries simultaneously.

The mysql2 gem is much smarter - it uses rb_thread_blocking_region to escape from ruby’s GVL. If you abuse this and actually run 2 queries against the same connection simultaneously then you’ll start getting errors such as Mysql2::Error: This connection is still waiting for a result. I’m not a postgres user but given that the pg gem uses the async api it’s probably subject to similar issues: you’re not supposed to call PQsendQuery until you’ve called PQgetResult enough times to get all of the results.

It’s quite easy to get away with this- we ran with that patch for a long time with no issues. I suspect that this is down to a lot of tests inherently building in serialization: your spec code spends most of its time waiting for capybara finder calls to complete. Those calls only complete when capybara finds the element on the page, which will usually mean that the page has loaded, ie the thread serving your app is now idle. As a result there won’t be much in the way of database queries from the app until you call a method like visit or click, so you can access the database from your spec without fear.

More complicated web apps widen the window of opportunity substantially: for example the main page load might trigger several ajax calls in the background. We started writing request specs against code like this and they would fail unpredictably.

Mike Perham suggested using this instead:

1
2
3
4
5
6
7
8
9
class ActiveRecord::Base
  mattr_accessor :shared_connection
  @@shared_connection = nil

  def self.connection
    @@shared_connection || ConnectionPool::Wrapper.new(:size => 1) { retrieve_connection }
  end
end
ActiveRecord::Base.shared_connection = ActiveRecord::Base.connection

When ActiveRecord::Base.connection is called the first time, @@shared_connection is nil, so what is returned is a ConnectionPool::Wrapper. This gets assigned to @@shared_connection so all subsequent calls to connection return the ConnectionPool::Wrapper. This is an object that uses method_missing to forward everything to the connection it wraps, but only allowing one thread access at a time.

We use spork, so

1
ActiveRecord::Base.shared_connection = ActiveRecord::Base.connection

line goes in our Spork.each_run block - if you do it before then the shared connection ends up being created in the spork parent process which will cause no end of trouble.

This really feels to me like it should work - I can’t fault the logic, yet we were still getting the occasional random Mysql2::Error: This connection is still waiting for a result error when running our request specs.

Reusing the wheel

Active Record already has a connection pool, so what could go wrong with just setting that pool size to 1 in database.yml? Surely that will ensure that we’re not using multiple connections simultaneously. The only issue here is that, for performance reasons, once a thread checks out a connection it can’t be used by another thread until you check it back in (unlike the connection pool wrapper that checks things in and out for you around each method call). In a normal rails app, the ActiveRecord::ConnectionAdapters::ConnectionManagement middleware checks connections back in for you after the request, but clearly that doesn’t apply to your spec code.

There are 2 connections that need to be managed (assuming you app doesn’t spawn threads of its own): the one used by your spec code and the one used by the app. Normally the app code would be handled by the aforementioned middleware but this is deactivated in specs (if request.env['rack.test'] is true) because it’s normally not desirable in tests: if you are using the standard rails integration tests then everything executes in the same thread and so checking the connection back in at the end of the request would actually mess things up. There did’nt seem to be a good way of overriding this behaviour, so I ended up just sticking this in test.rb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class TestConnectionManagement
  class Proxy # :nodoc:
    attr_reader :body

    def initialize(body)
      @body    = body
    end

    def method_missing(method_sym, *arguments, &block)
      @body.send(method_sym, *arguments, &block)
    end

    def respond_to?(method_sym, include_private = false)
      super || @body.respond_to?(method_sym)
    end

    def each(&block)
      body.each(&block)
    end

    def close
      body.close if body.respond_to?(:close)
      ActiveRecord::Base.clear_active_connections!
    end
  end

  def initialize(app)
    @app = app
  end

  def call(env)
    status, headers, body = @app.call(env)
    [status, headers, Proxy.new(body)]
  rescue
    ActiveRecord::Base.clear_active_connections!
    raise
  end
end
config.middleware.insert_before ActiveRecord::ConnectionAdapters::ConnectionManagement, TestConnectionManagement

This is a straight up copy of the original middleware except that it checks connections in even if rack.test is set.

The other connection is used by your spec code. I haven’t come up with anything better than calling ActiveRecord::Base.clear_active_connections! (aliased to something shorter to type) before calling visit, click etc. This could possibly be improved by magically calling it when the appropriate capybara driver methods are called. If you forget to do this then your spec code will hog the database connection and the capybara thread will hang and eventually raise an exception saying that it couldn’t checkout a connection.

We’ve been using this setup for a little while now and (fingers crossed) random request spec failures seem to be a thing of the past.