Space Vatican

Ramblings of a curious coder

Sporking With Akephalos

Spork is a test server for saving on environment startup time when running rspec or cucumber. Akephalos is an awesome capybara driver that wraps the htmlunit library. Because htmlunit is a java library, akephalos spawns a jruby process and then uses DRb to allow non-jruby instances to get at the htmlunit goodness.

If you try and combine the two, you get a somewhat uninviting error message:

Fs-ElephantBook-2:dressipi fred$ cucumber --drb features/user_signs_up.feature 
Using the default profile...
Disabling profiles...
.F---------------------------

(::) failed steps (::)

0x00000083df8904 is not id value (RangeError)
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:375:in `_id2ref'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:375:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1405:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1713:in `to_obj'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:613:in `recv_request'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:908:in `recv_request'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1533:in `init_with_client'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1545:in `setup_message'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1497:in `perform'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1592:in `block (2 levels) in main_loop'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1588:in `loop'
(druby://127.0.0.1:8990) /Users/fred/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/drb/drb.rb:1588:in `block in main_loop'
(druby://127.0.0.1:60779) -e:1
./features/step_definitions/authentication_steps.rb:31:in `block (2 levels) in '
./features/step_definitions/authentication_steps.rb:30:in `each_pair'
./features/step_definitions/authentication_steps.rb:30:in `/^I sign up with$/'
features/user_signs_up.feature:10:in `When I sign up with'

Which a shame as spork would be especially useful here because akephalos can take quite a few seconds to launch its jruby process.

DRb

First let’s step back and remind ourselves how DRb works. DRb (short for distributed ruby) is a distributed object system for ruby. In the simplest of cases, when you call a method on a remote object, your arguments are marshalled and transmitted over the wire. In the remote process, the arguments are unpacked, the method called and then the result is marshalled and sent back to you.

Obviously this doesn’t always work - some objects (procs, IO streams etc.) can’t be dumped. A very common example would be if you want to call a method and pass a block. So, in addition to the previously described scheme, DRb also allows making proxies for objects. Instead of getting or passing a marshalled dump of an object it uses a proxy that knows the process the object has come from (a uri like druby://127.0.0.1:8990) and something identifying the object in that process. By default it uses the object_id for the object, which DRb can then use to find the actual object. This can require some careful coding, as if in the remote process the object appears unused it might get garbage collected. If this happens then when you try to call a method on it DRb won’t be able to turn the object_id back into an actual object and you get an exception like the one above.

When you run code such as

1
some_drb_object.do_foo { }

Then there are actually DRb calls happening in both directions. The client asks the server process to run do_foo, but because there is a block passed to the method DRb wraps up that block in a proxy object and sends it to the server along with the arguments. When do_foo calls yield, a DRb call happens in the reverse direction in order to invoke the block.

Akephalos and Spork

Back to the original problem. When you run some specs or cucumber features with spork, spork forks off a new instance and then uses DRb to run your specs/features in that process. It’s the combination of that use of DRb with akephalos’ use of DRb that causes the problem. After digging around I traced the exception raised to this line in akephalos:

1
page.find(selector).each { |node| nodes << Node.new(self, node) }

page is a DRb object from akephalos’s jruby process that wraps the htmlunit representation of the page. A little more digging revealed that the invalid id was in fact referring to the block passed to each. You may have noticed that the error is being raised from druby://127.0.0.1:8990, ie from spork’s parent process.

There was already a DRb server running in spork parent process ( druby://127.0.0.1:8990 ), and because our test running process was forked from that process, when DRb packaged up that block into a proxy object it gave the uri for that server as the uri to use when making calls to the block. When each yielded to the block, instead of the method call going to the process that called each, it went to the spork parent process. Because the call went to the wrong process, the object id passed is entirely meaningless.

Fixing it

I’m not entirely sure of the best way to fix this - akephalos is blissfuly ignorant of this aspect of its environment. I was able to get my cucumber features running with both spork and akephalos by doing this in my env.rb

1
2
3
4
5
6
7
8
9
Spork.prefork do
  ... #my usual Spork.prefork stuff
  Akephalos::RemoteClient.manager #force akephalos to be started by the parent process
                                  #so that we don't keep starting it again and again
end

Spork.each_run do
  Thread.current['DRb'] = { 'server' => DRb::DRbServer.new }
end

This ensures that when DRb is constructing a proxy object it uses the uri for the freshly constructed DRbServer as the uri for any proxy objects it passes, and so block invocations and the like go to the correct process.