Space Vatican

Ramblings of a curious coder

Threading the Rat - on Capybara and Database Connections

At Dressipi we use rspec request specs for our integration testing and these days we use capybara rather than webrat. A lot of our specs use the built in rack-test driver but a fair few use capybara-webkit (I’d like to look at poltergeist but haven’t quite got round to it yet).

Database connection management has long been an issue with tools such as capybara or selenium. You want to wrap your tests in transactions because that makes them much faster: you don’t have to cleanup the database after each spec and you can load content shared by a bunch of specs once and once only. But transactions being problems: capybara runs your app in a separate thread to your spec, so your app a separate database connection. Transactions are a per-connection thing so the transaction wrapping your spec have no effect on the connection used by the app thread: changes made by the app aren’t rolled back. On top of that by default transactions can’t see uncommitted changes from other transactions, so your app thread can’t see seed data your spec code has created. If you change the transaction isolation level then connections can see uncommitted transactions, however you’ll very soon end up with deadlocks in your code due to the locks acquired by the separate transactions.

Typically you do very little database access from your request specs, just insert a record here or check some state there, so for a while I used a bit of a hack to allow the spec code to run code inside the app thread. This was pretty messy.

Stripping Invalid UTF-8

In Ruby 1.9 strings aren’t just dumb collections of bytes - there is an associated encoding which tells ruby how to interpret those bytes. With some encodings, everything goes: any byte sequence is legal (although I suppose statistical analysis could show it to be unlikely, given the language), but with encodings such as UTF-8 some sequences are invalid. If you’ve got a string with such a sequence then at some point you’ll get the dreaded “invalid byte sequence in UTF-8” error message (you can use valid_encoding? to test the validity of a string’s encoding rather than waiting for it to blowup).

Most of the time this is because the string isn’t UTF-8 in the first place, so you need to tell ruby the correct encoding using force_encoding (either because you know the encoding or by using a gem such as charguess). A stack overflow question got me thinking about another case: you have a string that is mostly UTF-8 but which has been mangled in some way. The best case scenario is obviously working out the mangling and reversing it, but sometimes you might just want to cut your losses and salvage what is there.

Fixing an Accidental Git Push –force

I did something stupid with git today. I was working alone on a topic brand and had just rebased it against master. I did git push --force to push my changes to github. The output wasn’t quite what I expected:

 + e9c00be...2652a00 garment-quiz -> garment-quiz (forced update)
 + d91922d...2fec250 release-2012-07-04 -> release-2012-07-04 (forced update)

What’s the second update doing there? I had a local branch tracking our latest release branch and that local branch wasn’t upto date - I hadn’t pulled changes made by other people (but all of my commits had been previous pushed to the repo). When I forced pushed, git pushed everything it was tracking, overwriting colleagues commits had made to the release branch.

With git it’s rather hard to actually lose data and this was no exception. I was able to restore the branch to its previous state by explicitly pushing the commit ref of the branch’s previous state (helpfully given in the output from git push)

git push origin d91922d:release-2012-07-04

Since all my changes had been pushed normally, all I had done was rewind that branch so git was happy for me to add those commits back in (or at least that’s my interpretation - I’m no git wizard.)

Who’s Afraid of the Big Bad Lock?

Unless you’ve been living under a rock for the past few years you will have heard of C-ruby’s GIL/GVL, the Global Interpreter/VM Lock (I prefer to think of it as the Giant VM Lock). While your ruby threads may sit on top of native pthreads, this big bad lock stops more than one thread running at the same time. Allegedly one of the main reasons behind this was to protect non threadsafe ruby extensions and also to shield us from the horrors of threading. Personally it feels like a lot of 3rd party gems needed updating for 1.9 anyway (particularly with encoding related issues) and so it would have been a good opportunity to make that change. The complexities of threading, locking etc. could be handled by providing higher level abstractions over them (actors etc.).

True concurrency isn’t completely dead though. Ruby can already release the GVL when a thread is blocked on IO and if you are writing a C extension you can release the lock too. The mysql2 gem does this: clearly there is no point in holding onto the GVL when you’re just waiting on mysql to return results. Similarly Eric Hodel recently submitted a patch to the zlib extension so that the lock is released while zlib is doing its thing. This obviously doesn’t make mysql queries or zlib run any faster individually but it means you can run many in parallel and that these operations don’t block other unrelated threads. When even laptops have hyperthreaded quad-core processors, this is a good thing.

Goodbye Mephisto, Hello Octopress

For a long time, this blog was hosted on an ageing mephisto install. I took some time this weekend to migrate to octopress. It’s much more current and is more modern in many ways (a responsive layout, nicer syntax highlighting options, integration with things like gist and jsfiddle being just a few). These days a lot of my blogging is done on the train, so I was writing in a text editor and then pasting into mephisto which is just convoluted.