Space Vatican

Ramblings of a curious coder

Starting EC2 Instances With Ephemeral Storage

Amazon EC2 instances can have two forms of storage attached: EBS, which is basically a SAN and instance local storage. The instance local storage is free (the amount you get depends on the instance type), but doesn’t persist once the instance is terminated. EBS storage is pricier (you pay per GB), but is persistent and you can do neat things like snapshot volumes or move volumes from one instance to another.

Using Glacier From Ruby With Fog

Glacier is Amazon’s latest storage service, designed for long term storage of data. It’s not a direct replacement for S3: retrieving data is slow and potentially expensive, so stay away if you expect to retrieve data often or if waiting a few hours for retrieval is too slow.

One good use case would be if you have to keep data for a very long time for regulatory reasons. Storage is a lot cheaper than S3: storage costs $0.01/GB/month. By comparison if you have less than 50TB S3 charges $0.11, and its 1-5PB rate is $0.08/GB (more on pricing later). In case the long-termness needs hammering in, Amazon actually charges a fee for data deleted within 90 days of upload.

As of early September 2012 the AWS ruby sdk doesn’t include glacier support. If like me you want to use glacier from your ruby apps, one of your options is the fog gem that supports glacier from version 1.6 onwards.

Threading the Rat - on Capybara and Database Connections

At Dressipi we use rspec request specs for our integration testing and these days we use capybara rather than webrat. A lot of our specs use the built in rack-test driver but a fair few use capybara-webkit (I’d like to look at poltergeist but haven’t quite got round to it yet).

Database connection management has long been an issue with tools such as capybara or selenium. You want to wrap your tests in transactions because that makes them much faster: you don’t have to cleanup the database after each spec and you can load content shared by a bunch of specs once and once only. But transactions being problems: capybara runs your app in a separate thread to your spec, so your app a separate database connection. Transactions are a per-connection thing so the transaction wrapping your spec have no effect on the connection used by the app thread: changes made by the app aren’t rolled back. On top of that by default transactions can’t see uncommitted changes from other transactions, so your app thread can’t see seed data your spec code has created. If you change the transaction isolation level then connections can see uncommitted transactions, however you’ll very soon end up with deadlocks in your code due to the locks acquired by the separate transactions.

Typically you do very little database access from your request specs, just insert a record here or check some state there, so for a while I used a bit of a hack to allow the spec code to run code inside the app thread. This was pretty messy.

Stripping Invalid UTF-8

In Ruby 1.9 strings aren’t just dumb collections of bytes - there is an associated encoding which tells ruby how to interpret those bytes. With some encodings, everything goes: any byte sequence is legal (although I suppose statistical analysis could show it to be unlikely, given the language), but with encodings such as UTF-8 some sequences are invalid. If you’ve got a string with such a sequence then at some point you’ll get the dreaded “invalid byte sequence in UTF-8” error message (you can use valid_encoding? to test the validity of a string’s encoding rather than waiting for it to blowup).

Most of the time this is because the string isn’t UTF-8 in the first place, so you need to tell ruby the correct encoding using force_encoding (either because you know the encoding or by using a gem such as charguess). A stack overflow question got me thinking about another case: you have a string that is mostly UTF-8 but which has been mangled in some way. The best case scenario is obviously working out the mangling and reversing it, but sometimes you might just want to cut your losses and salvage what is there.

Fixing an Accidental Git Push –force

I did something stupid with git today. I was working alone on a topic brand and had just rebased it against master. I did git push --force to push my changes to github. The output wasn’t quite what I expected:

 + e9c00be...2652a00 garment-quiz -> garment-quiz (forced update)
 + d91922d...2fec250 release-2012-07-04 -> release-2012-07-04 (forced update)

What’s the second update doing there? I had a local branch tracking our latest release branch and that local branch wasn’t upto date - I hadn’t pulled changes made by other people (but all of my commits had been previous pushed to the repo). When I forced pushed, git pushed everything it was tracking, overwriting colleagues commits had made to the release branch.

With git it’s rather hard to actually lose data and this was no exception. I was able to restore the branch to its previous state by explicitly pushing the commit ref of the branch’s previous state (helpfully given in the output from git push)

git push origin d91922d:release-2012-07-04

Since all my changes had been pushed normally, all I had done was rewind that branch so git was happy for me to add those commits back in (or at least that’s my interpretation - I’m no git wizard.)