Space Vatican

Ramblings of a curious coder

First, Foremost and [0]

This doesn’t work (the something field will not be updated):

1
2
post.comments.first.something = true
post.comments.first.save

but this does

1
2
post.comments[0].something = true
post.comments[0].save

as does

1
2
3
post.comments.each {|c| puts c.id}
post.comments.first.something = true
post.comments.first.save

Both of these would have worked in rails 2.0.x and previous versions. So what changed?

Quacks like a duck but breathes fire

As you may know, post.comments looks an awful lot like an array but isn’t an array. It’s an association proxy. It has methods defined on it for things like finding objects from the database, the count method that does an sql count and things like that. When you ask it to do something that can only really be done by having the ruby objects in memory it will load the objects from the database into an actual ruby array and pass methods onto that (this all happens via method_missing). So far business as usual, rails has been like this for a long long time. In particular were you to call the first or the last methods on an association then the array would be loaded and first would be called on that array.

This sort of depends on your problem domain, but a lot of the time loading the entire array just to look at the first or last element of it is wasteful. You’ve always been able to do some_association.find :first (and as of 2.1 some_association.find :last) but that flows a little less easily off the tongue and of course doesn’t play nicely if you pass your association to some code that thinks it’s just working with an array. So a few months ago changes were made to make first and last load just that one item from the database[1]. Of course if the target array is already loaded then it just returns the first item from that array.

At the end of the day what that ends up meaning is that in the first example I gave, each call to post.comments.first returns a different object, ie the one that we call save on is not the same as the one we made the change to. The second and third examples are ok purely because they force the array to be loaded which in turn means that calls to first no longer hit the database in that way[2].

Of course if you’re doing things right your unit tests would catch this sort of thing, but it’s still likely to leave you scratching your head a little (I certainly recall spending a few minutes looking at code very similar to the first example and wondering why it no longer worked). Slightly more subtle are performance problems, for example if you were iterating over various attributes then you’d be hitting the database each time to load somethings.first.

I’m not sure what to think about this sort of thing. There is a perfectly sound rationale for doing this but it introduces little ifs, buts and maybes into the illusion that association proxies behave like Arrays. As far as performance goes the implications vary. For big associations it can be a huge win, other times loading 1 object instead of 3 will make little to no difference. In other places I do genuinely want to load the whole array but I’d rather write first than [0] if I’m accessing the first element. Maybe the example I gave is a little artificial, maybe not, but at the end of the day, first no longer being a synonym for [0] is a habit that is hard to break.


[1] I’m simplifying things quite considerably here - there are a number of edge cases which that code has to tread around quite carefully, including unsaved parent objects, unsaved children objects, custom finder sql etc… [2] While I’ve concentrated on first, everything I’ve said here applies to last too. In a way it’s slightly worse in that (at least for me) the difference in comfort between writing last and [-1] is greater than the difference between first and [0]

Do You Know When Your Code Runs?

Few things are more head bashing inducing than code that passes all unit tests, runs perfectly on your development machine but fails on your staging/production servers. In that vein, both of these examples are wrong:

1
2
3
4
5
6
7
8
9
10
class Person < ActiveRecord::Base
  has_many :posts
  has_many :recent_posts, :class_name => 'Post', :conditions => ["created_at > ?", 1.week.ago]
  validates_inclusion_of :birth_date, :in => (20.years.ago..13.years.ago),
                            :message => "You must be a teenager to signup", :on => :create
end

class Post < ActiveRecord::Base
  named_scope :recent, :conditions => ["created_at > ?", 1.week.ago]
end

In development mode this will work absolutely fine. When you deploy this code onto the production server it will work fine too, but after a while it won’t behave quite right. For example Person.recent_posts will start returning posts older than 1 week.

The key to this is understanding when the code runs. In particular when does “1.week.ago” get turned into an instance of Time with some fixed value such as 1st November 2008 at 20:32?

The statements has_many, validates_inclusion_of etc… are just method calls, so their arguments are evaluated when that function is called. You can look in the options hash for an association to see this (assuming you’ve just typed in the Person class given above):

1
2
Person.reflections[:recent_posts].options
=> {:conditions=>["created_at > ?", Sun Nov 02 14:27:26 +0000 2008]}

So when are these functions called? Quite simply when ruby loads person.rb. In development mode your source code is reloaded for each request[1], providing the illusion that the “1.week.ago” is re-evaluated whenever it is used. In production mode person.rb would only be read once per Rails instance and so once your mongrels had been running for a week Post.recent_posts would return anything written in the last 2 weeks (1 week before the date at which your mongrels were launched). You would also notice this if you were running script/console and keeping an eye on the sql generated: you’d see that the date in the WHERE clause didn’t change.

Fixing it.

Fortunately it’s not hard to fix this. In this case of the awesome named_scope you probably already know that you can supply a Proc for when you want your scope to take arguments. We can equally make one with no arguments, just to ensure that the time condition is evaluated whenever the scope is accessed.

1
2
3
class Post
  named_scope :recent, lambda { {:conditions => 1.week.ago}}
end

For conditions on things like associations we can use a little trick called interpolation. As I’m sure you know when ruby encounters “#{ ‘hello world’ }” it evaluates the things inside the #{}, but if you use single quotes (or equivalently things like %q() then it doesn’t. What you may not know is that Active Record will perform that interpolation again at the point where sql is generated. For example we can write the recent posts associations like this:

1
2
3
4
class Person < ActiveRecord::Base
  has_many :recent_posts, :class_name => 'Post',
           :conditions => 'created_at > #{self.class.connection.quote 1.week.ago}'
end

When person.rb is loaded the stuff in the #{} will not be evaluated, however when Active Record generates the sql needed to load the association it will be[2].

Validations can’t play any of the clever little games that the other 2 examples can. You’ll just have to something like

1
2
3
4
5
6
7
8
9
class Person < ActiveRecord::Base
  validate_on_create :is_a_teenager

  def is_a_teenager
    unless birth_date < 13.years.ago && birth_date > 20.years.ago
      ...
    end
  end
end

[1] Assuming you’ve got config.cache_classes set to false in development mode which is the default

[2] You can do a lot more with interpolation. Normally the code is interpolated in the context of the instance of the model so you can use any model methods, instance variables etc… When an association is fetched with :include it will be interpolated in the context of the class (since the whole point is to bulk load instances it does not make sense (nor would it work) to work per instance data in there.

Required or Not ?

One of Rails’ slightly gnarly areas is all the magic that goes into enabling the automatic reloading of source in development mode[1]. Reloading a class isn’t just as simple as reading the source again: that would just reopen the class. While this would allow you to add or change existing methods, it wouldn’t allow you to remove methods, change the class an object inherits from, stop including a module and things like that. In the particular context of Rails this would also cause validations, filters and callbacks to be added repeatedly. You also don’t want to reload absolutely everything. For example reloading standard ruby libraries would be pointless (and slow) as would be reloading Rails itself and (usually) plugins.

A related service that Rails’ dependencies system provide is autoloading of constants. Rails hooks const_missing: when an unknown constant is found Rails will try and determine the name of the file containing it (according to Rails’ conventions) and search for it in the appropriate folders. After a request (or when you call reload!) Rails unsets the constant. This means that reading the corresponding file again will create a new class rather than reopening the old one. It also means that the next use of that constant will cause const_missing to be hit again and load the class.

require messes with reloading

The long and short of this is that Rails needs to track what needs to be reloaded (i.e. which constants it should remove). When a file is loaded via Rails’ dependency system, all the constants are stashed away, in Dependencies.autoloaded_constants[2]. At the end of the request all of those constants are removed. But if you have bypassed the Rails dependency system then it won’t get that treatment. Here’s an example script/console session

1
2
3
4
5
6
7
8
9
>> Customer.object_id
=> 19116470
>> reload!
Reloading...
=> true
>> Object.constants.include?('Customer')
=> false
>> Customer.object_id
=> 18966210

The reload! function does the reloading that Rails would do at the end of a request. Here everything is happening as normal: we’ve let Rails handle the loading and after the reload the Customer constant is removed, ensuring we then get a fresh copy of the Customer class. Now lets try something different: explicitly require customer.rb:

1
2
3
4
5
6
7
8
9
>> require 'customer'
=> ["Customer"]
>> Customer.object_id
=> 19121220
>> reload!
Reloading...
=> true
>> Customer.object_id
=> 19121220

Lo and behold: the Customer class isn’t being reloaded. Had you done this in a real app you would find that changes to the customer file weren’t being picked up until you restarted the server. Even more confusingly it would be fine until you loaded a file that did such a require but thereafter changes would have no effect, even on pages where previously it worked.

Fun with associations

A lot of problems happen when you have something hanging onto an old version of a class. One way that can happen in a Rails app is via associations. Suppose our Customer class has an orders association.

1
2
3
4
5
6
7
8
9
10
11
>> require 'customer'
=> ["Customer"]
>> Customer.find(1).orders
=> [#<Order id: 1, customer_id: 1>]
>> Order.object_id
=> 18291410
>> Customer.reflections[:orders].klass.object_id
=> 18291410
>> Customer.reflections[:orders].klass.instance_methods - ActiveRecord::Base.instance_methods
=> ["build_customer", "create_customer", "belongs_to_before_save_for_customer", "customer",
"customer=", "my_instance_method", "set_customer_target"]

Everything is as we would expect it. Customer.reflections[:orders] returns an AssociationReflection object which is something that describes an association. It holds data like what kind of association it is, any options that were supplied (eg :foreign_key, :counter_cache) and so on. In particular its klass attribute is the ActiveRecord::Base subclass for the association. Here we can see that that class is the same as Order which we would expect.

The association’s class has the methods you would expect: some methods to deal with the customer association that Order has and an instance method we added. So far so good. Lets reload the code:

1
2
3
4
5
>> reload!
Reloading...
=> true
>> Customer.find(1).orders
=> [#<Order id: 1, customer_id: 1>]

Superficially things look fine, but if we dig a little deeper, everything has gone horribly wrong. The first clue is this:

1
2
3
4
>> Order.object_id
=> 18680200
>> Customer.reflections[:orders].klass.object_id
=> 18291410

This tells us that the Order class is no longer the same class as the class referenced by the association. Because Order was loaded via the Rails’ dependencies system it was reloaded when we did reload! but as we saw before Customer isn’t. This causes quite a few problems, for example

1
2
>> Customer.find(1).orders << Order.new
ActiveRecord::AssociationTypeMismatch: Order(#18291410) expected, got Order(#18680200)

Oh noes! When you add a record to a collection Active Record checks that it is of the correct type, but the Customer class is trying to check that the object is an instance of the old Order class, which it isn’t. The fun thing about this sort of situation is that it will work fine the first time you view the page after restarting the server, but not the second or following times. Madness!

There’s more stuff too. If we repeat our earlier test to list the instance methods of the association’s class we get this:

1
2
3
4
>> Customer.reflections[:orders].klass.instance_methods - ActiveRecord::Base.instance_methods
=> []
>> Customer.find(1).orders.customer
NoMethodError: undefined method `customer' for #<Class:0x23e34a4>

They’ve all gone. This can be more than a little baffling, when a page works fine but reloading it causes methods you know exist to just disappear into thin air. The culprit here is the reset_subclasses method in Active Record, which as its name implies, clears out classes. It only does this to autoloaded classes, which normally is fine because such classes are just thrown away and never used again, but we’re hanging onto this gutted class and trying to use it[3]. Even if this gutting of classes didn’t happen you’d still have a lot of confusion: instances of Order retrieved via the association would be the old class and so wouldn’t reflect any changes you had made to the source, but instances created directly would.

Just don’t do it

By now you’ve probably got the message that using require to load your models can cause some weird stuff to happen. Loading classes behind Rails’ back just gets things confused. There are two ways to stop this happening:

  • Just don’t require stuff. If you lets Rails’ automagic loading do its work none of this will happen
  • If you do need to require stuff explicity, use require_dependency. This means that Rails is kept in the loop

Of course require is fine for requiring gems, bits of standard libraries and so on, but using require to load bits of your own application should be viewed with suspicion. It only takes one require somewhere to mess things up, so be careful.


[1] Or to be quite precise, when config.cache_classes is set to false. If it is set to true (for example in production mode) nothing in this article applies

[2] In Rails 2.2 and higher, Dependencies was moved into the ActiveSupport namespace. If you’re running that version mentally prepend ActiveSupport:: wherever you see Dependencies. There are a lot of other settings in there that control all of this, for example load_once_paths and explicitly_unloadable_constants allow you to control what is reloaded and what isn’t.

[3] As far as I can tell and according to this thread the exact reason this is necessary is rather lost in the mists of time, possibly an artefact of previous implementations of Rails’ dependencies.

Selenium and Firefox 3

I recently spent a bit of time making our Selenium tests play nicely with Firefox 3 and spent quite a lot of time starting at

1
Preparing Firefox profile...

Selenium would launch Firefox, and then Firefox would just sit there doing nothing. Eventually some digging around found a ticket on the Selenium issue tracker. It turns out Selenium installs a tiny little extension into the Firefox profiles it generates that basically just lets selenium kill firefox by telling it to go to a magic chrome url. Firefox extensions specify which versions they are compatible with and the one embedded in selenium had 2.0.0.* as their maximum version (and this is still the case with the latest downloadable release (although you could of course download the nightly builds)).

It seems that this was the only thing from keeping selenium and Firefox 3 playing nicely together as changing the maximum version to 3.0.* got all our tests passing again with our existing version of selenium (0.9.2).

All I had to do was extract the relevant files from selenium-server.jar:

1
2
3
4
5
6
7
8
9
10
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/readystate@openqa.org/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/{538F0036-F358-4f84-A764-89FB437166B4}/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/\{503A0CD4-EDC8-489b-853B-19E0BAA8F0A4\}/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFF/extensions/readystate\@openqa.org/install.rdf
jar xf selenium-server.jar \
customProfileDirCUSTFF/extensions/\{538F0036-F358-4f84-A764-89FB437166B4\}/install.rdf

This extracts the files (and the directory structure containing them). To be honest I’m not entirely sure of the difference between all of these extensions - safest bet seems to be changing them all. Now edit all of the .rdf files (they’re just text files) and change the maximum version from 2.0.0. to whatever you want (for example 3.0.) and put them back in the jar:

1
2
3
4
5
6
7
8
9
10
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/readystate@openqa.org/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/{538F0036-F358-4f84-A764-89FB437166B4}/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFFCHROME/extensions/\{503A0CD4-EDC8-489b-853B-19E0BAA8F0A4\}/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFF/extensions/readystate\@openqa.org/install.rdf
jar uf selenium-server.jar \
customProfileDirCUSTFF/extensions/\{538F0036-F358-4f84-A764-89FB437166B4\}/install.rdf

Voila! all done

Watch Out for Has_one :through and :include

I’ve seen enough people confused about this that it’s probably worth broadcasting this slightly more widely. In a nutshell, :include of has_one :through associations is broken in Rails 2.1. Rails 2.1.1 and higher are fixed.

As you may recall, :include takes ones of two different paths.

In the first (the default) Rails loads parent records first, and will then load all the child record of all those parent ones in one go. Unfortunately in Rails 2.1 this isn’t done quite right and the net effect is that the associations are loaded normally and then preloaded. This was fixed as of fdeeeaea and is included in Rails 2.1.1

In the second case Rails generates appropriate join statements. This is used when you have conditions or orders on the joined tables and also if you have a count or a sum which uses columns from the joins tables. This just plain wasn’t implemented, so it was being treated as a plain old has_one which results in an angry message from the database about you referencing a non existant column name. This was fixed as of bff0f5fb and like the previous fix is in Rails 2.1.1