environment.rb and Requiring Dependencies

Nov 21^st, 2008

In the before time, the bottom of most of my apps’ environment.rb was an unholy mess. Inflector rules, requiring of various libraries or gems, various bits of app specific configuration etc… all jumbled together. Rails 2.0 introduced initializers: any file in config/initializers is run at an appropriate time during the initialisation process. You get to split that mess into a handful of well organised, single purposed little files (and rails 2.1 simplified the case of requiring gems with the config.gems mechanism).

You might still have a few stragglers though, that one require that you didn’t bother moving into an initializer because it hardly seemed worth creating a whole file just for that one line. With the imminent release of Rails 2.2 it’s high time you made that change.

Living Thread Dangerously

Unless you’ve been living under an internet-proof rock you’ve probably heard about Rails’ new threadsafeness. There’s a bunch of hard work across the framework that’s gone into making this possible, but one particular area is to do with loading code. Ruby’s require mechanism isn’t threadsafe (or as it has been put to me, it’s thread dangerous) nor is the automatic loading stuff Rails’ uses. For example say two threads both hit a constant called Foo that has yet to be loaded. Thread 1 starts loading foo.rb and gets as far as

class Foo < ActiveRecord

At this point Thread 2 hits Foo. However at this point the constant Foo now exists and so Thread 2 doesn’t load foo.rb. However since Thread 1 hasn’t yet processed the rest of foo.rb the Foo class will be missing all its instance methods, validations etc… If both threads end up loading foo.rb at the same time then weird things can happen like validations being added twice and so on. It can also cause the dependencies system to spuriously claim it couldn’t find a constant. It’s a small world of pain you don’t want to get involved in. Making require threadsafe is fundamentally hard (and is something the ruby-core and jruby folks have been worrying about).

What Rails 2.2 does in production mode is load all of your models, controllers and so on as part of the initialization process instead of loading them as they are needed. No more loads from different threads when your app is actually running, no more pain.

The Bad Thing

So, how does this connect with the statement I made above about moving things into initializers? Your average environment.rb file looks a little like this

#set some constants like RAILS_GEM_VERSION
require File.join(File.dirname(__FILE__), 'boot')
Rails::Initializer.run do |config|
  #set some config settings
end
#if you're old school, app configuration here
require 'some_dependency'

The bulk of initialization happens when you call run. This yields to the block to allow you to set the various settings (and also reads the appropriate environment file and so on) but the key thing is that by the time that function has returned, all of the initialization has happened.

In particular, Rails will try to load all of your application classes before the stuff at the bottom of environment.rb has been executed. If a model depends on some_dependency.rb being loaded (for example if that file added a validation that it uses) then your app will die before it even finished initialising.

If however you’re a good person and move these things into initializers (i.e. files in config/initializers) then they will run at an appropriate time in the boot process (i.e. before Rails loads up all your application classes) and you won’t get an unpleasant surprise when you try and deploy your app.

ASAP

Nov 20^th, 2008

Things most likely to make me not reply to a message on a mailing list:

This is urgent

Plz reply ASAP

First, Foremost and [0]

Nov 15^th, 2008

This doesn’t work (the something field will not be updated):

post.comments.first.something = true
post.comments.first.save

but this does

post.comments[0].something = true
post.comments[0].save

as does

post.comments.each {|c| puts c.id}
post.comments.first.something = true
post.comments.first.save

Both of these would have worked in rails 2.0.x and previous versions. So what changed?

Quacks like a duck but breathes fire

As you may know, post.comments looks an awful lot like an array but isn’t an array. It’s an association proxy. It has methods defined on it for things like finding objects from the database, the count method that does an sql count and things like that. When you ask it to do something that can only really be done by having the ruby objects in memory it will load the objects from the database into an actual ruby array and pass methods onto that (this all happens via method_missing). So far business as usual, rails has been like this for a long long time. In particular were you to call the first or the last methods on an association then the array would be loaded and first would be called on that array.

This sort of depends on your problem domain, but a lot of the time loading the entire array just to look at the first or last element of it is wasteful. You’ve always been able to do some_association.find :first (and as of 2.1 some_association.find :last) but that flows a little less easily off the tongue and of course doesn’t play nicely if you pass your association to some code that thinks it’s just working with an array. So a few months ago changes were made to make first and last load just that one item from the database[1]. Of course if the target array is already loaded then it just returns the first item from that array.

At the end of the day what that ends up meaning is that in the first example I gave, each call to post.comments.first returns a different object, ie the one that we call save on is not the same as the one we made the change to. The second and third examples are ok purely because they force the array to be loaded which in turn means that calls to first no longer hit the database in that way[2].

Of course if you’re doing things right your unit tests would catch this sort of thing, but it’s still likely to leave you scratching your head a little (I certainly recall spending a few minutes looking at code very similar to the first example and wondering why it no longer worked). Slightly more subtle are performance problems, for example if you were iterating over various attributes then you’d be hitting the database each time to load somethings.first.

I’m not sure what to think about this sort of thing. There is a perfectly sound rationale for doing this but it introduces little ifs, buts and maybes into the illusion that association proxies behave like Arrays. As far as performance goes the implications vary. For big associations it can be a huge win, other times loading 1 object instead of 3 will make little to no difference. In other places I do genuinely want to load the whole array but I’d rather write first than [0] if I’m accessing the first element. Maybe the example I gave is a little artificial, maybe not, but at the end of the day, first no longer being a synonym for [0] is a habit that is hard to break.

[1] I’m simplifying things quite considerably here - there are a number of edge cases which that code has to tread around quite carefully, including unsaved parent objects, unsaved children objects, custom finder sql etc… [2] While I’ve concentrated on first, everything I’ve said here applies to last too. In a way it’s slightly worse in that (at least for me) the difference in comfort between writing last and [-1] is greater than the difference between first and [0]

Do You Know When Your Code Runs?

Nov 9^th, 2008

Few things are more head bashing inducing than code that passes all unit tests, runs perfectly on your development machine but fails on your staging/production servers. In that vein, both of these examples are wrong:

class Person < ActiveRecord::Base
  has_many :posts
  has_many :recent_posts, :class_name => 'Post', :conditions => ["created_at > ?", 1.week.ago]
  validates_inclusion_of :birth_date, :in => (20.years.ago..13.years.ago),
                            :message => "You must be a teenager to signup", :on => :create
end

class Post < ActiveRecord::Base
  named_scope :recent, :conditions => ["created_at > ?", 1.week.ago]
end

In development mode this will work absolutely fine. When you deploy this code onto the production server it will work fine too, but after a while it won’t behave quite right. For example Person.recent_posts will start returning posts older than 1 week.

The key to this is understanding when the code runs. In particular when does “1.week.ago” get turned into an instance of Time with some fixed value such as 1st November 2008 at 20:32?

The statements has_many, validates_inclusion_of etc… are just method calls, so their arguments are evaluated when that function is called. You can look in the options hash for an association to see this (assuming you’ve just typed in the Person class given above):

Person.reflections[:recent_posts].options
=> {:conditions=>["created_at > ?", Sun Nov 02 14:27:26 +0000 2008]}

So when are these functions called? Quite simply when ruby loads person.rb. In development mode your source code is reloaded for each request[1], providing the illusion that the “1.week.ago” is re-evaluated whenever it is used. In production mode person.rb would only be read once per Rails instance and so once your mongrels had been running for a week Post.recent_posts would return anything written in the last 2 weeks (1 week before the date at which your mongrels were launched). You would also notice this if you were running script/console and keeping an eye on the sql generated: you’d see that the date in the WHERE clause didn’t change.

Fixing it.

Fortunately it’s not hard to fix this. In this case of the awesome named_scope you probably already know that you can supply a Proc for when you want your scope to take arguments. We can equally make one with no arguments, just to ensure that the time condition is evaluated whenever the scope is accessed.

class Post
  named_scope :recent, lambda { {:conditions => 1.week.ago}}
end

For conditions on things like associations we can use a little trick called interpolation. As I’m sure you know when ruby encounters “#{ ‘hello world’ }” it evaluates the things inside the #{}, but if you use single quotes (or equivalently things like %q() then it doesn’t. What you may not know is that Active Record will perform that interpolation again at the point where sql is generated. For example we can write the recent posts associations like this:

class Person < ActiveRecord::Base
  has_many :recent_posts, :class_name => 'Post',
           :conditions => 'created_at > #{self.class.connection.quote 1.week.ago}'
end

When person.rb is loaded the stuff in the #{} will not be evaluated, however when Active Record generates the sql needed to load the association it will be[2].

Validations can’t play any of the clever little games that the other 2 examples can. You’ll just have to something like

class Person < ActiveRecord::Base
  validate_on_create :is_a_teenager

  def is_a_teenager
    unless birth_date < 13.years.ago && birth_date > 20.years.ago
      ...
    end
  end
end

[1] Assuming you’ve got config.cache_classes set to false in development mode which is the default

[2] You can do a lot more with interpolation. Normally the code is interpolated in the context of the instance of the model so you can use any model methods, instance variables etc… When an association is fetched with :include it will be interpolated in the context of the class (since the whole point is to bulk load instances it does not make sense (nor would it work) to work per instance data in there.

Required or Not ?

Sep 28^th, 2008

One of Rails’ slightly gnarly areas is all the magic that goes into enabling the automatic reloading of source in development mode[1]. Reloading a class isn’t just as simple as reading the source again: that would just reopen the class. While this would allow you to add or change existing methods, it wouldn’t allow you to remove methods, change the class an object inherits from, stop including a module and things like that. In the particular context of Rails this would also cause validations, filters and callbacks to be added repeatedly. You also don’t want to reload absolutely everything. For example reloading standard ruby libraries would be pointless (and slow) as would be reloading Rails itself and (usually) plugins.

A related service that Rails’ dependencies system provide is autoloading of constants. Rails hooks const_missing: when an unknown constant is found Rails will try and determine the name of the file containing it (according to Rails’ conventions) and search for it in the appropriate folders. After a request (or when you call reload!) Rails unsets the constant. This means that reading the corresponding file again will create a new class rather than reopening the old one. It also means that the next use of that constant will cause const_missing to be hit again and load the class.

require messes with reloading

The long and short of this is that Rails needs to track what needs to be reloaded (i.e. which constants it should remove). When a file is loaded via Rails’ dependency system, all the constants are stashed away, in Dependencies.autoloaded_constants[2]. At the end of the request all of those constants are removed. But if you have bypassed the Rails dependency system then it won’t get that treatment. Here’s an example script/console session

>> Customer.object_id
=> 19116470
>> reload!
Reloading...
=> true
>> Object.constants.include?('Customer')
=> false
>> Customer.object_id
=> 18966210

The reload! function does the reloading that Rails would do at the end of a request. Here everything is happening as normal: we’ve let Rails handle the loading and after the reload the Customer constant is removed, ensuring we then get a fresh copy of the Customer class. Now lets try something different: explicitly require customer.rb:

>> require 'customer'
=> ["Customer"]
>> Customer.object_id
=> 19121220
>> reload!
Reloading...
=> true
>> Customer.object_id
=> 19121220

Lo and behold: the Customer class isn’t being reloaded. Had you done this in a real app you would find that changes to the customer file weren’t being picked up until you restarted the server. Even more confusingly it would be fine until you loaded a file that did such a require but thereafter changes would have no effect, even on pages where previously it worked.

Fun with associations

A lot of problems happen when you have something hanging onto an old version of a class. One way that can happen in a Rails app is via associations. Suppose our Customer class has an orders association.

>> require 'customer'
=> ["Customer"]
>> Customer.find(1).orders
=> [#<Order id: 1, customer_id: 1>]
>> Order.object_id
=> 18291410
>> Customer.reflections[:orders].klass.object_id
=> 18291410
>> Customer.reflections[:orders].klass.instance_methods - ActiveRecord::Base.instance_methods
=> ["build_customer", "create_customer", "belongs_to_before_save_for_customer", "customer",
"customer=", "my_instance_method", "set_customer_target"]

Everything is as we would expect it. Customer.reflections[:orders] returns an AssociationReflection object which is something that describes an association. It holds data like what kind of association it is, any options that were supplied (eg :foreign_key, :counter_cache) and so on. In particular its klass attribute is the ActiveRecord::Base subclass for the association. Here we can see that that class is the same as Order which we would expect.

The association’s class has the methods you would expect: some methods to deal with the customer association that Order has and an instance method we added. So far so good. Lets reload the code:

>> reload!
Reloading...
=> true
>> Customer.find(1).orders
=> [#<Order id: 1, customer_id: 1>]

Superficially things look fine, but if we dig a little deeper, everything has gone horribly wrong. The first clue is this:

>> Order.object_id
=> 18680200
>> Customer.reflections[:orders].klass.object_id
=> 18291410

This tells us that the Order class is no longer the same class as the class referenced by the association. Because Order was loaded via the Rails’ dependencies system it was reloaded when we did reload! but as we saw before Customer isn’t. This causes quite a few problems, for example

>> Customer.find(1).orders << Order.new
ActiveRecord::AssociationTypeMismatch: Order(#18291410) expected, got Order(#18680200)

Oh noes! When you add a record to a collection Active Record checks that it is of the correct type, but the Customer class is trying to check that the object is an instance of the old Order class, which it isn’t. The fun thing about this sort of situation is that it will work fine the first time you view the page after restarting the server, but not the second or following times. Madness!

There’s more stuff too. If we repeat our earlier test to list the instance methods of the association’s class we get this:

>> Customer.reflections[:orders].klass.instance_methods - ActiveRecord::Base.instance_methods
=> []
>> Customer.find(1).orders.customer
NoMethodError: undefined method `customer' for #<Class:0x23e34a4>

They’ve all gone. This can be more than a little baffling, when a page works fine but reloading it causes methods you know exist to just disappear into thin air. The culprit here is the reset_subclasses method in Active Record, which as its name implies, clears out classes. It only does this to autoloaded classes, which normally is fine because such classes are just thrown away and never used again, but we’re hanging onto this gutted class and trying to use it[3]. Even if this gutting of classes didn’t happen you’d still have a lot of confusion: instances of Order retrieved via the association would be the old class and so wouldn’t reflect any changes you had made to the source, but instances created directly would.

Just don’t do it

By now you’ve probably got the message that using require to load your models can cause some weird stuff to happen. Loading classes behind Rails’ back just gets things confused. There are two ways to stop this happening:

Just don’t require stuff. If you lets Rails’ automagic loading do its work none of this will happen
If you do need to require stuff explicity, use require_dependency. This means that Rails is kept in the loop

Of course require is fine for requiring gems, bits of standard libraries and so on, but using require to load bits of your own application should be viewed with suspicion. It only takes one require somewhere to mess things up, so be careful.

[1] Or to be quite precise, when config.cache_classes is set to false. If it is set to true (for example in production mode) nothing in this article applies

[2] In Rails 2.2 and higher, Dependencies was moved into the ActiveSupport namespace. If you’re running that version mentally prepend ActiveSupport:: wherever you see Dependencies. There are a lot of other settings in there that control all of this, for example load_once_paths and explicitly_unloadable_constants allow you to control what is reloaded and what isn’t.

[3] As far as I can tell and according to this thread the exact reason this is necessary is rather lost in the mists of time, possibly an artefact of previous implementations of Rails’ dependencies.

← Older Blog Archives Newer →

Space Vatican

Ramblings of a curious coder