Space Vatican

Ramblings of a curious coder

When Cache_classes Gets You Down

As I’ve mentioned before, in Rails 2.2 if config.cache_classes is true (which it is in production) then all of your models, controllers etc are loaded up as part of application initialization. This is great if you’re firing up a server that should actually be handling requests as it avoids all sorts of nasty race conditions to do with requiring files and defining classes in ruby. It’s not so great if you’re running a small script, cron job or something like a migration.

The reasons are two-fold. First off, it makes startup slower. This is irrelevant when starting up a web server because it is just getting out of the way stuff you would do anyway but it is time wasted if you’ve just got a single threaded script that doesn’t care about dependency race conditions (and probably only touches one or two models anyway), especially if the script only takes a second or two to run once the environment is loaded. Secondly it can break stuff if models or controllers try and look at the database when the class is loaded. One example of this is ActiveScaffold. For those of you not familiar with it, ActiveScaffold helps you create adminy CRUD style interfaces. A very basic controller might look like this:

1
2
3
class ProductsController < ApplicationController
  active_scaffold
end

When the controller is loaded ActiveScaffold sees that it’s working on the ProductsController, infers that the corresponding model is Product and goes off to find out what columns that model has. Imagine you’ve just added that model and controller and it’s time to deploy on your production machine. At this point the products table doesn’t exist (since the migration hasn’t run yet). Since you’re running in your production environment when the rake task causes rails to be loaded it will load your application’s classes, including ProductsController which will try and introspect the products table. Fail.

There are two workarounds I can think of. One is a separate rails environment: define an environment with cache_classes set to false but that still uses the production database. Instead of running your migrations in the production environment run them in the production_without_cache_classes environment. This works but can be a bit annoying, especially for adhoc scripts and stuff like that and is even more of a pain if you have multiple productiony environments (eg a staging environment).

Another is to edit your production.rb so that it reads

1
2
3
4
5
if defined?(DONT_CACHE_CLASSES) && DONT_CACHE_CLASSES
  config.cache_classes = false
else
  config.cache_classes = true
end

Then for any script which you want to run without class caching turned on stick DONT_CACHE_CLASSES=true at the top (before you require config/environment). If you want to extend this to your rake files then edit the Rakefile at the top level of the app. This feels slightly neater to me as I don’t have to remember to fiddle with the environment when running the script.

Another variation upon this is to use an environment variable instead of a constant which would allow you to do things such as rake db:migrate CACHE_CLASSES=false or similar (although of the default tasks I can’t of one you’d run in production where you would want class caching to be on).

This is far from beautiful but appears to get the job done for now.

Dates, Params and You

A not particularly nice area of Rails are the date and time helpers. 3 popups just isn’t a very nice bit of user interface. It’s a lot of clicks when you want to change dates and most people can’t reason in their head about just the date. It’s far easier to pick a date from a calendar type view. Still the helpers rails provides are fine for that quick and dirty date input.

Based on the questions on the mailing list about this, the thing that trips people up is that, unlike other attributes you might typically have, dates and times are not representable by a single input control. Instead you have several, one for each component (year, month, day etc…). So in particular, there is no single value in your params hash with your date or time. Exactly what is in your params hash depends on whether your using select_date or date_select (if you’re entering a datetime, select_datetime or datetime_select).

These are to each other as text_field_tag is to text_field: date_select is expecting to hook up to an attribute of an instance variable (or if you use form_for or fields_for an attribute of the corresponding object) whereas select_date isn’t. However unlike the other pairs of functions like textfield_tag/text_field, select/select_tag these two send very different parameters through to your controller.

select_date is perhaps the easiest to understand. It will result in a hash (by defaults it is named “date”, but you can override this with the :prefix option) with keys like year, month, day. You can then put those together to get an instance of Date or Time. For example the following in your view

1
<%= select_date Date::today, :prefix => 'start' %>

will result in a params hash like this:

1
{:start => {:year => '2008', :month => '11', :day => '22'}}

As I said, there is nothing in the params hash that is the actual value. You have to put it together yourself, for example

1
2
my_date = Date::civil(params[:start][:year].to_i, params[:start][:month].to_i,
                      params[:start][:day].to_i)

A bit more work than you average parameter, but there’s nothing mysterious going on here. Under the hood, select_date is also quite boring: it’s just calling select_year, select_month, select_day with appropriate options and concatenating the result. A consequence of that is that if you want some odd combination (eg just months and seconds) you can just do that concatenation work yourself. One interesting thing about those subhelpers is that the first parameter you give them can be one of two things: - an integer in which case the corresponding day/month/year is displayed (eg 3 for March) - something like a Date or DateTime in which case the relevant date component is extracted from it

date_select is where the fun is. Here the expectation is that there is a model object we will want to update and we want to be able to do

1
my_object.update_attributes params[:my_object]

However update_attributes just wants to set attributes. If you pass it {‘foo’ => ‘bar’} it will try and call the method foo= passing bar as a parameter. For a date input that is made up of these multiple parameters this is clearly a problem. What solves this is something called multiparameter assignment. If there are parameters whose name is in a certain format, then instead of just trying to call the appropriate accessor Rails will gather the related parameters, feed them through a transformation function (for example Time.mktime or Date::new[1]) and then set the appropriate attribute.

The format used is as follows: all the related parameters start with the name of the attribute which lets Rails know they are related. Next Rails needs to know in what order to pass them to the transformation function and whether a typecast is needed. If your view contained

1
<%= date_select 'product', 'release_date' %>

Then your parameters hash would look like

1
2
{:product =>
        {'release_date(1i)' => '2008', 'release_date(2i)' => '11', 'release_date(3i)' => '22'}}

Rails can look at this and see that this is to do with the release_date attribute. It’s a date column, so rails knows to use Date::civil. The suffixes tell rails that 2008 is the first parameter to Date::civil and is an integer, that 11 is the second parameter and so on. Rails constructs the value using Date::civil(2008,11,22) and assigns that to release_date.

If you don’t intend to pass the parameters to update_attributes (or other functions with that syntax such as the new or create methods on an ActiveRecord class) there’s not a lot of point in putting up with the scary parameter names althouh you can of course construct the date yourself with

1
2
3
Date::civil(params[:product]['release\_date(1i)'].to_i,
 params[:product]['release\_date(2i)'].to_i,
params[:product]['release\_date(3i)'].to_i)

You might as well just use select_date and have readable parameter names though.

So, to sum up use date_select or datetime_select when creating/updating ActiveRecord objects but select_date or select_datetime for just a general purpose date input. As a closing tip, with select_datetime you can use the :use_hidden option in which case hidden form inputs are generated instead of select boxes.[2]


[1] There’s a bit more to this. For one the range of times representable by a Time object is limited on most platforms (since it’s commonly a 32 bit number of seconds since an epoch). Rails has some conversion code that will try and create an instance of Time but if necessary will fall back and create a DateTime object. Secondly there’s some cleverness to do with interpreting the user’s input with respect to the correct time zone.

[2] This is (I think) a slight misuse. The intent of the use hidden is that it is the mechanism by which the :discard_day and so on work

With_options for Fun and Profit

Active Support has a nifty little helper that can cut down on repetition. A lot of things in Rails like validations, associations, named_scopes, routes etc… take a hash of options as their final parameter. There are times where you use many of these with some common options, for example

1
2
3
4
5
class Customer < ActiveRecord::Base
  validates_presence_of :phone_number, :if => :extended_signup
  validates_presence_of :job_title, :if => :extended_signup
  validates_presence_of :job_industry, :if => :extended_signup
end

or maybe you have a bunch of associations which all share an association proxy extension module and some settings

1
2
3
4
5
class Customer < ActiveRecord::Base
  has_many :foos, :extend =>MyModule, :order => "updated_at desc", :conditions => {:active => true}
  has_many :bars, :extend =>MyModule, :order => "updated_at desc"
  has_many :things, :extend =>MyModule, :order => "updated_at desc"
end

Enter with_options!

1
2
3
4
5
6
7
class Customer < ActiveRecord::Base
  with_options :extend => MyModule, :order => "updated_at desc" do |options|
    options.has_many :foos, :conditions => {:active => true}
    options.has_many :bars
    options.has_many :things
  end
end

Any time you’ve got a bunch of method calls taking some common options, with_options can help.

So how does it work on the inside? All the with_options method actually does is yield a special object to its block - all the craftiness is in that object. What we want that object to do is forward method calls to the object we’re really interested in (in this case the Customer class) adding the options before it does so.

As with many such proxy objects we undefine just about every method and just implement method_missing. The implementation of method_missing inspects the arguments and merges any options present with the common set defined by the call to with_options (so individual methods can take extra options or override the common ones) before passing them onto the “real” object.

Originally a limitation was that the last argument just had to be a hash, so for example if you had a procedural named_scope then with_options couldn’t work. Luckily a recent commit rectifies this: if the thing you’re trying to merge with is a proc, then with_options will replace it with a new proc that merges the common options with the result of the call to the original proc. If you’re not on edge you’ll have to wait for 2.3 in order to get this.

While both the examples I gave showed using with_options on what is essentially model class configuration it is by no means limited to that. You could use it for that sort of configuration on your own classes or just inside a regular method - anytime you are making several method calls on the same object with a hash of options at the end.

environment.rb and Requiring Dependencies

In the before time, the bottom of most of my apps’ environment.rb was an unholy mess. Inflector rules, requiring of various libraries or gems, various bits of app specific configuration etc… all jumbled together. Rails 2.0 introduced initializers: any file in config/initializers is run at an appropriate time during the initialisation process. You get to split that mess into a handful of well organised, single purposed little files (and rails 2.1 simplified the case of requiring gems with the config.gems mechanism).

You might still have a few stragglers though, that one require that you didn’t bother moving into an initializer because it hardly seemed worth creating a whole file just for that one line. With the imminent release of Rails 2.2 it’s high time you made that change.

Living Thread Dangerously

Unless you’ve been living under an internet-proof rock you’ve probably heard about Rails’ new threadsafeness. There’s a bunch of hard work across the framework that’s gone into making this possible, but one particular area is to do with loading code. Ruby’s require mechanism isn’t threadsafe (or as it has been put to me, it’s thread dangerous) nor is the automatic loading stuff Rails’ uses. For example say two threads both hit a constant called Foo that has yet to be loaded. Thread 1 starts loading foo.rb and gets as far as

1
class Foo < ActiveRecord

At this point Thread 2 hits Foo. However at this point the constant Foo now exists and so Thread 2 doesn’t load foo.rb. However since Thread 1 hasn’t yet processed the rest of foo.rb the Foo class will be missing all its instance methods, validations etc… If both threads end up loading foo.rb at the same time then weird things can happen like validations being added twice and so on. It can also cause the dependencies system to spuriously claim it couldn’t find a constant. It’s a small world of pain you don’t want to get involved in. Making require threadsafe is fundamentally hard (and is something the ruby-core and jruby folks have been worrying about).

What Rails 2.2 does in production mode is load all of your models, controllers and so on as part of the initialization process instead of loading them as they are needed. No more loads from different threads when your app is actually running, no more pain.

The Bad Thing

So, how does this connect with the statement I made above about moving things into initializers? Your average environment.rb file looks a little like this

1
2
3
4
5
6
7
#set some constants like RAILS_GEM_VERSION
require File.join(File.dirname(__FILE__), 'boot')
Rails::Initializer.run do |config|
  #set some config settings
end
#if you're old school, app configuration here
require 'some_dependency'

The bulk of initialization happens when you call run. This yields to the block to allow you to set the various settings (and also reads the appropriate environment file and so on) but the key thing is that by the time that function has returned, all of the initialization has happened.

In particular, Rails will try to load all of your application classes before the stuff at the bottom of environment.rb has been executed. If a model depends on some_dependency.rb being loaded (for example if that file added a validation that it uses) then your app will die before it even finished initialising.

If however you’re a good person and move these things into initializers (i.e. files in config/initializers) then they will run at an appropriate time in the boot process (i.e. before Rails loads up all your application classes) and you won’t get an unpleasant surprise when you try and deploy your app.

ASAP

Things most likely to make me not reply to a message on a mailing list:

This is urgent

Plz reply ASAP