Space Vatican

Ramblings of a curious coder

Modules Aren’t Just for Instance Methods

Modules are great for sharing bits of code across models, controllers etc.. The very basic case, where you just want to share some instance methods is dead easy: just write the methods and include the module. The module gets inserted in your class’ list of ancestors and so ruby will look inside it when it needs to find a method. If you’re using rails you might also want to share validations on models, filters on controllers and so on. But first something a little simpler.

Using a module to add class methods is a bit more fiddly than sharing instance methods. Of course you can just include the module inside class << self, but often we want to add both class and instance methods. Requires two modules and two includes when one cannot exist without the other is a bit messy and repetitive(not to mention error-prone). Luckily, whenever a module is included, a callback is called passing what did the including as an argument. Instead on relying on the programmer to do that second include we can do it ourselves from this callback. This is a fairly standard pattern:

1
2
3
4
5
6
7
8
9
10
11
12
13
module MyModule
  module ClassMethods
    def a_class_method
    end
  end

  def an_instance_method
  end

  def self.included(base)
    base.extend ClassMethods
  end
end

Our included function is dead simple, we just want to add our module of class methods. We need to use extend here because include would add them as instance methods, we want to add them as singleton methods on the class, i.e. class methods. If you include this module then you’ll gain the methods in MyModule::ClassMethods as class methods. Naming the module ClassMethods is just for ease of reading, you could call it anything you want.

We could be doing a lot more though. It’s important to understand that things like has_many, validates_format_of and so on are just methods. They don’t always look like it (which is of course one of the things that makes ruby great for building DSLs), but it’s definitely there. Laying to one side issues of protected or private methods,

1
2
3
class Person < ActiveRecord::Base
  validates_presence_of :email
end

is the same as

1
2
3
class Person < ActiveRecord::Base
  self.validates_presence_of :email
end

or

1
2
class Person < ActiveRecord::Base; end
Person.validates_presence_of :email

This is of course true of all those other things (named_scope, before_filter, verify etc…) that you frequently see in models or controllers: just methods you can call on classes. What a happy coincidence that the included callback gives you a class! This means that we can package up sets of validtions or associations or anything like that into a module. For example, we could have an Auditable module. Things that are auditable have a polymorphic audit_trails association that tracks whenever the object was modified.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
module Auditable
  def update_with_audit(*args)
    returning update_without_audit(*args) do |rows_changed|
      if rows_changed > 0
        audit_trails.create(...)
      end
    end
  end

  def self.included(base)
     base.has_many :audit_trails, :as => :auditable
     base.alias_method_chain :update, :audit
   end
end

In our included method we use has_many to create the association and we use alias_method_chain to hook the update method (which is used whenever you update a row). Sometimes it’s nice to use instance_eval on base to make things look a little neater, for example

1
2
3
4
5
6
7
8
module FooBarish
  def self.included(base)
    base.instance_eval do
      validates_presence_of :foo
      validates_presence_of :bar
    end
  end
end

Whenever you include this module the 2 validations will be defined.

You could play similar tricks with sets of filters that you want certain controllers to have and all sorts of things. That said, don’t go overboard, sometimes regular old subclassing is exactly the right thing, but happily modules will keep you going even when it’s not.

Fun With Class Variables

Class variables are a slightly fiddly bit of ruby. They don’t always quite behave the way you expect. ActiveSupport bundles up a large number of helpers to deal with the various cases. They’re used extensively throughout the framework where the ability to control how things set at the framework level ripple down to subclasses (ie your models and controllers) is important to get right, but they can be pretty handy in your own apps too.

The Basics

If you’re at all familiar with ruby you’ll have heard of @@. @@ is a bit odd because it creates variables that are visible both from the instance and the class. The value is also shared with subclasses. ActiveSupport adds cattr_accessor which creates the obvious accessor methods and initializes the value to nil. The accessors are created on both the class and instances of it (the instance writer is optional).

1
2
3
4
5
6
7
8
9
class Foo
  cattr_accessor :bar
end

Foo.bar #=> nil
Foo.bar= '123'
Foo.new.bar #=> '123'
Foo.new.bar = 'abc'
Foo.bar #=> 'abc'

If we create a subclass, the value is shared:

1
2
3
4
5
6
class Derived < Foo
end
Foo.bar= '123'
Derived.bar #=> '123'
Derived.bar = 'abc'
Foo.bar #=> 'abc'

Other slightly odd things can happen:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class Base
  def self.bar
    @@bar
  end

  def self.bar=value
    @@bar
  end
end

class Derived < Base
  cattr_accessor :bar
end
class Derived2 < Base
  cattr_accessor :bar
end

Derived.bar = '123'
Derived2.bar = 'abc'
Derived.bar #=> '123'

So now the subclasses have independent class variables named @@bar. However if you set Base.bar before the Derived and Derived2 classes are created [1] then everyone will be share the base class’ value as before. To summarize it’s rather fiddly and often unintuitive. It also does not allow the fairly common pattern of having the base class setting a default value that subclasses can override. If you don’t know about this then it can lead to odd situations. You’ll have code that works fine in dev mode, but not in production (since in development classes are trashed and recreated between requests which obviously interacts with all this) or tests that pass when run individually but fail when you run several of them at the same time.

Classes are objects

Classes are no exception in ruby, like most things (everything?) they are objects, and like objects they can have instance variables. These instance variables are just normal instance variables: they don’t have the odd scoping rules that @@ variables have. Base classes and derived classes have completely independent values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Base
  @bar = '123'
  class << self
    def bar
      @bar
    end

    def bar= value
      @bar = value
    end
  end
end

class Derived < Base; end
Base.bar #=> '123
Derived.bar #=> nil
Derived.bar = 'abc'
Base.bar #=> "123"

Less prone to unwanted surprises, but not without shortfalls as you cannot make a default from a base class propagate down (without resorting to tricks with self.inherited).

Inheritable Tricks

ActiveSupport adds class_inheritable_accessor. This provides something closer in behaviour to what you might expect: base classes can provide defaults and subclasses inherit those defaults and can overwrite them without affecting other subclasses or the base class.

1
2
3
4
5
6
7
8
9
10
11
12
class Base
  class_inheritable_accessor :value
  self.value = '123'
end

class Derived < Base; end
class Derived2 < Base; end

Derived.value #=> '123'
Derived.value = 'abc'
Base.value #=> '123'
Derived2.value #=> '123

So far so good. We override the value on derive and didn’t perturb anyone else. Unfortunately there are some drawbacks:

1
2
  Base.value = 'xyz'
  Derived2.value #=> '123

Oh dear. Even though Derived2 never overrode any values the change we made to Base didn’t propagate. In other words the default values are baked into the subclass at the time that it is created. This is down to the way that class_inheritable_accessor is implemented: The classes have an instance variable @inheritable_attributes (like we saw above) that is a hash with all the attributes. The accessor methods just pull the values in and out of the hash. When a class is subclassed the subclass gets a copy of @inheritable_attributes. Once this has happened, nothing links the base class’ attributes with the subclass. This happens via the inherited callback, a consequence of this is that if you override inherited on an ActiveRecord class, a controller etc… without calling super all hell will break loose (since class_inheritable_accessor attributes will not be propagated).

Bags of tricks

ActiveSupport also provides class_inheritable_array and class_inheritable_hash. They both use class_inheritable_accessor as their underlying mechanism. When you set a class_inheritable_array or a class_inheritable_hash you are actually concatenating (or merging) with the value inherited from the super class.

1
2
3
4
5
6
7
8
9
class Base
  class_inheritable_hash :attrs
  self.attrs = {:name => 'Fred'}
end

class Derived < Base
  self.attrs = {:export => 'Pain'}
end
Derived.attrs #=> {:name => 'Fred', :export => 'Pain'}

These aren’t particularly magic but are a handy shortcut.

Delegation for the nation

ActiveSupport’s final trick is superclass_delegating_accessor, added in rails 2.0.1. At first it appears very similar to class_inheritable_accessor:

1
2
3
4
5
6
class Base
  superclass_delegating_accessor :properties
  self.properties = []
end

class Derived < Base; end

This time however we can do this:

1
2
  Base.properties = [:useless]
  Derived.properties #=> [:useless]

superclass_delegating_accessor creates regular instance variables in the class. The interesting bit is the reader method: it looks at the current class and checks if the appropriate instance variable is defined. If so, it returns it, if not it calls super (ie gets the superclass’ instance variable) and so it. It stops when it reaches the class that created the superclass_delegating_accessor.

Unfortunately it doesn’t always behave as you would expect: in place modifications will propagate upwards.

1
2
  Derived.properties << [:derived]
  Base.properties #=> [:useless, :derived]

Which is just horrible. The example is slightly less contrived when dealing with objects which you tend to modify in place. ActiveResource ran into this with its site property (an instance of URI) and rolls its own thing specially for this property that freezes things so that subclasses don’t mess with their parents.

Unfortunately it’s a decidedly quirky corner of ruby. Several different options with subtly different semantics that when they go wrong, go wrong in subtle ways that are easy to overlook and with no clear winner. Be careful!

[1] The important bit is actually when Derived and Derived2 create their @@bar variable. cattr_accessor does this for you so in this case the variable is created at the same time the class is.

Counting on Both Hands

This is more of an SQL hint than anything else, but it’s something I’ve found useful a number of times. If SQL scares you, now’s the time to turn back (or crack open a beer - dutch courage).

More often than not if I’m not displaying or editing something then I’m counting. How many unpaid invoices are there, how many customers to I have who were active in the last 6 months etc…

I’m sure I’m not teaching anyone anything if I say that Rails has some helpers for this, for example

1
2
3
Invoice.count :all, :conditions => {:status => 'unpaid'}
Customer.count :all, :conditions => ["updated_at > ?", 6.months.ago]
Invoice.sum :total, :conditions => {:status => 'unpaid'}

From top to bottom this counts the number of unpaid invoices, the number of customers updated in the last 6 months and the total value of all unpaid invoices. Pretty much any option you can stick in a find you can also use with the calculation helpers, for example:

1
2
  Customer.sum :weight, :joins => {:orders => :products}
  Customer.sum :weight, :joins => {:orders => :products}, :group => 'customers.id'

This sums the weight of all the items ordered by our customers (ignoring for now the quantity of a product in a given order). The second example groups this by customer.

But what if you want to sum (or count) more than one thing? For example I might want to know the number and the total value of the unpaid invoices. You could of course just do one and then the other but that feels a little wasteful: we’re asking the database to scan over the corresponding invoices once and then we turn around and ask it do it again. Luckily with some raw sql it’s not hard to do this in one go:

1
  connection.select_all "SELECT count(*), SUM(total) from invoices where status='unpaid'"

But what if you wanted to count more than one thing? For example I might want a count of all outstanding invoices and a count of those with non trivial value (say more than $10). Again easy enough to write as two queries, but it would be nice to get it all back in one go.

The key to this is understanding the link between summation and counting. To use a technical term, we’re interested in indicator functions. If we have: - a set X (which in our terms corresponds to all the rows in a table (or to be more correct, all the rows your query would return if it had no WHERE clause) - a subset A (the rows our WHERE clause would select). - a function I such that I(x) is 1 if x is in A and 0 if not

Then the cardinality (the number of things in it) of A is the sum of I(x) for x ranging over X.

That sounds complicated, but if you think about it, all it is saying is that to count the rows in a table you could use

1
SELECT SUM(1) from foos

Here our indicator function is dead simple: it always returns 1.

These three queries return the same thing

1
2
3
  SELECT COUNT(*) from invoices where status='pending'
  SELECT SUM(1) from invoices where status='pending'
  SELECT SUM( IF(status='pending', 1, 0) from invoices

The third form is the one we’re interested in. Again it shouldn’t be hard to spot why it works: for each row that matches we count 1, for all others we count 0. When we add these all up we’re just going to get the number of times we counted 1, i.e. the number of matching rows. We wouldn’t want to use that on its own (it won’t use an index and COUNT(*) probably takes some shortcuts) but in the context of our problem it’s just what we need:

1
  SELECT COUNT(*) as count, SUM(IF(total<10, 1, 0)) as small_invoices where status = 'pending'

will return the number of unpaid invoices and the number of invoices whose total is less than 10 dollars. Here our IF functions are our indicator functions: they return 1 if the condition evaluates to true and 0 if not.

Is it actually useful?

It’s going to depend on your data and queries. The ones shown here are probably too simple for it to be of any use since they also had to be easy to explain. When you push some of your conditions from the WHERE clause into the IF statements you are effectively stopping the database from using any indexes to solve those conditions. This can hurt you, but obviously if you weren’t using (or don’t have) indexes that the database could have used for those conditions then you haven’t lost anything.

The other basic premise is that if the database going over some set of data it might as well be counting more than one thing. So if in order to find rows satisfying condition A the database needs to scan some subset X and in order to satisfy condition B the database also needs to scan that same subset X then you’re onto a winner. If on the other hand the two are completely distinct (or if X is hopelessly big) then you won’t be saving much.

Typically I’ve used this the most in reporting style applications where having applied a common set of conditions I want to count the number of occurrences of a large number of features. Not something to be using willy-nilly, but a neat trick to have in your toolbox. Use it wisely. And as with all performance things, don’t do things blindly just because you read somewhere that it’s faster. Profile, measure and so on before and after and come to your own conclusions.

When Ducks Go Mutant

Ruby is a permissive language. In general we don’t care what objects are, as long as they respond in certain ways to certain methods: “If it walks like a duck and quacks like a duck, I would call it a duck”.

As you may recall, in Rails 2.1 there was a rewrite of the :include mechanism, however the old mechanism persists for compatiblity reasons. When using the new mechanism, the :included associations are not joined, and so if any part of your query looks references the tables that would formerly have been joined it won’t work.

To work around this Rails looks at the conditions, order, select statements and so on to see if any of them mention tables other than the main table. If they do then the fallback to the old code is triggered and everything works fine. The code that detects tables in the order clause looks something like

1
2
   return [] unless order && order.is_a?(String)
   order.scan(/([\.\w]+).?\./).flatten

The code is quite simple: if we ever have “something.” then that means we’re using the something table.

The code that eventually adds the order to the statement looks something like

1
  sql << " ORDER BY #{order}"

The code for the other options is similar.

So what’s the problem here? If as the api docs indicate, you pass a string containing a fragment of sql then nothing at all is wrong. However some people (I assume that the :conditions option is the origin of this habit) have taken to doing things like

1
  Foo.find :all, :order => ['bars']

Before 2.1, this happens to work, because the default to_s on an array just joins the strings together. However the table scanning code won’t scan an array (My guess is that the explicit check for string was because people quite legitimately write :order => :name and things like that). So if you’ve got a select or order clause specified as an array that depends on an included able it will break when you move to rails 2.1.

It’s a complete accident that this ever worked (and it breaks if you were to try anything like :order =>[‘name desc’, ‘age desc’]), but that isn’t a huge amount of comfort when code that has been working suddenly stops working. You could probably waste a lot of time before working out that it was specifying the order option as an array, which obviously is not a good thing (and makes people scared of upgrading). On the other hand it’s hard to anticipate how people will use things and explicitly checking types and so on isn’t a very rubyish thing to do and could get in the way of legitimate uses.

I’m not sure how a framework provider should handle this in the general case. It’s a delicate balance between not stifling some of flexibility ruby offers and helping programmers not rely on things that only work by accident.

Nested Includes and Joins

Nested eager loads are a not entirely obvious bit of rails syntax. It’s not hard once you get it though. In a nutshell you have to tell ActiveRecord how it should walk through the associations (i.e. just listing them isn’t enough). Before we get going it’s worth pointing out that although I’ve written :include everywhere, everything I’ve said applies equally to :include and :joins (see my previous post on the difference between the two).

When you are nesting includes, you’re building up a data structure that is inherently recursive. There are 3 rules for nested :includes: - Always use the association name. Not the table name, not the class name but the association name (whatever it is that you typed just after belongs_to, has_many etc…). A correlation is that if you don’t have all your associations set up, you’re dead in the water. If you’ve got one side of a relationship Rails won’t infer the other for you. - If you want to load multiple associations from a model, use an array. - If you want to load an association and some of its child associations then use a hash. The key should be the parent association name, the value should be a description of the child associations.

Now just combine those 3 rules and apply recursively. As an aid here’s a quick snippet that takes an include option (such as [:comments, {:posts => :authors}]) and describes which associations are loaded. The structure is the same as the code in activerecord that handles the :include option, so it should give some insight into how things work.

1
2
3
4
5
6
7
8
9
10
11
12
def describe associations, from = 'base'
  case associations
  when Symbol then puts "load #{associations} from #{from}"
  when Array then associations.each {|a| describe a, from}
  when Hash then
    associations.each do |parent, child|
      raise "Invalid hash - key must be an association name" unless parent.is_a?(Symbol)
      describe parent, from
      describe child, parent
    end
  end
end

This code isn’t too hard to understand. It’s all about the class of the associations parameter - The easy case is if it’s a string or a symbol: what we’ve got is just the name of an association, so just go ahead and load it. - If what we’ve got is an array, then just call ourselves recursively on the contents of that array. - if what we’ve got is a hash, then for each key value pair: - Load the association specified by the key (the parent association) - Load the associations specified by the value (from the from the parent association).

It produces (ugly) output like this:

1
2
3
4
  describe [{:comments => :user}, :category]
load comments from base
load user from comments
load category from base

which is exactly what activerecord would do. If you see something like “load user from comments” but instances of Comment don’t have an association named user then you’ve screwed up.

Examples

Still confused ? Here are some examples, from simple to complicated (these are purely examples of the :include syntax - don’t see this as a recommendation to actually load 10 layers of nested associations). The models are from a hypothetical book selling application and are: - Book - Author - Comment - User

Books belong to authors, and users of the site can leave a comment on any book. The obvious associations are defined. In addition user has a favourite_books association and through the friends association they can list other users who taste in books they generally share.

1
Book.find :all, :include => :author

I hope I don’t have to explain that one to anyone :-)

1
Book.find :all, :include => [:author, :comments]

We want to include both the author and comments, so we place the two names in an array

1
2
3
4
  Book.find :all, :include => [:author, {:comments => :user}]
  Book.find :all, :include => [{:comments => :user}]
  Book.find :all, :include => {:comments => :user}
  Book.find :all, :include => {:comments => [:user]}

In the first example we still want to include author and comments, but now we want to include an association from comments. From the 3rd rule we need a hash containing the key :comments and with corresponding value a description of the associations from the Comment model that we want to load (ie just :user in this case).

In the next 3 examples we just want to include the comments and the user from each comment. These three forms are entirely equivalent (which should be fairly obvious).

1
2
  Book.find :all, :include => {:comments => {:user => :favourite_books}}
  Book.find :all, :include => {:comments => {:user => {:favourite_books => :author}}}

Here we are loading the books’ comments, the user for each comment and the favourite books for each of those users. In the second example we’re also loading the author for each of those favourite books. You can keep on nesting these as far as you want.

1
  Book.find :all, :include => {:comments => {:user => {:favourite_books => [:author, :comments]}}}

Now we’ve come full circle - on each favourite book we’ve loaded the comments

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  Book.find :all, :include => {:comments =>
    {:user => [
       :friends,
       {:favourite_books => [:author, :comments]}
]}}
  Book.find :all, :include => {:comments =>
    {:user => [
      {:friends => :favourite_books},
      {:favourite_books => [:author, :comments]}
]}}
  Book.find :all, :include => {:comments =>
    {:user =>
      {:friends => :favourite_books,
       :favourite_books => [:author, :comments]
}}}

Our final examples. In addition to the favourite_books association, we’re loading a user’s friends, and in the second case the favourite books of those friends. The last two examples are identical: we can either have an array with two 1 item hashes, or just one hash with 2 items. We can’t do that in the first example: because we’re not loading any associations from friends we can’t make it into a hash (what would the corresponding value be?)