Space Vatican

Ramblings of a curious coder

Mixing :include and :conditions

We all love :include, it cuts down on the number of queries made to the database (especially when iterating over a collection) and generally makes the hamsters running the servers happier. It’s not all green pastures though and then eager loading code in Rails up to 2.0.x had some shortcomings (not least that some of it was mind bendingly complicated), so Rails 2.1 has a new eager loading scheme which works in a completely different way.

Consider the following snippet:

  Post.find :all, :include => [:author, :comments, :tags]

In Rails 2.0.2, this generates one big query that selects from posts and joins authors, comments and tags. Because of the way the joins work, if a post has 10 comments and 4 tags then that particular post will create 40 rows in the result set (for 16 actual distinct objects) that rails has to filter down. The more has_many associations you have at any single level, the worse it gets (and to top it off the processing of that large result set has an O(n2) element to it).

Build it up, break it down

In Rails 2.1, this is handled completely differently. Instead of building one big query, we can break it down into some easy ones. First a regular Post.find(:all) query is run, which yields a array of posts. Then Rails basically does

  Author.find :all, :conditions => ['id in (?)', posts.collect(&:author_id)]
  Comment.find :all, :conditions => ['post_id in (?)', posts.collect(&:id)]
  Tag.find :all, :conditions => ['post_id in (?)', posts.collect(&:id)]

and hooks up all of those objects to their respective owners. So you execute more queries, but those queries are easy (both for the db and for rails) and the number of queries relates to the number of associations you are getting, not the number of rows. They also play nice with things like Active Record Context. You could eager load 10 top level associations and it would be just 10 simple queries, rather than a complicated join on 10 tables. This approach also makes it a lot easier to handle associations with conditions, polymorphic belongs_to, associations with orders and so on.

Conditional confusion

So a lot of rambling and nothing very relevant to the title! Here it comes. Suppose comments have a moderated attribute, and we wish to defeat the spam bots by only showing moderated comments. With the :include in Rails 2.0.2 you can do something like this

  Post.find :all, :include => [:comments], :conditions => ['moderated = ?', true]

Easy, get all the posts but only get moderated comments. Except this doesn’t work. Sure it only gives you back moderated comments, but it also only returns posts with at least one moderated comment so that article you just posted isn’t ever going to get read because it has no comments. Oh noes! But why? Recall the giant sql statement, returning many rows. If we have some articles, one with a moderated comment and one with none, our result set (ignoring for a second the conditions) will look something like

id (for posts)body (for posts)id (for comments)body (for comments)moderated
1Welcome to the blogNULLNULLNULL
2This post is controversial1first!true
2This post is controversial2secondtrue
3My new articleNULLNULLNULL

So when our conditions are applied, they’ll knock out the rows for our first and third blog post. See, I wasn’t even making that shit up!

So mixing :conditions and :include has dubious semantics at best. It should be immediately apparent that this cannot work as is with the new :include scheme since there is no single query where both the posts and comments tables are loaded. It would however be possible to massage :conditions into only applying to the query we wanted (in this case the load from comments). For now, we can solve the original problem by adding a moderated_comments association (with a condition) to posts, and :including it. With the new eager loading scheme that will accomplish our original goal.

But I liked the old way!

If you do rely on the old behaviour, fret not: ActiveRecord will fall back to the old implementation if it notices that you are referencing tables other than the ‘main’ table. You’ve got to give ActiveRecord a hand though, by disambiguating your column names. In the example given previously, ActiveRecord wouldn’t pick up on it, and you’d get a nasty error back from your database.

  Post.find :all, :include => [:comments],
                  :conditions => ['comments.moderated = ?', true]

will make it clear to AR that it needs to fallback to the old code.

So there you have it. You now know everything about eager loading in ActiveRecord, both old and new! Use it wisely.