Fun with class variables
August 19th, 2008
Class variables are a slightly fiddly bit of ruby. They don't always quite behave the way you expect. ActiveSupport bundles up a large number of helpers to deal with the various cases. They're used extensively throughout the framework where the ability to control how things set at the framework level ripple down to subclasses (ie your models and controllers) is important to get right, but they can be pretty handy in your own apps too.
The Basics
If you're at all familiar with ruby you'll have heard @@. @@ is a bit odd because it creates variables that are visible both from the instance and the class. The value is also shared with subclasses. ActiveSupport adds cattr_accessor which creates the obvious accessor methods and initializes the value to nil. The accessors are created on both the class and instances of it (the instance writer is optional).
class Foo cattr_accessor :bar end Foo.bar #=> nil Foo.bar= '123' Foo.new.bar #=> '123' Foo.new.bar = 'abc' Foo.bar #=> 'abc'
If we create a subclass, the value is shared:
class Derived < Foo end Foo.bar= '123' Derived.bar #=> '123' Derived.bar = 'abc' Foo.bar #=> 'abc'
Other slightly odd things can happen:
class Base def self.bar @@bar end def self.bar=value @@bar end end class Derived < Base cattr_accessor :bar end class Derived2 < Base cattr_accessor :bar end Derived.bar = '123' Derived2.bar = 'abc' Derived.bar #=> '123'
So now the subclasses have independent class variables named @@bar. However if you set Base.bar before the Derived and Derived2 classes are created [1] then everyone will be share the base class' value as before. To summarize it's rather fiddly and often unintuitive. It also does not allow the fairly common pattern of having the base class setting a default value that subclasses can override. If you don't know about this then it can lead to odd situations. You'll have code that works fine in dev mode, but not in production (since in development classes are trashed and recreated between requests which obviously interacts with all this) or tests that pass when run individually but fail when you run several of them at the same time.
Classes are objects
Classes are no exception in ruby, like most things (everything?) they are objects, and like objects they can have instance variables. These instance variables are just normal instance variables: they don't have the odd scoping rules that @@ variables have. Base classes and derived classes have completely independent values.
class Base @bar = '123' class << self def bar @bar end def bar= value @bar = value end end end class Derived < Base; end Base.bar #=> '123 Derived.bar #=> nil Derived.bar = 'abc' Base.bar #=> "123"
Less prone to unwanted surprises, but not without shortfalls as you cannot make a default from a base class propagate down (without resorting to tricks with self.inherited).
Inheritable Tricks
ActiveSupport adds class_inheritable_accessor. This provides something closer in behaviour to what you might expect: base classes can provide defaults and subclasses inherit those defaults and can overwrite them without affecting other subclasses or the base class.
class Base class_inheritable_accessor :value self.value = '123' end class Derived < Base; end class Derived2 < Base; end Derived.value #=> '123' Derived.value = 'abc' Base.value #=> '123' Derived2.value #=> '123
So far so good. We override the value on derive and didn't perturb anyone else. Unfortunately there are some drawbacks:
Base.value = 'xyz' Derived2.value #=> '123
Oh dear. Even though Derived2 never overrode any values the change we made to Base didn't propagate. In other words the default values are baked into the subclass at the time that it is created. This is down to the way that class_inheritable_accessor is implemented: The classes have an instance variable @inheritable_attributes (like we saw above) that is a hash with all the attributes. The accessor methods just pull the values in and out of the hash. When a class is subclassed the subclass gets a copy of @inheritable_attributes. Once this has happened, nothing links the base class' attributes with the subclass.
Bags of tricks
ActiveSupport also provides class_inheritable_array and class_inheritable_hash. They both use class_inheritable_accessor as their underlying mechanism. When you set a class_inheritable_array or a class_inheritable_hash you are actually concatenating (or merging) with the value inherited from the super class.
class Base class_inheritable_hash :attrs self.attrs = {:name => 'Fred'} end class Derived < Base self.attrs = {:export => 'Pain'} end Derived.attrs #=> {:name => 'Fred', :export => 'Pain'}
These aren't particularly magic but are a handy shortcut.
Delegation for the nation
ActiveSupport's final trick is superclass_delegating_accessor, added in rails 2.0.1. At first it appears very similar to class_inheritable_accessor:
class Base superclass_delegating_accessor :properties self.properties = [] end class Derived < Base; end
This time however we can do this:
Base.properties = [:useless] Derived.properties #=> [:useless]
superclass_delegating_accessor creates regular instance variables in the class. The interesting bit is the reader method: it looks at the current class and checks if the appropriate instance variable is defined. If so, it returns it, if not it calls super (ie gets the superclass' instance variable) and so it. It stops when it reaches the class that created the superclass_delegating_accessor.
Unfortunately it doesn't always behave as you would expect: in place modifications will propagate upwards.
Derived.properties << [:derived] Base.properties #=> [:useless, :derived]
Which is just horrible. The example is slightly less contrived when dealing with objects which you tend to modify in place. ActiveResource ran into this with its site property (an instance of URI) and rolls its own thing specially for this property that freezes things so that subclasses don't mess with their parents.
Unfortunately it's a decidedly quirky corner of ruby. Several different options with subtly different semantics that when they go wrong, go wrong in subtle ways that are easy to overlook and with no clear winner. Be careful!
[1] The important bit is actually when Derived and Derived2 create their @@bar variable. cattr_accessor does this for you so in this case the variable is created at the same time the class is.
Counting on both hands
August 17th, 2008
This is more of an SQL hint than anything else, but it's something I've found useful a number of times. If SQL scares you, now's the time to turn back (or crack open a beer - dutch courage).
More often than not if I'm not displaying or editing something then I'm counting. How many unpaid invoices are there, how many customers to I have who were active in the last 6 months etc...
I'm sure I'm not teaching anyone anything if I say that Rails has some helpers for this, for example
Invoice.count :all, :conditions => {:status => 'unpaid'} Customer.count :all, :conditions => ["updated_at > ?", 6.months.ago] Invoice.sum :total, :conditions => {:status => 'unpaid'}
From top to bottom this counts the number of unpaid invoices, the number of customers updated in the last 6 months and the total value of all unpaid invoices. Pretty much any option you can stick in a find you can also use with the calculation helpers, for example:
Customer.sum :weight, :joins => {:orders => :products} Customer.sum :weight, :joins => {:orders => :products}, :group => 'customers.id'
This sums the weight of all the items ordered by our customers (ignoring for now the quantity of a product in a given order). The second example groups this by customer.
But what if you want to sum (or count) more than one thing? For example I might want to know the number and the total value of the unpaid invoices. You could of course just do one and then the other but that feels a little wasteful: we're asking the database to scan over the corresponding invoices once and then we turn around and ask it do it again. Luckily with some raw sql it's not hard to do this in one go:
connection.select_all "SELECT count(*), SUM(total) from invoices where status='unpaid'"
But what if you wanted to count more than one thing? For example I might want a count of all outstanding invoices and a count of those with non trivial value (say more than $10). Again easy enough to write as two queries, but it would be nice to get it all back in one go.
The key to this is understanding the link between summation and counting. To use a technical term, we're interested in indicator functions. If we have:
- a set X (which in our terms corresponds to all the rows in a table (or to be more correct, all the rows your query would return if it had no WHERE clause)
- a subset A (the rows our WHERE clause would select).
- a function I such that I(x) is 1 if x is in A and 0 if not
Then the cardinality (the number of things in it) of A is the sum of I(x) for x ranging over X.
That sounds complicated, but if you think about it, all it is saying is that to count the rows in a table you could use
SELECT SUM(1) from foos
Here our indicator function is dead simple: it always returns 1.
These three queries return the same thing
SELECT COUNT(*) from invoices where status='pending' SELECT SUM(1) from invoices where status='pending' SELECT SUM( IF(status='pending', 1, 0) from invoices
The third form is the one we're interested in. Again it shouldn't be hard to spot why it works: for each row that matches we count 1, for all others we count 0. When we add these all up we're just going to get the number of times we counted 1, i.e. the number of matching rows. We wouldn't want to use that on its own (it won't use an index and COUNT(*) probably takes some shortcuts) but in the context of our problem it's just what we need:
SELECT COUNT(*) as count, SUM(IF(total<10, 1, 0)) as small_invoices where status = 'pending'
will return the number of unpaid invoices and the number of invoices whose total is less than 10 dollars. Here our IF functions are our indicator functions: they return 1 if the condition evaluates to true and 0 if not.
Is it actually useful?
It's going to depend on your data and queries. The ones shown here are probably too simple for it to be of any use since they also had to be easy to explain. When you push some of your conditions from the WHERE clause into the IF statements you are effectively stopping the database from using any indexes to solve those conditions. This can hurt you, but obviously if you weren't using (or don't have) indexes that the database could have used for those conditions then you haven't lost anything.
The other basic premise is that if the database going over some set of data it might as well be counting more than one thing. So if in order to find rows satisfying condition A the database needs to scan some subset X and in order to satisfy condition B the database also needs to scan that same subset X then you're onto a winner. If on the other hand the two are completely distinct (or if X is hopelessly big) then you won't be saving much.
Typically I've used this the most in reporting style applications where having applied a common set of conditions I want to count the number of occurrences of a large number of features. Not something to be using willy-nilly, but a neat trick to have in your toolbox. Use it wisely. And as with all performance things, don't do things blindly just because you read somewhere that it's faster. Profile, measure and so on before and after and come to your own conclusions.
When ducks go mutant
August 9th, 2008
Ruby is a permissive language. In general we don't care what objects are, as long as they respond in certain ways to certain methods: "If it walks like a duck and quacks like a duck, I would call it a duck".
As you may recall, in Rails 2.1 there was a rewrite of the :include mechanism, however the old mechanism persists for compatiblity reasons. When using the new mechanism, the :included associations are not joined, and so if any part of your query looks references the tables that would formerly have been joined it won't work.
To work around this Rails looks at the conditions, order, select statements and so on to see if any of them mention tables other than the main table. If they do then the fallback to the old code is triggered and everything works fine. The code that detects tables in the order clause looks something like
return [] unless order && order.is_a?(String) order.scan(/([\.\w]+).?\./).flatten
The code is quite simple: if we ever have "something." then that means we're using the something table.
The code that eventually adds the order to the statement looks something like
sql << " ORDER BY #{order}"
The code for the other options is similar.
So what's the problem here? If as the api docs indicate, you pass a string containing a fragment of sql then nothing at all is wrong. However some people (I assume that the :conditions option is the origin of this habit) have taken to doing things like
Foo.find :all, :order => ['bars']
Before 2.1, this happens to work, because the default to_s on an array just joins the strings together. However the table scanning code won't scan an array (My guess is that the explicit check for string was because people quite legitimately write :order => :name and things like that). So if you've got a select or order clause specified as an array that depends on an included able it will break when you move to rails 2.1.
It's a complete accident that this ever worked (and it breaks if you were to try anything like :order =>['name desc', 'age desc']), but that isn't a huge amount of comfort when code that has been working suddenly stops working. You could probably waste a lot of time before working out that it was specifying the order option as an array, which obviously is not a good thing (and makes people scared of upgrading). On the other hand it's hard to anticipate how people will use things and explicitly checking types and so on isn't a very rubyish thing to do and could get in the way of legitimate uses.
I'm not sure how a framework provider should handle this in the general case. It's a delicate balance between not stifling some of flexibility ruby offers and helping programmers not rely on things that only work by accident.
Nested includes and joins
August 8th, 2008
Nested eager loads are a not entirely obvious bit of rails syntax. It's not hard once you get it though. In a nutshell you have to tell ActiveRecord how it should walk through the associations (i.e. just listing them isn't enough). Before we get going it's worth pointing out that although I've written :include everywhere, everything I've said applies equally to :include and :joins (see my previous post on the difference between the two).
When you are nesting includes, you're building up a data structure that is inherently recursive. There are 3 rules for nested :includes:
- Always use the association name. Not the table name, not the class name but the association name (whatever it is that you typed just after belongs_to, has_many etc...). A correlation is that if you don't have all your associations set up, you're dead in the water. If you've got one side of a relationship Rails won't infer the other for you.
- If you want to load multiple associations from a model, use an array.
- If you want to load an association and some of its child associations then use a hash. The key should be the parent association name, the value should be a description of the child associations.
Now just combine those 3 rules and apply recursively. As an aid here's a quick snippet that takes an include option (such as [:comments, {:posts => :authors}]) and describes which associations are loaded. The structure is the same as the code in activerecord that handles the :include option, so it should give some insight into how things work.
def describe associations, from = 'base' case associations when Symbol then puts "load #{associations} from #{from}" when Array then associations.each {|a| describe a, from} when Hash then associations.each do |parent, child| raise "Invalid hash - key must be an association name" unless parent.is_a?(Symbol) describe parent, from describe child, parent end end end
This code isn't too hard to understand. It's all about the class of the associations parameter
- The easy case is if it's a string or a symbol: what we've got is just the name of an association, so just go ahead and load it.
- If what we've got is an array, then just call ourselves recursively on the contents of that array.
- if what we've got is a hash, then for each key value pair:
- Load the association specified by the key (the parent association)
- Load the associations specified by the value (from the from the parent association).
It produces (ugly) output like this:
describe [{:comments => :user}, :category]
load comments from base
load user from comments
load category from base
which is exactly what activerecord would do. If you see something like "load user from comments" but instances of Comment don't have an association named user then you've screwed up.
Examples
Still confused ? Here are some examples, from simple to complicated (these are purely examples of the :include syntax - don't see this as a recommendation to actually load 10 layers of nested associations). The models are from a hypothetical book selling application and are:
- Book
- Author
- Comment
- User
Books belong to authors, and users of the site can leave a comment on any book. The obvious associations are defined. In addition user has a favourite_books association and through the friends association they can list other users who taste in books they generally share.
Book.find :all, :include => :author
I hope I don't have to explain that one to anyone :-)
Book.find :all, :include => [:author, :comments]
We want to include both the author and comments, so we place the two names in an array
Book.find :all, :include => [:author, {:comments => :user}] Book.find :all, :include => [{:comments => :user}] Book.find :all, :include => {:comments => :user} Book.find :all, :include => {:comments => [:user]}
In the first example we still want to include author and comments, but now we want to include an association from comments. From the 3rd rule we need a hash containing the key :comments and with corresponding value a description of the associations from the Comment model that we want to load (ie just :user in this case).
In the next 3 examples we just want to include the comments and the user from each comment. These three forms are entirely equivalent (which should be fairly obvious).
Book.find :all, :include => {:comments => {:user => :favourite_books}} Book.find :all, :include => {:comments => {:user => {:favourite_books => :author}}}
Here we are loading the books' comments, the user for each comment and the favourite books for each of those users. In the second example we're also loading the author for each of those favourite books. You can keep on nesting these as far as you want.
Book.find :all, :include => {:comments => {:user => {:favourite_books => [:author, :comments]}}}
Now we've come full circle - on each favourite book we've loaded the comments
Book.find :all, :include => {:comments => {:user => [ :friends, {:favourite_books => [:author, :comments]} ]}} Book.find :all, :include => {:comments => {:user => [ {:friends => :favourite_books}, {:favourite_books => [:author, :comments]} ]}} Book.find :all, :include => {:comments => {:user => {:friends => :favourite_books, :favourite_books => [:author, :comments] }}}
Our final examples. In addition to the favourite_books association, we're loading a user's friends, and in the second case the favourite books of those friends. The last two examples are identical: we can either have an array with two 1 item hashes, or just one hash with 2 items. We can't do that in the first example: because we're not loading any associations from friends we can't make it into a hash (what would the corresponding value be?)
Parametrised to the max
July 18th, 2008
If you’ve played with Rails for more that about 3 minutes you’ll know that the params hash can itself contain hashes. If you use form_for/fields_for then you’re used to things like params[:user] containing those fields pertaining to the user and things like that. But it can also contain arrays, arrays of hashes and so on.
Fundamentally html forms don’t know about any sort of structured data. All they know about is name-value pairs. Rails tacks some conventions onto parameter names which it uses to express some structure.
A lot of the time you don’t really care about how this happens. You use the helpers, rails generates its magic form element names and more magic on the other side stuffs that into your params hash. Every now and again you need to head off the beaten track, and there it’s useful to understand how things all fit together (eg if you are generating data to submit via javascript).
Boring cases & parlour tricks
First off we’ll need a way of trying out things. You could of course just mock up a form with the appropriate input elements but this would massively slow down the process of trying out stuff which is never a good thing. Instead we can tap into what Rails uses. Just open up a script/console prompt:
ActionController::AbstractRequest.parse_query_parameters "name=fred&phone=0123456789" => {"name"=>"fred", "phone"=>"0123456789"}
Just stick name=value pairs (joined with &) into a string and pass it to the above function. What you get back is what would be in the params hash in your controller. From now on, for the sake of brevity I’ll just write parse instead of ActionController::AbstractRequest.parse_query_parameters.
If you’ve ever looked at the html your app generates you will have seen form inputs with names like user[name] (as generated by the text_field or form_for helpers). This notation indicates that the user parameter should be a hash, and that here we’re talking about the name key in that hash:
parse "user[name]=fred&user[phone]=0123456789" => {"user"=>{"name"=>"fred", "phone"=>"0123456789"}}
The other basic structure is an array. By default, repeated parameters with the same name are ignored and only the first appearance counts.
parse "name=fred&name=bob&name=henry" => {"name"=>"fred"}
If the name ends with [] then instead of discarding extra occurrences they are accumulated in an array:
parse "aliases[]=fred&aliases[]=bob&aliases[]=dark+prince&things[]=foo" => {"aliases"=>["fred", "bob", "dark prince"], "things"=>["foo"]}
One use case might be a set of checkboxes which indicate which folders a user has to. Just create all your checkboxes with the name accessible_folder_ids[] and with the ids of the folders as values, for example:
<% @folders.each do |f| %> <%= check_box_tag 'user[accessible_folder_ids][]', f.id %> <%= h f.name %> <br/> <% end %>
Stick this in a form and when you submit it, params[:user][:accessible_folder_ids] will be an array containing the ids of the folders the user can play with.
Nested parameters for dummies
We can nest hashes if we want. For example perhaps a user has an associated model such as an address:
parse "user[name]=fred&user[address][town]=cambridge& user[address][line1]=4+Station+road" => {"user"=>{"name"=>"fred", "address"=>{"line1"=>"4 Station road", "town"=>"cambridge"}}}
A member of a hash can of course be an array. To do this you just append [] to the parameter name, as with a top level array parameter:
parse "user[aliases][]=fred&user[aliases][]=bob&user[aliases][]=dark+prince" => {"user"=>{"aliases"=>["fred", "bob", "dark prince"]}}
The array can be several levels down: we can improve a previous example of a user with an address by saying that the address should have a lines array containing the lines from the address. Start with user[address][lines] and then just append [] to indicate the parameter should be an array:
parse "user[name]=fred&user[address][town]=cambridge&user[address][lines][]=Random+house& user[address][lines][]=4+Station+road&user[address][lines][]=By+the+station" => {"user"=> { "name"=>"fred", "address"=>{ "lines"=>["Random house", "4 Station road", "By the station"], "town"=>"cambridge" }}}
There’s another case in which nesting like this can be useful. Support for example that we have a list of users and we want the user to be able to change as many of them as they want with one edit operation. If we use the id of the records as the first key then this is easy:
parse "users[1][name]=fred&users[1][email]=fred@example.com& users[2][name]=bob&users[2][email]=bob@example.com" =>{"users"=>{ "1"=>{"name"=>"fred", "email"=>"fred@example.com"}, "2"=>{"name"=>"bob", "email"=>"bob@example.com" }}}
The corresponding controller code is similarly simple
params[:users].each do |id, new_attributes| User.find(id).update_attributes new_attributes end
(Handling errors and invalid entries when editing multiple elements is an interesting problem in itself and is left as an exercise to the reader). If you are using the Rails form helpers, the :index option does precisely this:
helper.text_field "person", "name" => '<input id="person_name" name="person[name]" size="30" type="text" />' helper.text_field "person", "name", "index" => 1 => '<input id="person_1_name" name="person[1][name]" size="30" type="text" />'
Editing multiple objects
An interesting case is a form allowing us to create several models, for example to add several users to a mailing list. To do this we use parameters of the form users[][name]. This says that users is an array and we’re pushing an hash with key name onto it. So for example
parse "users[][name]=fred&users[][email]=fred@example.com" => {"users"=>[{"name"=>"fred", "email"=>"fred@example.com"}]}
So how does rails know when one record is finished and the next start? Simple: if we’ve already had a users[][name] parameter and we see a new one then we can usually assume that the next one must belong to a new user
parse "users[][name]=fred&users[][email]=fred@example.com& users[][name]=bob&users[][email]=bob@example.com" => {"users"=>[{"name"=>"fred", "email"=>"fred@example.com"}, {"name"=>"bob", "email"=>"bob@example.com"}]}
To continue with a previous example, a user might have several addresses:
parse "user[name]=fred&user[addresses][][line]=24+bob+street&user[addresses][][town]=cambridge& user[addresses][][line]=1+market+square&user[addresses][][town]=bedford" => {"user"=>{"name"=>"fred", "addresses"=>[ {"line"=>"24 bob street", "town"=>"cambridge"}, {"line"=>"1 market square", "town"=>"bedford"}] }}
Turtles all the way down
Hashes can be nested as much as you want:
parse "users[a][b][c][d]=fred" => {"users"=>{"a"=>{"b"=>{"c"=>{"d"=>"fred"}}}}}
Arrays can’t be nested: you can can’t for example have a parameter which is an array of arrays. You can have a hash with an array parameter or an array of hashes but in general there can only be one level of ‘arrayness’. It’s easy enough to see why this is: arrays are built by repeating the same parameter name multiple times, however if you are inside an array then parameter name repetition is precisely what rails uses to determine whether it should move on to the next array element[1].
To an extent this can be sidestepped, for example instead of an array of users you can have a hash of users keyed by id. If your data structure is genuinely just an array you can always make it into a hash that is keyed by array index. For example if we had a form displaying a list of users and each user has a name and a list of aliases then the following query string parses into what we want:
parse "users[1][name]=fred&users[1][aliases][]=joker&users[1][aliases][]=the+bat& users[2][name]=bob&users[2][aliases][]=bobbo&users[2][aliases][]=bobster" =>{"users"=>{ "1"=>{"name"=>"fred", "aliases"=>["joker", "the bat"]}, "2"=>{"name"=>"bob", "aliases"=>["bobbo", "bobster"]} }}
Where it can go wrong
Consider the following query string:
user[aliases][]=fred&user[aliases][name]=fred
This can’t work: user[aliases][] indicates that user has an aliases attribute that’s an array, but user[aliases][name] indicates that user has an aliases attribute that’s a hash. There are other ways in which this can happen, but the result is the same:
TypeError: Conflicting types for parameter containers
[1] This is also the reason for the problem Xavier mentions in his comment. Checkboxes submit no value if they are not checked, however it’s rather convenient to create the illusion that they send (for example) 1 if the box is checked, 0 if not.
The rails check_box helper does this by adding a hidden field with the same parameter name. If the checkbox submits nothing then the hidden field “wins” and submits 0, if the checkbox is checked then it wins because it’s the first parameter. However since rails uses parameter name repetition to distinguish between elements of an array the hidden parameter causes rails to start a new array element. I don’t know of a good workaround other than using a hash instead of an array or using check_box_tag instead of check_box.
The difference between :include and :joins
June 22nd, 2008
I think some people occasionally mix up :include and :joins (or possibly don’t know of the existance of :joins).
:include is for loading associations. Before Rails 2.1 it will always left outer join the appropriate tables, starting with rails 2.1 it will either load them with a separate query or it will join the appropriate tables.
:joins is for joining (duh). It either takes an sql fragment (eg “INNER JOIN foos on foos.id = foo_id”) or association names in a variety of ways and joins the relevant tables. For example:
:joins => :products :joins => [:products, :customers] :joins => [:products, {:customers => [:friends, :foes]}] :joins => [:products, {:customers => {:friends => :parents}}] :joins => {:order => {:customer => {:address => :some_other_association}}}
and so on. You can nest these as deeply as you want (although bear in mind that if you end up requiring a 23 way join you may want to rethink your strategy). The same notation for nested associations is used by :include. The one difference is that :joins creates inner joins for you (if you desperately need those outer joins, you can always use the string form of :joins, but you will have to write the sql fragment explicitly).
If what you want is to use attributes from the joined tables for sorting or in your conditions then both :include and :joins will work since they both cause the relevant tables to be joined. But :include then does a lot more work massaging the results that come back from the database, instantiating lots of activerecord objects, gluing together all the relationships in the appropriate manner. If all you wanted was to order or filter results based on some of the joined attributes then this work is wasted.
Just to show that I’m not making things up, I performed the following unscientific test:
Benchmark.bm(7) do |x| x.report("include:") do Customer.find :all, :include => :customer_detail, :conditions => "customer_details.date_of_birth >= '1980-01-1'" end x.report("joins:") do Customer.find :all, :joins => :customer_detail, :conditions => "customer_details.date_of_birth >= '1980-01-1'" end end
Some customers have provided extra details which we store in a separate table, and we want to get all customers born after 1980.
The results:
| user | system | total | real | |
|---|---|---|---|---|
| include | 1.310000 | 0.040000 | 1.350000 | ( 1.532910) |
| joins | 0.350000 | 0.040000 | 0.390000 | ( 0.454001) |
In short, don’t use :include unless you will actually be accessing those associations and want to avoid the hit caused by loading them from the database one by one.
Berlin, here we come!
June 10th, 2008
Hot off the presses: the Railsconf Europe proposal submitted by my colleague Paul Butcher and myself has been accepted! Join us in Berlin to here about the nifty stuff we've been up to. I can't wait!
Dealing with concurrency
June 8th, 2008
Sometimes you don’t want a user doing two things at once (or two users doing something to the same third party at once). Dealing with this sort of issue is quite fiddly as its easy to overlook them and hard to track them down when they happen as they tend to be heavily timing dependant[4]. There’s a few tricks in the rails toolbox for dealing with them.
There’s a related issue that this post is not about: suppose two users bring up the edit form for some object. They each make unrelated changes to the form and save, one after the other. The second user will squash the changes made by the first user. Oops. I’m not concerned about that, only with what happens if the two requests are actually concurrent, whereas here the two save requests could be minutes apart.
One way of solving this problem is for editable objects to have an edited_by association and disallowing edits if someone else is in the process of editing. Another way is to get clever about how you apply changes to attributes. While not directly relevant, the things outlined here may still be useful, for example if you go down the route of only allowing the user named in the edited_by association to edit the record then you need to be able to reliably set that field even if two users click the edit button at the same time.
And now, our feature presentation
When I was at school the teacher used to dish out ‘bon points’ when you did something good. You could later trade those in for stickers and stuff. In a world very far removed from French primary school classrooms of 1990, the teacher might have written a webapp allowing you to do this (probably complete with an obnoxious facebook app where you could show everyone how many magic stars you have).
Optimism
The people table has a magic_stars column, and we want people to be able to spend their magic stars on various perks. if someone has been a good boy, you might want to use the credit method:
def credit self.magic_stars += 1 end
You need to be a little bit careful, as there’s a race condition here: suppose two people try to give someone a magic star at the same time
| Person 1 | Person 2 |
|---|---|
| loads child (3 stars) | loads child (3 stars) |
| adds 1 | adds 1 |
| saves (4 stars) | |
| saves (4 stars) |
The 2 requests execute in separate mongrels (or mod_rails listeners etc…) and are completely oblivious to each other and will happily overwrite the change made by the other one. Rails’ optimistic locking will help us here: Just add an integer lock_version column to the model (don’t forget to make it default to 0) and the second ‘bad’ save will raise ActiveRecord::StaleObjectError. We can rewrite our credit method like this:
def credit self.magic_stars += 1 rescue ActiveRecord::StaleObjectError reload retry end
So how does optimistic locking work? The key (unsurprisingly) is in the lock_version column. Assume that we loaded a person object with a lock_version of 4 and we now want to save it. We increment our lock version to 5. If no one else has touched this object then in the database it will still have a lock_version of 4.
Rails appends “WHERE lock_version = 4” to the update query, and then examines the number of rows that were updated (the database driver will return this). If we get 1 then we know everything went ok. If 0 rows were updated then we know that it must have been because someone touched that row and so we raise StaleObjectError.
Optimistic looking is nice because we’re not sitting on a database lock at any point, so we won’t hold up anyone else.
Optimism only gets you so far
Lets move on to another function: allowing children to swap their magic stars for some sort of perk. We need to check that they’ve got enough stars and if they do debit the appropriate amount and give them their perk. This is one of those classic times where you want a transaction: you don’t want a classroom full of 6 year olds screaming because you took their stars but didn’t give them their reward. You might write
def buy treat transaction do if magic_stars >= treat.cost self.magic_stars -= treat.cost self.treats << treat save! end end end
[1]If something bad happens halfway through then the transaction will roll back all the changes together.
There’s a big problem with this code. Suppose the child tries to buy two things in very quick succession. They’ve got 10 stars to begin with and each item costs 5. If the timing is right, then things will look like this:
| Connection 1 | Connection 2 |
|---|---|
| start transaction | |
| Check number of stars | start transaction |
| set number of stars to 5 | check number of stars |
| add the treat | set number of stars to 5 |
| save | add the treat |
| save | |
| commit the transaction | commit the transaction |
The 2 connections walk all over each other. In this case the children will be happy: both connections will set the remaining number of stars to 5, but both items will have been bought!
So why didn’t our optimistic locking help? Because of transaction isolation: in most databases by default your transaction won’t see changes made by other, uncommitted transactions. In fact you won’t see any changes made after your transaction was started. So when rails tries to save, as far as it can tell the row hasn’t changed and the save raises no errors.
Pessimism
The name optimistic locking suggests that there is an alternative, and there is: ask the database to get an actual lock for you. You can do this in two ways in rails: you can pass :lock => true to a finder (instead of passing true you can pass an sql fragment if you want to change the type of lock acquired) or you can call the lock! method (which is effectively a reload with :lock => true). It’s important to note that a lock is held as long as the current transaction. In particular if you aren’t inside a transaction then nothing happens, or in other words
def do_something_with_lock lock! do_something end
doesn’t accomplish anything at all. Inside a transaction however, you’ll get an exclusive lock and no other transaction will be able to get a lock for that row. We can rewrite our buy method to look like
def buy treat transaction do lock! ... end end
The lock stops another connection from fiddling with that person at the same time. It’s just a row lock, so it won’t stop you editing a different person at the same time[2]. If someone called our credit method from above at the same time that would be ok, as updating the customer row to change the amount of stars they have also requires an exclusive lock, so we’re in no danger here.
We can use these locks in more general ways, even if we don’t want to actually lock the customer row. For example if you have a constraint like a person may only borrow a certain number of books then you pretty much have to do it in ruby which renders you vulnerable to the various race conditions mentioned before.
You’re not modifying any rows (just adding one to a join table), so optimistic locking is no good. You can however lock the person row[3]. You’re not actually changing it at all, just using the database as a synchronisation method. We can repackage this up as something we can use all over our app:
def exclusive transaction do lock! yield end end
Then in your app you can do things like
some_customer.exclusive do #some task end
It should be an easy exercise for the reader to put this in a form where any ActiveRecord model has this exclusive method.
[1]People sometimes ask about the difference between Person.transaction, the instance method transaction and ActiveRecord::Base.transaction. There is no difference between the first two – the instance method just callsself.class.transaction. The transaction method on Person, ActiveRecord::Base or some other model class only differ if the models use a different database connection. Most of the time, the three can be used interchangeably.
[2]If you use a database without row locks (ie mysql with myisam tables) you’ll probably get something horrible like a table lock.
[3]Do be careful about deadlocks though.
[4]The sleep function can be rather helpful when you’re trying to force particular bits of code to overlap with each other in particular ways.
Squeeze your pipes
May 29th, 2008
It’s very easy to merrily write a web application without realising that all those images and ajax calls you’re using make it rather sluggish when you’re not just connecting to localhost. Even when you’ve uploaded it to your production or test servers chances are you’ll have pretty good bandwidth and latency between you and those servers so it can be hard to see just what it will be like for an enduser who is less well endowed in the broadband department (or even still on dialup).
It’s not just about size
Very often it’s not just about how many bytes a second you could be transfering, the amount of latency is also a critical component of what the app feels like to the end user. Luckily we can simulate both.
The piece of kit we need is a traffic shaper, a piece of software that sits in between you and the end server and modulates the flow of packets. If you’re using linux or Mac OS X you’ve probably already got everything you need. I’m a mac nerd so I’ll concentrate on that.
Luckily it’s pretty damn easy on the mac as ipfw has all the necessary bits (from 10.4 upwards). The first thing we need to setup is a pipe. To quote the man page, “A pipe emulates a link with given bandwidth, propagation delay, queue size and packet loss rate”.
ipfw pipe 1 config bw 300Kbit/s
ipfw pipe 2 config bw 500Kbit/s delay 100
This creates 2 pipes: one with a bandwidth of 300Kbit/s, and the other with a bandwidth of 500Kbit/s and a propagation delay of 100ms (ie when a packet goes into that pipe it won’t come out the other end for 100ms)
Having setup the pipe, you then need to add firewall rules that send traffic through those routes. For example if you’ve got your mongrel running on port 3000 you can run
sudo ipfw add pipe 2 tcp from any to any src-port 3000
sudo ipfw add pipe 2 tcp from any to any dst-port 3000
to send traffic to/from port 3000 through pipe number 2. When you’re done with your testing or if you mess up, run
sudo ipfw show
to show all the firewall rules you’ve created. The output will look a little like
00300 72 18982 pipe 2 tcp from any 4500 to any
00400 24 1896 pipe 2 tcp from any to any dst-port 4500
65535 58931 55276029 allow ip from any to any
The first column is the rule number (so 300 and 400 in my case). To remove the rules just run
sudo ipfw delete rule-number. This isn’t the sort of thing I’d even begin to worry about at the beginning of the project, but worth keeping at the back of your mind.
Reload me, Reload me not
May 28th, 2008
Rails’ development[1] mode automatic reloading is pretty nifty. It would really suck having to restart the server everytime you made a change, both in terms of the seconds wasted each time, the manual pressing of buttons you’d have to do and the 5 minutes wasted here and there when you forgot to restart. There’s no point reloading rails itself though, so all that stuff stays around. No point reloading plugins either really. I should probably point out that reloading is a misnomer: it implies things are actively loaded (which they aren’t). What actually happens is sort of that Rails forgets that it has seen Customer and loaded customer.rb, so when it hits Customer it goes off down the const_missing chain and loads your file again.
Although if you were working on a plugin or trying to diagnose a problem with a plugin it could be useful, but maybe that’s an edge case. That’s until your plugin starts to contain something with long lived reference to one of your application’s classes (or an instance thereof). A classic example is model classes with associations going over the plugin/app boundary. A quick trip to the console shows the problem:
I’ve got a teeny tiny app with 2 models: messages and conversations (think forums): conversations have many messages.
>> c = Conversation.find :first => #<Conversation id: 1, title: "hello world", created_at: "2008-05-28 09:03:58", updated_at: "2008-05-28 09:03:58"> >> c.messages => [] >> reload! Reloading... => true >> c.messages NoMethodError: undefined method `messages' for #<Conversation:0x1849ccc>
reload! makes Rails do its class reloading (a useful trick in itself if you’re fiddling around at the console and want to see some changes). But what happened? We had a perfectly good instance of conversation and then it got trashed! Lets take a closer look:
>> first_class = Conversation => Conversation(id: integer, title: string, created_at: datetime, updated_at: datetime) >> first_class.object_id => 13302014 >> first_class.instance_methods - ActiveRecord::Base.instance_methods => ["message_ids", "message_ids=", ... ] >> reload! Reloading... => true >> first_class.instance_methods - ActiveRecord::Base.instance_methods => [] >> Conversation.object_id => 13772244
So we stash away the conversation class. It’s got an object id, the methods we would expect, all good. Then we reload! and all the methods are gone. If you ask for Conversation again you get a different class! This is how class reloading of ActiveRecord classes works: the old class is gutted and the constant removed, but we can’t stop people hanging onto references to the old class [3](incidentally if you ever get an incomprehensible message about methods not existing when you can see the method definitions right in front of you then you’ve probably run into a variant of this).
So if you’ve got a plugin holding onto a reference to some class, this is what is likely to happen to it, nonsensical messages about methods not existing when they really should. Maddeningly, the first time you load a page it will load fine, but refresh it and it’s gone! Your tests will pass too, and if you’re even half sane then your app dying even though the tests pass will give you a bad feeling. If you’re lucky the only bad thing will be that changes won’t be noticed until a restart, but even that’s quite annoying (especially if you didn’t expect it – application classes are reloaded after all !).
The way out
The obvious way out is to reload the plugin as well[4]. The set of things that are loaded only once is controlled by Dependencies.load_once_paths [5](except for rails itself, that’s special) and by default plugin lib directories are added to it. As long as you load via the rails dependency mechanisms, any constant loaded from a path not in that array will be reloaded after a request. This is why a ruby style require of application classes is usually a bad thing: it can stop rails from reloading some of your classes which leads to the problems seen above.
In my dummy app i’ve got
>> Dependencies.load_once_paths => ["/Users/fred/empty_app/vendor/plugins/dummy_plugin/lib"]
There’s an easy way out: in the plugin’s init.rb just stick [2]
Dependencies.load_once_paths.delete(File.expand_path(File.dirname(__FILE__))+'/lib')
If we boot up our sample app again:
>> Dependencies.load_once_paths => []
Job done!
[1]None of this applies if config.cache_classes is true (for example in production mode).
[2]If you load plugins from non standard locations, you may have to fiddle with that: the string you delete must exactly match what rails put in Dependencies.load_once_paths, that it corresponds to the same location on disk is not enough.
[3]This does mean I was lying slightly when I said this could be handy if you’re working on a plugin: if what your plugin is doing is extending some rails base class with a module, then this won’t help. Rails won’t have stripped the methods from the module (so nothing will break), but ActionController (for example) will still have the old module included in it and so won’t see the changes.
[4]Another lie. Aren’t I naughty? The classes from the plugin’s lib folder will be reloaded as needed but the plugin’s init.rb won’t be run again (which in a way is the reason for the previous caveat: we never include the new module into ActionController. Rerunning init.rb would be messy though, apart from anything else it wouldn’t unwind the changes we made, so if we removed some methods from our module they would still be in ActionController).
[5]Dependencies is moving to the ActiveSupport namespace, so replace every occurrence of Dependencies with ActiveSupport::Dependencies if you are running > 2.1.
:with or :without you: link_to_remote's mysterious parameter
May 17th, 2008
One of the nice things about Rails is that it makes it really easy to get off the ground with Ajax, thanks to the link_to_remote helper (and all the other functions that share the same options like remote_function, form_remote_tag, observe_field etc... [1]). The hard work is actually done by prototype but now is not the time to worry about that.
The question that comes up over and over again is how do you add extra parameters, and this is where the :with option comes into play. This is all in the docs, but it's a little on the terse side. First things first though: if the parameters are known to you at the time when you call link_to_remote there's no need to use :with. Just pretend it's a link_to and do
link_to_remote 'Click', :url => {:action => 'foo', :some_param => 42, :something_else => 'hello world'}
That of course is the boring case. The interesting case happens when you want to submit some javascript variables or some input elements. Take a step back and look at what remote_function actually generates. It looks a little like this
new Ajax.Request('/dummy/foo', {asynchronous:true, evalScripts:true, parameters:'authenticity_token=' + encodeURIComponent('snip')})
The interesting thing here is the parameters option. It either takes a javascript object or a query string. By default all you'll get here is the authenticity token (part of Rails' CRSF protection), and if you've turned off that you'll get nothing. Everything you stick here gets passed as a parameter to your action, and in a nutshell rails sticks the value of your :with option here.
There's another easy case before we get stuck in. If you just want to submit the contents of the form then the :submit option is your best friend. Just pass it the id of a form, and rails will set the parameters to Form.serialize('some_form') and all the inputs from that form will get sent over with the ajax request.
:with => 'fancy pants'
In the most general case, you can pass whatever you want to :with as long as it evaluates to a valid query string (or if you don't have forgery protection turned on any javascript object will do [2]). Object.toQueryString is your friend here, it will turn any javascript object into a query string and takes care of escaping anything. Your other important friend is Form.Element.serialize. Given a form element or id it chucks back an appropriate piece of query string. It's even available as a method on extended elements, so instead of Form.Element.serialize('some_field') you can write $('some_field').serialize().
So if you've got a form element with id message you can create an ajax link that submits it with
link_to_remote 'Click me', :url => {:action => 'foo'}, :with =>"$('message').serialize()"
If you had 2 elements you can just string them together (remember that all we're doing here is building up a query string:
link_to_remote 'Click me', :url => {:action => 'foo'}, :with =>"$('message').serialize() + '&' + $('comment').serialize()"
Personally I hate all that messing around concatenating strings and ampersands, so I would usually just write
link_to_remote 'Click me', :url => {:action => 'foo'}, :with =>"$H({message: $F('message'), comment: $F('comment'), fromUser:prompt('Tell me something')}).toQueryString()"
What we've done here is build up a prototype hash (that's the $H({...}) bit) and then called toQueryString on it, which does exactly what it says. $F is another prototype helper that, given an id, returns the value of the associated input element. The last part shows that we can do anything we want really. Here I'm asking the user to provide some text, but it could be some pre-existing javascript variable or anything you want.
If you decide not to us the various prototype serialisation helpers then you do need to be a little careful: it's up to you to ensure that what needs to be escaped is escaped. A handy function here is encodeURIComponent:
link_to_remote 'Click me', :url => {:action => 'foo'}, :with =>"'thing='+encodeURIComponent('I am a nasty & funky string')"
This ensures the ampersand is encoded rather than messing up everything. You can of course combine that with some of the other tricks:
link_to_remote 'Click me', :url => {:action => 'foo'}, :with =>"'thing='+encodeURIComponent(someFunctionReturningAString())"
observe_field is a naughty boy
Everything I've said holds true for observe_field, but observe field allows you to take some short cuts. If what you pass to :with doesn't look like a valid query string or interesting javascript expression then rails assumes you're just specifying what name you want the parameter to be submitted as, so
observe_field 'some_field', :with => 'q'
expands to
observe_field 'some_field', :with => '"q="+value'
This is executed in a context where value is the new value of the form element. Can you spot the problem yet? You need to return a valid query string and value is just the value of the field. By some good fortune you get away with this right until you type an ampersand (I'll file a ticket on thisit was fixed here, so edge and rails 2.1 are ok). It's not escaped and so your query string is mangled. Until this is fixed you should use something like
observe_field 'some_field', :with => '"q="+encodeURIComponent(value)'
[1]The canonical function is actually remote_function, everyone else just uses its output. For example form_remote_tag is basically just a normal form whose onSubmit is the output of remote_function. As far as the docs go though, all the goodies are under link_to_remote.
[2]Prototype is smart enough to know that if you pass an object as the parameters option it should call Object.toQueryString on it, so for example parameters: {a: 2+2, b: "hello"} would do the right thing. The reason forgery protection makes a difference is that rails assumes that your :with option evaluates to a query string and appends to that authenticity_token=xxx.
When Aptana messes with your gems
May 9th, 2008
Recently a fair few people on rubyonrails-talk have been getting strange error messages about config.time_zone or referring to funny versions of gems (like rails 2.0.2.9216). This rather suggests that they were using edge versions of rails (since config.time_zone is an edge thing, and rails 2.0.2.9216 means 'the edge gem from revision 9216). These edge gems come from gems.rubyonrails.org. But why were so many people suddenly doing this, all at the same time ?
It turns out that in its infinite wisdom Aptana silently adds gems.rubyonrails.org to your gem sources. If you remove gems.rubyonrails.org, Aptana will put it back. Don't ask me why, it's madness as far as I can tell. So you gem edge rails gems installed, thus the rails command now refers to edge rails and generates you an envionment.rb suitable for edge rails (including the config.time_zone line). The gem version specified in environment.rb is still 2.0.2 though, so your app loads 2.0.2 which doesn't know about the edge features. Failure.
So, to fix this:
- Uninstall all the edge gems (the ones with versions like 2.0.2.9216 (the last number may vary)
- Remove gems.rubyonrails.org from your gem sources: gem sources -r http://gems.rubyonrails.org
- (Optional) get rid of Aptana, since it will keep adding gems.rubyonrails.org (or wait for the version that fixes this)
You might also want to recreate your app as the environment.rb that edge rails created for you will not work with rails 2.0.2
Creating multiple associations with the same table
May 6th, 2008
Rails makes setting up your associations dead easy and very concise. There's a whole bunch of options that can be set however as long as you are following Rails' conventions you very rarely need to use them. If you're new to rails you might not even know they exists and get away with it 99% of the time. One case where you do is when you've got more than one association pointing at the same table.
If you wrote an application tracking sales between individuals (lets call it railsbay) you might have a table called sales and in that table you'd need to track the buyer and the seller. Both buyer and seller should be objects from the users table, and we've got two columns that refer to them: buyer_id and seller_id. Before we tackle what we actually want to do, lets take a step back. This would all be really easy if there was only one user associated with a sale. So lets take a look at what happens inside when we write
class Sale < ActiveRecord::Base belongs_to :user end
When you do this [1], ActiveRecord derives the class name by taking :user and camelizing it to get User, the assumed class name (if this was a has_many, it would also singularize, so Users -> User). If you specify a class_name option, that takes priority. The foreign key (the database column containing the id of the associated user, in this case user_id) is derived by taking the name (user) and adding _id [1]. It could be overridden with :foreign_key. The first parameter (:user) is the name of association, so in particular it is the name of the method you call to get the value of the association. All this means is that we could also have written
class Sale < ActiveRecord::Base belongs_to :user, :class_name => 'User', :foreign_key => 'user_id' end
It's rather verbose, so we don't. If you didn't like your users (that's not really a good idea) you can say
class Sale < ActiveRecord::Base belongs_to :nasty_user, :class_name => 'User', :foreign_key => 'user_id' end
and now you need to access the association as sale.nasty_user instead of sale.user.
We can now return to the original problem. We've got 2 users to associate, a buyer and a seller, so the associations might as well use those names. Our foreign keys are buyer_id and seller_id, and in both cases the association will refer to a User, so we need :class_name => 'User'
class Sale < ActiveRecord::Base belongs_to :buyer, :class_name => 'User', :foreign_key => 'buyer_id' belongs_to :seller, :class_name => 'User', :foreign_key => 'seller_id' end
You don't actually need the foreign key in this case (since the default is association name + _id (from rails 2.0) so this simplifies down to
class Sale < ActiveRecord::Base belongs_to :buyer, :class_name => 'User' belongs_to :seller, :class_name => 'User' end
Easy! Now we just need to do the other side of the association. A user will have many sales as a seller and many sales as a buyer, we'll call the associations sales and purchases. Like before, the association name determines the method you call to get/set an association. ActiveRecord assumes the defaults in much the same way, so from
class User < ActiveRecord::Base has_many :sales end
ActiveRecord infers that the class is Sale (overridable by the :class_name option). The foreign key inference is different. Since we're talking creating assocations on the User class, ActiveRecord assumes that database columns pointing at this table are called user_id (underscorize the class name and add id [3]). So the above is equivalent to
class User < ActiveRecord::Base has_many :sales, :class_name => 'Sale', :foreign_key => 'user_id' end
Our associations are called sales and purchases, both refer to things of class Sale and the foreign keys are buyer_id and seller_id, so our User class should look like
class User < ActiveRecord::Base has_many :purchases, :class_name => 'Sale', :foreign_key => 'buyer_id' has_many :sales, :class_name => 'Sale', :foreign_key => 'seller_id' end
If you want you can drop the :class_name option from the sales association, it's not needed. There we go, you are now an association master! The same ideas apply in the other cases where the defaults aren't write. Always make the association name unique (try and come up with a good name, it will really help the readability of your code), and then set foreign_key and class_name options appropriately. I haven't talked about has_many :through at all, but that's because Josh Susser already has a lovely writeup of that over here.
[1]This is a slight lie. ActiveRecord does very little when you call belongs_to, has_many etc... all these things are computed when they are needed. That distinction isn't really important here.
[2]ActiveRecord use to base this on the :class_name option, ie the default key was (basically) the lowercased class name followed by _id. This changed in rails 2.0
[3]There's a bit more to this in cases where your models are in modules and so on. The gory details are in Inflector#foreign_key
Javascript: not as shit as you thought
May 5th, 2008
For a long time javascript was something I avoided like the plague, usually only ever handling with the rubber gloves that rails provides. On a few occasions I had written some raw javascript myself and quite frankly I felt dirty afterwards and started hitting the bottle rather hard. I was writing clumsy procedural code, verbose, without unit test and working round all sorts of quirks in the language. Sometimes it was the only choice but whenever I could avoid it, I did.
This year I've been doing quite a lot of work in javascript, as part of a supercool project at work. However I am not an alcoholic nor a gibbering wreck. So what happened? In a single word, prototype (note that a lot of what I'm saying probably applies to similar libraries like jQuery, dojo etc... The important stuff isn't really in the details).
Getting Prototypical
Now I'd heard of prototype before. It was that thing with the $ thing for getting dom elements by id that you had to include for the whizz-bang scriptaculous magic or for using ajax. Thanks to my blind raging hatred of javascript I'd never looked any closer. What a mistake! There is so much more to prototype, something which I discovered thanks to the Bungee Book which is a lovely piece of work (Seriously, stop reading this and go and buy it. It's great (apart from the index)).
Fun in the Browser
Prototype brings two unrelated (at least at the conceptual level) things together. The first is a whole pile of things that make make the browser environment more palatable. The $ function is but the tip of the iceberg here. Some of my favourites include
$$ (and Selector class that powers it): find all the elements in the document matching a CSS rule. The great thing here is that it knows a whole pile of tricks, even if you're browser doesn't, so you can go from the basic $$('p.alert') (all paragraphs with class alert) to things that are a little more fun:
$$('.sources .source:not(.disabled)')
( all elements inside something with class 'sources' that have the class 'source' but not the class disabled).down, next, up etc... Navigating the dom by hand is the biggest pain in the arse since Vlad the Impaler, but these functions make it oh so easy. The up function gives you the parent element, down gives you the first descendant and so on (and of course you can say 'give me the 3rd such element'). That in itself is a modest improvement in comfort. But the killer blow is that you can pass those lovely css selectors and say things like
foo.down('.source:not(.disabled)')which does exactly what it should: keep going down until you find something that matches that selector. Now think about how you'd do that by hand!Event handling: prototype smoothes over all the differences between browsers and makes it really easy to have your own custom events which is great for having page elements react to changes without coupling things too tightly
Style manipulation: getStyle (gets computed styles), setStyle, hasClassName, removeClassName, addClassName are just terrific. Before
foo.style.left = '8px'; foo.style.right = '8px'; foo.style.top = '8px';
Afterfoo.setStyle({left: '8px', right: '8px', top: '8px'})
There's loads more of this sort of stuff, the list above is just what sprung to mind. All the way prototype hides the differences between browsers.
Inner Beauty
Great stuff, but it doesn't address the issues of Javascript's builtin classes and the language itself. In a way I think this is where prototype really shines, because after a while I couldn't feel that feeling of self-loathing anymore. I was actually enjoying writing javascript. Part of it is down to the fact that beneath the horribleness of the 90s there's actually some cool stuff in javascript: closures, functions are objects and so on. The trouble is I'd never noticed because I was so busy vomiting profusely at the rest of it. Prototype made me aware of the nice stuff and tidies up the dog turds (And if you ignore everything I've written but remember that javascript has these functional hiding inside then that's probably the most important thing).
Enough waffling. What does prototype give you in this deparment?
- Enumerable. A beast of a module. Mixed in to Array by default, it lets you use each, inject, map almost as if you were writing ruby (there's some punctuation clumsiness, but that's not prototype's fault). If that sounds awfully similar to ruby's Enumerable that's because it is
- Class. Write classes, subclass them = win
- method binding
- Loads of little utilities like Object.isFunction that really should have been there all along
- Your sanity back.
It's really a joy the first time you write javascript using all this. Seriously it brought tears to my eyes.
Feeling Testy?
The one bit missing from my workflow was unit testing. I started of playing with jsunit and while it worked fine I wasn't really enjoying it. There just seemed to be too much machinery involved, and some things seemed needlessly complicated. For example if you want to run a test suite in the browser you have to open the testRunner page, then click on the choosefile button on that page, navigate to your test file and hit run. And because of that you can't just set a firebug breakpoint. I ended up loading the test file itself in the browser and calling the setup, test and teardown functions from the firebug console. Yuck.
I know a lot of this is built for flexibility, but it was just completely overkill for what I needed. It turns out that prototype itself has a handy little unit testing framework that was much closer to what I was looking for. There's even the javascript_test plugin which makes all that fit nicely into your rails app (although it would seem that javascript_test hasn't seen much love - I've ended up ripping out most of its innards and replacing them with the lastest stuff from prototype). If you want to run a test suite in your browser, just open the test file! A rake task lets you run them all in one go
Started tests in Firefox ....................... Finished in 23 seconds. 116 tests, 545 assertions, 0 failures, 0 errors
Finally, peace. Maintainable, structured, nonverbose javascript code, unit tested and running off a CI server. Who would have expected that!
It's always the butler
May 1st, 2008
I have a confession to make: I really like hospital based tv series: House, ER, Scrubs, Green Wing. It's all good. Of the lot, House is the one I've been watching most recently and in many ways a typical episode is basically a medical murder mystery. Understanding a bug or working out some unexplained behaviour is basically the same thing, it's just that this time you're the hardened police officer on the case!
This episode opens up with a strange scene: