Space Vatican

Ramblings of a curious coder

Memory Leak in YAML on Ruby 1.9.2

We recently upgraded to delayed_job 3.0 and immediately started seeing some major memory leaks in our app, in the delayed job workers, passenger instances and even standalone scripts which don’t even use delayed job. In the end I tracked it down to a bug in YAML.load

Out of the box YAML support can be provided by 1 of 2 backends in ruby 1.9 : syck and psych. Syck is an older implementation based around a no longer support C library, whereas psych uses the newer and supported libYAML. The default backend is psych, but earlier version of delayed_job did work with psych, and so were forcing the yaml engine to syck (which doesn’t have this bug). When we upgraded to 3.0 they fixed their problems with psych and so we (unintentionally) started used psych. Unfortunately the version of psych that comes with ruby 1.9 has a memory leak in YAML.load. If YAML::ENGINE.yamler is ‘psych’ and Psych::VERSION is 1.0.0 then you are using an affected version

In particular this means that each time you load a model with serialised attributes, you leak memory. One of our very frequently used models has some serialized columns so that was why we were leaking. Delayed job obviously does a lot of yaml loading and so its workers were haemorrhaging memory.

Plugging the leaks

It took a bit of work to narrow down the leaks we were seeing to yaml but once that was done it turn out a few people have already written about this, notably over at nerdd.dk but I am somewhat amazed that knowledge of this issue is not more widespread. The issue is perhaps clouded by the fact that if libyaml isn’t available when ruby is built ruby will just skip building psych (in which case syck is the only backend). Ruby 1.9.3 has a fixed version of psych, but disappointingly currently available versions of 1.9.2 (currently p290) still have this bug, 18 months after the release of 1.9.2.

Luckily there is a gem version of psych, however using it can be a bit fiddly if (as most rails apps do) you use bundler. Bundler loads psych early on its its setup process so you can’t just stick psych in your Gemfile - both versions end up being loaded which causes an ugly mess.

nerdd.dk has a series of posts about how they tacked the various issues. In the end what I did was

  • set up config/setup_load_paths.rb to keep passenger happy:
1
2
3
4
require 'rubygems'
gem 'psych'
require 'bundler'
Bundler.setup
  • edit config/boot.rb to do gem ‘psych’ just after require ‘rubygems’
  • hacked the stub executable for bundle to also have gem ‘psych’ after ruby gems is loaded
  • added the same version of psych to the Gemfile as was installed outside of bundler