If you're writing an iOS app that uses Core Data then you may well want to ship it with an initial database (which potentially gets over the air updates later on).

On iOS, CoreData stores always use sqlite3 as their backend. You could create a sqlite database directly, but you'd have to reverse engineer the way apple uses sqlite, ensure that you use the same name manging for table and column names, generate the same meta data used for persistent store migration etc. Too brittle for my liking.

Luckily both RubyCocoa and MacRuby allow you to access the Core Data framework from a ruby script. The former is bundled with Mac OS X since 10.5 (you will need to use the system ruby). RubyCocoa has some clunkier syntax because it doesn't have the benefit of the extensions to ruby, however at the moment MacRuby doesn't quite work with Active Record, which is where I was getting my seed data from. If your data is coming from else where, this may not be a problem. Other than the slight syntax differences around the handling of Objective-C's sort-of-named-arguments the code is the same

The RubyCocoa version looks like this

require 'rubygems'
require 'osx/cocoa'

OSX.require_framework 'CoreData'

class CoreDataStore
  def create_entity name, props={}, relationships={}
    entity = OSX::NSEntityDescription.insertNewObjectForEntityForName_inManagedObjectContext(name, context)
    props.each do |k,v|
      entity.setValue_forKey v, k
    end
    relationships.each do |k, objects|
      collection = entity.mutableSetValueForKey(k)
      objects.each {|o| collection.addObject o}
    end
    entity
  end

  def initialize(data_store_path, mom_path)
    @data_store_path = data_store_path
    @mom_path = mom_path
  end

  def context
    @context ||= OSX::NSManagedObjectContext.alloc.init.tap do |context|
      model = OSX::NSManagedObjectModel.alloc.initWithContentsOfURL(
          OSX::NSURL.fileURLWithPath(@mom_path))
      coordinator = OSX::NSPersistentStoreCoordinator.alloc.initWithManagedObjectModel(model)

      result, error = coordinator.addPersistentStoreWithType_configuration_URL_options_error(
         OSX::NSSQLiteStoreType, nil, OSX::NSURL.fileURLWithPath(@data_store_path), nil)
      if !result
        raise "Add persistent store failed: #{error.description}"
      end
      context.setPersistentStoreCoordinator coordinator
    end
  end

  def save
    res, error = context.save_
    if !res
      raise "Save failed: #{error.description}"
    end
    res
  end
end

Having done this, you use it like this

store = CoreDataStore.new('seedData.sqlite', 'yourmodel.mom')
blog = store.create_entity 'Blog', 'title' => 'Hello world', 'body' => "it's a fine day"
store.create_entity 'Comment', 'body' => 'I Agree', 'blog' => blog
store.save

The too arguments are the path to the file you want to store the data in, and the path to your model file.

Your Mom's a data model

You may be wondering what a .mom file is. In XCode you work with a .xcdatamodel file. When you build your app, this is compiled down into a .mom file. You can also do it your self by running /Developer/usr/bin/momc mymodel.xcdatamodel mymodel.mom

We recently upgraded to delayed_job 3.0 and immediately started seeing some major memory leaks in our app, in the delayed job workers, passenger instances and even standalone scripts which don't even use delayed job. In the end I tracked it down to a bug in YAML.load

Out of the box YAML support can be provided by 1 of 2 backends in ruby 1.9 : syck and psych. Syck is an older implementation based around a no longer support C library, whereas psych uses the newer and supported libYAML. The default backend is psych, but earlier version of delayed_job did work with psych, and so were forcing the yaml engine to syck (which doesn't have this bug). When we upgraded to 3.0 they fixed their problems with psych and so we (unintentionally) started used psych. Unfortunately the version of psych that comes with ruby 1.9 has a memory leak in YAML.load. If YAML::ENGINE.yamler is 'psych' and Psych::VERSION is 1.0.0 then you are using an affected version

In particular this means that each time you load a model with serialised attributes, you leak memory. One of our very frequently used models has some serialized columns so that was why we were leaking. Delayed job obviously does a lot of yaml loading and so its workers were haemorrhaging memory.

Plugging the leaks

It took a bit of work to narrow down the leaks we were seeing to yaml but once that was done it turn out a few people have already written about this, notably over at nerdd.dk but I am somewhat amazed that knowledge of this issue is not more widespread. The issue is perhaps clouded by the fact that if libyaml isn't available when ruby is built ruby will just skip building psych (in which case syck is the only backend). Ruby 1.9.3 has a fixed version of psych, but disappointingly currently available versions of 1.9.2 (currently p290) still have this bug, 18 months after the release of 1.9.2.

Luckily there is a gem version of psych, however using it can be a bit fiddly if (as most rails apps do) you use bundler. Bundler loads psych early on its its setup process so you can't just stick psych in your Gemfile - both versions end up being loaded which causes an ugly mess.

nerdd.dk has a series of posts about how they tacked the various issues. In the end what I did was

  • set up config/setup_load_paths.rb to keep passenger happy: require 'rubygems' gem 'psych' require 'bundler' Bundler.setup
  • edit config/boot.rb to do gem 'psych' just after require 'rubygems'
  • hacked the stub executable for bundle to also have gem 'psych' after ruby gems is loaded

I was looking at moving an application to ruby 1.9.3 and was getting some strange syntax errors along the lines of "syntax error, unexpected keyword_do_block" on code that was working fine on 1.9.2. I spend quite a few minutes staring at the code which looked completely benign.

It turns out the ruby 1.9.2 is a bit too permissive: it allows you to write an extra comma after your argument list but before the do that marks the start of your block.

  some_method arg1, arg2, do
    ...
  end

ruby 1.9.3 on the other hand won't accept this.

NSData and RubyCocoa on 10.7

August 3rd, 2011

For some reason code like

OSX::NSData.dataWithBytes_length(data, data.length)

segfaults on lion. Others have noted that this only occurs when the data contains bytes with the high bit set. This smells like something trying to interpret the string in some encoding when the string is in fact arbitrary binary data.

I'd love to use MacRuby instead of RubyCocoa, but unfortunately MacRuby doesn't seem to be able to handle Active Record at the moment, so I can't use it (yet). I haven't had time to delve properly into how RubyCocoa converts between ruby and Objective-C objects but I was able to hack around by using ruby inline

require 'inline'
class CFDataGenerator
  begin 
    inline do |builder|
      builder.include "<CoreFoundation/CoreFoundation.h>"
      builder.c_raw_singleton "
      
      static VALUE from_string(int argc, VALUE *args){
        CFDataRef d = CFDataCreate(NULL, RSTRING_PTR(args[0]), RSTRING_LEN(args[0]));    
        
        VALUE result;
        char type = '@';
        ocdata_to_rbobj(Qnil,&type, (const void *)&d, &result, false) ;
        return result;
      }
    "
    end
  end
end

Then you can use CFDataGenerator.from_string to convert ruby strings into nsdata instances. Remember to release the instance when you're done!

I recently needed to deal with ssl connection using client side certificates. The ruby openssl bindings are fairly impenetrable, here's what worked for me (at least in part as a note for myself in the future)

ctx = OpenSSL::SSL::SSLContext.new
ctx.cert = OpenSSL::X509::Certificate.new("mycert.cer")
ctx.key = OpenSSL::PKey::RSA.new("mykey.pem")
ssl = OpenSSL::SSL::SSLSocket.new(sock, ctx)
ssl.connect

If the key you've got is a .p12 file (which is what the key chain utility on the mac exports) then you'll need to convert it like so

openssl pkcs12 -in key.p12  -nocerts -nodes -out key.pem

Fun with ruby http clients

April 13th, 2009

Quite a few people have written about the performance failings of Net::HTTP, but until recently, to be honest, I never really cared a lot. Most of my http request needs have been fairly meagre, often not much more than hitting a url and checking the result code.

I've been playing with couchdb recently, and so my app does a fair amount of http requests. I've been using RelaxDB which uses net/http, so Net::HTTP's performance has started to matter.

Net::HTTP is not the only game in town. I spent some time recently playing with rfuzz, eventmachine and taf2-curb and came to largely the same conclusion as Paul Dix.

Leaning on a mature library such as libcurl gives taf2-curb a huge advantage. While eventmachine was on par speed wise, neither of the 2 http clients it includes are a complete implementation of the HTTP protocol. For example HttpClient will tell the remote server that it speaks HTTP/1.1, yet it does not support chunked encoding (mandatory part of the spec). HttpClient2 does understand chunked encoding, but doesn't let you set headers or a body to the request. Fine for just pinging a url, but not up to the task of working with couchdb. Something to do with couchdb's chunk encoded also seemed to confuse rfuzz.

taf2-curb does the job very nicely. On my dumb benchmark, 1000 requests for a static html page hosted on the same machine (ie we're pretty much only testing overhead) the numbers are:

Benchmark.bmbm(5) do |x|
  x.report 'net/http' do
    u = URI.parse('http://docs.local/')
    1000.times {Net::HTTP.get u}
  end

  x.report 'curb' do
    1000.times do
      c = Curl::Easy.new 'http://docs.local/'
      c.perform
    end
  end
end
               user     system      total        real
net/http   0.560000   0.270000   0.830000 (  1.065960)
curb       0.310000   0.170000   0.480000 (  0.696188)

On the other extreme, these numbers corresponds to ~1 meg of data pulled from couchdb (benchmark code the same apart from the urls, and I did 100 iterations rather than 1000).

               user     system      total        real
net/http  17.400000   8.900000   2.630000 (  32.067821)
curb       0.700000   1.300000   2.000000 (  29.586022)

curb comes up squarely on top. Another thing of note during this test is cpu usage (as you might expect from the difference in user time). With Net::HTTP the ruby process running this was taking up 60-70% (on a 2.4GHz core duo), with curb it used around 5% of cpu.

The commit to switch RelaxDB from net/http to taf2-curb is here for those interested - really very straightforward stuff. There may well be more to be had by fiddling with libcurl options, I haven't tried yet.

Confused by sync.rb ?

May 18th, 2008

I needed a reader/writer lock the other day and followed the trail to ruby's Sync class (why this gets to call itself Sync/Synchronizer as opposed to all the other synchronisation primitives is beyond me). The documentation isn't exactly enlightening either. I'm sure there's all sorts of clever stuff to do with upgrading a lock you already hold and other stuff like that, but if you're just interested in the boring case all you need is

  lock = Sync.new

  lock.synchronize(Sync::EX) do
    #do something that requires a writer (exclusive) lock
  end

  lock.synchronize(Sync::SH) do
    #do something that requires a reader (shared) lock
  end