Space Vatican

Ramblings of a curious coder

Goodbye Mephisto, Hello Octopress

For a long time, this blog was hosted on an ageing mephisto install. I took some time this weekend to migrate to octopress. It’s much more current and is more modern in many ways (a responsive layout, nicer syntax highlighting options, integration with things like gist and jsfiddle being just a few). These days a lot of my blogging is done on the train, so I was writing in a text editor and then pasting into mephisto which is just convoluted.

The migration was relatively straightforward. I wasn’t interested in preserving the appearance of the existing site (which was just mephisto’s default theme), so I just dumped my existing content out of mephisto using the code I found on meatleasing.com, updating a few settings and I was just about done.

Octopress generates static html, which makes sense for a blog. A few years ago this would have been a bit limiting if you wanted things like comments and so on, but these days there’s an api for everything so you can have quite a lot of dynamic content on a page even if the pages are completely static. This widens deployment options too. Github pages is a popular one, but I thought I would try something different. I’m currently deploying using the following code

(octopress_deploy.rb) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
require 'fog'
require 'yaml'
require 'digest'
require 'mime/types'

config = YAML.load(File.read(File.join(File.dirname(__FILE__),'aws.yml')))

storage = Fog::Storage.new(config['storage']['credentials'])

directory = storage.directories.get(config['storage']['directory'])

existant = directory.files.inject({}) do |hash, file|
  hash[file.key] = file.etag
  hash
end

updates = []
Dir.chdir 'public' do
  Dir['**/*.*'].each do |f|
    body = File.read f, :open_args => ['rb']
    digest = Digest::MD5.new
    digest.update(body)

    if existant[f] == digest.hexdigest
      puts "skipping #{f}; already exists" if ENV['VERBOSE']
      next
    end
    if existant[f]
      updates << f
    end

    type = MIME::Types.of(f).first
    file = directory.files.new(:key => f, :body => body,  :content_type => type && type.content_type,
                               :content_md5 => digest.base64digest, :expires => 24*3600*7, :acl => 'public-read')
    file.save
  end
end

if config['cdn']
  cdn = Fog::CDN.new(config['cdn']['credentials'])

  if updates.any?
    paths = updates.collect {|p| '/' + p}
    cdn.post_invalidation config['cdn']['distribution_id'], paths
    puts 'posted updates'
  end
end

In a nutshell it grabs the etags of all the files already present in the S3 bucket and uploads anything that isn’t present or is present with a different MD5. The aws.yml file has the details of which bucket to use, credentials (restricted to only allow access to that bucket) etc. I’m using fog, so in theory it should work with other storage providers supported by fog. I’ve put a cloudfront distribution in front of the bucket, so the script sends cloudfront invalidation requests for any updated files.

Completely over the top for the amount of actual traffic, but should Stephen Fry tweet a link everything should still work. I also set the max age on the files so that cloudfront won’t hold onto them for more than a day. Initially I was only going to use the api to invalidate files but because of the “Recent posts” section, add a page actually invalidates almost every page on the site. So normally I only actually invalidate the front page. Setting a max age of 1 day means that the other pages on the site will catch up reasonably soon (assuming they were cached at all)

The only tiny niggle is that this means the blog is no longer accessible at the domain apex (spacevatican.org), because you can’t setup a CNAME on the zone apex. As it happens I was keeping the server that hosted the mephisto around for other stuff so I’ve added a redirect from non www to www.

Update - more niggles

One niggle (found a few days later): mephisto would generate permalinks without a leading zero eg (/2012/6/6) for 6th june but would accept leading zeroes in the url. Jekyll generates them with a leading 0 and since it’s just static html only accepts links with the leading zeroes, so a bunch of old links to the site were 404ing. You can fix this by changing the :month and :day segments of your permalink (in _config.yml) to :i_month and :i_day.

The other niggle was around trailing slashes. Initially I had cloudfront configured with my S3 bucket set as the origin. Configuring a bucket as a website means that as being able to access files at http://<bucket>.s3.amazonaws.com/ you can also access them at http:<bucketname>.s3-website-eu-west-1.amazonaws.com/. This second url behaves differently to the first - it will display /foo/index.html when you go to /foo instead of complaining about a missing key. By default the urls that jekyll generates are things like /blog/archives with the actual html file being /blog/archives/index.html so this behaviour is important.

When you configure a cloudfront distribution when an s3 bucket as its origin it accesses the bucket with the first behaviour so most of the links jekyll generated weren’t working. Setting cloudfront to use a custom origin whose url was http:<bucketname>.s3-website-eu-west-1.amazonaws.com fixed this.