Space Vatican

Ramblings of a curious coder

Direct to S3 Browser Uploads

The easiest thing to do with S3 uploads is have them go straight to your app server and then post them on to S3. This is a bit clunky if you don’t need any pre processing). Fortutnately you can post files straight to S3 from the user’s browser with an appropriately crafted form. Amazon have some documentation on this but there are a few details I thought were worth spelling out.

What’s in a form

The starting point is a form configured to do a POST to the endpoint for the bucket (don’t forget to set it to be a multipart form). There are a few fields that your form has to submit:

  • bucket: where the file is going

  • key: where the file is going to be stored. You can use ${filename} here and it will be replaced with the name of the file uploaded. It’s a good idea to specify a prefix, so that users cannot write files to arbitary keys in the bucket (for example overwriting other users’ files)

  • AWSAccessKeyID: the access key id for the account

  • policy: the policy document. More on this later

  • Signature: the result of taking the policy document, base64 encoding it, computing the HMAC-SHA1 signature using your secret key and then base 64 encoding it.

And of course the file itself, which much be submitted as file (Amazon will ignore any inputs after the form). You can use other input types (e.g. a textarea) - as long as it is named file the value will be stored in S3.

You can also set other fields if required. Some useful ones:

  • x-amz-security-token: if you are using IAM temporary credentials then you need to stick your session token here
  • success_action_redirect: tells S3 to redirect to the specified URL when the upload is complete. S3 will add key and bucket query string parameters to this url in case the user was able to control these. There is nothing stopping an attacker hitting this url without having done an upload at all, so be careful what you do here.
  • success_action_status: by default success results in http status 204 and a blank document. Set this field to 201 to get a xml document describing the upload instead.
  • x-amz-server-side-encryption: Turn on S3’s at rest encryption for the file
  • acl: what acl to apply to the file (as usual this will be used in conjunction with whatever bucket policies and user policies are in force)

You can set the various metadata fields that you can with a normal S3 upload Content-Disposition, Content-Encoding, Content-Type, x-amz-meta etc. You can use ${filename} as part of any of these too. For example you might set Content-Disposition to attachment; filename="${filename}".

The policy document

Since you’re not making api calls in the usual way, authorization has to be handled differently (unless you’re writing to a publicly writable S3 bucket, which seems like it should be a rare case). Instead of creating a canonical representation of the request and signing it, you instead create a policy document and sign that.

This policy document has nothing to do with an IAM policy or a bucket policy, instead it sets out which fields are in the form and their expected values. This ensures that someone cannot manipulate the form to upload files to a different bucket or make other changes you might not want. You can either specify the exact value for each field or looser constraints. For example you might want to let the user be in control of the key the file is stored under, as long as it starts with some user specific prefix.

A policy is a JSON document that states the expiry date for the policy and a list of conditions:

1
2
3
4
{
  expiration: "2013-01-07T19:01:03Z" #expiration date in XML schema format
  conditions: [...]
}

Each condition is either:

  • a json object with a single key/value pair. The key is the field name and the value is what you expect the form to submit
  • a content length restriction: an array containing the string ‘content-length-range’, followed by the minimum and maximum values
  • an array with 3 values: an operator, the key name (with ‘$’ prefixed) and an operand for the operator.

If S3 finds a field in your form which doesn’t match the condition or for which there is no condition it will raise an error and refuse the upload. The only exceptions are the file, policy and Signature fields and any field whose name begins with x-ignore-.

The available operators are

  • eq: equality matching
  • starts-with: You can use this for prefix matches (for the example the s3 key must start with /users/bob). If you need to accept arbitrary values for a field then use this with the empty string as the desired prefix

You can replicate the first form with the second form:

1
{bucket: 'some-bucket'}

is the same as

1
['eq', '$bucket', 'some-bucket']

Once you’ve got your policy you add set it as the value of a field named policy. The field value is the Base64 encoded version of this JSON document. I found it was necessary to strip the carriage returns from the Base64 data (with ruby you can do this with Base64.strict_encode64). Then compute the HMAC-SHA1 signature of the base64 encoded policy document (using your AWS secret access key as the key) and stick the base 64 encoded version in the Signature field. All done!

Placeholders

You might expect that a condition of {key: 'prefix/${filename}'} would work when combined with key having the value prefix/${filename}. However, for this to work you actually need to have a condition of ['starts-with','$key', 'prefix/']

Rails specific gotchas

If you use form_tag Rails injects 2 fields into your forms that will be default cause the policy validation to fail: the authenticity token and the utf8 enforcer token. You can turn off the authenticity token by passing :authenticity_token => false. There’s no downside to doing this - S3 isn’t able to do anything with the token anyway.

The utf8 token you may want to keep. Depending on the headers you allow people to submit, whether you expect utf8 file names etc you may wish to ensure that a browser doesn’t accidentally mangle things by using a different encoding. You can either explicitly list this field in your policy document, or you can override the utf8_enforcer helper method to change its name to something beginning with x-ignore-.

Beyond browsers

While this all works in a browser with no additional software required, this is also usable by anything that can generate an http POST. For example you could do the actual upload from a mobile app without having to distribute credentials.

Making it easy

The ruby aws sdk has support for this built in. I’ve also written a gem named s3_browser_uploads that handles this.

To use the gem you start by creating an S3BrowserUploader::FormDefinition in your controller with basic details such as region, credentials and the bucket:

1
2
3
4
@uploader = S3BrowserUploader::FormDefinition.create(:region => 'eu-west-1',
                                                     :aws_access_key_id => 'XYZ',
                                                     :aws_secret_access_key => '123',
                                                     :bucket => 'my-bucket')

You can add data such as extra fields:

1
@uploader.add_field('x-amz-server-side-encryption', 'AES256')

This will result in a hidden input being generated for you and will configure the policy document to expect that exact value for that field.

If you want to let the user specify the value then you need to specify the condition for the policy document. To require that the key field start with /users/fred/ you do

1
@uploader.add_condition('key', 'starts-with' => 'users/fred/')

This overwrites any previous condition set for this field. You’ll need to add the input field yourself.

You can combine the two, which is normally useful when you want to set a field to a value using the ${filename} placeholder. For example to use it in your content disposition header you would do

1
2
@uploader.add_field('Content-Disposition', 'attachment; filename="${filename}"')
@uploader.add_condition('Content-Disposition', 'starts-with' => 'attachment; filename="')

In your view you do (I’ve used haml but it shouldn’t matter what template language you’ve used)

1
2
3
= s3_form @uploader do
  = file_field_tag :file
  = submit_tag

The uploader will setup a form tag for you and generate all the inputs for things like the policy document, signature and any data you specified when configuring the uploader. It also takes care of the utf8 enforcer tag and the other gotchas I mentionned. Obviously you can add whatever markup around this to make the form look pretty.

Any options you pass to s3_form will be set on the form as html options, so that you can still set html classes and so on.

Asynchronous uploads

The traditional way of doing asynchronous uploads is to set the target of the form to a hidden iframe. Having done this you need to do something when the upload has completed. It’s tempting to use the load event on the iframe, but unfortunately this can’t distinguish being the upload succeeding or failing - it fires just the same regardless and there’s no way to get at the underlying http status code. The error event doesn’t seem to work at all with iframes.

The way I’ve tackled this is to use success_action_redirect. If the upload is sucessful then the browser will load the content from that url in the iframe. That is just a line of javascript that sets a javascript property on the containing window. If the load event fires on the iframe but the upload sucessful flag hasn’t been set then the upload must have failed. This is far from perfect (you still can’t get at the actual S3 error message, other than by showing the hidden iframe) but I think it’s the best you can do without resorting to a flash based uploader.

It’s not a great user experience, particularly with larger files as you don’t in general get any feedback about progress (Chrome displays progress information in the status area).

Asynchronous uploads with HTML 5

Luckily the HTML 5 apis take care of this. There are jquery plugins such as jQuery-File-Upload that handle this but if you don’t need all the bells and whistles it’s easy enough to roll your own - something like this does the trick

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$('form').on('submit', function(event){
  var data = new FormData(this);
  var xhr = new XMLHttpRequest();
  xhr.upload.addEventListener('progress',function(ev){
    //display progress in some way
  }, false);
  xhr.onreadystatechange = function(ev){
    if(xhr.readyState == 4){
      //complete! - check xhr.status
    }
  };
  xhr.open('POST', , true);
  xhr.send(data);
  return false;
})

You need to use FormData because amazon is expecting a multipart encoded form - you can’t just send the file contents as the body of the post. Because this is an ajax submission you will need to create a CORS policy for your s3 bucket. You need to allow POST requests of course.

If at the end xhr.status is in the 200 range then the uploaded succeeded (you can configure exactly which response code amazon uses). If the upload fails because amazon rejected it (as opposed to a network error) then xhr.responseXML will be a document containing the error. You can pull out a textual description of the error with

$('Message', xhr.responseXML).text()

Which probably won’t be much help to the end user but could be handy when they report the problem to you. Unfortunately the user will still to wait for the upload to transfer in its entirety for the response.

You don’t seem to be able to use success_action_redirect at the moment - the 303 response from S3 doesn’t have any CORS headers which seems to confuse browsers. If you need to access the key under which the file was saved then if you set success_action_status to 201 then S3 will respond with an XML document that has the key and bucket under which the file was saved. You can then access this using a similar approach to the error message:

$('Key', xhr.responseXML).text()

This is a bit inconvient if you want a single form that falls back to a traditional form submission. However you can get around this by setting up both success_action_status and success_action_redirect in your policy, with a condition of 'starts-with' => ''. Then in your pre submit javascript clear the value of success_action_redirect if you are about to go down the HTML5 route or the value of success_action_status otherwise.