Uploading Files to Manta from a browser

How to use Signed URLs and CORS to Upload Files to Manta

One of the very first questions that came up after launching Manta was how to have a browser directly upload an object into Manta. There are several reasons you would want this, but most obvious is that it allows clients to bypass your web server(s), which (1) reduces your bandwidth costs, and (2) reduces client latency. Unfortunately the browser security model is fairly complicated here as CORS is brought into play. To answer this question, I created a small (and ugly!) sample application to illustrate how to accomplish this. This blog post will explain the tricky pieces of the sample application; it will not stand alone without looking at the sample code.

A few concepts

Signed URLs

There seems to be some general confusion around what signed urls are used for. Basically, signed URLs are a way for you to hand out an expiring “ticket” that allows somebody else to see a single, private Manta object. For example, suppose I have an MP3 that I wanted to share with a few friends over email; rather than putting the file in /:login/public, and getting sued by the RIAA, I would place it in /:login/stor/song.mp3, generate a signed URL to it, and just send the URL to them. msign is the command line utility that will generate a presigned URL, but in this example we’ll be generating it programatically.

CORS

CORS, or “Cross-Origin-Resource-Sharing” is a mechanism that allows browsers to access resources that did not originate on the same domain. While functional, it’s very complicated (personally, there is little else in the web world I hate more than CORS); for a gentler introduction than the W3C spec, see the MDN page. Manta fully supports CORS on a per-directory and per-object basis, so that you are empowered to be as restrictive or permissive as you like. To achieve direct upload, you will need to set CORS headers on your directories. In the examples below, I’ve basically set them “wide open.”

The General Flow

The gist is that you are still running some web application, albeit a light one, that a browser does interact with to get “tickets,” that allow the browser to directly write into some object under your control. There are other ways to accomplish doling out “tickets,” but this is the most practical. In the example I made, each browser session gets their own “dropbox,” that the server sets up for them (in reality it would be tied to your webapp’s users, not a session). The browser has some little HTML form, and when the user selects a file and submits the HTML form, the browser asks your webserver for a location to upload. Your webapp generates a Manta signed url, and gives that to the browser. The browser then creates an Ajax request and sends the bytes up. Here’s an illustration of all that text:

Upload Sequence Diagram

Of course the devil is in the details…

Our Storage Layout

In this example I’m using /$MANTA_USER/stor/dropbox, as the root for uploads. Note that /:login/stor/ is “private,” so only you can read and write data. For each browser session that comes in, our webserver creates /:login/stor/dropbox/:session (which is just a random number in this example).

When a user selects a file to upload, we send it to /:login/stor/dropbox/:session/:filename. If the user uploads the same file multiple times, it just gets overwritten.

The Web Server

We’ll start with an examination of the important parts of the web server. I used no dependencies in this example so there’s no confusion about which toolkit makes more sense, etc.; it’s all just “straight up” node http. I’m not going to walk through every line of the example application, but instead just give some more context on the particularly tricky parts that may not be clear.

Creating a per-session directory

When we see a new user session (to reemphasize again, you assuredly want this based off your user’s name or id or something), we create a “private directory”. Ignoring all the setup and node HTTP stuff, here’s the bits that creates a per-session directory – I’ve slightly modified the example code here to be readable out of context:

Creating a Directory with CORS options

// "cookie" is the sessionid
var dir = '/' + process.env.MANTA_USER + '/stor/dropbox/' + cookie;
var opts = {
    headers: {
            'access-control-allow-headers': 'access-control-allow-origin, accept, origin, content-type',
            'access-control-allow-methods': 'PUT,GET,HEAD,DELETE',
            'access-control-allow-origin': '*'
    }
};
mantaClient.mkdir(dir, opts, function (err) {
    assert.ifError(err);

    // HTML is just the static string of our webapp
    res.setHeader('set-cookie', 'name=' + res.cookie);
    res.setHeader('content-type', 'text/html');
    res.setHeader('content-length', Buffer.byteLength(HTML));
    res.writeHead(200);
    res.end(HTML);
});

The important aspect to point out here is the header block we pass into the options block on mkdir. When a write request comes into Manta in a “CORS scenario,” the server honors the CORS settings on the parent directory. So setting up the requisite CORS headers on the directory we want to write into allows the browser to go through all the preflight garbage and send the headers it needs to for uploading an object directly.

Signing a Request

This portion is actually pretty straightforward and handled by the Manta SDK. The only thing of interest here is that we’re signing a request to the given URL with two methods: OPTIONS and PUT. Normally you’d only hand out a signed URL with one method signed, but for this case as the browser preflighs the request with the same URL we need the server side to honor both. Again I’ve slightly modified the example application code here:

Signing a URL for the browser

var body = '';
req.setEncoding('utf8');
req.on('data', function (chunk) {
    body += chunk;
});

req.once('end', function () {
    var params = qs.parse(body) || {};
    if (!params.file)
      // send error

    var p = '/' + process.env.MANTA_USER + '/stor/dropbox/' + cookie + '/' + params.file;
    var opts = {
        expires: new Date().getTime() + (3600 * 1000), // 1hr
        path: p
        method: ['OPTIONS', 'PUT'],
    };
    mantaClient.signURL(opts, function (err, signature) {
        assert.ifError(err);

        var signed = JSON.stringify({
            url: process.env.MANTA_URL + signature
        });
        res.setHeader('content-type', 'application/json');
        res.setHeader('content-length', Buffer.byteLength(signed));
        res.writeHead(200);
        res.end(signed);
    });

So in the example above, the webapp POST’d a form to us with the file name the user wants to upload. A “real app” would want to sanitize that, and stop CSRF attacks, etc., but that’s outside the scope of this little application. Here we blindly sign it, and spit the URL back to the browser.

At this point we’re basically done with what our little webapp needed to do.

Client Side

First, a disclaimer: I am awful at client-side code, so please don’t be wed to anything I did here. Anyway, so I made a single HTML page with an upload form and some jQuery pieces: specifically I used their Ajax API where it made sense, along with their Form Helpers. Lastly, so you can see progress information, I stuck in a progress bar.

Ok, enough preamble, let’s see the code!

The tried and true form

<form enctype="multipart/form-data">
  <input name="file" type="file" accept="image/jpeg" />
  <input type="button" value="Upload" />
</form>

Yup, that’s a form. What do we do when the user submits? As per our flows above, we first need to request a place to write the file to, so we ask the webserver to sign the file name – note this is “straight up” jQuery and ajax, nothing fancy about this:

jQuery Sign The filename

$(function() {
    var file;
    $(':file').change(function(){
        file = this.files[0];
    });

    $(":button").click(function () {
        $.ajax({
            url: 'sign',
            type: 'POST',
            data: {
                file: file.name
            }
        }).done(function (signature) {
            // signature looks like
            // {
            //   url: 'https://...'
            // }
        });
    });

Uploading the file to Manta

At long last, we can now push the raw bytes into Manta. Our own webserver gave us a URL we can write to for an hour. We now make an XHR2 request and directly PUT the data.

XHR2.send() data

function onSignature(signature) {
    var xhr = new XMLHttpRequest();

    xhr.open('PUT', signature.url, true);
    xhr.setRequestHeader('accept', 'application/json');
    xhr.setRequestHeader('access-control-allow-origin', '*');
    xhr.setRequestHeader('content-type', 'image/jpeg');
    xhr.send(file);
}

A few notes:

XHR2.send()) is the only way I know of to actually send an uninterpreted byte stream. XHR and jQuery’s Ajax both send a multipart framed message, which Manta will not interpret; meaning you would end up with an object that still has HTTP framing noise in it.
In this example I sent access-control-allow-origin: *. That’s specifically so that future GET requests will work from a web browser as well. When reading an object, the CORS semantics are inferred from the object itself.

That’s pretty much it – once the browser completes you can see it using mls or whatever other tool you want.

Conclusion

This article explained how to construct a web application that allows clients to directly upload to Manta. It highlighted the relevant portions of a sample application I created that does this using Ajax and XHR Level 2. Comments welcome!

Mark Cavage