Design of a File Upload API

Design of a File Upload API

Context

 

Recently, a customer asked us to design an API for uploading files.

The basic requirements were:

  • Consumers; partners & internal teams – should be able to upload files to our platform, from which these files are routed to other systems/partners/…
  • Along with these binary files, the consumer must be able to include metadata for these files, like a subject, description, source- and target destination.
  • All the functionality must be made available using web APIs.
  • The API must be protected with OAuth2 and JWT tokens
  • File size is usually, in 90% of the cases, less than 10mb.

 

To upload files using web technologies there are multiple solutions out there:

  • Direct File Upload
  • Multipart
  • Transactional Upload
  • ..etc

 

First of all, there is no “best solution” out there, as every solution has their advantages and disadvantages depending on your requirements, type of consumers and type of client applications. Evaluating the best solution for your scenario is a trade-off between several factors.

This blog describes the most common technologies out there with their pros and cons.

 

Direct File Upload

 

The document publisher uploads the file contents directly to the server using a binary endpoint. The publisher must specify in the Content-Type header field the media type of the file he wants to send.

The following requests creates or updates the contents of an existing file:

 

As you can see, I use the PUT method instead of the POST method to create the file on the server. This is because in this scenario, the document publisher is responsible for providing the ID of the resource, which in this case, is the filename. This means sending multiple PUT requests with the same filename will overwrite the file on the server instead of creating each time a new file.

 

Metadata can be sent along the request using header parameters, as shown in the previous example, or by exposing a separate endpoint for the metadata, e.g.: /documents/{filename}/metadata

Pro:

  • Easy solution
  • One call per document

 

Cons:

  • Sending the metadata within the request headers (key-value pairs) imposes limitations on the structure and length of the metadata.

 

Note: to prevent a document publisher from overwriting or downloading someone else’s files, you must create the files in the backend together with the user security context and always check the identity of the user making the request.

 

Multipart

 

The server exposes one endpoint which allows the document publisher to upload both the metadata and the file in one request. The API provider can even allow the document publisher to upload multiple files in one request.

 

The document publisher must set the Content-Type header field to “multipart/form-data” which is used for submitting forms that contain files or binary data. The request contains multiple parts (in the example below; the file and the metadata) separated by part boundaries. Each part itself has a Content-Type header field specifying the MIMI type of the part.

 

The following example uploads a picture together with the metadata in JSON format:

 

Pros:

  • You can send the metadata in JSON format
  • Only one request needed

 

Cons:

  • Not very ‘developer’ friendly

 

Transactional Upload

 

Transactional upload is a pattern for file uploads. The pattern allows the document publisher to send multiple files to the server in one transaction.

Transactional Upload is a three-step process in which the document publisher

  1. initiates an ‘upload session’
  2. uploads the files
  3. commits the files for further processing

 

Let me describe each step in the following example, where a document publisher creates a case to which he attaches some documents.

      1. The document publisher initiates an upload session

Endpoint: POST /cases

The document publisher initiates a new case and includes the metadata in the body of the request:

 

The API provider answers with HTTP response code ‘201 Created’ and includes the link to the newly created case in the Location header of the response, which includes the case ID.

 

       2. The document publisher uploads files for a case

Endpoint: PUT /cases/{caseID}/documents/{fileName}

The document publisher uploads all the documents related to a case one by one via direct file upload. The case ID and the filename are specified as path parameters.

The response contains a link to the newly created file in the Location response header:

     

      3. The document publisher commits the case (session) for further processing

Endpoint: POST /cases/{caseID}/commit

Once all files have been uploaded to the server for a case, the document publisher can commit the files for further processing by the backend service.

The files will be processed asynchronously by a backend service and the API answers with HTTP response code “202 Accepted”.

Pros:

  • Developer (DX) friendly
  • Enables the ability to send multiple files in one batch

 

Cons:

  • Extra calls are required

 

Uploading big (GBs) files

 

Uploading big files, like files exceeding 100mb for example, is a different game. In such scenarios the platform has to deal with a lot of additional strains on the platform:

  • Uploading a big file in different parts and rebuilding these parts together into one file on the server.
  • Resuming files after broken pipes.
  • Memory consumption.

 

Uploading big files requires other solutions not mentioned in this blog.

 

Additional Features

 

Not mentioned but advisable to implement are the following features:

  • File Integrity: You can foresee an extra header parameter in the request to allow the document publisher to send the hash of the file content along with the request. This hash can be used to verify the integrity of the binary stream at the server.
  • API Gateway: Introducing an API Gateway in the architecture can protect your API endpoints by enforcing policies on each request. The API Gateway can, for example:
    • Check if the access token is valid
    • Log the request details to create visibility
    • Block file upload requests of files bigger than X MB’s.
    • etc.…

 

Conclusion

 

As described above, various techniques can be used when designing a File Upload API. It is crucial to have a clear understanding of all the requirements and how your API will be used before making any design choice.

 

Author: Bert Meuris



Apply for these jobs
Working at i8c
i8c is a system integrator that strives for an informal atmosphere between its employees, who have an average age of approx 30 years old. We invest a lot of effort in the professional development of each individual, through a direct connection between the consultants and the management (no multiple layers of middle management). We are based in Kontich, near Antwerp, but our customers are mainly located in the triangle Ghent-Antwerp-Brussels and belong to the top 500 companies in Belgium (Securex, Electrabel, UCB, etc…).
Quality Assurance
i8c is committed to delivering quality services and providing customer satisfaction. That’s why we invested in the introduction of a Quality Management System, which resulted in our ISO9001:2000 certification. This guarantees that we will meet your expectations, as a reliable, efficient and mature partner for your SOA & integration projects. i8c also signed the eTIC Benelux charter, which proves our commitment to ethical service delivery.