Metadata Design

Metadata is data that further describes an already existing piece of information. In our case, metadata refers to the properties that are inside the headers of our HTTP calls.

There are quite a number of HTTP header properties, to see the full list, visit: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers.

HTTP headers can be categorized into 4 broad categories, they are general headers, request headers, entity headers, and response headers. We will be interested in mainly the response headers.

Response headers can also be further broken down into content headers, cache headers, media headers, and many more. We'll take a look at with some of the best practices and guidelines regarding HTTP headers and how they should be applied when building out our RESTful APIs.

Content Header Guidelines

Guideline #1: "Content-Type" Is Required

The Content-Type header reveals a specified media type in the response. It allows the client to know how to process the body of the response that is being sent over.

Here are some common content types you may have seen.

  • application/json

  • application/octet-stream

  • image/png

  • audi/mpeg

  • text/html

Guideline #2: "Content-Length" Should Be Provided

The Content-Length header specifies the size of the response's body in bytes. This allows the client to know whether it has read the correct number of bytes from the connection and also allows the a client to make HEAD requests about the size of the body without downloading it completely.

Cache Header Guidelines

Guideline #1: "Cache-Control" Should Be Used for Caching

The Cache-Control header is the most widely used client-side caching mechanism included since HTTP 1.1 (1997). Any application that wants to take advantage of client-side caching should considering including it in their response.

Here are some of the most common directives used by Cache-Control.

  1. public - Indicates that the response may be cached by any cache, either in the client's or any intermediary proxies between the client and the server.

  2. private - Indicates that the response message is intended for a single user and must not be cached by a shared cache.

  3. no-cache - A cache must not use the response to satisfy a subsequent request without successful re-validation with the origin server.

  4. no-store - The cache should not store anything about the client request or server response.

  5. must-revalidate - The cache must verify the status of the stale documents before using it and expired ones should not be used.

  6. max-age = seconds - Indicates that the client is willing to accept a response whose age is not greater than the specified time in seconds.

Example, consider the following: Cache-Control: max-age=100, private.

This indicates that the cache can be cached for 100 seconds, but only at the requested user's client.

The HTTP Cache-Control feature is quite an extensive topic, visit https://roadmap.sh/guides/http-caching#browser-cache to get a complete list of directives and more information on it.

Guideline #2: "Pragma" and "Expires" Should Still Be Used To Support Caching

Pragma and Expires were the caching mechanisms that were used pre-HTTP 1.1, when HTTP 1.0 was still in use between 1991 and 1997.

Although Cache-Control is most widely used today, and both Pragma and Expires are now deprecated since the release of HTTP 1.1 in 1997, it is still a good idea to add them to support backwards compatibility for the client.

The Expires header takes in a date as its value, that date indicates how long the cache will last in the client. Here is an example.

Expires: Mon, 13 Mar 2017 12:22:00 GMT

The Pragma header has only one possible value, and that is simply no-cache, which basically tells the client to not cache anything. You would only need to use the Pragma header when you do not want to provide client side caching. Here is how it would look like.

Pragma: no-cache

Guideline #3: "ETag" Should Be Provided

The ETag header is a generated hash that identifies a specific version of a resource. It is used for the client to make conditional requests in order to validate the cache from the server. For a complete list of HTTP conditional headers and what they do, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests#conditional_headers for more information.

The two the conditional request headers that are used with ETags are If-Match and If-None-Match.

There are also two different types of ETag hashes. The strong ETag hash indicates that resource content is same for response body and the response headers. The weak ETag hash indicates that the two representations are semantically equivalent. It compares only the response body. The weak ETag hash is prefixed with a "W/" whereas the strong ETag is not.

The ETag header should always be provided as it gives the client the ability make conditional requests with the If-Match and If-None-Match headers.

Take the following example. Suppose we make a request to our server and we get the following output with cURL as our client.

$ curl -v localhost:3000/ETag-test

*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
> GET /ETag-test HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Cache-Control: max-age=100, private
< Content-Type: application/json; charset=utf-8
< Content-Length: 37
< ETag: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"
< Date: Sat, 25 Dec 2021 19:32:13 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
* Connection #0 to host localhost left intact
{"message":"Hello world! ETag Test!"}* Closing connection 0

The response we get back has the following headers.

Cache-Control: max-age=100, private

ETag: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"

And the following body.

{
  "message": "Hello world! ETag Test!"
}

This header tells us that we, the client will keep using this response from the cache for the next 100 seconds.

Suppose we were to make a call to the server with the If-None-Match header, the server will then match the ETag of the resource with the newly available resource.

If it doesn't match, server will respond with the new ETag and the new resource which will then be used to replace the old one.

If it does match the existing resource, the server will respond with the status code of 304 which means "Not modified", and the client will renew the cache for another 3600 seconds.

Let's see it here in action.

$ curl --header 'If-None-Match: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"' -v localhost:3000/ETag-test

*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
> GET /ETag-test HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
> If-None-Match: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"
>
< HTTP/1.1 304 Not Modified
< X-Powered-By: Express
< Cache-Control: max-age=100, private
< ETag: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"
< Date: Sat, 25 Dec 2021 19:33:23 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
* Connection #0 to host localhost left intact
* Closing connection 0

Notice the HTTP/1.1 304 Not Modified in the response, whereas before, we had a HTTP/1.1 200 Ok. That's because there was a match with the ETag we sent over. If there was no match, then it would have been a status code of 200 being returned along with a new ETag hash in the response.

Guideline #4: "Last-Modified" Should Be Provided

The Last-Modified header serves a very similar purpose to the ETag header. It is used to indicate the date and time of when the content was last modified. The client can make a conditional request with the Last-Modified date along with conditional headers If-Modified-Since or If-Unmodified-Since. This will have similar very effect in which if the resource had not been modified, it will return a 304 — Not modified, or if the resource has been modified, it will return the brand new response.

It is therefore important to provide the Last-Modified header if you want the client to have an alternative caching method so that they may make conditional requests with If-Modified-Since or If-Unmodified-Since.

Guideline #5: Add Optional Caching to 3XX and 4XX Responses

Although it is most common to cache mainly GET requests that return a 200 response. It can sometimes be beneficial to include caching in 300 or even 400 responses, this would reduce the load on the amount of redirection and errors on a RESTful API.

Additional Header Guidelines

Guideline #1: Consider Using An Application-Specific Media Type

You may have come across the most commonly used content types such as these:

  • application/json

  • application/octet-stream

  • image/png

  • audi/mpeg

  • text/html

But, there are actually many custom media types that available as well. Here are some examples:

  • application/vnd.ms-excel

  • application/vnd.lotus-notes

  • text/vnd.sun.j2me.app-descriptor

Notice the "vnd" prefix, that's because these are vendor specific custom media types that are official media types similar to your everyday application/json or text/html.

In fact anyone can register a custom media type with the Internet Assigned Numbers Authority (IANA). Here is a list of officially registered media types: https://www.iana.org/assignments/media-types/media-types.xhtml.

When building a RESTful API, it is important consider creating a custom application-specific media type. Although this is quite uncommon for most projects, this additional layer of structure can be extremely beneficial for larger APIs if you want better predefined structure and types.

Guideline #2: Prefix Custom Headers

If you've been a web developer for some time, you may have encountered several custom headers provided by different companies or software during your career.

By convention, a customer header will always start with an X-. This is done so you won't accidentally override any existing predefined HTTP headers.

Here are some examples of custom headers by software that you are probably already familiar with.

X-Powered-By: Express
X-Cache: Miss from cloudfront
X-Drupal-Cache: HIT

If you are in any situation where you want to add a custom header of your own in a RESTful API, do prefix it with an X-.

Guideline #3: Use The Location Header Specify The URI Of A Newly Created Resource

The Location header can contain a URI which identifies a resource that can be helpful to the client. When a resource is created, usually via a POST request, it can be very useful to include the new generated resource in the Location header.

Below is a good example of a common use case involving the creation of new users in our application.

If we make a cURL call with curl -v localhost:3000/api/v1/users -X POST.

We get the following response.

*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
> POST /api/v1/users HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Location: /api/v1/users/181de1eb98aa7ffe6a127232178793d7
< Content-Type: application/json; charset=utf-8
< Content-Length: 79
< ETag: "4f-GFjDt03GZ9sYOxv4+1mObRaXK5I"
< Date: Sun, 26 Dec 2021 05:01:47 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
* Connection #0 to host localhost left intact
{"message":"New user has been created! Find the link the \"Location\" header!"}
* Closing connection 0

Notice the Location: /api/v1/users/181de1eb98aa7ffe6a127232178793d7. This is a great way for the client to know where the newly created user can be accessed. We could have also added some sort of link in the body of our response, which if we did, then we would have satisfied the "hypermedia" aspect of our RESTful API. Nonetheless, doing this with a header is still a good and clean convention to follow.

Last updated