# Metadata Design

Metadata is data that further describes an already existing piece of information. In our case, metadata refers to the properties that are inside the headers of our HTTP calls.

There are quite a number of HTTP header properties, to see the full list, visit: <https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers>.

HTTP headers can be categorized into 4 broad categories, they are *general headers*, *request headers*, *entity headers*, and *response headers*. We will be interested in mainly the **response headers**.

Response headers can also be further broken down into *content headers*, *cache headers,* *media headers*, and many more. We'll take a look at with some of the best practices and guidelines regarding HTTP headers and how they should be applied when building out our RESTful APIs.

## Content Header Guidelines

### Guideline #1: "Content-Type" Is Required

The Content-Type header reveals a specified media type in the response. It allows the client to know how to process the body of the response that is being sent over.

Here are some common content types you may have seen.

* `application/json`
* `application/octet-stream`
* `image/png`
* `audi/mpeg`
* `text/html`

### Guideline #2: "Content-Length" Should Be Provided

The `Content-Length` header specifies the size of the response's body in bytes. This allows the client to know whether it has read the correct number of bytes from the connection and also allows the a client to make HEAD requests about the size of the body without downloading it completely.

## Cache Header Guidelines

### Guideline #1: "Cache-Control" Should Be Used for Caching

The `Cache-Control` header is the most widely used client-side caching mechanism included since HTTP 1.1 (1997). Any application that wants to take advantage of client-side caching should considering including it in their response.

Here are some of the most common directives used by `Cache-Control`.

1. **public** - Indicates that the response may be cached by any cache, either in the client's or any intermediary proxies between the client and the server.
2. **private** - Indicates that the response message is intended for a single user and must not be cached by a shared cache.
3. **no-cache** - A cache must not use the response to satisfy a subsequent request without successful re-validation with the origin server.
4. **no-store** - The cache should not store anything about the client request or server response.
5. **must-revalidate** - The cache must verify the status of the stale documents before using it and expired ones should not be used.
6. **max-age = seconds** - Indicates that the client is willing to accept a response whose age is not greater than the specified time in seconds.

Example, consider the following: `Cache-Control: max-age=100, private`.

This indicates that the cache can be cached for 100 seconds, but only at the requested user's client.

The HTTP `Cache-Control` feature is quite an extensive topic, visit <https://roadmap.sh/guides/http-caching#browser-cache> to get a complete list of directives and more information on it.

### Guideline #2: "Pragma" and "Expires" Should Still Be Used To Support Caching

`Pragma` and `Expires` were the caching mechanisms that were used pre-HTTP 1.1, when HTTP 1.0 was still in use between 1991 and 1997.

Although Cache-Control is most widely used today, and both Pragma and Expires are now deprecated since the release of HTTP 1.1 in 1997, it is still a good idea to add them to support backwards compatibility for the client.

The `Expires` header takes in a date as its value, that date indicates how long the cache will last in the client. Here is an example.

`Expires: Mon, 13 Mar 2017 12:22:00 GMT`

The `Pragma` header has only one possible value, and that is simply `no-cache`, which basically tells the client to not cache anything. You would only need to use the `Pragma` header when you do not want to provide client side caching. Here is how it would look like.

`Pragma: no-cache`

### Guideline #3: "ETag" Should Be Provided

The `ETag` header is a generated hash that identifies a specific version of a resource. It is used for the client to make conditional requests in order to validate the cache from the server. For a complete list of HTTP conditional headers and what they do, see <https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests#conditional_headers> for more information.

The two the conditional request headers that are used with ETags are `If-Match` and `If-None-Match`.

There are also two different types of ETag hashes. The strong ETag hash indicates that resource content is same for response body and the response headers. The weak ETag hash indicates that the two representations are semantically equivalent. It compares only the response body. The weak ETag hash is prefixed with a "W/" whereas the strong ETag is not.

The `ETag` header should always be provided as it gives the client the ability make conditional requests with the `If-Match` and `If-None-Match` headers.

Take the following example. Suppose we make a request to our server and we get the following output with cURL as our client.

`$ curl -v localhost:3000/ETag-test`

```
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
> GET /ETag-test HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Cache-Control: max-age=100, private
< Content-Type: application/json; charset=utf-8
< Content-Length: 37
< ETag: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"
< Date: Sat, 25 Dec 2021 19:32:13 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
* Connection #0 to host localhost left intact
{"message":"Hello world! ETag Test!"}* Closing connection 0
```

The response we get back has the following headers.

`Cache-Control: max-age=100, private`

`ETag: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"`

And the following body.

```json
{
  "message": "Hello world! ETag Test!"
}
```

This header tells us that we, the client will keep using this response from the cache for the next 100 seconds.

Suppose we were to make a call to the server with the `If-None-Match` header, the server will then match the ETag of the resource with the newly available resource.

If it doesn't match, server will respond with the new ETag and the new resource which will then be used to replace the old one.

If it does match the existing resource, the server will respond with the status code of 304 which means "Not modified", and the client will renew the cache for another 3600 seconds.

Let's see it here in action.

`$ curl --header 'If-None-Match: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"' -v localhost:3000/ETag-test`

```
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
> GET /ETag-test HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
> If-None-Match: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"
>
< HTTP/1.1 304 Not Modified
< X-Powered-By: Express
< Cache-Control: max-age=100, private
< ETag: "25-+vIrmGA7FcSjzeJueoK/J+jWGd4"
< Date: Sat, 25 Dec 2021 19:33:23 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
* Connection #0 to host localhost left intact
* Closing connection 0
```

Notice the `HTTP/1.1 304 Not Modified` in the response, whereas before, we had a `HTTP/1.1 200 Ok`. That's because there was a match with the ETag we sent over. If there was no match, then it would have been a status code of 200 being returned along with a new ETag hash in the response.

### Guideline #4: "Last-Modified" Should Be Provided

The `Last-Modified` header serves a very similar purpose to the `ETag` header. It is used to indicate the date and time of when the content was last modified. The client can make a conditional request with the `Last-Modified` date along with conditional headers `If-Modified-Since` or `If-Unmodified-Since`. This will have similar very effect in which if the resource had not been modified, it will return a 304 — Not modified, or if the resource has been modified, it will return the brand new response.

It is therefore important to provide the `Last-Modified` header if you want the client to have an alternative caching method so that they may make conditional requests with `If-Modified-Since` or `If-Unmodified-Since`.

### Guideline #5: Add Optional Caching to 3XX and 4XX Responses

Although it is most common to cache mainly GET requests that return a 200 response. It can sometimes be beneficial to include caching in 300 or even 400 responses, this would reduce the load on the amount of redirection and errors on a RESTful API.

## Additional Header Guidelines

### Guideline #1: Consider Using An Application-Specific Media Type

You may have come across the most commonly used content types such as these:

* `application/json`
* `application/octet-stream`
* `image/png`
* `audi/mpeg`
* `text/html`

But, there are actually many custom media types that available as well. Here are some examples:

* `application/vnd.ms-excel`
* `application/vnd.lotus-notes`
* `text/vnd.sun.j2me.app-descriptor`

Notice the "vnd" prefix, that's because these are vendor specific custom media types that are official media types similar to your everyday `application/json` or `text/html`.

In fact anyone can register a custom media type with the Internet Assigned Numbers Authority (IANA). Here is a list of officially registered media types: <https://www.iana.org/assignments/media-types/media-types.xhtml>.

When building a RESTful API, it is important consider creating a custom application-specific media type. Although this is quite uncommon for most projects, this additional layer of structure can be extremely beneficial for larger APIs if you want better predefined structure and types.

### Guideline #2: Prefix Custom Headers

If you've been a web developer for some time, you may have encountered several custom headers provided by different companies or software during your career.

By convention, a customer header will always start with an `X-`. This is done so you won't accidentally override any existing predefined HTTP headers.

Here are some examples of custom headers by software that you are probably already familiar with.

```
X-Powered-By: Express
X-Cache: Miss from cloudfront
X-Drupal-Cache: HIT
```

If you are in any situation where you want to add a custom header of your own in a RESTful API, do prefix it with an `X-`.

### Guideline #3: Use The Location Header Specify The URI Of A Newly Created Resource

The `Location` header can contain a URI which identifies a resource that can be helpful to the client. When a resource is created, usually via a POST request, it can be very useful to include the new generated resource in the Location header.

Below is a good example of a common use case involving the creation of new users in our application.

If we make a cURL call with `curl -v localhost:3000/api/v1/users -X POST`.

We get the following response.

```
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
> POST /api/v1/users HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Location: /api/v1/users/181de1eb98aa7ffe6a127232178793d7
< Content-Type: application/json; charset=utf-8
< Content-Length: 79
< ETag: "4f-GFjDt03GZ9sYOxv4+1mObRaXK5I"
< Date: Sun, 26 Dec 2021 05:01:47 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
* Connection #0 to host localhost left intact
{"message":"New user has been created! Find the link the \"Location\" header!"}
* Closing connection 0
```

Notice the `Location: /api/v1/users/181de1eb98aa7ffe6a127232178793d7`. This is a great way for the client to know where the newly created user can be accessed. We could have also added some sort of link in the body of our response, which if we did, then we would have satisfied the "hypermedia" aspect of our RESTful API. Nonetheless, doing this with a header is still a good and clean convention to follow.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.restfulnode.com/part-2/chapter-3/6-metadata-design.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
