Cache headers have been an essential part of the HTTP spec from the very beginning. They have played crucial role in scaling web to the enormous size that it has today. Or, at least: the statement is true when we talk about the “human web”. Unfortunately, vast majority of APIs (the “machine web”) either completely ignore HTTP-level caching, or have implementations that are different levels of broken.

Following is a quick guide for implementing HTTP caching properly in your APIs.

The Role of HTTP Caching

If you ask a randomly-selected developer what caching is for, more likely than not you will get an answer that it’s for “making things faster”. Now, that is a very generic answer and actually not accurate when we are talking about network-level caching (which HTTP caching is a part of). Network-level caching is not for making a slow computation faster. Its primary purpose is to increase scalability – throughput defined in requests/sec that we can process without degrading performance, compared to a single-user scenario. Generally speaking, speed of response when there’re few users connected should already be at the desired level, using other forms of optimization.

Network-level caching is appropriate for improving throughput (scalability), but not: response time under low levels of load (speed).

Two Use-Cases for HTTP Caching

Given the role of HTTP caching we explained above, you can use this type of caching for both mostly-static data, as well as: dynamic i.e. rapidly changing data.

If you are dealing with dynamic data, you cannot cache for long periods of time, such as: days, hours or sometimes even minutes because data becomes stale too quickly. That doesn’t mean, however, that such data shouldn’t be cached at all.

Since network-level caching is mostly used for increasing the throughput, it makes sense to cache responses even for very short periods of time. Let’s see how this works.

If you have low load-levels, there’s no sense in caching something for several seconds, as most clients won’t be able to reach cached data: the rate of cache misses compared to cache hits will be very high. However, when the load increases (let’s say to: hundreds of requests/sec) even a 5 second cache will be a lifesaver, satisfying thousands of requests from cache, rather than origin providing very effective protection for the backend systems by avoiding database hits etc. The amazing thing about this type of caching is that: it becomes more effective as the load on the system increases (more cache hits before cache expires).

When I worked at NPR, we were using this type of caching extensively. With a news website you obviously cannot cache content for too long: reporters and journalists will update an article as they see fit and they have very little patience for the article not refreshing despite the edit they made. However, if you cache content for even very short time periods, you get huge payback when millions of readers are simmultanously accessing the very same page/API, during a breaking news event of some kind.

This is the kind of scenario in which caching even dynamic content makes a lot of sense. Most people who have worked on high-traffic systems would have used this approach.

The second type of caching that can and should be used extensively deals with nearly-static data. In any application you have plenty of such data-sets, ranging from: list of coutries, states, cities (any domain), insurance providers (healthcare), podcasts, series and topics (news media), currencies (banking) etc. These lists do change, and generally we have no idea when they will change, but we do know that they change quite infrequently. Nearly-static data-sets are very effective target for long-term caching.

In general, the way we cache long-lasting data-sets vs dynamic data-sets in HTTP, is different.

Caching Dynamic Data with HTTP

If you have had an opportunity to hear Mike Amundsen talk about distributed systems architecture, you may already know that in distributed systems deployed over large geographic areas (e.g.: the web) you cannot rely on the existance of shared understanding of “now” (or: time in general). This, among other things, has to do with basic physics: information cannot propagate instantenously due to the limit on speed of light. For instance, if a server in Chicago, at 11:55:00AM local time, tells a client in Melbourne, Australia that a response is valid until 11:55:02AM Chicago time:

  1. We will have to be sure that timezone conversions are done properly by every participant of that exchange.
  2. We will need to assume clocks on Chicago server and Melbourne client are ideally synchronized (generally: a pipe dream).
  3. For the client to leverage cache, response from Chicago server to Melbourne client will need to reach in less than 2 seconds, otherwise cache will be already invalid by the time response is recieved. Considering distance between Chicago and Melbourne, the theoretical limitation of speed of light and the actual speed of data-transmission on the public web, which is much slower: this goal may be very much unattainable.

In distributed systems, deployed at large distances (such as: the web) the above assumptions are so unrealistic that using date-based caching instructions, such as: Expires header is highly ineffective. Same is true for the combination of Last-Modified and If-Modified-Since headers which also rely on a shared understanding of date-time.

Instead, when caching resources for short periods of time you should be using HTTP caching instructions that do not rely on shared understanding of time, such as: cache-control: max-age and ETags

Caching Near-Static Data Sets

If you are caching resources (API responses) for sufficiently long periods of time (hours, days, potentially: months) you usually do not have to worry about the issues related to date-time-based caching, that were described in the previous section.

For facilitating caching of near-static data, where we have no reliable clue about when data will become stale, but we know it won’t happen too soon and yet it will happen, you could use two approaches:

  1. Entity Tags (ETags) that don’t rely on shared agreement on time
  2. Last-Modified HTTP header that is date-time-centric.

Let’s see how each one of those works:

Using Entity Tags

In this workflow, for each response, server provides an ETag header in the response. For a specific “version” of the data, ETag has to be unique and constant, until the data changes:

HTTP/1.1 200 OK
Content-Type: application/vnd.uber+json; charset=UTF-8
Expires: Sat, 01 Jan 1970 00:00:00 GMT
Pragma: no-cache
ETag: "88d979a0a78942b5bda05ace4214556a"

… the rest of the response …

In a number of implementations, ETag is some sort of a hash of the response payload, but it can really be anything as long as it’s unique and consistent with the change of data (same response = same ETag).

Important: Please note that while we are discussing ETags in the context of caching, ETag HTTP header is not, technically, a “caching” header per se. It’s part of the RFC7232 - Conditional Requests specification, separate from the RFC7234 - HTTP 1.1 Caching spec. HTTP clients handle ETags in the response and cache instructions independently. This is why the example response above has pragma: no-cache and the Expires header set in distant past. That specific API wants you to use ETags for determining freshness of the response, but not: cache headers. The two approaches can get in each other’s way through double-caching, if they instruct the client with incosistent hints. In general, you should either always use only one approach in your responses (explicitely disabling the other) or make absolute sure that the two hints lead to the same result, regardless of which one the client is paying attention to, or even if the client respects both instructions (“good” clients should).

The reason the example API decided to explicitely disable caching is: some clients (e.g. web browsers) make some default assumptions about the cache-ability of content when no cache headers are present. To the best of my knowledge, no major HTTP client makes assumptions on ETags, so we typically don’t need to worry about undefined ETags.

Once client receives the response and sees the ETag value, it should save the ETag and the response in the local store and issue subsequent requests to the same data with the If-None-Match header that points to the value of the ETag saved:

Get /countries HTTP/1.1
Host: api.example.org
Content-Type: application/vnd.uber+json
If-None-Match: "88d979a0a78942b5bda05ace4214556a"

If the data-set hasn’t modified on the server, the server must respond with HTTP 304 and an empty body:

HTTP/1.1 304 Not Modified
Content-Type: application/vnd.uber+json
Expires: Sat, 01 Jan 1970 00:00:00 GMT
Pragma: no-cache
Content-Length: 0

If the data-set has modified on the server, the server must respond with a full HTTP 200 response and the new value of ETag.

Using Last-Modified

In this workflow, for each response, server provides a “Last-Modified” header in the response, containing the last date the specific data was modified:

HTTP/1.1 200 OK
Last-Modified: Mon, 7 Dec 2015 15:29:14 GMT
Content-Length: 23456
Content-Type: application/vnd.uber+json; charset=UTF-8

… the rest of the response …

Once client receives the response and sees the Last-Modified header, it should save the value of the Last-Modified date-time and the corresponding response in the local store (cache). Client should issue subsequent requests to the same data with the If-Modified-Since header that points to the value of the date-time saved:

Get /countries HTTP/1.1
Host: api.example.org
Content-Type: application/vnd.uber+json
If-Modified-Since: Mon, 7 Dec 2015 15:29:14 GMT

If the data-set hasn’t modified on the server, since the date-time indicated, the server must respond with HTTP 304 and an empty body:

HTTP/1.1 304 Not Modified
Content-Type: application/vnd.uber+json
Content-Length: 0

If the data-set has modified on the server, the server must respond with a full HTTP 200 response and the new value of the “Last-Modified” header.

One More Thing: Vary Header

When using cache-controlling headers (such as: Last-Modified, cache-control: max-age, and Expires) the communicating parties have to determine what constitutes “access to the same resource representation”. This is generally determined based on: “is the full request URL the same between two interactions”. However, depending on what you are doing, two requests with the same full URL may be pointing to different resource representations, because some of the headers in the exchange are different. Case in point: you don’t want a client to cache JSON representation when you are asking for XML representation at the same API endpoint. In this two cases URL can be the same, between two interactions, but the expected content type of the payload will be different because of the HTTP header values.

If values of HTTP headers have to be taken into account during HTTP caching, you need to utilize a specialO HTTP header called: Vary. Basically, this header lets communicating parties know which other headers they need to pay attention to, besides the URL, in caching determinations.

You can see an example of effective usage of the Vary header in our recent blog post where we discussed usage of the Prefer header.