Concepts, Ideas, Strategies
Concepts, Ideas, StrategiesCache control via persisted queries [PRO]

Cache control via persisted queries [PRO]

GraphQL typically operates via POST, by executing all queries against a single endpoint, and passing parameters through the body of the request. That single endpoint's URL will produce different responses, which means it cannot be cached (at least, not using the URL as the identifier).

So the standard way to support caching in GraphQL is at the client layer, through the Apollo client and similar libraries, which cache the returned objects independently of each other, identifying them by their unique global ID.

(In contrast, when caching on the server, we normally use the URL as identifier, and we cache the data for all entities in the response all together.)

But this solution has several disadvantages:

  • The application got more JavaScript to run on the client-side. Accessing the website via a low-end mobile phone will take a performance hit
  • The application got more complex, and with more moving parts, since now we also need to worry about implementing the caching layer
  • Not everybody understands JavaScript (eg: the website may be coded in PHP), but now dealing with JS also becomes a responsibility

A much better solution is to use HTTP caching. Let's see the preconditions needed for this to work out.

Accessing GraphQL via GET

Using HTTP caching means that we will cache the GraphQL response using the URL as identifier. This has 2 implications:

  1. We must access the GraphQL's single endpoint via GET
  2. We must pass the query and variables as URL params

Then, if the single endpoint is /graphql, the GET operation can be executed against URL /graphql?query=...&variables=....

This applies to retrieving data from the server (via the query operation). For mutating data (via the mutation operation), we must still use POST. There is no problem here, since mutations are always executed fresh; we can't cache the results of a mutation, so we wouldn't use HTTP caching with it anyway.

This approach works (and it's even suggested in the official site), but there are certain considerations we must pay attention to.

Coding GraphQL queries via URL param

A GraphQL query will normally span multiple lines. For instance:

{
  posts {
    id
    title
  }
}

However, we can't input this multi-line string directly in the URL param.

The solution is to encode it. For instance, the GraphiQL client will encode the query above like this:

%7B%0A%20%20posts%20%7B%0A%20%20%20%20id%0A%20%20%20%20title%0A%20%20%7D%0A%7D

Alright, this works. But it doesn't look very good, right? Who can make sense of that query?

One of the virtues of GraphQL is that its queries are so easy to grasp. With some practice, once we see the query, we understand it immediately. But once it's been codified, all that is gone, and only machines can comprehend it; the human is out of the equation.

Another solution could be to replace all the newlines in the query with a space, which works because newlines add no semantic meaning to the query. Then, the query above can be represented as:

?query={ posts { id title } }

This works well for simple queries. But if you have a really long query, opening and closing many { }, and adding field arguments and directives, then it becomes increasingly difficult to understand.

For instance, this query:

{
  posts(limit:5) {
    id
    title @titleCase
    excerpt @default(
      value:"No title",
      condition:IS_EMPTY
    )
    author {
      name
    }
    tags {
      id
      name
    }
    comments(
      limit:3,
      order:"date|DESC"
    ) {
      id
      date(format:"d/m/Y")
      author {
        name
      }
      content
    }
  }
}

Would become this single-line query:

{ posts(limit:5) { id title @titleCase excerpt @default(value:"No title", condition:IS_EMPTY) author { name } tags { id name } comments(limit:3, order:"date|DESC") { id date(format:"d/m/Y") author { name } content } } } 

Once again, executing the query will work, but we won't know what it is we are executing.

And if the query also contains fragments, then absolutely forget it, there's no way we can make sense of it.

Persisted queries come to the rescue

If passing the query in the URL is not satisfactory, what other option do we have? Well, to not pass the query in the URL!

This is the approach called a "persisted query": We store the query in the server, and use an identifier (such as a numeric ID, or a unique string produced by applying a hashing algorithm with the query as input) to retrieve it. Finally, we pass this identifier as the URL parameter, instead of the query.

For instance, the query could be identified with ID 2908 (or a hash such as "50ac3e81"), and then we execute the GET operation against URL /graphql?id=2908. The GraphQL server will then retrieve the query corresponding to this ID, execute it, and return the results.

Gato GraphQL makes it even simpler: a persisted query is implemented as a custom post type, so we can create one and publish it as any regular post, and the slug we choose (which is by default based on the title we input) will become its identifier. Persisted queries make implementing HTTP caching trivial.

Calculating the max-age value

HTTP Caching works by sending the Cache-Control header in the response, with a max-age value indicating the amount of time the response must be cached, or no-store indicating to not cache it.

How will the GraphQL server calculate the max-age value for the query, considering that different fields can have different max-age values?

The response is: Get the max-age value for all fields requested in the query, and find out which is the lowest one. That will be the response's max-age.

For instance, let's say we have an entity of type User. Following the behavior assigned to this entity, we can assign for how long can the corresponding field be cached:

🛠 Its ID will never change ⇒ We give field id a max-age of 1 year

🛠 Its URL will be updated very randomly (if ever) ⇒ We give field url a max-age of 1 day

🛠 The person's name may change every now and then (eg: to add a status, or to say "Milton (wears a mask)") ⇒ We give field name a max-age of 1 hour

🛠 The user's karma on the site can change at all times (eg: after somebody upvotes their comment) ⇒ We give field karma a max-age of 1 minute

🛠 If querying the data from the logged-in user, then the response can't be cached at all (independently of whichever field we're fetching) ⇒ The max-age must be no-store

As a result, the response to the following GraphQL queries will have the following max-age values (for this example, we ignore the max-age for field Root.users, but in practice it will also be taken into account):

Querymax-age value
{
  users {
    id
  }
}
1 year
{
  users {
    id
    url
  }
}
1 day
{
  users {
    id
    url
    name
  }
}
1 hour
{
  users {
    id
    url
    name
    karma
  }
}
1 minute
{
  me {
    id
    url
    name
    karma
  }
}
no-store (don't cache)

Creating the Cache Control List

Once we have identified the max-age for each field, we input this information via a Cache Control List:

Defining a cache control policy

Gato GraphQL will then automatically calculate the response's max-age value, and send it back as the Cache-Control HTTP header.