Caching is a useful and surprisingly complex feature of web browsers.
In this article, we'll explain how the browser uses its cache to load pages faster, which factors determine cache duration, and how we can bypass the cache when necessary.
Why is Caching Important?
All browsers attempt to keep local copies of static assets in an effort to reduce page load times and minimize network traffic.
Fetching a resource over a network will always be slower than retrieving it from local cache. This is true whether the server is on the same network or it's located on the far side of the world.
How Browser Caching Works
Case 1: User has not visited the site before
The browser won't have any files cached for the site so it will fetch everything from the server.
Below is a snapshot of the resources downloaded when visiting the Wikipedia home page for the first time. The status bar at the bottom shows that 265KB of data was transferred to the browser.
Case 2: User has visited the site before
We can see the difference cache makes when we refresh the Wikipedia page:
The data transferred went down to 928 bytes - that's 0.3% the size of the initial page load. The Size column shows us that most of the content is pulled from cache.
Chrome will pull files from either memory cache or disk cache. Since we didn't close our browser between Cases 1 & 2, the data was still in memory cache.
Show the Browser Cache
chrome://cacheto view the contents of the cache. This will display a page of links to a detailed view for each cached file.
How Does the Browser Know What to Cache?
The browser inspects the headers of the HTTP response generated by the web server. There are four headers commonly used for caching:
The ETag (or Entity Tag) is a string that serves as a cache validation token. This is usually a hash of the file contents.
The server can include an ETag in its response, which the browser can then use this in a future request (after the file has expired) to determine if the cache contains a stale copy.
If the hash is the same, then the resource hasn't changed and the server responds with a 304 response code (Not Modified) with an empty body. This lets the browser know it's still safe to use the cached copy.
Note that ETag is only used in requests whenever the file has expired from cache.
The Cache-Control header has a number of directives we can set to control cache behavior, expiration, and validation.
public means that the resource can be cached by any cache (browser, CDN, etc)
private means that the resource can only be cached by the browser
This tells the browser to always request the resource from the server
This one is actually a bit misleading. It doesn't mean "do not cache".
This tells the browser to cache the file but not to use it until it checks with the server to validate we have the latest version. This validation is done with the ETag header.
This is commonly used with HTML files since it makes sense for the browser to always check for the latest markup.
This specifies the length of time in seconds the resource should be cached.So a max-age=60 means that it should be cached for 1 minute. RFC 2616 recommends that the maximum value for should no longer than 1 year (max-age=31536000).
This is only used by intermediate caches like a CDN.
This tells the cache it must verify the status of the stale resource before using it and expired ones should not be used.
The Expires header is from the older HTTP 1.0 days but is still used on many sites.
This header field provides an expiration date after which the asset is considered invalid.
Expires: Wed, 25 Jul 2018 21:00:00 GMT
The browser will ignore this field if there's a max-age directive in Cache-Control
The Last-Modified header is also from the HTTP 1.0 days.
Last-Modified: Mon, 12 Dec 2016 14:45:00 GMT
This field contains the date and time the resource was last modified.
HTML Meta Tags
Prior to HTML5, using meta tags inside HTML to specify cache-control was a valid approach:
<meta http-equiv="Cache-control" content="no-cache">
Using a meta tag like this is now discouraged and is not valid HTML5. Why? It's not a good idea because only browsers will be able to parse the meta tag and understand it. Intermediate caches won't.
So always send caching instructions via HTTP headers.
Let's take a look at an sample HTTP response:
Accept-Ranges: bytes Cache-Control: max-age=3600 Connection: Keep-Alive Content-Length: 4361 Content-Type: image/png Date: Tue, 25 Jul 2017 17:26:16 GMT ETag: "1109-554221c5c8540" Expires: Tue, 25 Jul 2017 18:26:16 GMT Keep-Alive: timeout=5, max=93 Last-Modified: Wed, 12 Jul 2017 17:26:05 GMT Server: Apache
- Line 2 tells us that the max-age is 1 hour
- Line 5 tells us that this is a PNG image
- Line 7 shows us the ETag value which will be used for validation after the 1 hour mark to verify that the resource hasn't changed
- Line 8 is the Expires header which will be ignored since max-age is set
- Line 10 is the Last-Modified header which shows when the image was last modified
So we've established that browser caching is awesome, and we should take advantage of it.
But we also want users see the latest version of our page when we make updates. We can't expect them to do a hard refresh (Ctrl-F5) every time they visit our site or clear their cache regularly.
Let's explore another situation where stale files could bite us.
This is what our HTML looks like:
Cache-Control: private, max-age=604800
After the update, some users report they are still having issues symptomatic of the bug.
What's going on here?
- Bob visited the site 2 weeks ago and has a cached copy of buggy app.min.js. Since his copy is older than max-age, the browser will retrieve the file from the server, and he gets the latest bug-free version.
- Mary visited the site 2 days ago and also has a cached copy of buggy app.min.js. Her copy is newer than max-age so her browser is still happily using the cached copy.
In the next section, we'll see how to prevent these issues with a technique called cache busting.
Cache busting is where we invalidate a cached file and force the browser to retrieve the file from the server.
We can instruct the browser to bypass the cache by simply changing the filename. To the browser, this is a completely new resource so it will fetch the resource from the server.
Cache busting also allows us to keep long max-age values for resources that may change frequently. Google recommends that max-age be set to 1 year (source).
We could add a version number to the filename:
We could add a fingerprint based on the file contents:
Append Query String
We could append a query string to the end of the filename:
The query string approach has known issues with proxy servers so this method is generally discouraged.
- Use the Cache-Control and ETag headers to control cache behavior for static assets
- Set long max-age values to reap the benefits of browser cache
- Use fingerprinting or versioning for cache busting
- Use HTML meta tags to specify cache behavior
- Use query strings for cache busting
How can I tell if a file was loaded from cache?
Check out the Developer Tools in your browser. In Chrome, this information is shown in the Network tab under the Size column.
How do I prevent caching for a file?
Use the following response header:
Cache-Control: no-cache, no-store, must-revalidate