Enough To Be Danger.us Blog See What We're Making →

Web Caching For Beginners

By Nathan on Dec 21, 2012

Caching is important. It’s one of the best tools we have to make websites faster, so any aspiring coder needs to understand it. But most of the articles you’ll find when you search for “caching” cover specific tools, or assume you’re already an experienced programmer. They don’t explain the basics of how a cachePronounced "cash" works or why it’s useful. My goal with this article is to fill that gap and provide a gentle introduction for smart beginners.

Caching, via http://www.flickr.com/photos/akash_k/125489887/in/photostream/

Caching 101

Imagine that you and I are sitting at a table with only a pen and paper (no calculators) and someone walks up and asks us “What is 930 ÷ 24?”. We’re helpful folks, so we work it out on paper and with a bit of effort figure out that the answer is 38.75.

A few minutes later, a second person walks up and asks the same question: “What is 930 ÷ 24?”. You tell him the answer, then notice that I’m slowly trying to work out the problem all over again on paper.

You didn’t know it yet, but you just beat me to the punch and saved effort by using a cache. Caching saves the result of a calculation so you can conserve energy and respond faster to requests for data by not performing the same calculation multiple times.

Keeping your cache fresh

The answer to 930 ÷ 24 never changes, so it’s possible to store the answer in a cache forever and never have to run the calculation again. But what if someone asked you for the average temperature of San Francisco over the past 7 days? Since the answer changes daily, your cached data would quickly become staleA caching jargon word that means "out of date".

The solution is to store some meta-data about whether or not the cache is fresh. Typically this is done either by setting an expiration date (“good until 1/30/13”) or by validation, which is a bit more advanced, but basically consists of asking the server whether the data in your cache is still fresh (more details on this below).

How is caching implimented in web apps?

In the example above, you stored your cached data on the paper where you wrote down the answer. In the context of a web application, there are two main places where you can store cached data:

  1. Your users’ web browsers
  2. A back-end service

1. Browser (HTTP) Caching

The easiest way to start caching data from your website is to tell your users’ browsers to store a local copy of files (HTML, CSS, Javascript, images, audio, etc) rather than downloading a new one from the server on every page load. Modern browsers cache some things automatically, but you can control exactly what is cached by including some special information in the response your server sends to the browser that is requesting data.

Every HTTP responseThe raw data your server sends to the user's browser over the internet. contains two main parts: the headers and the body. The headers contain metadata about the file your server is sending, and the body is the file itself. Here is an example of a typical HTTP response a server might send for a CSS file:

There are several ways we can use HTTP headers to control how browsers cache files locally. Here are the main 4:

A) The Expiration Method

Expires: Tue, 1 Jan 2013 04:00:25 GMT

Sometimes you know in advance how long the file will stay unchanged. In this case, you can put a line in the HTTP response header that will tell the browser when the local cached version should expire.

B) The Max-Age Method

Cache-Control: max-age=120

Instead of describing the expiration date as an absolute moment in time in the future, you could also use relative time (in seconds). For example, above is the HTTP header you would include if you wanted to cache a file for 2 minutes.

C) The Last-Modified Method

Last-modified: Tue, 1 Jan 2013 04:00:25 GMT

Another way to cache is to keep track of when files are modified. If the version of the file on the server hasn’t been modified since we last downloaded it, then the browser won’t re-download a new copy. All you have to do to enable this is to send along information about when your files are modified in the HTTP header.

D) The E-Tag Method

ETag: "xyzzy"

The e-tag method is very similar to last-modified, except it uses a unique id for each version of the file rather than storing it as a date. If the e-tag of the local file doesn’t match the version coming from the server, the browser will download a new copy.

Implimenting HTTP Caching

In order to gain full control over your implimentation of HTTP caching, you’ll need to configure your server to send the proper headers. There are many different ways to do this depending on the type of web server or framework you’re using, so I won’t go into that in this blog post. That being said, most web servers come with some HTTP caching out of the box, so you are probably already using it to some extent.

You can look at a site’s HTTP headers using a tool called curl"Client for URLs". On most Mac or Linux machines, it comes pre-installed. Windows folk need to download it.

Once you’ve got curl, you can go to your terminal and run curl -I url, replacing “url” with the appropriate link to the resource who’s HTTP headers you want to inspect. It will return just the HTTP headers, so you can look for all the caching methods I mentioned above.

2. Back-end Caching

Browser caching is great for static content like CSS files and images, but sometimes you need a more sophisticated caching mechanism. For example, most big sites like Facebook and Twitter use a system called Memcached that keeps a copy of information from their database (slower) in memory (faster). That way they can spread out some of the load from their database to the faster alternative. A diagram to explain this type of caching looks like thisNot all back-end caches sit between the database and the web server, but many do. Consider this a useful simplification.:

Back-end Cache Diagram

Implimenting a back-end cache is only nececarry for dynamic web applications with substantial traffic. There are many different caching tools like Memcached, and an even greater number of ways to impliment them into your application. For the purposes of this blog post it would be overkill to dive into the details about how they all work. If you want to learn more, check out the “further reading” section below.

Conclusion

I hope you enjoyed this brief tour of caching! Here is a summary of some the main points:

Further Reading

If you’re interested in learning more about caching, here are a few of the best resources I found while researching this blog post:


Thanks to Kyle Mulka, Nathan Cahill, and James Kruth for reviewing drafts of this.

PS — If you enjoyed this post and are learning to code (or know someone who is), you might also enjoy a book I’m writing called Enough To Be Dangerous. It’s a step-by-step guide to coding your first web application. Check it out!

By Nathan on Dec 21, 2012

Nathan Bashaw

Nathan is the founder of Sandbox Labs — the San Francisco based startup behind Enough To Be Danger.us. Previously, he was employee #2 at Olark.

comments powered by Disqus