Caching on the Server
Although the previous section covered most of the caching mechanisms for both the browser and the web server, there are other ways by which we can cache data on the server. The server is often not the last link in the chain, and it often works with data that it must retrieve from files on the system, or data in a database. Therefore, if we can find efficient ways to cache this data, we can again avoid the expensive operations required to retrieve the data from its original source. With dynamically generated pages, we can often benefit from caching the entire rendered version of the page, instead of re-rendering it on each request. This is useful when we have pages that change only once per hour, or once per day, for example, because we can cache the output of the page once per hour or once per day and use that copy instead of re-rendering the entire page. We can often cache the output of rendering sections of a page as well, so that if certain dynamic sections change only once per hour or day, we can again just output the cached versions of these page fragments instead of having to re-render them on every request.
Different web servers and application frameworks provide different methods for caching content. Older frameworks, such as PHP, JSP, and classic ASP, have no inherent support for caching and require some additional work to effectively cache content. The general workflow is described in Figure 6.1.
act Caching
- Request Page
Figure 6.1 Activity Diagram of a Common Caching Strategy
When a request for a page hits the web server, it checks a file system cache to see if the page has previously been cached. If the cache exists, the timestamp of the cache is checked, and if the cache is out of date, a new page cache is created and served; otherwise, the cached version is served. The actual implementation is similar for ASP, PHP, or JSP; the PHP implementation might look something like this:
$page = $_SERVER[lHTTP_HOSTl] . $_SERVER['REQUEST_URI']; // Create a unique cache page identifier $cacheFile = 'cache/1 . md5($page) . '.cache'; $cacheFile_created = 0;
// Find out when the file was cached from the filesystem if (@file_exists($cacheFile)) {
$cacheFile_created = @filemtime($cacheFile);
// If page created < a minute ago then read cached file and serve it!
if (time() - 60 < $cacheFile_created) { @readfile($cacheFile);
// Output the newly created and cached content ob_end_flush();
In PHP, we take advantage of the ob_start() and ob_end_flush() functions that turn on and off output buffering, respectively. You can see that we first check if a valid cache exists, and if it does, we call exit() to prevent the entire page from being re-rendered and instead serve up the cached content. If the cache is not valid, we create the page and a new cached file with the contents of the output buffer.
In ASP.NET, we can do full-page, or output caching, fragment caching, and we also have the ability to cache arbitrary data using the powerful built-in Cache object. To enable output caching on an ASP.NET page, we add the @OuputCache page directive to the beginning of the ASPX page, as follows:
<%@OutputCache Duration="60" VaryByParam="none" %>
The Duration parameter specifies, in minutes, how long to cache the page, and the VaryByParam tells the page if there are any parameters (either GET or POST) that would change the resulting output of the page. For example, suppose we have a page that returns the details about a specific product based on the id parameter sent in the request querystring. If we leave VaryByParam as "none", if we first request the page for id=1, and then the page for id=2, the response for the first request will be cached and then sent back for the second request. This is not the desired behavior. If we change the VaryByParam variable to be "id", the first request would create the response for the product with id=1 and cache that. The second request, on the other hand, would be treated as a different page in the cache, and a new response would be generated and cached for the product with id=2. If any subsequent requests for either of those produces arrive in the next hour, the proper cached versions would be returned. We can also set VaryByParam to be "*", which would mean that all parameters should be treated as different requests.
In ASP.NET, we have control over the headers that are sent back to the client as well, and we can control how the browser caches the contents.
Response.Cache.SetExpires(DateTime.Now.AddMinutes(60)); Response.Cache.SetCacheability(HttpCacheability.Public);
This sets the expiry time for the page to be 1 hour from now, which causes all caches, the browser cache and any intermediate proxy or web caches, to reload the page after 1 hour. The SetCacheability setting tells intermediate proxy caches and the browser cache that the page is public, and the cached version of the page can be sent to anyone who requests the same page.
We have similar control over the headers that control the caching in PHP, and the following demonstrates how to set the Cache-Control and Expires headers in a PHP page:
<?php header("Cache-Control: no-cache, must-revalidate");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT"); ?>
In this example, we set the Cache-Control header to "no-cache", which should tell proxy caches and the browser not to cache the request, but we also set the Expires header to be a date in the past, and proxies that do not understand the Cache-Control header still reload the data on the next request because the cached data will have already expired. In versions of PHP since 4.2.0, there are also the session_cache_expire and session_cache_limiter methods to manage the HTTP headers for caching when using sessions. They are used as follows:
// set the cache limiter to 'private' session_cache_limiter('private'); $cache_limiter = session_cache_limiter();
// set the cache expire to 30 minutes session_cache_expire(30); $cache_expire = session_cache_expire();
If headers are manually set in the page, they override the ses-sion_cache settings. Because PHP pages are typically different every time, the ETag and Last-Modified headers are not sent with the responses. If your PHP pages do not change on every request, adding an ETag header can reduce the server load. The ETag header can be set using the header method, or there is a library called cgi_buffer1 that provides a number of performance-improving HTTP features, including ETag setting and validation through the If-None-Match request header
In ColdFusion, we can set the headers using the CFHEADER tag that sets the Content-Expires header to be one month from the current date, as follows:
<cfheader name="Expires"
value="#GetHttpTimeString(DateAdd('m', 1, Now()))#">
1http://www.mnot.net/cgi_buffer/
Caching in the database
A number of excellent resources on the web provide tools for checking the cacheability of a web site, and can be useful for understanding how the various parts of a web site can be cached.2
Post a comment