One of the primary reasons for establishing a proxy server is to cache documents retrieved from remote hosts. Both forward and reverse proxies can benefit from caching. Forward proxies reduce the bandwidth demands of clients accessing servers elsewhere on the internet by caching frequently accessed pages, which is invaluable for networks with limited bandwidth to the outside world. Reverse proxies, conversely, cache frequently accessed pages on a local server so that it is not subjected to constant requests for static pages when it has more important dynamic queries to process.
Caching is not actually required by proxy servers and is not enabled by the use of the ProxyRequests directive. Rather, caching is implicitly enabled by defining the directory under which cached files are to be stored with CacheRoot:
Other than the root directory for caching mod_proxy provides two other directives for controlling the layout of the cache:
CacheDirLevels: defines the number of subdirectories that are created to store cached files. The default is three. To change it to six we can put:
CacheDirLength: defines the length of the directory names used in the cache. The default is 1. It is inadvisable to use names longer than 8 on Windows systems due to the problems of long file names on these platforms.
These two directives are reciprocal - a single letter directory name leaves relatively few permutations for Apache to run through, so a cache intended to store a lot of data will need an increased number of directory levels. Conversely, a longer directory name allows many more directories per level, which can be a performance issue if the number of directories becomes large, but allows a shallower directory tree.Setting the Cache Size
Probably the most important parameter to set for a proxy cache is its size. The default cache size is only 5 kilobytes, so we would usually increase it with the CacheSize directive which takes a number of kilobytes as a parameter. To set a 100mb cache, we would put:
However, this in itself means nothing unless Apache is also told to trim down the size of the cache when it exceeds this limit. This is called garbage collection and is governed by the CacheGcInterval directive, which schedules a time period in hours between scans of the cache. To scan and trim down the cache once a day, we would put:
The chosen value is a compromise between performance and disk space - if we have a quiet period once a day, it makes sense to trim the cache every 24 hours, but we also have to make sure that the cache can grow above its limit for a day without running into disk space limitations.
We can also schedule a very rapid cache time by using a decimal number:
# trim the cache every 75 minutes
# trim the cache every 12 minutes
Without a CacheGcInterval directive, the cache will never be trimmed and will continue to grow indefinitely. This is almost certainly a bad idea, so CacheGcInterval should always be set on caching proxies.Delivering Cached Documents and Expiring Documents from the Cache
Apache will only deliver documents from the cache to clients if they are still valid, otherwise it will fetch a new copy from the remote server and cache it in place of the expired version. Apache also trims the cache based on their validity. Each time the time period specified by CacheGcInterval lapses, Apache scans the cache looking for expired documents.
The expiry time of a document can be set in five ways:
The maximum time after which a document automatically expires is set by CacheMaxExpires, which takes a number of hours as an argument. The default period is one day, or 24 hours, which is equivalent to the directive:
To change this to a week we would put:
This time period defines the absolute maximum time a file is considered valid, starting from the time it was stored in the cache. Although other directives can specify shorter times, longer times will always be overridden by CacheMaxExpires.
HTTP documents that do not carry an expiry header can have an estimated expiry time set using the CacheLastModifiedFactor. This gives the document an expiry time equal to the time since the file was last modified, multiplied by the specified factor. The factor can be a decimal value, so to set an expiry time of half the age of the document, we would put:
If the calculated time exceeds the maximum expiration time set by CacheMaxExpire, the maximum expiration time takes precedence, so outlandish values that would result from very old documents are avoided. Likewise, if a factor is not set at all, the document expires when it exceeds the maximum expiry time.
The HTTP protocol supports expiry times directly, but other protocols do not. In these cases, a default expiry time can be specified with CacheDefaultExpire, which takes a number of hours as a parameter. For example, to ensure that cached files fetched with FTP expire in three days, we could put:
For this directive to be effective, it has to specify a time period shorter than CacheMaxExpire; if no default expiry time is set, files fetched with protocols other than HTTP automatically expire at the time limit set by CacheMaxExpire.
A special case arises when the proxy receives a content-negotiated document from an HTTP/1.0 source. HTTP/1.1 provides additional information to let a proxy know how valid a content-negotiated document is, but HTTP/1.0 does not. By default, Apache does not cache documents from HTTP/1.0 sources if they are content negotiated unless they come with a header telling Apache it is acceptable to do so. If the remote host is running Apache, it can add this header with the CacheNegotiatedDocs directive - see "Content Negotiation" in Chapter 4 for more details.Caching Incomplete Requests
Sometimes a client will disconnect from a proxy before it has finished transferring the requested document from the remote server. Ordinarily, Apache will discontinue transferring the document and discard what it has already transferred unless it has already transferred over 90 percent. This percentage can be changed with CacheForceCompletion, which takes a number between 0 and 100 as a percentage. For example, to force the proxy to continue loading a document and cache it if 75 percent or more of it has already been transferred we would put:
A setting of 0 is equivalent to the default, 90. A setting of 100 means Apache will not cache the document unless it completely transfers before the client disconnects.Disabling Caching for Selected Hosts, Domains, and Documents
Just as NoProxy defines hosts, domains, or words that cause matching URLs not to be passed to remote proxies, NoCache causes documents from hosts, domains, or words that match the URL to remain uncached. For example:
NoCache interactive.alpha-complex.com uncacheddomain.net badword
This will cause the proxy to avoid caching any document from interactive.alpha-complex.com, any host in the domain uncachedomain.net, and any domain name with the word badword anywhere in it. If any parameter to NoCache resolves to a unique IP address via DNS, Apache will make a note of it at startup and also avoid caching any URL that equates to the same IP address. Caching can also be disabled completely with a wildcard:
This is equivalent to commenting out the corresponding CacheRoot directive.
©1999 Wrox Press Limited, US and UK.
blog comments powered by Disqus