Professional Apache - Caching (
Page 5 of 6 )
One of the primary reasons for establishing a proxy server is to cache
documents retrieved from remote hosts. Both forward and reverse proxies can
benefit from caching. Forward proxies reduce the bandwidth demands of clients
accessing servers elsewhere on the internet by caching frequently accessed
pages, which is invaluable for networks with limited bandwidth to the outside
world. Reverse proxies, conversely, cache frequently accessed pages on a local
server so that it is not subjected to constant requests for static pages when it
has more important dynamic queries to process.
Enabling Caching
Caching is not actually required by proxy servers and is not enabled by the
use of the ProxyRequests directive. Rather, caching is implicitly enabled by
defining the directory under which cached files are to be stored with
CacheRoot:
CacheRoot /usr/local/apache/proxy/
Other than the root directory for caching mod_proxy provides two other
directives for controlling the layout of the cache:
CacheDirLevels: defines the number of subdirectories that are created to
store cached files. The default is three. To change it to six we can put:
CacheDirLevels 6
CacheDirLength: defines the length of the directory names used in the cache.
The default is 1. It is inadvisable to use names longer than 8 on Windows
systems due to the problems of long file names on these platforms.
These two directives are reciprocal - a single letter directory name leaves
relatively few permutations for Apache to run through, so a cache intended to
store a lot of data will need an increased number of directory levels.
Conversely, a longer directory name allows many more directories per level,
which can be a performance issue if the number of directories becomes large, but
allows a shallower directory tree.
Setting the Cache Size
Probably the most important parameter to set for a proxy cache is its size.
The default cache size is only 5 kilobytes, so we would usually increase it with
the CacheSize directive which takes a number of kilobytes as a parameter. To set
a 100mb cache, we would put:
CacheSize 102400
However, this in itself means nothing unless Apache is also told to trim down
the size of the cache when it exceeds this limit. This is called garbage
collection and is governed by the CacheGcInterval directive, which schedules a
time period in hours between scans of the cache. To scan and trim down the cache
once a day, we would put:
CacheGcInterval 24
The chosen value is a compromise between performance and disk space - if we
have a quiet period once a day, it makes sense to trim the cache every 24 hours,
but we also have to make sure that the cache can grow above its limit for a day
without running into disk space limitations.
We can also schedule a very rapid cache time by using a decimal number:
# trim the cache every 75 minutes
CacheGcInterval 1.25
# trim the cache every 12 minutes
CacheGcInterval 0.2
Without a CacheGcInterval directive, the cache will never be trimmed and will
continue to grow indefinitely. This is almost certainly a bad idea, so
CacheGcInterval should always be set on caching proxies.
Delivering
Cached Documents and Expiring Documents from the Cache
Apache will only deliver documents from the cache to clients if they are
still valid, otherwise it will fetch a new copy from the remote server and cache
it in place of the expired version. Apache also trims the cache based on their
validity. Each time the time period specified by CacheGcInterval lapses, Apache
scans the cache looking for expired documents.
The expiry time of a document can be set in five ways:
HTTP/1.1 defines the Expires: header that a server can use to tell a proxy
how long a document is considered valid.
We can set a maximum time after which all cached documents are considered
invalid irrespective of the expiry time set in the Expires: header.
HTTP documents that do not specify an expiry time can have one estimated
based on the time they were last modified.
Non-HTTP documents can have a default expiry time set for them.
Documents from both HTTP/1.0 and HTTP/1.1 hosts may send a header telling
the proxy whether or not the document can be cached, though the header differs
between the two.
The maximum time after which a document automatically expires is set by
CacheMaxExpires, which takes a number of hours as an argument. The default
period is one day, or 24 hours, which is equivalent to the directive:
CacheMaxExpires 24
To change this to a week we would put:
CacheMaxExpires 168
This time period defines the absolute maximum time a file is considered
valid, starting from the time it was stored in the cache. Although other
directives can specify shorter times, longer times will always be overridden by
CacheMaxExpires.
HTTP documents that do not carry an expiry header can have an estimated
expiry time set using the CacheLastModifiedFactor. This gives the document an
expiry time equal to the time since the file was last modified, multiplied by
the specified factor. The factor can be a decimal value, so to set an expiry
time of half the age of the document, we would put:
CacheLastModifiedFactor 0.5
If the calculated time exceeds the maximum expiration time set by
CacheMaxExpire, the maximum expiration time takes precedence, so outlandish
values that would result from very old documents are avoided. Likewise, if a
factor is not set at all, the document expires when it exceeds the maximum
expiry time.
The HTTP protocol supports expiry times directly, but other protocols do not.
In these cases, a default expiry time can be specified with CacheDefaultExpire,
which takes a number of hours as a parameter. For example, to ensure that cached
files fetched with FTP expire in three days, we could put:
CacheDefaultExpire 72
For this directive to be effective, it has to specify a time period shorter
than CacheMaxExpire; if no default expiry time is set, files fetched with
protocols other than HTTP automatically expire at the time limit set by
CacheMaxExpire.
A special case arises when the proxy receives a content-negotiated document
from an HTTP/1.0 source. HTTP/1.1 provides additional information to let a proxy
know how valid a content-negotiated document is, but HTTP/1.0 does not. By
default, Apache does not cache documents from HTTP/1.0 sources if they are
content negotiated unless they come with a header telling Apache it is
acceptable to do so. If the remote host is running Apache, it can add this
header with the CacheNegotiatedDocs directive - see "Content Negotiation" in
Chapter 4 for more details.
Caching Incomplete Requests
Sometimes a client will disconnect from a proxy before it has finished
transferring the requested document from the remote server. Ordinarily, Apache
will discontinue transferring the document and discard what it has already
transferred unless it has already transferred over 90 percent. This percentage
can be changed with CacheForceCompletion, which takes a number between 0 and 100
as a percentage. For example, to force the proxy to continue loading a document
and cache it if 75 percent or more of it has already been transferred we would
put:
CacheForceCompletion 75
A setting of 0 is equivalent to the default, 90. A setting of 100 means
Apache will not cache the document unless it completely transfers before the
client disconnects.
Disabling Caching for Selected Hosts, Domains, and
Documents
Just as NoProxy defines hosts, domains, or words that cause matching URLs not
to be passed to remote proxies, NoCache causes documents from hosts, domains, or
words that match the URL to remain uncached. For example:
NoCache interactive.alpha-complex.com uncacheddomain.net badword
This will cause the proxy to avoid caching any document from
interactive.alpha-complex.com, any host in the domain uncachedomain.net, and any
domain name with the word badword anywhere in it. If any parameter to NoCache
resolves to a unique IP address via DNS, Apache will make a note of it at
startup and also avoid caching any URL that equates to the same IP address.
Caching can also be disabled completely with a wildcard:
NoCache *
This is equivalent to commenting out the corresponding CacheRoot
directive.
©1999 Wrox Press Limited, US and UK.