Home arrow Site Administration arrow Page 5 - Professional Apache

Caching - Administration

This excerpt from Wrox Press Ltd.'s Professional Apache covers Chapter 8 - Improving Apache's Performance. It tells you how to configure Apache for peak performace using caching and clustering, plus much more. Buy it on Amazon.com now!

TABLE OF CONTENTS:
  1. Professional Apache
  2. Apache's Performance Directives
  3. Configuring Apache for Better Performance
  4. Proxying
  5. Caching
  6. Fault Tolerance and Clustering
By: Dev Shed
Rating: starstarstarstarstar / 12
May 18, 2000

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

One of the primary reasons for establishing a proxy server is to cache documents retrieved from remote hosts. Both forward and reverse proxies can benefit from caching. Forward proxies reduce the bandwidth demands of clients accessing servers elsewhere on the internet by caching frequently accessed pages, which is invaluable for networks with limited bandwidth to the outside world. Reverse proxies, conversely, cache frequently accessed pages on a local server so that it is not subjected to constant requests for static pages when it has more important dynamic queries to process.

Enabling Caching

Caching is not actually required by proxy servers and is not enabled by the use of the ProxyRequests directive. Rather, caching is implicitly enabled by defining the directory under which cached files are to be stored with CacheRoot:


CacheRoot /usr/local/apache/proxy/

Other than the root directory for caching mod_proxy provides two other directives for controlling the layout of the cache:

CacheDirLevels: defines the number of subdirectories that are created to store cached files. The default is three. To change it to six we can put:


CacheDirLevels 6

CacheDirLength: defines the length of the directory names used in the cache. The default is 1. It is inadvisable to use names longer than 8 on Windows systems due to the problems of long file names on these platforms.

These two directives are reciprocal - a single letter directory name leaves relatively few permutations for Apache to run through, so a cache intended to store a lot of data will need an increased number of directory levels. Conversely, a longer directory name allows many more directories per level, which can be a performance issue if the number of directories becomes large, but allows a shallower directory tree.

Setting the Cache Size

Probably the most important parameter to set for a proxy cache is its size. The default cache size is only 5 kilobytes, so we would usually increase it with the CacheSize directive which takes a number of kilobytes as a parameter. To set a 100mb cache, we would put:


CacheSize 102400

However, this in itself means nothing unless Apache is also told to trim down the size of the cache when it exceeds this limit. This is called garbage collection and is governed by the CacheGcInterval directive, which schedules a time period in hours between scans of the cache. To scan and trim down the cache once a day, we would put:


CacheGcInterval 24

The chosen value is a compromise between performance and disk space - if we have a quiet period once a day, it makes sense to trim the cache every 24 hours, but we also have to make sure that the cache can grow above its limit for a day without running into disk space limitations.

We can also schedule a very rapid cache time by using a decimal number:


# trim the cache every 75 minutes

CacheGcInterval 1.25

# trim the cache every 12 minutes

CacheGcInterval 0.2

Without a CacheGcInterval directive, the cache will never be trimmed and will continue to grow indefinitely. This is almost certainly a bad idea, so CacheGcInterval should always be set on caching proxies.

Delivering Cached Documents and Expiring Documents from the Cache

Apache will only deliver documents from the cache to clients if they are still valid, otherwise it will fetch a new copy from the remote server and cache it in place of the expired version. Apache also trims the cache based on their validity. Each time the time period specified by CacheGcInterval lapses, Apache scans the cache looking for expired documents.

The expiry time of a document can be set in five ways:

  • HTTP/1.1 defines the Expires: header that a server can use to tell a proxy how long a document is considered valid.

  • We can set a maximum time after which all cached documents are considered invalid irrespective of the expiry time set in the Expires: header.

  • HTTP documents that do not specify an expiry time can have one estimated based on the time they were last modified.

  • Non-HTTP documents can have a default expiry time set for them.

  • Documents from both HTTP/1.0 and HTTP/1.1 hosts may send a header telling the proxy whether or not the document can be cached, though the header differs between the two.

    The maximum time after which a document automatically expires is set by CacheMaxExpires, which takes a number of hours as an argument. The default period is one day, or 24 hours, which is equivalent to the directive:


    CacheMaxExpires 24

    To change this to a week we would put:


    CacheMaxExpires 168

    This time period defines the absolute maximum time a file is considered valid, starting from the time it was stored in the cache. Although other directives can specify shorter times, longer times will always be overridden by CacheMaxExpires.

    HTTP documents that do not carry an expiry header can have an estimated expiry time set using the CacheLastModifiedFactor. This gives the document an expiry time equal to the time since the file was last modified, multiplied by the specified factor. The factor can be a decimal value, so to set an expiry time of half the age of the document, we would put:


    CacheLastModifiedFactor 0.5

    If the calculated time exceeds the maximum expiration time set by CacheMaxExpire, the maximum expiration time takes precedence, so outlandish values that would result from very old documents are avoided. Likewise, if a factor is not set at all, the document expires when it exceeds the maximum expiry time.

    The HTTP protocol supports expiry times directly, but other protocols do not. In these cases, a default expiry time can be specified with CacheDefaultExpire, which takes a number of hours as a parameter. For example, to ensure that cached files fetched with FTP expire in three days, we could put:


    CacheDefaultExpire 72

    For this directive to be effective, it has to specify a time period shorter than CacheMaxExpire; if no default expiry time is set, files fetched with protocols other than HTTP automatically expire at the time limit set by CacheMaxExpire.

    A special case arises when the proxy receives a content-negotiated document from an HTTP/1.0 source. HTTP/1.1 provides additional information to let a proxy know how valid a content-negotiated document is, but HTTP/1.0 does not. By default, Apache does not cache documents from HTTP/1.0 sources if they are content negotiated unless they come with a header telling Apache it is acceptable to do so. If the remote host is running Apache, it can add this header with the CacheNegotiatedDocs directive - see "Content Negotiation" in Chapter 4 for more details.

    Caching Incomplete Requests

    Sometimes a client will disconnect from a proxy before it has finished transferring the requested document from the remote server. Ordinarily, Apache will discontinue transferring the document and discard what it has already transferred unless it has already transferred over 90 percent. This percentage can be changed with CacheForceCompletion, which takes a number between 0 and 100 as a percentage. For example, to force the proxy to continue loading a document and cache it if 75 percent or more of it has already been transferred we would put:


    CacheForceCompletion 75

    A setting of 0 is equivalent to the default, 90. A setting of 100 means Apache will not cache the document unless it completely transfers before the client disconnects.

    Disabling Caching for Selected Hosts, Domains, and Documents

    Just as NoProxy defines hosts, domains, or words that cause matching URLs not to be passed to remote proxies, NoCache causes documents from hosts, domains, or words that match the URL to remain uncached. For example:


    NoCache interactive.alpha-complex.com uncacheddomain.net badword

    This will cause the proxy to avoid caching any document from interactive.alpha-complex.com, any host in the domain uncachedomain.net, and any domain name with the word badword anywhere in it. If any parameter to NoCache resolves to a unique IP address via DNS, Apache will make a note of it at startup and also avoid caching any URL that equates to the same IP address. Caching can also be disabled completely with a wildcard:


    NoCache *

    This is equivalent to commenting out the corresponding CacheRoot directive.

    1999 Wrox Press Limited, US and UK.



     
     
    >>> More Site Administration Articles          >>> More By Dev Shed
     

    blog comments powered by Disqus
  • escort Bursa Bursa escort Antalya eskort
       

    SITE ADMINISTRATION ARTICLES

    - Coding: Not Just for Developers
    - To Support or Not Support IE?
    - Administration: Networking OSX and Win 7
    - DotNetNuke Gets Social
    - Integrating MailChimp with Joomla: Creating ...
    - Integrating MailChimp with Joomla: List Mana...
    - Integrating MailChimp with Joomla: Building ...
    - Integrating MailChimp with Joomla
    - More Top WordPress Plugins for Social Media
    - Optimizing Security: SSH Public Key Authenti...
    - Patches and Rejects in Software Configuratio...
    - Configuring a CVS Server
    - Managing Code and Teams for Cross-Platform S...
    - Software Configuration Management
    - Back Up a Joomla Site with Akeeba Backup

    Developer Shed Affiliates

     


    Dev Shed Tutorial Topics: