Professional Apache - Configuring Apache for Better Performance (
Page 3 of 6 )
Many aspects of Apache's general configuration can have important performance
implications if set without regard to their processing cost.
Directives That Effect Performance
There are a large number of directives that can have a beneficial or adverse
effect on performance, depending on how they are used. Some of these are
obvious; others rather less so:
DNS and Host Name Lookups
Any use of DNS significantly effects Apache's performance. In particular, use
of the following two directives should be avoided if
possible:
HostNameLookups on/off/double
This allows Apache to log information based on the host name rather than the
IP address, but it is very time consuming, even though Apache caches DNS results
for performance. Log analyzers like Analog, discussed in Chapter 9, do their own
DNS resolution when it comes to generating statistics from the log at a later
point, so there is little to be gained from forcing the running server to do it
on the fly.
UseCanonicalName on/off/dns
This causes Apache to deduce the name of a server from its IP address, rather
than generate it from the ServerName and Port directives (UseCanonicalName on)
or just accept the client value (UseCanonicalName off). This can be occasionally
useful for things like mass virtual hosting with mod_vhost_alias. Because
it only caches the names of hosts being served by Apache rather than the whole
Internet, it is less demanding than HostNameLookups, but even so, if it is
avoidable, avoid it.
In addition, any use of a host name, whole or partial, may cause DNS lookups
to take place, either from name to IP address or IP address to name. This
affects the allow and deny directives in mod_access, ProxyBlock, NoProxy
and NoCache in mod_proxy, and so on.
Following Symbolic Links and
Permission Checking
Apache can be told to follow or refuse to follow symbolic links with the
FollowSymLinks option. Unless enabled, each time Apache retrieves a file or runs
a CGI script, it must spend extra time checking the entire path, from the root
directory down, to see if any parent directories (or the file itself) are
symbolic links.
Alternatively, if symbolic links are enabled with SymLinksIfOwnerMatch,
Apache will follow links, but only if the ownership of the link is the same as
that of the server (or virtual host, in the case of suEXEC). This also causes
Apache to check the entire path for symbolic links and, in addition, check that
the ownership of the link is valid.
For maximum performance, always specify FollowSymLinks and never
SymLinksIfOwnerMatch:
Options FollowSymLinks
However, these options exist to improve security, and this strategy is the
most permissive, which may be unpalatable to administrators more worried about
security than performance.
Caching Dynamic Content
Normally, Apache will not send information to proxies telling them to cache
documents if they have been generated dynamically. The burden on the server can
therefore be considerably reduced by using mod_expires to force an
expiration time onto documents, even if it is very short:
ExpiresByType text/html 600
This directive would be suitable for a server that updates an information
page like a stock market price page every ten minutes - even if the page expires
in a time as short as this, if the page is frequently accessed, we save
ourselves a lot of hits if clients can get the page from a proxy instead.
Even so, some proxies will not accept documents they think are generated
dynamically, requiring us to fool them by disguising CGI scripts as ordinary
HTML:
RewriteEngine on
RewriteRule ^(/pathtocgi/.*).html$ $1.cgi [T=application/x-httpd-cgi]
Caching Negotiated Content
HTTP/1.1 clients already have sufficient information to know how and when to
cache documents delivered by content negotiation. HTTP/1.0 proxies however do
not, so to make them cache negotiated documents we can use:
CacheNegotiatedDocs
This can have unexpected side effects if we are a multilingual site, however,
since clients may get the wrong page. It should therefore be used with caution,
if at all. The number of HTTP/1.0 clients affected is small and decreasing, so
this can usually be ignored.
A different aspect of content negotiation is when the configure directory
index is specified without a suffix (thereby causing content negotiation to be
performed on it). Since index files are very common URLs for clients to
retrieve, it is always better to specify a list, even if most of them don't
exist, than have Apache generate an on-the-fly map with MultiViews. For example,
don't put:
DirectoryIndex index
Instead put something like:
DirectoryIndex index.html index.htm index.shtml index.cgi
Logging
One of the biggest users of disk and CPU time is logging. It therefore pays
not to log information that we don't care about or, if we really want to squeeze
performance from the server, don't log at all. It is inadvisable not to have an
error log, but we can disable the access log by simply not defining one.
Otherwise, we can minimize the level of logging with the LogLevel directive:
LogLevel error
An alternative approach is to put the log on a different server, either by
NFS mounting the logging directory onto the web server or, preferably, using the
system log daemon to do it for us. NFS is not well known for its performance,
and it introduces security risks by making other servers potentially visible to
users on the web server.
Session Tracking
Any kind of session tracking is time consuming, first because Apache is
responsible for checking for cookies and/or URL elements and setting them if
missing, and second, because for tracking to be useful, it has to be logged
somewhere, creating additional work for Apache. The bottom line is not to use
modules like mod_usertrack or mod_session unless it is absolutely
necessary, and even then use Directory, Location, or Files directives to limit
its scope.
.htaccess Files
If AllowOverride is set to anything other than None, Apache will check for
directives in .htaccess files for each directory from the root all the way down
to the directory in which the requested resource resides, after aliasing has
been taken into account. This can be extremely time consuming since Apache does
this check every time a URL is requested, so unless absolutely needed, always
put:
# AllowOverride is directory scope only, so we use the root directory
<Directory/>
AllowOverride None
</Directory>
This also has the side effect of making the server more secure. Even if we do
wish to allow overrides in particular places, this is a good directive to have
in the server-level configuration to prevent Apache searching all the
directories from the root down. By enabling overrides only in the directories
that are needed, Apache will only search a small part of the pathname, rather
than the whole chain of directories.
Extended Status
mod_status allows an extended status page to be generated if the
ExtendedStatus directive is set to on. However, this causes Apache to make two
calls to the operating system for time information on each and every client
request. Time calls are one of the most expensive system calls on any platform,
so this can cause significant performance loss, especially as the directive is
only allowed at the server level and not on a per-virtual hosts basis. The
solution is to simply not enable ExtendedStatus.
Rewriting URLs
Any use of mod_rewrite's URL rewriting capabilities can cause
significant performance loss, especially for complex rewriting strategies. The
RewriteEngine directive can be specified on a per-directory or per-virtual host
basis, so it is worth enabling and disabling mod_rewrite selectively if
the rules are complex and needed only in some cases.
In addition, certain rules can cause additional performance problems by
making internal HTTP requests to the server. Pay special attention to the NS
flag, and be wary of using the -F and especially -U conditional
tests.
Large Configuration Files
Lastly, the mere fact of a configuration file being large can cause Apache to
respond more sluggishly. Modules like mod_rewrite can benefit performance
by reducing the number of lines needed to achieve a desired effect. The
mod_vhost_alias module is also particularly useful for servers that need
to host large numbers of virtual hosts.
Performance Tuning CGI
Any script or application intended for use as a CGI script should already be
written with performance in mind; this means not consuming excessive quantities
of memory or CPU time, generating the output as rapidly as possible, and caching
if at all possible the results, so they can be returned faster if the conditions
allow it.
In addition, Apache defines three directives for controlling what CGI
directives are allowed to get away with:
|
RLimitCPU |
controls how much CPU time is allowed |
|
RLimitMEM |
controls how memory can be allocated |
|
RLimitNPROC |
controls how many CGI instances can run
simultaneously |
All these directives are described in more detail in Chapter 6. A better
approach is to write dynamic content applications in a more efficient way to
take better advantage of Apache. The most obvious option is FastCGI, also
covered Chapter 6. Perl programmers will also want to check out mod_perl
in Chapter 11.
Additional Directives for Tuning Performance
Although not part of the standard Apache executable, there are several
modules, both included with Apache and third-party, designed to improve server
performance in various ways:
MMapFile
MMapFile is supplied by mod_mmap_static, an experimental UNIX specific
module supplied as standard with Apache but not compiled or enabled by default.
When active, it allows nominated files to be memory mapped, if the UNIX platform
supports it. Memory-mapped files are kept in memory permanently, allowing Apache
to deliver them to clients rapidly, without retrieving them from the disk first.
For example, to map the index page and a banner logo so they are stored in
memory, we might put:
MMapFile /home/www/alpha-complex/index.html /home/www/alpha-complex/banner.gif
This will only work for files that are static and present on the filing
system - dynamically generated content and CGI scripts will not work with
MMapFile. To cache CGI scripts, use the FastCGI module or mod_perl and
Apache::Registry (for Perl scripts).
The MMapFile is not flexible in its syntax and does not allow wildcards.
There is also no MMapDirectory equivalent to map groups of files at once.
It is important to realize that once a file is mapped, it will never be
retrieved from disc again, even if it changes. Apache must be restarted
(preferably gracefully with apachectl graceful) for changed files to be remapped
into memory.
mod_bandwidth
mod_bandwidth is available from the contributed modules archive on any
Apache mirror, and in addition, a current version can be found at
http://www.cohprog.com/. It provides Apache with the ability to limit the amount
of data sent out per second based on the domain or IP address of the remote
client or, alternatively, the size of the file requested.
Bandwidth limits may also be used to divide available bandwidth according to
the number of clients connecting, allowing a service to be maintained to all
clients even if there is theoretically insufficient bandwidth for them.
As it is a contributed module, mod_bandwidth is not enabled by default
and needs to be added to Apache in the usual way - by either rebuilding the
server or building and installing it as a dynamic module with the apxs utility.
Once installed, bandwidth limiting can be enabled with:
BandWidthModule on
Bandwidth limits are configured to work on a per-directory basis, allowing a
server to customize different parts of a web site with different bandwidth
restrictions. For example, we can limit bandwidth usage on the non-secure part
of a web site, ensuring that traffic to our secure online ordering system always
has bandwidth available to it.
Limiting Bandwidth Based on the Client
Once enabled, bandwidth limitations may be set with:
<Directory/>
BandWidth localhost 0
BandWidth friendly.com 4096
BandWidth 192.168 512
BandWidth all 2048
</Directory>
This tells Apache not to limit local requests (potentially from CGI scripts)
by setting a value of 0, to limit internal network clients to 512 bytes per
second, to allow a favored domain 4k per second and to allow all other hosts 2k
with the special all keyword. The order is important as the first matching
directive will be used; if the friendly.com domain resolved to the network
address 192.168.30.0 it would be overridden by the directive for 192.168 if it
were placed after it. Similarly, if a client from 192.168.0.0 happened to be in
the friendly.com domain, they'd get 4k access.
Limiting Bandwidth Based on
File Size
Bandwidth limits can also be set on file size with the LargeFileLimit
directive, allowing large files to be sent out more gradually than small ones.
This can be invaluable when large file transfers are being carried out on the
same server as ordinary static page requests. If a LargeFileLimit and BandWidth
directive apply to the same URL then the smaller of the two is selected.
The LargeFileLimit takes two parameters, a file size in kilobytes and
transfer rate. Several directives can be cascaded to produce a graded limit; for
example:
<Directory /home/www/alpha-complex>
LargeFileLimit 50 8092
LargeFileLimit 1024 4096
LargeFileLimit 2048 2048
</Directory>
This tells Apache not to limit files smaller than 50kb , generally
corresponding to HTML pages and small images, to limit files up to 1Mb to 8kb
per second and files between 1Mb and 2Mb to 4k per second. Files larger than 2Mb
are limited to 2k per second. As with the BandWidth directive, order is
important - the first directive that has a file size greater than the file
requested will be used, so directives must be given in smallest to largest order
to work.
If more than one client is connected at the same time, mod_bandwidth
also uses the bandwidth limits as proportional values and allocates the
available bandwidth allowed, based on the limit values for each client; if ten
clients all connect with a total bandwidth limit of 4096 bytes per second, each
client gets 410 bytes per second allocated to it.
Minimum Bandwidth and
Dividing Bandwidth Between Clients
Bandwidth is normally shared between clients by mod_bandwidth, based
on their individual bandwidth settings. So, if two clients both have bandwidth
limits of 4k per second, mod_bandwidth divides it between them, giving
each client 2k per second. However, the allocated bandwidth will never drop
below the minimum bandwidth set by MinBandWidth, which defaults to 256 bytes per
second:
MinBandWidth all 256
MinBandWidth takes a domain name or IP address as a first parameter with the
same meaning as BandWidth. Just as with BandWidth, it is also applied in order
with the first matching directive being used:
<Directory/>
BandWidth localhost 0
BandWidth friendly.com 4096
MinBandWidth friendly.com 2096
BandWidth 192.168 512
BandWidth all 2048
MinBandWidth all 512
</Directory>
Bandwidth allocation can also be disabled entirely, using a special rate of
-1. This causes the limits defined by BandWidth and LargeFileLimit to be taken
literally, rather than relative values to be applied in proportion when multiple
clients connect. To disable all allocation specify:
MinBandWidth all -1
In this case, if ten clients all connect with a limit of 4096 bytes per
second, mod_bandwidth will allow 4096 bytes per second for all clients,
rather than dividing the bandwidth between them.
Transmission
Algorithm
mod_bandwidth can transmit data to clients based on two different
algorithms. Normally it parcels data into packets of 1kb and sends them as often
as the bandwidth allowed: If the bandwidth available after allocation is only
512 bytes, a 1kb packet is sent out approximately every two seconds.
The alternative mode is set with the directive BandWidthPulse, which takes a
value in microseconds as a parameter. When this is enabled, mod_bandwidth
sends a packet after each interval, irrespective of the size. For example, to
set a pulse rate of one second, we would put:
BandWidthPulse 1000000
This means that for a client whose allocated bandwidth is 512 bytes per
second, a 512-byte packet is sent out once per second. The advantage of this is
smoother communications, especially when the load becomes very high and the gap
between packets gets large. The disadvantage is that the proportion of bandwidth
dedicated to network communications, as opposed to actual data transmission,
increases in proportion.
©1999 Wrox Press Limited, US and UK.