Home arrow Site Administration arrow Page 3 - Professional Apache

Configuring Apache for Better Performance - Administration

This excerpt from Wrox Press Ltd.'s Professional Apache covers Chapter 8 - Improving Apache's Performance. It tells you how to configure Apache for peak performace using caching and clustering, plus much more. Buy it on Amazon.com now!

TABLE OF CONTENTS:
  1. Professional Apache
  2. Apache's Performance Directives
  3. Configuring Apache for Better Performance
  4. Proxying
  5. Caching
  6. Fault Tolerance and Clustering
By: Dev Shed
Rating: starstarstarstarstar / 12
May 18, 2000

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Many aspects of Apache's general configuration can have important performance implications if set without regard to their processing cost.

Directives That Effect Performance

There are a large number of directives that can have a beneficial or adverse effect on performance, depending on how they are used. Some of these are obvious; others rather less so:

DNS and Host Name Lookups

Any use of DNS significantly effects Apache's performance. In particular, use of the following two directives should be avoided if possible:

HostNameLookups on/off/double

This allows Apache to log information based on the host name rather than the IP address, but it is very time consuming, even though Apache caches DNS results for performance. Log analyzers like Analog, discussed in Chapter 9, do their own DNS resolution when it comes to generating statistics from the log at a later point, so there is little to be gained from forcing the running server to do it on the fly.

UseCanonicalName on/off/dns

This causes Apache to deduce the name of a server from its IP address, rather than generate it from the ServerName and Port directives (UseCanonicalName on) or just accept the client value (UseCanonicalName off). This can be occasionally useful for things like mass virtual hosting with mod_vhost_alias. Because it only caches the names of hosts being served by Apache rather than the whole Internet, it is less demanding than HostNameLookups, but even so, if it is avoidable, avoid it.

In addition, any use of a host name, whole or partial, may cause DNS lookups to take place, either from name to IP address or IP address to name. This affects the allow and deny directives in mod_access, ProxyBlock, NoProxy and NoCache in mod_proxy, and so on.

Following Symbolic Links and Permission Checking

Apache can be told to follow or refuse to follow symbolic links with the FollowSymLinks option. Unless enabled, each time Apache retrieves a file or runs a CGI script, it must spend extra time checking the entire path, from the root directory down, to see if any parent directories (or the file itself) are symbolic links.

Alternatively, if symbolic links are enabled with SymLinksIfOwnerMatch, Apache will follow links, but only if the ownership of the link is the same as that of the server (or virtual host, in the case of suEXEC). This also causes Apache to check the entire path for symbolic links and, in addition, check that the ownership of the link is valid.

For maximum performance, always specify FollowSymLinks and never SymLinksIfOwnerMatch:


Options FollowSymLinks

However, these options exist to improve security, and this strategy is the most permissive, which may be unpalatable to administrators more worried about security than performance.

Caching Dynamic Content

Normally, Apache will not send information to proxies telling them to cache documents if they have been generated dynamically. The burden on the server can therefore be considerably reduced by using mod_expires to force an expiration time onto documents, even if it is very short:


ExpiresByType text/html 600

This directive would be suitable for a server that updates an information page like a stock market price page every ten minutes - even if the page expires in a time as short as this, if the page is frequently accessed, we save ourselves a lot of hits if clients can get the page from a proxy instead.

Even so, some proxies will not accept documents they think are generated dynamically, requiring us to fool them by disguising CGI scripts as ordinary HTML:


RewriteEngine onRewriteRule ^(/pathtocgi/.*).html$ $1.cgi [T=application/x-httpd-cgi]
Caching Negotiated Content

HTTP/1.1 clients already have sufficient information to know how and when to cache documents delivered by content negotiation. HTTP/1.0 proxies however do not, so to make them cache negotiated documents we can use:


CacheNegotiatedDocs

This can have unexpected side effects if we are a multilingual site, however, since clients may get the wrong page. It should therefore be used with caution, if at all. The number of HTTP/1.0 clients affected is small and decreasing, so this can usually be ignored.

A different aspect of content negotiation is when the configure directory index is specified without a suffix (thereby causing content negotiation to be performed on it). Since index files are very common URLs for clients to retrieve, it is always better to specify a list, even if most of them don't exist, than have Apache generate an on-the-fly map with MultiViews. For example, don't put:


DirectoryIndex index

Instead put something like:


DirectoryIndex index.html index.htm index.shtml index.cgi
Logging

One of the biggest users of disk and CPU time is logging. It therefore pays not to log information that we don't care about or, if we really want to squeeze performance from the server, don't log at all. It is inadvisable not to have an error log, but we can disable the access log by simply not defining one. Otherwise, we can minimize the level of logging with the LogLevel directive:


LogLevel error

An alternative approach is to put the log on a different server, either by NFS mounting the logging directory onto the web server or, preferably, using the system log daemon to do it for us. NFS is not well known for its performance, and it introduces security risks by making other servers potentially visible to users on the web server.

Session Tracking

Any kind of session tracking is time consuming, first because Apache is responsible for checking for cookies and/or URL elements and setting them if missing, and second, because for tracking to be useful, it has to be logged somewhere, creating additional work for Apache. The bottom line is not to use modules like mod_usertrack or mod_session unless it is absolutely necessary, and even then use Directory, Location, or Files directives to limit its scope.

.htaccess Files

If AllowOverride is set to anything other than None, Apache will check for directives in .htaccess files for each directory from the root all the way down to the directory in which the requested resource resides, after aliasing has been taken into account. This can be extremely time consuming since Apache does this check every time a URL is requested, so unless absolutely needed, always put:


# AllowOverride is directory scope only, so we use the root directory
<Directory/>
AllowOverride None
</Directory>

This also has the side effect of making the server more secure. Even if we do wish to allow overrides in particular places, this is a good directive to have in the server-level configuration to prevent Apache searching all the directories from the root down. By enabling overrides only in the directories that are needed, Apache will only search a small part of the pathname, rather than the whole chain of directories.

Extended Status

mod_status allows an extended status page to be generated if the ExtendedStatus directive is set to on. However, this causes Apache to make two calls to the operating system for time information on each and every client request. Time calls are one of the most expensive system calls on any platform, so this can cause significant performance loss, especially as the directive is only allowed at the server level and not on a per-virtual hosts basis. The solution is to simply not enable ExtendedStatus.

Rewriting URLs

Any use of mod_rewrite's URL rewriting capabilities can cause significant performance loss, especially for complex rewriting strategies. The RewriteEngine directive can be specified on a per-directory or per-virtual host basis, so it is worth enabling and disabling mod_rewrite selectively if the rules are complex and needed only in some cases.

In addition, certain rules can cause additional performance problems by making internal HTTP requests to the server. Pay special attention to the NS flag, and be wary of using the -F and especially -U conditional tests.

Large Configuration Files

Lastly, the mere fact of a configuration file being large can cause Apache to respond more sluggishly. Modules like mod_rewrite can benefit performance by reducing the number of lines needed to achieve a desired effect. The mod_vhost_alias module is also particularly useful for servers that need to host large numbers of virtual hosts.

Performance Tuning CGI

Any script or application intended for use as a CGI script should already be written with performance in mind; this means not consuming excessive quantities of memory or CPU time, generating the output as rapidly as possible, and caching if at all possible the results, so they can be returned faster if the conditions allow it.

In addition, Apache defines three directives for controlling what CGI directives are allowed to get away with:

RLimitCPU

controls how much CPU time is allowed

RLimitMEM

controls how memory can be allocated

RLimitNPROC

controls how many CGI instances can run simultaneously

All these directives are described in more detail in Chapter 6. A better approach is to write dynamic content applications in a more efficient way to take better advantage of Apache. The most obvious option is FastCGI, also covered Chapter 6. Perl programmers will also want to check out mod_perl in Chapter 11.

Additional Directives for Tuning Performance

Although not part of the standard Apache executable, there are several modules, both included with Apache and third-party, designed to improve server performance in various ways:

MMapFile

MMapFile is supplied by mod_mmap_static, an experimental UNIX specific module supplied as standard with Apache but not compiled or enabled by default. When active, it allows nominated files to be memory mapped, if the UNIX platform supports it. Memory-mapped files are kept in memory permanently, allowing Apache to deliver them to clients rapidly, without retrieving them from the disk first. For example, to map the index page and a banner logo so they are stored in memory, we might put:


MMapFile /home/www/alpha-complex/index.html /home/www/alpha-complex/banner.gif

This will only work for files that are static and present on the filing system - dynamically generated content and CGI scripts will not work with MMapFile. To cache CGI scripts, use the FastCGI module or mod_perl and Apache::Registry (for Perl scripts).

The MMapFile is not flexible in its syntax and does not allow wildcards. There is also no MMapDirectory equivalent to map groups of files at once.

It is important to realize that once a file is mapped, it will never be retrieved from disc again, even if it changes. Apache must be restarted (preferably gracefully with apachectl graceful) for changed files to be remapped into memory.

mod_bandwidth

mod_bandwidth is available from the contributed modules archive on any Apache mirror, and in addition, a current version can be found at http://www.cohprog.com/. It provides Apache with the ability to limit the amount of data sent out per second based on the domain or IP address of the remote client or, alternatively, the size of the file requested.

Bandwidth limits may also be used to divide available bandwidth according to the number of clients connecting, allowing a service to be maintained to all clients even if there is theoretically insufficient bandwidth for them.

As it is a contributed module, mod_bandwidth is not enabled by default and needs to be added to Apache in the usual way - by either rebuilding the server or building and installing it as a dynamic module with the apxs utility. Once installed, bandwidth limiting can be enabled with:


BandWidthModule on

Bandwidth limits are configured to work on a per-directory basis, allowing a server to customize different parts of a web site with different bandwidth restrictions. For example, we can limit bandwidth usage on the non-secure part of a web site, ensuring that traffic to our secure online ordering system always has bandwidth available to it.

Limiting Bandwidth Based on the Client

Once enabled, bandwidth limitations may be set with:


<Directory/>

BandWidth localhost 0


BandWidth friendly.com 4096

BandWidth 192.168 512

BandWidth all 2048

</Directory>

This tells Apache not to limit local requests (potentially from CGI scripts) by setting a value of 0, to limit internal network clients to 512 bytes per second, to allow a favored domain 4k per second and to allow all other hosts 2k with the special all keyword. The order is important as the first matching directive will be used; if the friendly.com domain resolved to the network address 192.168.30.0 it would be overridden by the directive for 192.168 if it were placed after it. Similarly, if a client from 192.168.0.0 happened to be in the friendly.com domain, they'd get 4k access.

Limiting Bandwidth Based on File Size

Bandwidth limits can also be set on file size with the LargeFileLimit directive, allowing large files to be sent out more gradually than small ones. This can be invaluable when large file transfers are being carried out on the same server as ordinary static page requests. If a LargeFileLimit and BandWidth directive apply to the same URL then the smaller of the two is selected.

The LargeFileLimit takes two parameters, a file size in kilobytes and transfer rate. Several directives can be cascaded to produce a graded limit; for example:


<Directory /home/www/alpha-complex>

LargeFileLimit 50 8092

LargeFileLimit 1024 4096

LargeFileLimit 2048 2048

</Directory>

This tells Apache not to limit files smaller than 50kb , generally corresponding to HTML pages and small images, to limit files up to 1Mb to 8kb per second and files between 1Mb and 2Mb to 4k per second. Files larger than 2Mb are limited to 2k per second. As with the BandWidth directive, order is important - the first directive that has a file size greater than the file requested will be used, so directives must be given in smallest to largest order to work.

If more than one client is connected at the same time, mod_bandwidth also uses the bandwidth limits as proportional values and allocates the available bandwidth allowed, based on the limit values for each client; if ten clients all connect with a total bandwidth limit of 4096 bytes per second, each client gets 410 bytes per second allocated to it.

Minimum Bandwidth and Dividing Bandwidth Between Clients

Bandwidth is normally shared between clients by mod_bandwidth, based on their individual bandwidth settings. So, if two clients both have bandwidth limits of 4k per second, mod_bandwidth divides it between them, giving each client 2k per second. However, the allocated bandwidth will never drop below the minimum bandwidth set by MinBandWidth, which defaults to 256 bytes per second:


MinBandWidth all 256

MinBandWidth takes a domain name or IP address as a first parameter with the same meaning as BandWidth. Just as with BandWidth, it is also applied in order with the first matching directive being used:


<Directory/>BandWidth localhost 0BandWidth friendly.com 4096MinBandWidth friendly.com 2096BandWidth 192.168 512BandWidth all 2048MinBandWidth all 512</Directory>

Bandwidth allocation can also be disabled entirely, using a special rate of -1. This causes the limits defined by BandWidth and LargeFileLimit to be taken literally, rather than relative values to be applied in proportion when multiple clients connect. To disable all allocation specify:


MinBandWidth all -1

In this case, if ten clients all connect with a limit of 4096 bytes per second, mod_bandwidth will allow 4096 bytes per second for all clients, rather than dividing the bandwidth between them.

Transmission Algorithm

mod_bandwidth can transmit data to clients based on two different algorithms. Normally it parcels data into packets of 1kb and sends them as often as the bandwidth allowed: If the bandwidth available after allocation is only 512 bytes, a 1kb packet is sent out approximately every two seconds.

The alternative mode is set with the directive BandWidthPulse, which takes a value in microseconds as a parameter. When this is enabled, mod_bandwidth sends a packet after each interval, irrespective of the size. For example, to set a pulse rate of one second, we would put:


BandWidthPulse 1000000

This means that for a client whose allocated bandwidth is 512 bytes per second, a 512-byte packet is sent out once per second. The advantage of this is smoother communications, especially when the load becomes very high and the gap between packets gets large. The disadvantage is that the proportion of bandwidth dedicated to network communications, as opposed to actual data transmission, increases in proportion.

1999 Wrox Press Limited, US and UK.



 
 
>>> More Site Administration Articles          >>> More By Dev Shed
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

SITE ADMINISTRATION ARTICLES

- Coding: Not Just for Developers
- To Support or Not Support IE?
- Administration: Networking OSX and Win 7
- DotNetNuke Gets Social
- Integrating MailChimp with Joomla: Creating ...
- Integrating MailChimp with Joomla: List Mana...
- Integrating MailChimp with Joomla: Building ...
- Integrating MailChimp with Joomla
- More Top WordPress Plugins for Social Media
- Optimizing Security: SSH Public Key Authenti...
- Patches and Rejects in Software Configuratio...
- Configuring a CVS Server
- Managing Code and Teams for Cross-Platform S...
- Software Configuration Management
- Back Up a Joomla Site with Akeeba Backup

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: