Professional Apache - Fault Tolerance and Clustering (
Page 6 of 6 )
When web sites become large and busy, issues of reliability and performance
become more significant. It can be disastrous if the server of an important web
site like an online store front or a web-hosting ISP falls over, and visitors
are put off by sites that are sluggish and hard to use.
Both these problems can be solved to a greater or lesser extent in two basic
ways:
- We can make our servers more powerful, adding more memory and faster disks
or upgrading the processor to a faster speed or a multiprocessor system. This is
simple, but potentially expensive.
- We can install more servers and distribute the load of client requests
between them. Because they are sharing the load, the individual servers do not
have to be expensive power servers, just adequate to the job.
Multiple servers are an attractive proposition for several reasons: They can
be cheap and therefore easily replaceable, individual servers can fall over
without the web site becoming unavailable, and increasing capacity is just a
case of adding another server without needing to open up or reconfigure an
existing one.
However, we can't just dump a bunch of servers on a network and expect them
to work as one. We need to make them into a cluster, so that external clients do
not have to worry about, and preferably aren't aware of, the fact that they are
talking to a group of servers and not just one.
There are two basic approaches to clustering, DNS load sharing and Web server
clustering, and several solutions in each. Which we choose depends on exactly
what we want to achieve and how much money we are prepared to spend to achieve
it. We'll first look at DNS solutions before going on to look at true web
clusters and a home-grown clustering solution using Apache.
In its favor, however, is the fact that this works not just for web servers,
but ftp archive or any other kind of network server, since it is protocol
independent.
Backup Server Via Redirected Secondary DNS
The simplest of the DNS configuration options, this approach allows us to
create a backup server for the primary web server by taking advantage of the
fact that all domain names have at least two nominated name servers, a primary
and a secondary, from which their IP address can be determined.
Ordinarily, both name servers hold a record for the name of the web server
with the same IP address:
www.alpha-complex.com. IN A 204.148.170.3
However, there is no reason why the web server cannot be the primary name
server for itself. If we set up two identical servers, we can make the web
server its own primary name server and give the secondary server a different IP
address for the web server. For example:
www.alpha-complex.com. IN A 204.148.170.203
In normal operation, the IP address of the web server is requested by other
name servers directly from the web server's own DNS service. If for any reason
the web server falls over, however, the primary name server will no longer be
available and DNS requests will resort to the secondary name server. This
returns the IP address of the backup server rather than the primary so client
requests will succeed.
The Time To Live (TTL) setting of the data served by the primary DNS server
on the web server needs to be set to a low value like 30 minutes, or external
name servers will cache the primary web server's IP address and not request an
update from the secondary name server in a timely fashion, making the web server
apparently unavailable until the DNS information expires. We can give the A
record a time to live of 30 minutes by altering it to:
www.alpha-complex.com. 30 IN A 204.148.170.3
There are several caveats to this scheme: session tracking, user
authentication, and cookies are likely to get confused when the IP address
switches to the backup server and no provision is made for load sharing - the
backup server is never accessed until the primary server becomes unavailable, no
matter how busy it might be. Note that unavailable means totally unavailable. If
the httpd daemon crashes but the machine is still capable of DNS resolution, the
switch will not take place.
Load Sharing with Round-Robin DNS
Since version 4.9 BIND, the internet daemon that runs the bulk of the world's
DNS servers provides a configuration called round-robin DNS. This was an early
approach to load sharing between servers and still works today. It works by
specifying multiple IP addresses for the same host:
www.alpha-complex.com. 60 IN A 204.148.170.1
www.alpha-complex.com. 60 IN A 204.148.170.2
www.alpha-complex.com. 60 IN A 204.148.170.3
When a DNS request for the IP address for www.alpha-complex.com is received,
BIND returns one of these three addresses and makes a note of it. The next
request then gets the next IP address in the file and so on until the last one,
after which BIND returns to the first address again. Subsequent requests will
therefore get IP addresses in the order: 204.148.170.1, 204.148.170.2,
204.148.170.3, 204.148.170.1 ...
Just as with the backup server approach, we have to deal with the fact that
other name servers will cache the response they get from us, thwarting the
round-robin. To stop this, we set a short time-to-live value on the order of an
hour or so, which we do with the addition of the 60 values in the records given
above.
We can specify a lower value, but this causes more DNS traffic in updates,
which improves the load sharing on our web servers at the expense of increasing
the load on our name server.
The attraction of round-robin DNS is its simplicity - we only have to add a
few lines to one file to make it work (two files if you include the secondary
name server). It also works for any kind of server, not just web servers. The
drawback is that this is not true load balancing, only load sharing - the
round-robin takes no account of which servers are loaded and which are free or
even which are actually up and running.
Hardware Load Balancing
Various manufacturers such as Cisco have load balancing products for networks
that cluster servers at the TCP/IP level. These are highly effective but can
also be expensive.
Clustering with Apache
Apache provides a simple but clever way to cluster servers using features of
mod_rewrite and mod_proxy together. This gets around DNS caching
problems by hiding the cluster with a proxy server and because it uses Apache it
is totally free, of course.
To make this work, we have to nominate one machine to be a proxy server,
handling requests to several back-end servers on which the web site is actually
located. The proxy takes the name www.alpha-complex.com, and we call our
back-end servers www1 to www6.
The solution comprises of two parts:
- Using mod_rewrite to randomly select a back-end server to service the
client request.
- Using mod_proxy's ProxyPassReverse directive to disguise the URL of
the back-end server so clients are compelled to direct further requests through
the proxy.
Part one makes use of the random text map feature of mod_rewrite,
which was developed primarily to allow this solution to work. We create a map
file containing a single line:
# /usr/local/apache/rewritemaps/cluster.txt
#
# Random map of back-end web servers
www www1|www2|www3|www4|www5|www6
When used, this map will take the key www and randomly return one of the
values www1 to www6.
We now write some mod_rewrite directives into the proxy server's
configuration to make use of this map to redirect URLs to a random server:
# switch on URL rewriting
RewriteEngine on
# define the cluster servers map
RewriteMap cluster rnd:/usr/local/apache/rewritemaps/cluster.txt
# rewrite the URL if it matches the web server host
RewriteRule ^http://www.(.*)$ http://{cluster:www}.$2 [P,L]
# forbid any URL that doesn't match
RewriteRule .* - [F]
Depending on how sophisticated we want to be, we can make this rewrite rule a
bit more advanced and cope with more than one cluster at a time:
Map file:
www www1|www2|www3|www4|www5|www6
secure secure-a|secure-b
users admin.users|normal.users
Rewrite Rule:
# rewrite the URL based on the hostname asked for. If nothing matches,
# default to 'www1':
RewriteRule ^http://([^.]+).(.*)$ http://{cluster:$1|www1}.$2 [P,L]
We can even have the proxy cluster both HTTP and FTP servers, so long as it's
listening to port 20:
Map file:
www www1|www2|www3|www4|www5|www6
ftp ftp|archive|attic|basement
Rewrite Rule:
# rewrite the URL based on the protocol and hostname asked for:
RewriteRule ^(http|ftp)://[^.]+.(.*)$ $1://${cluster:$1}.$2 [P,L]
Part two makes use of mod_proxy to rewrite URLs generated by the
back-end servers due to a redirection. Without this, clients will receive
redirection responses with locations starting with www1 or www3 rather than www.
We can fix this with ProxyPassReverse:
ProxyPassReverse / http://www1.alpha-complex.com
ProxyPassReverse / http://www2.alpha-complex.com
...
ProxyPassReverse / http://www6.alpha-complex.com
A complete Apache configuration for creating a web cluster via proxy would
look something like this:
# Apache Server Configuration for Clustering Proxy
#
### Basic Server Setup
# The proxy takes the identity of the web site...
ServerName www.alpha-complex.com
ServerAdmin webmaster@alpha-complex.com
ServerRoot /usr/local/apache
DocumentRoot /usr/local/apache/proxysite
ErrorLog /usr/local/apache/proxy_error
TransferLog /usr/local/apache/proxy_log
User nobody
Group nobody
# dynamic servers load their modules here...
# don't waste time on things we don't need
HostnameLookups off
# this server is only for proxying so switch off everything else
<DIRECTORY />
Options None
AllowOverride None
</DIRECTORY>
# allow a local client to access the server status
<LOCATION />
order allow,deny
deny from all
allow from 127.0.0.1
SetHandler server-status
</LOCATION>
### Part 1 - Rewrite
# switch on URL rewriting
RewriteEngine on
# Define a log for debugging but set the log level to zero to disable it for
# performance
RewriteLog logs/proxy_rewrite
RewriteLogLevel 0
# define the cluster servers map
RewriteMap cluster rnd:/usr/local/apache/rewritemaps/cluster.txt
# rewrite the URL if it matches the web server host
RewriteRule ^http://www.(.*)$ http://{cluster:www}.$2 [P,L]
# forbid any URL that doesn't match
RewriteRule .* - [F]
### Part 2 - Proxy
ProxyRequests on
ProxyPassReverse / http://www1.alpha-complex.com/
ProxyPassReverse / http://www2.alpha-complex.com/
ProxyPassReverse / http://www3.alpha-complex.com/
ProxyPassReverse / http://www4.alpha-complex.com/
ProxyPassReverse / http://www5.alpha-complex.com/
ProxyPassReverse / http://www6.alpha-complex.com/
# We don't want caching, preferring to let the back end servers take the load,
# but if we did:
#
#CacheRoot /usr/local/apache/proxy
#CacheSize 102400
Because this works at the level of an HTTP/FTP proxy rather than lower level
protocols like DNS or TCP/IP, we can also have the proxy cache files and use it
to bridge a firewall, allowing the cluster to reside on an internal and
protected network.
The downside of this strategy is that it does not intelligently distribute
the load. We could fix this by replacing the random map file with an external
mapping program that attempted to make intelligent guesses about which servers
are most suitable, though the program should be very simple to not adversely
affect performance, since it will be called for every client request.
Other Clustering Solutions
There are several commercial and free clustering solutions available from the
Internet. Here are a few that might be of interest if none of the other
solutions here is sophisticated enough:
Eddie
The Eddie Project is an open-source initiative sponsored by Ericsson to
develop advanced clustering solutions for Linux, FreeBSD, and Solaris; Windows
NT is under development.
There are two packages available: an enhanced DNS server that takes the place
of the BIND daemon and performs true load balancing and an intelligent HTTP
gateway that allows web servers to be clustered across disparate networks. A
sample Apache configuration is included with the software, and binary RPM
packages are available for x86 Linux systems.
Eddie is available from http://www.eddieware.org/.
TurboCluster
TurboCluster is a freely available clustering solution developed for
TurboLinux: http://community.turbolinux.com/cluster/.
Sun Cluster
Solaris system will most probably be interested in Sun's own clustering
application, however, this is not a free or open product. See http://www.sun.com/clusters/.
Freequalizer
Freequalizer is a freely available version of Equalizer, produced by Coyote
Point Systems, designed to run on a FreeBSD server (Equalizer, the commercial
version, runs on its own dedicated hardware). GUI monitoring tools are available
as part of the package.
Freequalizer is available from http://www.coyotepoint.com.
©1999 Wrox Press Limited, US and UK.