Home arrow Site Administration arrow Page 6 - Professional Apache

Fault Tolerance and Clustering - Administration

This excerpt from Wrox Press Ltd.'s Professional Apache covers Chapter 8 - Improving Apache's Performance. It tells you how to configure Apache for peak performace using caching and clustering, plus much more. Buy it on Amazon.com now!

TABLE OF CONTENTS:
  1. Professional Apache
  2. Apache's Performance Directives
  3. Configuring Apache for Better Performance
  4. Proxying
  5. Caching
  6. Fault Tolerance and Clustering
By: Dev Shed
Rating: starstarstarstarstar / 12
May 18, 2000

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

When web sites become large and busy, issues of reliability and performance become more significant. It can be disastrous if the server of an important web site like an online store front or a web-hosting ISP falls over, and visitors are put off by sites that are sluggish and hard to use.

Both these problems can be solved to a greater or lesser extent in two basic ways:

  • We can make our servers more powerful, adding more memory and faster disks or upgrading the processor to a faster speed or a multiprocessor system. This is simple, but potentially expensive.
  • We can install more servers and distribute the load of client requests between them. Because they are sharing the load, the individual servers do not have to be expensive power servers, just adequate to the job.

Multiple servers are an attractive proposition for several reasons: They can be cheap and therefore easily replaceable, individual servers can fall over without the web site becoming unavailable, and increasing capacity is just a case of adding another server without needing to open up or reconfigure an existing one.

However, we can't just dump a bunch of servers on a network and expect them to work as one. We need to make them into a cluster, so that external clients do not have to worry about, and preferably aren't aware of, the fact that they are talking to a group of servers and not just one.

There are two basic approaches to clustering, DNS load sharing and Web server clustering, and several solutions in each. Which we choose depends on exactly what we want to achieve and how much money we are prepared to spend to achieve it. We'll first look at DNS solutions before going on to look at true web clusters and a home-grown clustering solution using Apache.

In its favor, however, is the fact that this works not just for web servers, but ftp archive or any other kind of network server, since it is protocol independent.

Backup Server Via Redirected Secondary DNS

The simplest of the DNS configuration options, this approach allows us to create a backup server for the primary web server by taking advantage of the fact that all domain names have at least two nominated name servers, a primary and a secondary, from which their IP address can be determined.

Ordinarily, both name servers hold a record for the name of the web server with the same IP address:


www.alpha-complex.com. IN A 204.148.170.3

However, there is no reason why the web server cannot be the primary name server for itself. If we set up two identical servers, we can make the web server its own primary name server and give the secondary server a different IP address for the web server. For example:


www.alpha-complex.com. IN A 204.148.170.203

In normal operation, the IP address of the web server is requested by other name servers directly from the web server's own DNS service. If for any reason the web server falls over, however, the primary name server will no longer be available and DNS requests will resort to the secondary name server. This returns the IP address of the backup server rather than the primary so client requests will succeed.

The Time To Live (TTL) setting of the data served by the primary DNS server on the web server needs to be set to a low value like 30 minutes, or external name servers will cache the primary web server's IP address and not request an update from the secondary name server in a timely fashion, making the web server apparently unavailable until the DNS information expires. We can give the A record a time to live of 30 minutes by altering it to:


www.alpha-complex.com. 30 IN A 204.148.170.3

There are several caveats to this scheme: session tracking, user authentication, and cookies are likely to get confused when the IP address switches to the backup server and no provision is made for load sharing - the backup server is never accessed until the primary server becomes unavailable, no matter how busy it might be. Note that unavailable means totally unavailable. If the httpd daemon crashes but the machine is still capable of DNS resolution, the switch will not take place.

Load Sharing with Round-Robin DNS

Since version 4.9 BIND, the internet daemon that runs the bulk of the world's DNS servers provides a configuration called round-robin DNS. This was an early approach to load sharing between servers and still works today. It works by specifying multiple IP addresses for the same host:


www.alpha-complex.com. 60 IN A 204.148.170.1

www.alpha-complex.com. 60 IN A 204.148.170.2

www.alpha-complex.com. 60 IN A 204.148.170.3

When a DNS request for the IP address for www.alpha-complex.com is received, BIND returns one of these three addresses and makes a note of it. The next request then gets the next IP address in the file and so on until the last one, after which BIND returns to the first address again. Subsequent requests will therefore get IP addresses in the order: 204.148.170.1, 204.148.170.2, 204.148.170.3, 204.148.170.1 ...

Just as with the backup server approach, we have to deal with the fact that other name servers will cache the response they get from us, thwarting the round-robin. To stop this, we set a short time-to-live value on the order of an hour or so, which we do with the addition of the 60 values in the records given above.

We can specify a lower value, but this causes more DNS traffic in updates, which improves the load sharing on our web servers at the expense of increasing the load on our name server.

The attraction of round-robin DNS is its simplicity - we only have to add a few lines to one file to make it work (two files if you include the secondary name server). It also works for any kind of server, not just web servers. The drawback is that this is not true load balancing, only load sharing - the round-robin takes no account of which servers are loaded and which are free or even which are actually up and running.

Hardware Load Balancing

Various manufacturers such as Cisco have load balancing products for networks that cluster servers at the TCP/IP level. These are highly effective but can also be expensive.

Clustering with Apache

Apache provides a simple but clever way to cluster servers using features of mod_rewrite and mod_proxy together. This gets around DNS caching problems by hiding the cluster with a proxy server and because it uses Apache it is totally free, of course.

To make this work, we have to nominate one machine to be a proxy server, handling requests to several back-end servers on which the web site is actually located. The proxy takes the name www.alpha-complex.com, and we call our back-end servers www1 to www6.

The solution comprises of two parts:

  • Using mod_rewrite to randomly select a back-end server to service the client request.
  • Using mod_proxy's ProxyPassReverse directive to disguise the URL of the back-end server so clients are compelled to direct further requests through the proxy.

Part one makes use of the random text map feature of mod_rewrite, which was developed primarily to allow this solution to work. We create a map file containing a single line:


# /usr/local/apache/rewritemaps/cluster.txt

#

# Random map of back-end web servers

www www1|www2|www3|www4|www5|www6

When used, this map will take the key www and randomly return one of the values www1 to www6.

We now write some mod_rewrite directives into the proxy server's configuration to make use of this map to redirect URLs to a random server:


# switch on URL rewriting

RewriteEngine on

# define the cluster servers map

RewriteMap cluster rnd:/usr/local/apache/rewritemaps/cluster.txt

# rewrite the URL if it matches the web server host

RewriteRule ^http://www.(.*)$ http://{cluster:www}.$2 [P,L]

# forbid any URL that doesn't match

RewriteRule .* - [F]

Depending on how sophisticated we want to be, we can make this rewrite rule a bit more advanced and cope with more than one cluster at a time:

Map file:


www www1|www2|www3|www4|www5|www6

secure secure-a|secure-b

users admin.users|normal.users

Rewrite Rule:


# rewrite the URL based on the hostname asked for. If nothing matches,

# default to 'www1':

RewriteRule ^http://([^.]+).(.*)$ http://{cluster:$1|www1}.$2 [P,L]

We can even have the proxy cluster both HTTP and FTP servers, so long as it's listening to port 20:

Map file:


www www1|www2|www3|www4|www5|www6

ftp ftp|archive|attic|basement

Rewrite Rule:


# rewrite the URL based on the protocol and hostname asked for:

RewriteRule ^(http|ftp)://[^.]+.(.*)$ $1://${cluster:$1}.$2 [P,L]

Part two makes use of mod_proxy to rewrite URLs generated by the back-end servers due to a redirection. Without this, clients will receive redirection responses with locations starting with www1 or www3 rather than www. We can fix this with ProxyPassReverse:


ProxyPassReverse / http://www1.alpha-complex.com

ProxyPassReverse / http://www2.alpha-complex.com

...

ProxyPassReverse / http://www6.alpha-complex.com

A complete Apache configuration for creating a web cluster via proxy would look something like this:


# Apache Server Configuration for Clustering Proxy

#

### Basic Server Setup

# The proxy takes the identity of the web site...

ServerName www.alpha-complex.com

ServerAdmin webmaster@alpha-complex.com

ServerRoot /usr/local/apache

DocumentRoot /usr/local/apache/proxysite

ErrorLog /usr/local/apache/proxy_error

TransferLog /usr/local/apache/proxy_log

User nobody

Group nobody

# dynamic servers load their modules here...

# don't waste time on things we don't need

HostnameLookups off

# this server is only for proxying so switch off everything else

<DIRECTORY />

Options None

AllowOverride None

</DIRECTORY>

# allow a local client to access the server status

<LOCATION />

order allow,deny

deny from all

allow from 127.0.0.1

SetHandler server-status

</LOCATION>

### Part 1 - Rewrite

# switch on URL rewriting

RewriteEngine on

# Define a log for debugging but set the log level to zero to disable it for
# performance

RewriteLog logs/proxy_rewrite

RewriteLogLevel 0

# define the cluster servers map

RewriteMap cluster rnd:/usr/local/apache/rewritemaps/cluster.txt

# rewrite the URL if it matches the web server host

RewriteRule ^http://www.(.*)$ http://{cluster:www}.$2 [P,L]

# forbid any URL that doesn't match

RewriteRule .* - [F]

### Part 2 - Proxy

ProxyRequests on

ProxyPassReverse / http://www1.alpha-complex.com/

ProxyPassReverse / http://www2.alpha-complex.com/

ProxyPassReverse / http://www3.alpha-complex.com/

ProxyPassReverse / http://www4.alpha-complex.com/

ProxyPassReverse / http://www5.alpha-complex.com/

ProxyPassReverse / http://www6.alpha-complex.com/

# We don't want caching, preferring to let the back end servers take the load,
# but if we did:
#

#CacheRoot /usr/local/apache/proxy

#CacheSize 102400

Because this works at the level of an HTTP/FTP proxy rather than lower level protocols like DNS or TCP/IP, we can also have the proxy cache files and use it to bridge a firewall, allowing the cluster to reside on an internal and protected network.

The downside of this strategy is that it does not intelligently distribute the load. We could fix this by replacing the random map file with an external mapping program that attempted to make intelligent guesses about which servers are most suitable, though the program should be very simple to not adversely affect performance, since it will be called for every client request.

Other Clustering Solutions

There are several commercial and free clustering solutions available from the Internet. Here are a few that might be of interest if none of the other solutions here is sophisticated enough:

Eddie

The Eddie Project is an open-source initiative sponsored by Ericsson to develop advanced clustering solutions for Linux, FreeBSD, and Solaris; Windows NT is under development.

There are two packages available: an enhanced DNS server that takes the place of the BIND daemon and performs true load balancing and an intelligent HTTP gateway that allows web servers to be clustered across disparate networks. A sample Apache configuration is included with the software, and binary RPM packages are available for x86 Linux systems.

Eddie is available from http://www.eddieware.org/.

TurboCluster

TurboCluster is a freely available clustering solution developed for TurboLinux: http://community.turbolinux.com/cluster/.

Sun Cluster

Solaris system will most probably be interested in Sun's own clustering application, however, this is not a free or open product. See http://www.sun.com/clusters/.

Freequalizer

Freequalizer is a freely available version of Equalizer, produced by Coyote Point Systems, designed to run on a FreeBSD server (Equalizer, the commercial version, runs on its own dedicated hardware). GUI monitoring tools are available as part of the package.

Freequalizer is available from http://www.coyotepoint.com.

1999 Wrox Press Limited, US and UK.



 
 
>>> More Site Administration Articles          >>> More By Dev Shed
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

SITE ADMINISTRATION ARTICLES

- Coding: Not Just for Developers
- To Support or Not Support IE?
- Administration: Networking OSX and Win 7
- DotNetNuke Gets Social
- Integrating MailChimp with Joomla: Creating ...
- Integrating MailChimp with Joomla: List Mana...
- Integrating MailChimp with Joomla: Building ...
- Integrating MailChimp with Joomla
- More Top WordPress Plugins for Social Media
- Optimizing Security: SSH Public Key Authenti...
- Patches and Rejects in Software Configuratio...
- Configuring a CVS Server
- Managing Code and Teams for Cross-Platform S...
- Software Configuration Management
- Back Up a Joomla Site with Akeeba Backup

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: