Administration
  Home arrow Administration arrow Page 6 - Professional Apache
Dev Shed Forums 
Administration  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Forums Sitemap 
IBM® developerWorks 
Dedicated Servers 
E-Commerce Hosting 
Linux Web Hosting 
Managed Hosting 
Small Business Hosting 
Download TestComplete 
VPS Hosting 
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
ADMINISTRATION

Professional Apache
By: Dev Shed
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 5 stars5 stars5 stars5 stars5 stars / 11
    2000-05-18

    Table of Contents:
  • Professional Apache
  • Apache's Performance Directives
  • Configuring Apache for Better Performance
  • Proxying
  • Caching
  • Fault Tolerance and Clustering

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
     
    ADVERTISEMENT

    Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here

    Professional Apache - Fault Tolerance and Clustering
    (Page 6 of 6 )

    When web sites become large and busy, issues of reliability and performance become more significant. It can be disastrous if the server of an important web site like an online store front or a web-hosting ISP falls over, and visitors are put off by sites that are sluggish and hard to use.

    Both these problems can be solved to a greater or lesser extent in two basic ways:

    • We can make our servers more powerful, adding more memory and faster disks or upgrading the processor to a faster speed or a multiprocessor system. This is simple, but potentially expensive.
    • We can install more servers and distribute the load of client requests between them. Because they are sharing the load, the individual servers do not have to be expensive power servers, just adequate to the job.

    Multiple servers are an attractive proposition for several reasons: They can be cheap and therefore easily replaceable, individual servers can fall over without the web site becoming unavailable, and increasing capacity is just a case of adding another server without needing to open up or reconfigure an existing one.

    However, we can't just dump a bunch of servers on a network and expect them to work as one. We need to make them into a cluster, so that external clients do not have to worry about, and preferably aren't aware of, the fact that they are talking to a group of servers and not just one.

    There are two basic approaches to clustering, DNS load sharing and Web server clustering, and several solutions in each. Which we choose depends on exactly what we want to achieve and how much money we are prepared to spend to achieve it. We'll first look at DNS solutions before going on to look at true web clusters and a home-grown clustering solution using Apache.

    In its favor, however, is the fact that this works not just for web servers, but ftp archive or any other kind of network server, since it is protocol independent.

    Backup Server Via Redirected Secondary DNS

    The simplest of the DNS configuration options, this approach allows us to create a backup server for the primary web server by taking advantage of the fact that all domain names have at least two nominated name servers, a primary and a secondary, from which their IP address can be determined.

    Ordinarily, both name servers hold a record for the name of the web server with the same IP address:


    www.alpha-complex.com. IN A 204.148.170.3

    However, there is no reason why the web server cannot be the primary name server for itself. If we set up two identical servers, we can make the web server its own primary name server and give the secondary server a different IP address for the web server. For example:


    www.alpha-complex.com. IN A 204.148.170.203

    In normal operation, the IP address of the web server is requested by other name servers directly from the web server's own DNS service. If for any reason the web server falls over, however, the primary name server will no longer be available and DNS requests will resort to the secondary name server. This returns the IP address of the backup server rather than the primary so client requests will succeed.

    The Time To Live (TTL) setting of the data served by the primary DNS server on the web server needs to be set to a low value like 30 minutes, or external name servers will cache the primary web server's IP address and not request an update from the secondary name server in a timely fashion, making the web server apparently unavailable until the DNS information expires. We can give the A record a time to live of 30 minutes by altering it to:


    www.alpha-complex.com. 30 IN A 204.148.170.3

    There are several caveats to this scheme: session tracking, user authentication, and cookies are likely to get confused when the IP address switches to the backup server and no provision is made for load sharing - the backup server is never accessed until the primary server becomes unavailable, no matter how busy it might be. Note that unavailable means totally unavailable. If the httpd daemon crashes but the machine is still capable of DNS resolution, the switch will not take place.

    Load Sharing with Round-Robin DNS

    Since version 4.9 BIND, the internet daemon that runs the bulk of the world's DNS servers provides a configuration called round-robin DNS. This was an early approach to load sharing between servers and still works today. It works by specifying multiple IP addresses for the same host:


    www.alpha-complex.com. 60 IN A 204.148.170.1

    www.alpha-complex.com. 60 IN A 204.148.170.2

    www.alpha-complex.com. 60 IN A 204.148.170.3

    When a DNS request for the IP address for www.alpha-complex.com is received, BIND returns one of these three addresses and makes a note of it. The next request then gets the next IP address in the file and so on until the last one, after which BIND returns to the first address again. Subsequent requests will therefore get IP addresses in the order: 204.148.170.1, 204.148.170.2, 204.148.170.3, 204.148.170.1 ...

    Just as with the backup server approach, we have to deal with the fact that other name servers will cache the response they get from us, thwarting the round-robin. To stop this, we set a short time-to-live value on the order of an hour or so, which we do with the addition of the 60 values in the records given above.

    We can specify a lower value, but this causes more DNS traffic in updates, which improves the load sharing on our web servers at the expense of increasing the load on our name server.

    The attraction of round-robin DNS is its simplicity - we only have to add a few lines to one file to make it work (two files if you include the secondary name server). It also works for any kind of server, not just web servers. The drawback is that this is not true load balancing, only load sharing - the round-robin takes no account of which servers are loaded and which are free or even which are actually up and running.

    Hardware Load Balancing

    Various manufacturers such as Cisco have load balancing products for networks that cluster servers at the TCP/IP level. These are highly effective but can also be expensive.

    Clustering with Apache

    Apache provides a simple but clever way to cluster servers using features of mod_rewrite and mod_proxy together. This gets around DNS caching problems by hiding the cluster with a proxy server and because it uses Apache it is totally free, of course.

    To make this work, we have to nominate one machine to be a proxy server, handling requests to several back-end servers on which the web site is actually located. The proxy takes the name www.alpha-complex.com, and we call our back-end servers www1 to www6.

    The solution comprises of two parts:

    • Using mod_rewrite to randomly select a back-end server to service the client request.
    • Using mod_proxy's ProxyPassReverse directive to disguise the URL of the back-end server so clients are compelled to direct further requests through the proxy.

    Part one makes use of the random text map feature of mod_rewrite, which was developed primarily to allow this solution to work. We create a map file containing a single line:


    # /usr/local/apache/rewritemaps/cluster.txt

    #

    # Random map of back-end web servers

    www www1|www2|www3|www4|www5|www6

    When used, this map will take the key www and randomly return one of the values www1 to www6.

    We now write some mod_rewrite directives into the proxy server's configuration to make use of this map to redirect URLs to a random server:


    # switch on URL rewriting

    RewriteEngine on

    # define the cluster servers map

    RewriteMap cluster rnd:/usr/local/apache/rewritemaps/cluster.txt

    # rewrite the URL if it matches the web server host

    RewriteRule ^http://www.(.*)$ http://{cluster:www}.$2 [P,L]

    # forbid any URL that doesn't match

    RewriteRule .* - [F]

    Depending on how sophisticated we want to be, we can make this rewrite rule a bit more advanced and cope with more than one cluster at a time:

    Map file:


    www www1|www2|www3|www4|www5|www6

    secure secure-a|secure-b

    users admin.users|normal.users

    Rewrite Rule:


    # rewrite the URL based on the hostname asked for. If nothing matches,

    # default to 'www1':

    RewriteRule ^http://([^.]+).(.*)$ http://{cluster:$1|www1}.$2 [P,L]

    We can even have the proxy cluster both HTTP and FTP servers, so long as it's listening to port 20:

    Map file:


    www www1|www2|www3|www4|www5|www6

    ftp ftp|archive|attic|basement

    Rewrite Rule:


    # rewrite the URL based on the protocol and hostname asked for:

    RewriteRule ^(http|ftp)://[^.]+.(.*)$ $1://${cluster:$1}.$2 [P,L]

    Part two makes use of mod_proxy to rewrite URLs generated by the back-end servers due to a redirection. Without this, clients will receive redirection responses with locations starting with www1 or www3 rather than www. We can fix this with ProxyPassReverse:


    ProxyPassReverse / http://www1.alpha-complex.com

    ProxyPassReverse / http://www2.alpha-complex.com

    ...

    ProxyPassReverse / http://www6.alpha-complex.com

    A complete Apache configuration for creating a web cluster via proxy would look something like this:


    # Apache Server Configuration for Clustering Proxy

    #

    ### Basic Server Setup

    # The proxy takes the identity of the web site...

    ServerName www.alpha-complex.com

    ServerAdmin webmaster@alpha-complex.com

    ServerRoot /usr/local/apache

    DocumentRoot /usr/local/apache/proxysite

    ErrorLog /usr/local/apache/proxy_error

    TransferLog /usr/local/apache/proxy_log

    User nobody

    Group nobody

    # dynamic servers load their modules here...

    # don't waste time on things we don't need

    HostnameLookups off

    # this server is only for proxying so switch off everything else

    <DIRECTORY />

    Options None

    AllowOverride None

    </DIRECTORY>

    # allow a local client to access the server status

    <LOCATION />

    order allow,deny

    deny from all

    allow from 127.0.0.1

    SetHandler server-status

    </LOCATION>

    ### Part 1 - Rewrite

    # switch on URL rewriting

    RewriteEngine on

    # Define a log for debugging but set the log level to zero to disable it for
    # performance

    RewriteLog logs/proxy_rewrite

    RewriteLogLevel 0

    # define the cluster servers map

    RewriteMap cluster rnd:/usr/local/apache/rewritemaps/cluster.txt

    # rewrite the URL if it matches the web server host

    RewriteRule ^http://www.(.*)$ http://{cluster:www}.$2 [P,L]

    # forbid any URL that doesn't match

    RewriteRule .* - [F]

    ### Part 2 - Proxy

    ProxyRequests on

    ProxyPassReverse / http://www1.alpha-complex.com/

    ProxyPassReverse / http://www2.alpha-complex.com/

    ProxyPassReverse / http://www3.alpha-complex.com/

    ProxyPassReverse / http://www4.alpha-complex.com/

    ProxyPassReverse / http://www5.alpha-complex.com/

    ProxyPassReverse / http://www6.alpha-complex.com/

    # We don't want caching, preferring to let the back end servers take the load,
    # but if we did:
    #

    #CacheRoot /usr/local/apache/proxy

    #CacheSize 102400

    Because this works at the level of an HTTP/FTP proxy rather than lower level protocols like DNS or TCP/IP, we can also have the proxy cache files and use it to bridge a firewall, allowing the cluster to reside on an internal and protected network.

    The downside of this strategy is that it does not intelligently distribute the load. We could fix this by replacing the random map file with an external mapping program that attempted to make intelligent guesses about which servers are most suitable, though the program should be very simple to not adversely affect performance, since it will be called for every client request.

    Other Clustering Solutions

    There are several commercial and free clustering solutions available from the Internet. Here are a few that might be of interest if none of the other solutions here is sophisticated enough:

    Eddie

    The Eddie Project is an open-source initiative sponsored by Ericsson to develop advanced clustering solutions for Linux, FreeBSD, and Solaris; Windows NT is under development.

    There are two packages available: an enhanced DNS server that takes the place of the BIND daemon and performs true load balancing and an intelligent HTTP gateway that allows web servers to be clustered across disparate networks. A sample Apache configuration is included with the software, and binary RPM packages are available for x86 Linux systems.

    Eddie is available from http://www.eddieware.org/.

    TurboCluster

    TurboCluster is a freely available clustering solution developed for TurboLinux: http://community.turbolinux.com/cluster/.

    Sun Cluster

    Solaris system will most probably be interested in Sun's own clustering application, however, this is not a free or open product. See http://www.sun.com/clusters/.

    Freequalizer

    Freequalizer is a freely available version of Equalizer, produced by Coyote Point Systems, designed to run on a FreeBSD server (Equalizer, the commercial version, runs on its own dedicated hardware). GUI monitoring tools are available as part of the package.

    Freequalizer is available from http://www.coyotepoint.com.

    ©1999 Wrox Press Limited, US and UK.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

     

       

    ADMINISTRATION ARTICLES

    - Configuring Load-Balanced Clusters
    - Load-Balanced Clusters
    - UNIX Time Format Demystified
    - Making Changes in the CVS
    - Building Your First CVS Repository
    - CVS Quickstart Guide
    - Authorizing Users in Samba
    - Handling User Accounts in Samba
    - Authentication in Samba
    - Accounts, Authentication, and Authorization
    - Advanced Concepts on Dealing with Files and ...
    - Dealing with Files and Filesystems
    - More Hacks for the User Environment in BSD
    - Personalizing the User Environment in BSD
    - Customizing the User Environment in BSD

     
    Accelerating Trading Partner Performance
     
    Competing on Analytics
     
    Cost Effective Scaling with Virtualization and Coyote Point Systems
     
    Five Checkpoints to Implementing IP Telephony
     
    Hosted Email Security: Staying Ahead of New Threats
     




    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 2 hosted by Hostway