CPU performance and plenty of memory by themselves and won’t prevent a bottleneck if Input/Output (I/O) performance (frequency of access to interface card and hard disk) of the system is insufficient.
In an intranet, very high demands are made of and from the network and interface card. Here, an older 10Base2 or 10BaseT connection can easily become a problem. A 10Base network can cope with a maximum throughput of six to eight megabits per second, and a Web server accessed at a rate of 90 hits per second will soon reach this limit.
100baseT network cards and cabling are now negligibly more expensive, so there’s no reason to invest in 10Base networking unless you have a legacy 10Base network that you can’t easily replace. Even in this case, dual 10/100Base network cards are a better option—you can always upgrade the rest of the network later. For the most demanding applications, Gigabyte Ethernet is also available, but it costs considerably more to implement.
Provided that other computers don’t unduly stretch the network, a normal Ethernet card will in most cases be sufficient, as long as it’s not the cheapest card available at a cut-price computer store. Note that many lower-end cards don’t use features such as Direct Memory Access (DMA), also called bus mastering, so they perform significantly worse even though they’re “compatible” with more expensive ones.
Using Dual Network Connections
Fitting two network cards and assigning them different IP addresses on different networks is an excellent approach for servers, especially if you intend to connect them both to an ISP. The external network interface is then used exclusively for Web server access, and the internal network interface links to the database server or backup systems, allowing you to process database requests and make backups without affecting the external bandwidth. Similarly, a busy Web site won’t affect bandwidth on the internal network.
Dual network interfaces have an additional security benefit: By isolating the internal and external networks and eliminating any routing between them, it becomes relatively easy to deny external users access to the internal network. For example, if you have a firewall, you can put it between the internal network interface and the rest of the network, which leaves the server outside but everything else inside.Internet Connection
If the server is going to be on the Internet, you need to give careful consideration both to the type of connection you use and the capabilities of the ISP that’ll provide it.
Here are some questions when considering an ISP:
If you’re running a site with an international context (for example, if you run all the regional sites of an international company from one place), find out the answers to the following, as well:
Note that just because an ISP can offer high bandwidth doesn’t necessarily mean that users on the Internet can utilize that bandwidth—that depends on how well connected the ISP is to its peers and the Internet backbone in general. Many ISPs rely on one supplier for their own connectivity, so if that supplier is overloaded, your high bandwidth is useless to you and your visitors, even if the ISP’s outgoing bandwidth is theoretically more than adequate.Hard Disk and Controller
Fast hard disks and a matching controller definitely make sense for a Web server, and a SCSI system is infinitely preferable to Integrated Device Electronics (IDE) if performance is an issue.
For frequently accessed Web sites, it also makes sense to use several smaller disks rather than one large hard disk. If, for instance, one large database or several large virtual servers are operated, for superior access performance, store the data on their own disks because one hard disk can read from only one place at one time.
RAID 0 (striping) can also be used to increase the performance from a disk array. Combining it with RAID 1 for redundancy can be an effective way of improving server performance. This is known as RAID 0+1, RAID 1+0, and RAID 10—all three are the same. However, it can be expensive.Operating System Checklist
For the server to run effectively (that is, be both stable and efficient), the hosting operating system needs to be up to the task. I have discussed operating systems in reference to Apache’s supported platforms, and I mentioned that as a server platform, Unix is generally preferred for Apache installations. Whatever operating system you choose, it should have all the following features to some degree:
Third-party modules can be more of a problem, but Apache supplies the MaxRequestsPerChild directive to forcibly restart Apache processes periodically, preventing unruly modules from misbehaving too badly. If you plan to use large applications such as database servers, you should check their records, too.Redundancy and Backup
If you’re planning to run a server of any importance, you should give some attention to how you intend to recover the server if, for whatever reason, it dies. For example, you may have a hardware failure, or you might get cracked and have your data compromised. A RAID array is a good first line of defense, but it can be expensive. It also keeps the backup in the server itself, which isn’t much comfort if the server happens to catch fire and explodes. (Yes, this actually happens.)
A simple backup solution is to equip the server with a DAT drive or other mass storage device and configure the server to automatically copy the relevant files to tape at regular scheduled times. This is easy to set up even without specialist backup software; on a Unix platform, a simple cron job will do this for you.
A better solution is to back up across an internal network, if you have one. This would allow data to be copied off the server to a backup server that could stand in when the primary server goes down. It also removes the need for manual intervention because DAT tapes don’t swap by themselves.
If the server is placed on the Internet (or even if it isn’t), you should take precautions against the server being compromised. If this happens, there is only one correct course of action: Replace everything from reliable backups. That includes reinstalling the operating system, reconfiguring it, and reinstalling the site or sites from backups. If you’re copying to a single backup medium every day and don’t spot a problem before the next backup occurs, you have no reliable backup the following day. The moral is to keep multiple, dated backups.
There are several commercial tools for network backups, and your choice may be influenced by the server’s environment—the corporate backup strategy most likely can extend to the server, too. Free options include obvious but crude tools such as FTP or NFS to copy directory trees from one server to another. (Unless you have a commandingly good reason to do so, you should probably not ever have NFS enabled on the server because this could compromise its security.)
A better free tool for making backups is rsync, which is an intelligent version of the standard Unix rcp (remote copy) command that copies only the differences between directory hierarchies. Better still, it can run across an encrypted connection supplied by Openssh (secure shell), another free tool. If you need to make remote backups of the server’s files across the Internet, you should seriously consider this approach. (I cover both rsync and ssh in Chapter 10.) On the subject of free tools, another more advanced option worth noting is the Concurrent Versioning System (CVS). More often applied to source code, it works well on HTML files, too. (For more information on CVS, see http://www.cvshome.org/.)
A final note about backups across the network: Even if you use a smart backup system that knows how to make incremental backups of the server, a large site can still mean a large quantity of data. If the server is also busy, whenever a backup is performed this data will consume bandwidth that would otherwise be put to toward handling browser requests, so it pays to plan backups and schedule them appropriately. If you have a lot of data to copy, consider doing it in stages (on a per-directory basis, for example) and definitely do it incrementally. Having dual network connections, backing up on the internal one, and leaving the external one for HTTP requests is a definite advantage here.Specific Hardware Solutions
Many vendors now sell hardware with Apache or an Apache derivative preinstalled, coupled with administrative software to simplify server configuration and maintenance. At this point, all these solutions are Unix-based, predominantly Linux. Several ISPs are also offering some of these solutions as dedicated servers for purchase or hire.
Larger vendors include HP, Dell, Sun, and of course IBM, as well as a diverse list of smaller companies. The list of vendors is growing all the time—the Linux VAR HOWTO at http://en.tldp.org/HOWTO/VAR-HOWTO.html (and other places) has some useful pointers.Get Someone Else to Do It
As an alternative to setting up a server yourself—with all the attendant issues of reliability, connectivity, and backups this implies—you can buy or hire a dedicated server at an ISP, commonly known as colocation.
The advantages of this are that the ISP handles all the issues involving day-to-day maintenance, but you still get all the flexibility of a server that belongs entirely to you. You can even rebuild and reconfigure Apache as you want it because you have total control of the server. This also means you have total control over wrecking the server, so this doesn’t eliminate the need for a Web server administrator just because the server isn’t physically present.
The disadvantage is that you’re physically removed from the server. If it has a serious problem, you may be unable to access it to find out what the problem is. The ISP will most likely also impose bandwidth restrictions, which you should be aware of. You’re also reliant on the ISP’s service, so checking out their help desk before signing up is recommended.
Note that services vary from one ISP to another—some will back up the server files automatically; others will not. As with most things on the Internet, it pays to check prospective ISPs by looking them up on discussion lists and Usenet newsgroups.
NOTE More introductory material is available at http://httpd.apache.
In this chapter, I covered the basic concepts of what a Web server is and introduced you to Apache. There are many reasons that Apache is a popular Web server, including the important fact that it’s free. The best form of support for Apache is the informative and informal support of the online community that’s very active in developing and maintaining it.
I also discussed how Apache works on Unix and Windows as well as some networking tools such as ifconfig, netstat, snoop, tcpdump, ping, spray, and traceroute. In the latter part of the chapter, I covered the basic server requirements and some specific hardware solutions for your Web server.
In the next chapter, I’ll cover installing Apache and configuring it as a basic Web server.
blog comments powered by Disqus