Home arrow Apache arrow Page 10 - Apache and the Internet

Network Interface - Apache

This article introduces those new to networking to Apache, the Hypertext Transfer Protocol (HTTP), and the basics of system administration. It is excerpted from chapter one of Peter Wainwright's book Pro Apache (Apress, 2004; ISBN: 1590593006).

TABLE OF CONTENTS:
  1. Apache and the Internet
  2. How Apache Works
  3. Configuring Apache
  4. The Hypertext Transfer Protocol
  5. Understanding the HTTP Protocol
  6. The TCP/IP Network Model
  7. Netmasks and Routing
  8. The Future: IPv6
  9. Monitoring a Network
  10. Network Interface
By: Apress Publishing
Rating: starstarstarstarstar / 24
March 09, 2005

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

CPU performance and plenty of memory by themselves and won’t prevent a bottleneck if Input/Output (I/O) performance (frequency of access to interface card and hard disk) of the system is insufficient.

In an intranet, very high demands are made of and from the network and interface card. Here, an older 10Base2 or 10BaseT connection can easily become a problem. A 10Base network can cope with a maximum throughput of six to eight megabits per second, and a Web server accessed at a rate of 90 hits per second will soon reach this limit.

100baseT network cards and cabling are now negligibly more expensive, so there’s no reason to invest in 10Base networking unless you have a legacy 10Base network that you can’t easily replace. Even in this case, dual 10/100Base network cards are a better option—you can always upgrade the rest of the network later. For the most demanding applications, Gigabyte Ethernet is also available, but it costs considerably more to implement.

Provided that other computers don’t unduly stretch the network, a normal Ethernet card will in most cases be sufficient, as long as it’s not the cheapest card available at a cut-price computer store. Note that many lower-end cards don’t use features such as Direct Memory Access (DMA), also called bus mastering, so they perform significantly worse even though they’re “compatible” with more expensive ones.

Using Dual Network Connections

Fitting two network cards and assigning them different IP addresses on different networks is an excellent approach for servers, especially if you intend to connect them both to an ISP. The external network interface is then used exclusively for Web server access, and the internal network interface links to the database server or backup systems, allowing you to process database requests and make backups without affecting the external bandwidth. Similarly, a busy Web site won’t affect bandwidth on the internal network.

Dual network interfaces have an additional security benefit: By isolating the internal and external networks and eliminating any routing between them, it becomes relatively easy to deny external users access to the internal network. For example, if you have a firewall, you can put it between the internal network interface and the rest of the network, which leaves the server outside but everything else inside.

Internet Connection

If the server is going to be on the Internet, you need to give careful consideration both to the type of connection you use and the capabilities of the ISP that’ll provide it.

Here are some questions when considering an ISP:

  • Are they reliable?

  • Do they have good connectivity with the Internet (who are they peered with, and do they have redundant circuits)?

  • Are you sharing bandwidth with many other customers?

  • If so, do they offer a dedicated connection?

If you’re running a site with an international context (for example, if you run all the regional sites of an international company from one place), find out the answers to the following, as well:

  • Do they have good global connectivity?

  • Does technical support know the answer to all these questions when called?

Note that just because an ISP can offer high bandwidth doesn’t necessarily mean that users on the Internet can utilize that bandwidth—that depends on how well connected the ISP is to its peers and the Internet backbone in general. Many ISPs rely on one supplier for their own connectivity, so if that supplier is overloaded, your high bandwidth is useless to you and your visitors, even if the ISP’s outgoing bandwidth is theoretically more than adequate.

Hard Disk and Controller

Fast hard disks and a matching controller definitely make sense for a Web server, and a SCSI system is infinitely preferable to Integrated Device Electronics (IDE) if performance is an issue.

For frequently accessed Web sites, it also makes sense to use several smaller disks rather than one large hard disk. If, for instance, one large database or several large virtual servers are operated, for superior access performance, store the data on their own disks because one hard disk can read from only one place at one time.

RAID 0 (striping) can also be used to increase the performance from a disk array. Combining it with RAID 1 for redundancy can be an effective way of improving server performance. This is known as RAID 0+1, RAID 1+0, and RAID 10—all three are the same. However, it can be expensive.

Operating System Checklist

For the server to run effectively (that is, be both stable and efficient), the hosting operating system needs to be up to the task. I have discussed operating systems in reference to Apache’s supported platforms, and I mentioned that as a server platform, Unix is generally preferred for Apache installations. Whatever operating system you choose, it should have all the following features to some degree:

Stability: The operating system should be reliable and capable of running indefinitely without recourse to rebooting. Bad memory management is a major course of long-term unreliability.

Security: The operating system should be resistant to all kinds of attack, including DoS attacks (which tie up system resources and prevent legitimate users from getting service), and have a good track record of security. Security holes that are discovered should be fixed rapidly by the responsible authority. Note that rapidly means days, not weeks.

Performance: The operating system should use resources effectively by handling networking without undue load on the rest of the operating system and performing task-switching efficiently. Apache in particular runs multiple processes to handle incoming connections; inefficient switching causes a performance loss. If you plan to run on a multiprocessor system, Symmetric Multi Processor (SMP) performance is also a key issue to consider.

Maintenance: The operating system should be easy to upgrade or patch for security concerns, shouldn’t require rebooting or being taken offline to perform anything but significant upgrades, and shouldn’t require that the whole system be rebooted to maintain or upgrade just one part of it.

Memory: The operating system should use memory effectively, avoid swapping unless absolutely necessary and then swap intelligently, and have no memory leaks that tie up memory uselessly. (Leaky software is one of the biggest causes of unreliable Web servers. For example, until recently, Windows NT has had a very bad record in this department.) However, leaky applications are also problematic. Fortunately, Apache isn’t one of them, but it used to be less stellar in this regard than it is now.

License: The operating system shouldn’t come with strings attached that may compromise your ability to run a secure server. Some vendors, even large and very well-known ones, have been known to insert new clauses in license agreements that must be agreed to in order to apply critical security patches. Some of the terms and conditions in these licenses grant permission for the vendor to modify or install software at will over the Internet. This is a clear security concern, not to mention a confidentiality issue for systems handling company or client information, so any vendor with a track record of this kind of behavior should be eliminated for consideration, irrespective of how well they score (or claim to score) otherwise.

Third-party modules can be more of a problem, but Apache supplies the MaxRequestsPerChild directive to forcibly restart Apache processes periodically, preventing unruly modules from misbehaving too badly. If you plan to use large applications such as database servers, you should check their records, too.

Redundancy and Backup

If you’re planning to run a server of any importance, you should give some attention to how you intend to recover the server if, for whatever reason, it dies. For example, you may have a hardware failure, or you might get cracked and have your data compromised. A RAID array is a good first line of defense, but it can be expensive. It also keeps the backup in the server itself, which isn’t much comfort if the server happens to catch fire and explodes. (Yes, this actually happens.)

A simple backup solution is to equip the server with a DAT drive or other mass storage device and configure the server to automatically copy the relevant files to tape at regular scheduled times. This is easy to set up even without specialist backup software; on a Unix platform, a simple cron job will do this for you.

A better solution is to back up across an internal network, if you have one. This would allow data to be copied off the server to a backup server that could stand in when the primary server goes down. It also removes the need for manual intervention because DAT tapes don’t swap by themselves.

If the server is placed on the Internet (or even if it isn’t), you should take precautions against the server being compromised. If this happens, there is only one correct course of action: Replace everything from reliable backups. That includes reinstalling the operating system, reconfiguring it, and reinstalling the site or sites from backups. If you’re copying to a single backup medium every day and don’t spot a problem before the next backup occurs, you have no reliable backup the following day. The moral is to keep multiple, dated backups.

There are several commercial tools for network backups, and your choice may be influenced by the server’s environment—the corporate backup strategy most likely can extend to the server, too. Free options include obvious but crude tools such as FTP or NFS to copy directory trees from one server to another. (Unless you have a commandingly good reason to do so, you should probably not ever have NFS enabled on the server because this could compromise its security.)

A better free tool for making backups is rsync, which is an intelligent version of the standard Unix rcp (remote copy) command that copies only the differences between directory hierarchies. Better still, it can run across an encrypted connection supplied by Openssh (secure shell), another free tool. If you need to make remote backups of the server’s files across the Internet, you should seriously consider this approach. (I cover both rsync and ssh in Chapter 10.) On the subject of free tools, another more advanced option worth noting is the Concurrent Versioning System (CVS). More often applied to source code, it works well on HTML files, too. (For more information on CVS, see http://www.cvshome.org/.)

A final note about backups across the network: Even if you use a smart backup system that knows how to make incremental backups of the server, a large site can still mean a large quantity of data. If the server is also busy, whenever a backup is performed this data will consume bandwidth that would otherwise be put to toward handling browser requests, so it pays to plan backups and schedule them appropriately. If you have a lot of data to copy, consider doing it in stages (on a per-directory basis, for example) and definitely do it incrementally. Having dual network connections, backing up on the internal one, and leaving the external one for HTTP requests is a definite advantage here.

Specific Hardware Solutions

Many vendors now sell hardware with Apache or an Apache derivative preinstalled, coupled with administrative software to simplify server configuration and maintenance. At this point, all these solutions are Unix-based, predominantly Linux. Several ISPs are also offering some of these solutions as dedicated servers for purchase or hire.

Larger vendors include HP, Dell, Sun, and of course IBM, as well as a diverse list of smaller companies. The list of vendors is growing all the time—the Linux VAR HOWTO at http://en.tldp.org/HOWTO/VAR-HOWTO.html (and other places) has some useful pointers.

Get Someone Else to Do It

As an alternative to setting up a server yourself—with all the attendant issues of reliability, connectivity, and backups this implies—you can buy or hire a dedicated server at an ISP, commonly known as colocation.

The advantages of this are that the ISP handles all the issues involving day-to-day maintenance, but you still get all the flexibility of a server that belongs entirely to you. You can even rebuild and reconfigure Apache as you want it because you have total control of the server. This also means you have total control over wrecking the server, so this doesn’t eliminate the need for a Web server administrator just because the server isn’t physically present.

The disadvantage is that you’re physically removed from the server. If it has a serious problem, you may be unable to access it to find out what the problem is. The ISP will most likely also impose bandwidth restrictions, which you should be aware of. You’re also reliant on the ISP’s service, so checking out their help desk before signing up is recommended.

Note that services vary from one ISP to another—some will back up the server files automatically; others will not. As with most things on the Internet, it pays to check prospective ISPs by looking them up on discussion lists and Usenet newsgroups.

Caveat emptor!

NOTE More introductory material is available at http://httpd.apache.
org/docs/misc/FAQ.html#what
.

Summary

In this chapter, I covered the basic concepts of what a Web server is and introduced you to Apache. There are many reasons that Apache is a popular Web server, including the important fact that it’s free. The best form of support for Apache is the informative and informal support of the online community that’s very active in developing and maintaining it.

I also discussed how Apache works on Unix and Windows as well as some networking tools such as ifconfig, netstat, snoop, tcpdump, ping, spray, and traceroute. In the latter part of the chapter, I covered the basic server requirements and some specific hardware solutions for your Web server.

In the next chapter, I’ll cover installing Apache and configuring it as a basic Web server.

This article is excerpted from Pro Apache by Peter Wainwright (Apress, 2004; ISBN  1590593006). Check it out at your favorite bookstore today. Buy this book now.



 
 
>>> More Apache Articles          >>> More By Apress Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

APACHE ARTICLES

- Apache Unveils Cassandra 1.2
- Apache on ARM Chips? Dell and Calxeda Help M...
- The Down Side of Open Source Software
- VMware Unveils Serengeti for Apache Hadoop
- SAP Takes Steps to Improve Hadoop Integration
- Looking to Hone Apache Hadoop Skills?
- How to Install Joomla on WAMPP
- Working with XAMPP and Wordpress
- GUI Available for Apache Camel
- Reduce Server Load for Apache and PHP Websit...
- Creating a VAMP (Vista, Apache, MySQL, PHP) ...
- Putting Apache in Jail
- Containing Intrusions in Apache
- Server Limits for Apache Security
- Setting Permissions in Apache

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: