Home arrow Apache arrow Page 5 - Apache and the Internet

Understanding the HTTP Protocol - Apache

This article introduces those new to networking to Apache, the Hypertext Transfer Protocol (HTTP), and the basics of system administration. It is excerpted from chapter one of Peter Wainwright's book Pro Apache (Apress, 2004; ISBN: 1590593006).

  1. Apache and the Internet
  2. How Apache Works
  3. Configuring Apache
  4. The Hypertext Transfer Protocol
  5. Understanding the HTTP Protocol
  6. The TCP/IP Network Model
  7. Netmasks and Routing
  8. The Future: IPv6
  9. Monitoring a Network
  10. Network Interface
By: Apress Publishing
Rating: starstarstarstarstar / 24
March 09, 2005

print this article



The protocol version is one of the following:

  • HTTP/0.9

  • HTTP/1.0

  • HTTP/1.1

In practice, nothing ever sends HTTP/0.9 because the protocol argument itself was introduced with HTTP/1.0 to distinguish 1.0 requests from 0.9 requests. HTTP/0.9 is assumed if the client doesnít send a protocol, but only GET and POST can work this way because other methods didnít exist before the introduction of HTTP version 1.0.

HTTP Headers

HTTP headers (also known as HTTP header fields) can pass with HTTP messages in either direction between client and server. Any header can be sent if both ends of the connection agree about its meaning, but HTTP defines only a specific subset of headers.

Recognized HTTP headers are divided into three groups:

Request headers are sent by clients to the server to add information or modify the nature of the request. The Accept-Language header, for example, informs the server of the languages the client accepts, which Apache can use for content negotiation.

Response headers are sent by the server to clients in response to requests.

Standard headers generally sent by Apache include the Date and Connection.

Entity headers may be sent in either direction and add descriptive information (also called meta information) about the body of an HTTP message. HTTP requests are permitted to use entity headers only for methods that allow a body, that is, PUT and POST. Requests with bodies are obliged to send a Content-Length header to inform the server how large the body is. Servers may instead send a Transfer-Encoding header but must otherwise send a Content-Length header. In addition to content headers, which also include Content-Language, Content-Encoding, and the familiar Content-Type, two useful entity headers are Expires and Last-Modified. Expires tells browsers and proxies how long a document remains valid; Last-Modified enables a client to determine if a cached document is current. (To illustrate how this is useful, consider a proxy with a cached document for which a request arrives. The proxy first sends the server a HEAD request and looks for a Last-Modified header in the response. If it finds one and itís no newer than the cached document, the proxy doesnít bother to request the document from the server but sends the cached version instead. (See mod_expires in Chapter 4 for more details.)

Online Appendix H gives a full list of recognized HTTP headers.

Networking and TCP/IP

Although a computer can work in isolation, itís generally more useful to connect it to a network. For a Web server to be accessible, it needs to be connected to the outside world.

To network two or more computers together, some kind of communication medium is required. In an office this is usually something such as Ethernet, with a network card installed in each participating computer and connecting cables. Wireless networking cards and hubs are another increasingly common option. Wired or not, however, hardware alone isnít enough. Although itís still possible to get away with sending data as-is on a serial connection, computers sharing a network with many other computers need a more advanced protocol for defining how data is transmitted, delivered, received, and acknowledged.

Transport Communication Protocol/Internet Protocol (TCP/IP) is one of several such protocols for communicating between computers on a network, and itís the protocol predominantly used on the Internet. Others include Token Ring (which doesnít run on Ethernet) and SPX/IPX (which does), both of which are generally used in corporate intranets.


TCP/IP is two protocols, one built on top of the other. As the lower level, IP routes data between sender and recipient by splitting the data into packets and attaching a source and destination address to each packet.

There are now two versions of IP available. The older, and most common, is IPv4 (IP version 4). This is the protocol on which the bulk of the Internet still operates, but itís now beginning to show its age. Its successor is IPv6, which extends the addressing range from 32 to 128 bits, adds support for mobile IP and quality-of-service determination, and provides optional authentication and encryption of network connections. This part of the protocol is large enough in its own right that itís published in a separate specification known as IPSec, and itís the basis of Virtual Private Networks (VPNs).

TCP relies on IP to handle the details of getting data from one point to another. On top of this, TCP provides mechanisms for establishing connections, ensuring that data arrives in the order that it was sent, and handling data loss, errors, and recovery. TCP defines a handshake protocol to detect network errors and defines its own set of envelope information, including a sequence number, which it adds to the packet of data IP sends.

TCP isnít the only protocol that uses IP. Also part of the TCP/IP protocol suite is User Datagram Protocol (UDP ). Unlike TCP, which is a reliable and connection-oriented protocol, UDP is a connectionless and nonguaranteed protocol used for noncritical transmissions and broadcasts, generally for messages that can fit into one packet. Because it doesnít check for successful transmission or correct sequencing, UDP is useful in situations where TCP would be too unwieldy, such as Internet broadcasts and multiplayer games. Itís also the basis of peer-to-peer networks such as Gnutella (http://www.gnutella.com/), which implement their own specialized error detection and retransmission protocols.

TCP/IP also includes Internet Control Message Protocol (ICMP), which is used by TCP/IP software to communicate messages concerning the protocol itself, such as a failure to connect to a given host. ICMP is intended for use by the low-level TCP/IP protocol and is rarely intended for user-level applications.

NOTE The TCP, UDP, ICMP, and IP protocols are defined in the following RFCs: UDP: 768, IP: 791, ICMP: 792, TCP: 793. See Online Appendix A for a complete list of useful RFCs and other documents.

Packets and Encapsulation

I mentioned earlier that IP and TCP both work by adding information to packets of data that are then transmitted between hosts. To really understand TCP/IP, itís helpful to know a little more about IP and TCP.

When an application sends a block of dataóa file or a page of HTMLóTCP splits the data into packets for transmission. The Ethernet standard defines a maximum packet size of 1500 bytes. (On older networks, the hardware might limit packets to a size of 576 bytes.) When establishing a connection, TCP/IP determines how large a packet is allowed to be. Even if the local network can handle 1500 bytes, the destination or an intermediate network might not. Unless an intermediate step can perform packet splitting, the whole communication will have to drop down to the lowest packet size.

Once TCP knows what the packet size is, it encapsulates each block of data destined for a packet with a TCP header that contains a sequence number, source and destination ports, and a checksum for detecting errors. This header is like the address on an envelope, with the data packet as the enclosed letter.

IP then adds its own header to the TCP packet, in which it records the source and destination IP addresses so intermediate stages know how to route the packet. It also adds a protocol type to identify the packet as TCP, UDP, ICMP, or some other protocol, and another checksum. If youíre using IPv6, the packet can be signed to authenticate the sender and encrypted for transmission.

Furthermore, if the packet is to be sent over an Ethernet network, Ethernet adds yet another header containing the source and destination Ethernet addresses for the current link in the chain, a type code, and another checksum. The reason for this is that while IP records the IP addresses of the sending and receiving hosts in the header, Ethernet uses the Ethernet addresses of the network interfaces for each stage of the packetís trip. Each protocol works at a closer range than the one it encapsulates, describing shorter and shorter hops in the journey from source to destination.

Both IP and TCP add 20 bytes of information to a data packet, all of which has to fit inside the 1500-byte limit imposed by Ethernet. So the maximum size of data that can fit into a TCP/IP packet is actually 1460 bytes. Of course, if IP is running a serial connection instead an Ethernet, it isnít necessarily limited to 1500 bytes for a packet. Other protocols may impose their own limitations.

ACKs, NAKs, and Other Messages

The bulk of TCP transmissions are made up of data packets, as I just described. However, IP makes no attempt to ensure that the packet reaches its destination, so TCP requires that the destination send an Acknowledged message (ACK) to tell the sending host that the message arrived. ACKs are therefore nearly as common as data messages, and in an ideal network, exactly as many ACKs occur as data messages. If something is wrong with the packet, TCP requires the destination to send a Not Acknowledged message (NAK) instead.

In addition to data, ACKs, and NAKs, TCP also defines synchronization (SYN), for establishing connections, and FIN, for ending them. The client requests a connection by sending a SYN message to a server, which establishes or denies the connection by sending an ACK or NAK, respectively. When either end of the connection wants to end it, it sends a FIN message to indicate it no longer wants to communicate. Figure 1-2 illustrates this process.

Figure 1-2. TCP communication messages

There are, therefore, three eventualities the sending host can expect:

  • The destination host receives a packet, and if the packet is the one it expected or a new connection, it sends an ACK.

  • The packetís checksum doesnít match or the sequence number of the packet is wrong, so the destination sends a NAK to inform the host that it needs to send the packet again.

  • The destination doesnít send anything at all. In this case, TCP eventually decides that the packet or the response got lost and sends it again.

Several kinds of Denial of Service (DoS) attacks exploit aspects of TCP/IP to attempt to tie up servers unnecessarily. One such attack is the SYN flood, when many SYN packets are sent to a server, but the acceptance of the requested connections is never acknowledged by the client. Clearly, a little understanding of TCP packets can be of more than just academic interest. Actually doing something about such attacks is one of the topics of Chapter 10.

This article is excerpted from Pro Apache by Peter Wainwright (Apress, 2004; ISBN  1590593006). Check it out at your favorite bookstore today. Buy this book now.

>>> More Apache Articles          >>> More By Apress Publishing

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Apache Unveils Cassandra 1.2
- Apache on ARM Chips? Dell and Calxeda Help M...
- The Down Side of Open Source Software
- VMware Unveils Serengeti for Apache Hadoop
- SAP Takes Steps to Improve Hadoop Integration
- Looking to Hone Apache Hadoop Skills?
- How to Install Joomla on WAMPP
- Working with XAMPP and Wordpress
- GUI Available for Apache Camel
- Reduce Server Load for Apache and PHP Websit...
- Creating a VAMP (Vista, Apache, MySQL, PHP) ...
- Putting Apache in Jail
- Containing Intrusions in Apache
- Server Limits for Apache Security
- Setting Permissions in Apache

Developer Shed Affiliates


Dev Shed Tutorial Topics: