This article discusses the various types of networks, the methods for connnecting networks, how network data is moved from network to network, and the protocols used on today's popular networks. It is excerpted from chapter one of the book The Definitive Guide to Linux Networking Programming, written by Keir Davis et. al. (Apress, 2004; ISBN: 1590593227).
So far, you’ve seen that on a particular network, every device must have a unique address, and you can connect many networks together to form a larger network using gateways. Before a node on your network “talks,” it checks to see if anyone else is “talking,” and if not, it goes ahead with its communication. Your networks are interconnected, though! What happens if the nodes on your office LAN have to wait for some node on a LAN in Antarctica to finish talking before they can talk? Nothing would ever be sent—the result would be gridlock! How do you handle the need to identify a node with a unique address on interconnected networks while at the same time isolating your own network from every other network? Unless one of your nodes has a communication for another node on another network, there should be no communication between networks and no need for one to know of the existence of the other until the need to communicate exists. You handle the need for a unique address by assigning protocol addresses to physical addresses in conjunction with your gateways. In our scenario, these protocol addresses are known as Internet Protocol (IP) addresses.
IP addresses are virtual. That is, there is no required correlation between a particular IP address and its physical interface. An IP address can be moved from one node to another at will, without requiring anything but a software configuration change, whereas changing a node’s physical address requires changing the network hardware. Thus, any node on an internet has both a physical Ethernet address (MAC address) and an IP address.
Unlike an Ethernet address, an IP address is 32 bits long and consists of both a network identifier and a host identifier. The network identifier bits of the IP addresses for all nodes on a given network are the same. The common format for listing IP addresses is known as dotted quad notation because it divides the IP address into four parts (see Table 1-1). The network bits of an IP address are the leading octets, and the address space is divided into three classes: Class A, Class B, and Class C. Class A addresses use just 8 bits for the network portion, while Class B addresses use 16 bits and Class C addresses use 24 bits.
Table 1-1. Internet Protocol Address Classes
Networks 22.214.171.124 through 127.0.0.0
Networks 126.96.36.199 through 188.8.131.52
Networks 192.0.0.0 through 184.108.40.206
Reserved for future use
Let’s look at an example. Consider the IP address . From Table 1-1, you can tell that this is a Class C address. Since it is a Class C address, you know that the network identifier is 24 bits long and the host identifier is 8 bits long. This translates to “the node with address 1 on the network with address .” Adding a host to the same Class C network would require a second address with the same network identifier, but a different host identifier, such as , since every host on a given network must have a unique address.
You may have noticed that the table doesn’t include every possible value. This is because the octets 0 and 255 are reserved for special use. The octet 0 (all 0s) is the address of the network itself, while the octet 255 (all 1s) is called the broadcast address because it refers to all hosts on a network simultaneously. Thus, in our Class C example, the network address would be192.168.2.0, and the broadcast address would be192.168.2.255. Because every address range needs both a network address and a broadcast address, the number of usable addresses in a given range is always 2 less than the total. For example, you would expect that on a Class C network you could have 256 unique hosts, but you cannot have more than 254, since one address is needed for the network and another for the broadcast.
In addition to the reserved network and broadcast addresses, a portion of each public address range has been set aside for private use. These address ranges can be used on internal networks without fear of conflicts. This helps alleviate the problem of address conflicts and shortages when public networks are connected together. The address ranges reserved for private use are shown in Table 1-2.
Table 1-2. Internet Address Ranges Reserved for Private Use
10.0.0.0 through 10.255.255.255
172.16.0.0 through 172.31.0.0
192.168.0.0 through 192.168.255.0
If you know your particular network will not be connected publicly, you are allowed to use any of the addresses in the private, reserved ranges as you wish. If you do this, however, you must use software address translation to connect your private network to a public network. For example, if your office LAN uses as its network, your company’s web server or mail server cannot use one of those addresses, since they are private. To connect your private network to a public network such as the Internet, you would need a public address for your web server or mail server. The private addresses can be “hidden” behind a single public address using a technique called (NAT), where an entire range of addresses is translated into a single public address by the private network’s gateway. When packets are received by the gateway on its public interface, the destination address of each packet is converted back to the private address. The public address used in this scenario could be one assigned dynamically by your service provider, or it could be from a range of addresses to your network, also by your service provider. When a network address range is delegated, it means that your gateway takes responsibility for routing that address range and receiving packets addressed to the network.
Another IP address is considered special. This IP address is known as the loopback address, and it’s typically denoted as127.0.0.1. The loopback address is used to specify the local machine, also known as localhost. For example, if you were to open a connection to the address127.0.0.1, you would be opening a network connection to yourself. Thus, when using the loopback address, the sender is the receiver and vice versa. In fact, the entire127.0.0.0network is considered a reserved network for loopback use, though anything other than127.0.0.1is rarely used.
The final component of IP addressing is the port. Ports are virtual destination “points” and allow a node to conduct multiple network communications simultaneously. They also provide a standard way to designate the point where a node can send or receive information. Conceptually, think of ports as “doors” where information can come and go from a network node.
On Linux systems, the number of ports is limited to 65,535, and many of the lower port numbers are reserved, such as port 80 for web servers, port 25 for sending mail, and port 23 for telnet servers. Ports are designated with a colon when describing an IP address and port pair. For example, the address10.0.0.2:80can be read as “port 80 on the address10.0.0.2,” which would also mean “the web server on10.0.0.2” since port 80 is typically used by and reserved for web services. Which port is used is up to the discretion of the developer, provided the ports are not already in use or reserved. A list of reserved ports and the names of the services that use them can be found on your Linux system in the/etc/servicesfile, or at the Internet Assigned Numbers Authority (IANA) site listed here:http://www.iana.org/assignments/port-numbers. Table 1-3 contains a list of commonly used (and reserved) ports.
Table 1-3. Commonly Used Ports
File Transfer Protocol (FTP)
Secure Shell (SSH)
Simple Mail Transfer Protocol (SMTP)
Domain Name System (DNS)
Hypertext Transfer Protocol (HTTP)
Post Office Protocol 3 (POP3)
Internet Message Access Protocol (IMAP)
Hypertext Transfer Protocol Secure (HTTPS)
Without ports, a network host would be allowed to provide only one network service at a time. By allowing the use of ports, a host can conceivably provide moer than 65,000 services at any time using a given IP address, assuming each service is offered on a different port. We cover using ports in practice when writing code first in Chapter 2 and then extensively in later chapters.
This version of IP addressing is known as version 4, or IPv4. Because the number of available public addresses has been diminishing with the explosive growth of the Internet, a newer addressing scheme has been developed and is slowly being implemented. The new scheme is known as version 6, or IPv6. IPv6 addresses are 128 bits long instead of the traditional 32 bits, allowing for 2^96 more network nodes than IPv4 addresses. For more on IPv6, consult Appendix A.
Network Byte Order
One final note on IP addressing. Because each hardware manufacturer can develop its own hardware architecture, it becomes necessary to define a standard data representation for data. For example, some platforms store integers in what is known as Little Endian format, which means the lowest memory address contains the lowest order byte of the integer (remember that addresses are 32-bit integers). Other platforms store integers in what is known as Big Endian format, where the lowest memory address holds the highest order byte of the integer. Still other platforms can store integers in any number of ways. Without standardization, it becomes impossible to copy bytes from one machine to another directly, since doing so might change the value of the number.
In an internet, packets can carry numbers such as the source address, destination address, and packet length. If those numbers were to be corrupted, network communications would fail. The Internet protocols solve this byte-order problem by defining a standard way of representing integers called network byte order that must be used by all nodes on the network when describing binary fields within packets. Each host platform makes the conversion from its local byte representation to the standard network byte order before sending a packet. On receipt of a packet, the conversion is reversed. Since the data payload within a packet often contains more than just numbers, it is not converted.
The standard network byte order specifies that the most significant byte of an integer is sent first (Big Endian). From a developer’s perspective, each platform defines a set of conversion functions that can be used by an application to handle the conversion transparently, so it is not necessary to understand the intricacies of integer storage on each platform. These conversion functions, as well as many other standard network programming functions, are covered in Chapter 2.