Home arrow Site Administration arrow Page 7 - Design and Architecture

Dealing with Large Amounts of Data - Administration

Servers typically need to be able to handle multiple clients simultaneously. This presents several problems that need to be solved. This article addresses three of those issues: allowing multiple clients to connect and stay connnected, efficient use of resources, and keeping the server responsive to each of the clients. It is excerpted from chapter five of the book The Definitive Guide to Linux Networking Programming, written by Keir Davis et. al. (Apress, 2004; ISBN: 1590593227).

TABLE OF CONTENTS:
  1. Design and Architecture
  2. Multiplexing
  3. Forking
  4. Preforking: Process Pools
  5. Multithreading
  6. Combining Preforking and Prethreading
  7. Dealing with Large Amounts of Data
By: Apress Publishing
Rating: starstarstarstarstar / 22
November 03, 2005

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

In practice, the data transmitted between the client and the server is much larger than that dealt with in the examples earlier in this chapter. Such large amounts of data generate a few issues. First, large amounts of data will be broken up in the underlying transport layer. IP has a maximum packet size of 65,536 bytes, but even smaller amounts of data may be broken up depending on buffer availability. This means that when you call recv(), for example, it may not return all of the data the first time, and subsequent calls may be required. Second, while you are sending or receiving large amounts of data, you require your user interface to be responsive. In this section, we will address these issues.

Nonblocking Sockets

The first step to carry out large-sized data transfers is to create nonblocking sockets. By default, whenever we create a socket, it will be a blocking socket. This means that if we callrecv()and no data is available, our program will be put to sleep until some data arrives. Callingsend()will put our program to sleep if there is not enough outgoing buffer space available to hold all of the data we want to send. Both conditions will cause our application to stop responding to a user.

Table 5-1. Method Pros and Cons

Method

Multiplexing

Forking

Threading

Preforking

Pre threading

Preforking plus pre threading

Code Complexity

Can be very complex and difficult to follow

Simple

Simple

Can be complex if using a dynamic pool and shared memory

Only complex if using a dynamic pool

Complex

Shared Memory

Yes

Only through shnget()

Yes

Only through shnget()

Yes

Yes but only within each process

Number of Connections

Small

Large

Large, but not as large as forking

Depends on the size of the process pool

Depends on the size of the thread pool

Depends on pool sizes

Frequency of New Connections

Can handle new con nections quickly

Time is required for a new process to start

Time is required for a new thread to start

Can handle new con nections quickly if the process pool is large enough

Can handle new con nections quickly if the thread pool is large enough

Can handle new con nections quickly if pools are large enough.

Length of Connections

Good for long or short con nections

Better for longer con nections. Reduces start penalty.

Better for longer con nections. Reduces thread start penalty.

Good for long or short con nections

Good for long or short con nections

Good for long or short con nections

Stability

One client can crash the server

One client can crash the server

One client can crash the server

One client cannot crash the server

One client can crash the server

One client will crash only its parent process not the whole server

Context Switching

N/A

Not as fast as threads

Fast

Not as fast as threads

Fast

Fast

Resource Use

Low

High

Medium

High

Medium

Similar to threading but depends on how much pre forking is done

SMP Aware

No

Yes

Yes

Yes

Yes

Yes

Creating a nonblocking socket involves two steps. First, we create the socket as we would usually, using the socket function. Then, we use the following call toioctl():

  unsigned long nonblock = 1;
  ioctl(sock, FIONBIO, &nonblock);

With a nonblocking socket, when we callrecv()and no data is available, it will return immediately withEWOULDBLOCK. If data is available, it will read what it can and then return, telling us how much data was read. Likewise withsend(), if there is no room in the outgoing buffer, then it will return immediately withEWOULDBLOCK. Otherwise, it will send as much of our outgoing data as it can before returning the number of bytes sent. Keep in mind that this may be less than the total number of bytes we told it to send, so we may need to call send again.

Theselect()call from the section on multiplexing is the second step to carry out large-sized data transfers. As mentioned earlier, we useselect()to tell us when a socket is ready for reading or writing. In addition, we can specify a time-out, so that in case a socket is not ready for reading or writing within a specified time period,select()will return control to our program. This allows us to be responsive to a user’s commands while still polling the socket for activity.

Putting It All Together

Combining nonblocking sockets withselect()will allow us to send and receive large amounts of data while keeping our application responsive to the user, but we still need to deal with the data itself. Sending large amounts of data is relatively easy because we know how much we need to send. Receiving data, on the other hand, can be a little harder unless we know how much data to expect. Because of this, we will need to build into our communications protocol either a method to tell the receiving program how much data to expect or a fixed data segment size.

Communicating the expected size of the data to the receiver is fairly simple. In fact, this strategy is used by HTTP, for example. The sender calculates the size of the data to send and then transmits that size to the receiver. The receiver then knows exactly how much data to receive.

Another option is to use a fixed-sized segment. In this way, we will always send the same amount of data in each segment sent to the receiver. Because of this, the sender may need to break up data into multiple segments or fill undersized segments. Therefore, care must be taken in determining the segment size. If our segment size is too large, then it will be broken up in the transport and will be inefficient. If it is too small, then we will incur a lot of underlying packet overhead by sending undersized packets. The extra work on the sending side pays off on the receiving side, however, because the receiving is greatly simplified.

Since the receiver is always receiving the same amount of data in each segment, buffer overruns are easily preventable. Using fixed sizes can be a little more complex on the sending side, but simpler on the receiving side.

Finally, here is some code that demonstrates sending data using nonblocking sockets andselect(). The strategy for receiving data is very similar.

int mysend(int sock, const char *buffer, long buffsize) {

NOTEThis code does not deal with the Big Endian/Little Endian issue. It sends a buffer of bytes in the order provided. If you will be dealing with clients and servers that use differing byte orders, then you will need to take care in how the data is formatted before sending with this function.

First, we declare some variables that we’ll need.

fd_set fset; struct timeval tv; int sockStatus; int bytesSent; char *pos; char *end; unsigned long blockMode;

Then, we set the socket to nonblocking. This is necessary for our send but can be removed if we are already using nonblocking sockets.

/* set socket to non-blocking */ blockMode = 1; ioctl(sock, FIONBIO, &blockMode);

Now we set up a variable to keep our place in the outgoing buffer and a variable to point to the end of the buffer.

pos = (char *) buffer; end = (char *) buffer + buffsize;

Next, we loop until we get to the end of the outgoing buffer.

while (pos < end) {

We send some data. Ifsend()returns a negative number, then an error has occurred. Note that 0 is a valid number. Also, we want to ignore an error ofEAGAIN, which signifies that the outgoing buffer is full. Our call toselect()will tell us when there is room again in the buffer.

bytesSent = send(sock, pos, end - pos, 0); if (bytesSent < 0) { if (bytesSent == EAGAIN) { bytesSent = 0; } else { return 0; } }

We update our position in the outgoing buffer.

pos += bytesSent;

If we are already to the end of the buffer, then we want to break out of thewhileloop. There is no need to wait in theselect()because we are already done.

if (pos >= end) { break; }

Next, we get our watch list ready forselect(). We also specify a timeout of 5 seconds. In this example, we treat a timeout as a failure, but you could do some processing and continue to try and send. It is important to useselect()here because if the outgoing buffer is full, then we end up with a tight busy-wait loop that can consume far too many CPU cycles. Instead, we allow our process to sleep until buffer space is available or too much time has lapsed without space becoming available.

FD_ZERO(&fset); FD_SET(sock, &fset); tv.tv_sec = 5; tv.tv_usec = 0; sockStatus = select(sock + 1, NULL, &fset, &fset, &tv); if (sockStatus <= 0) {

return 0; } }

return 1; }

Summary

In this chapter, we looked at the different ways to handle multiple, simultaneous clients. First, we examined how to handle multiple clients in a single server process by using multiplexing. Then, we moved on to multiprocessing servers and the single process per client versus a process pool. Next, we introduced multi threaded servers. Much like multiprocess servers, multithreaded servers can be either a one-thread-per-client or a thread-pooled architecture. Afterward, we looked at an interesting approach used by the Apache Web Server version 2, in which multiprocessing is combined with multiple threads. We closed the chapter by covering how to handle sending and receiving large amounts of data by using nonblocking sockets and theselect()system call.

In this chapter, we looked at the different ways to handle multiple, simultaneous clients. First, we examined how to handle multiple clients in a single server process by using multiplexing. Then, we moved on to multiprocessing servers and the single process per client versus a process pool. Next, we introduced multithreaded servers. Much like multiprocess servers, multithreaded servers can be either a one-thread-per-client or a thread-pooled architecture. Afterward, we looked at an interesting approach used by the Apache Web Server version 2, in which multiprocessing is combined with multiple threads. We closed the chapter by covering how to handle sending and receiving large amounts of data by using nonblocking sockets and theselect()system call.

In the next chapter, we’ll examine what’s involved in implementing a custom protocol.



 
 
>>> More Site Administration Articles          >>> More By Apress Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

SITE ADMINISTRATION ARTICLES

- Coding: Not Just for Developers
- To Support or Not Support IE?
- Administration: Networking OSX and Win 7
- DotNetNuke Gets Social
- Integrating MailChimp with Joomla: Creating ...
- Integrating MailChimp with Joomla: List Mana...
- Integrating MailChimp with Joomla: Building ...
- Integrating MailChimp with Joomla
- More Top WordPress Plugins for Social Media
- Optimizing Security: SSH Public Key Authenti...
- Patches and Rejects in Software Configuratio...
- Configuring a CVS Server
- Managing Code and Teams for Cross-Platform S...
- Software Configuration Management
- Back Up a Joomla Site with Akeeba Backup

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: