Managing Standalone Scripts in PHP

Last week, we began our discussion of PHP standalone scripts. This week, we’ll be talking about child processes, shared resources, signals, and writing daemons. The second of three parts, this article is excerpted from chapter five of the book Advanced PHP Programming, written by George Schlossnagle (Sams; ISBN: 0672325616).

Creating and Managing Child Processes

PHP has no native support for threads, which makes it difficult for developers coming from thread-oriented languages such as Java to write programs that must accomplish multiple tasks simultaneously. All is not lost, though: PHP supports traditional Unix multitasking by allowing a process to spawn child processes via pcntl_fork() (a wrapper around the Unix system call fork()). To enable this function (and all the pcntl_* functions), you must build PHP with the --enable-pcntl flag.

When you call pcntl_fork() in a script, a new process is created, and it continues executing the script from the point of the pcntl_fork() call. The original process also continues execution from that point forward. This means that you then have two copies of the script running—the parent (the original process) and the child (the newly created process).

pcntl_fork() actually returns twice—once in the parent and once in the child. In the parent, the return value is the process ID (PID) of the newly created child, and in the child, the return value is 0. This is how you distinguish the parent from the child.

The following simple script creates a child process:

#!/usr/bin/env php
<?php

if($pid = pcntl_fork()) {
 $my_pid = getmypid();
 print "My pid is $my_pid. pcntl_fork() return $pid,
this is the parentn"; } else { $my_pid = getmypid(); print "My pid is $my_pid. pcntl_fork() returned 0,
this is the childn"; } ?>

Running this script outputs the following:

> ./4.php
My pid is 4286. pcntl_fork() return 4287, this is
the parent My pid is 4287. pcntl_fork() returned 0, this is
the child

Note that the return value of pcntl_fork() does indeed match the PID of the child process. Also, if you run this script multiple times, you will see that sometimes the parent prints first and other times the child prints first. Because they are separate processes, they are both scheduled on the processor in the order in which the operating system sees fit, not based on the parent–child relationship.

{mospagebreak title=Closing Shared Resources}

When you fork a process in the Unix environment, the parent and child processes both have access to any file resources that are open at the time fork() was called. As convenient as this might sound for sharing resources between processes, in general it is not what you want. Because there are no flow-control mechanisms preventing simultaneous access to these resources, resulting I/O will often be interleaved. For file I/O, this will usually result in lines being jumbled together. For complex socket I/O such as with database connections, it will often simply crash the process completely.

Because this corruption happens only when the resources are accessed, simply being strict about when and where they are accessed is sufficient to protect yourself; however, it is much safer and cleaner to simply close any resources you will not be using immediately after a fork.

Sharing Variables

Remember: Forked processes are not threads. The processes created with pcntl_fork() are individual processes, and changes to variables in one process after the fork are not reflected in the others. If you need to have variables shared between processes, you can either use the shared memory extensions to hold variables or use the “tie” trick from Chapter 2, “Object-Oriented Programming Through Design Patterns.”

Cleaning Up After Children

In the Unix environment, a defunct process is one that has exited but whose status has not been collected by its parent process (this is also called reaping the child process). A responsible parent process always reaps its children.

PHP provides two ways of handing child exits:

  • pcntl_wait($status, $options)pcntl_wait() instructs the calling process to suspend execution until any of its children terminates. The PID of the exiting child process is returned, and $status is set to the return status of the function.

  • pcntl_waitpid($pid, $status, $options)pcntl_waitpid() is similar to pcntl_wait(), but it only waits on a particular process specified by $pid. $status contains the same information as it does for pcntl_wait().

For both functions, $options is an optional bit field that can consist of the following two parameters:

  • WNOHANG—Do not wait if the process information is not immediately available.

  • WUNTRACED—Return information about children that stopped due to a SIGTTIN, SIGTTOU, SIGSTP, or SIGSTOP signal. (These signals are normally not caught by waitpid().)

Here is a sample process that starts up a set number of child processes and waits for them to exit:

#!/usr/bin/env php
<?php

define('PROCESS_COUNT', '5');
$children = array();
for($i = 0; $i < PROCESS_COUNT; $i++) {
 if(($pid = pcntl_fork()) == 0) {
  exit(child_main());
 }
 else {
  $children[] = $pid;
 }
}

foreach($children as $pid) {
 $pid = pcntl_wait($status);
 if(pcntl_wifexited($status)) {
  $code = pcntl_wexitstatus($status);
  print "pid $pid returned exit code: $coden";
 }
 else {
  print "$pid was unnaturally terminatedn";
 }
}

function child_main()
{
 $my_pid = getmypid();
 print "Starting child pid: $my_pidn";
 sleep(10);
 return 1;
?>

One aspect of this example worth noting is that the code to be run by the child process is all located in the function child_main(). In this example it only executes sleep(10), but you could change that to more complex logic.

Also, when a child process terminates and the call to pcntl_wait() returns, you can test the status with pcntl_wifexited() to see whether the child terminated because it called exit() or because it died an unnatural death. If the termination was due to the script exiting, you can extract the actual code passed to exit() by calling pcntl_wexitstatus($status). Exit status codes are signed 8-bit numbers, so valid values are between –127 and 127.

Here is the output of the script if it runs uninterrupted:

> ./5.php
Starting child pid 4451
Starting child pid 4452
Starting child pid 4453
Starting child pid 4454
Starting child pid 4455
pid 4453 returned exit code: 1
pid 4452 returned exit code: 1
pid 4451 returned exit code: 1
pid 4454 returned exit code: 1
pid 4455 returned exit code: 1

If instead of letting the script terminate normally, you manually kill one of the children, you get output like this:

> ./5.php
Starting child pid 4459
Starting child pid 4460
Starting child pid 4461
Starting child pid 4462
Starting child pid 4463
4462 was unnaturally terminated
pid 4463 returned exit code: 1
pid 4461 returned exit code: 1
pid 4460 returned exit code: 1
pid 4459 returned exit code: 1

{mospagebreak title=Signals}

Signals send simple instructions to processes. When you use the shell command kill to terminate a process on your system, you are in fact simply sending an interrupt signal (SIGINT). Most signals have a default behavior (for example, the default behavior for SIGINT is to terminate the process), but except for a few exceptions, these signals can be caught and handled in custom ways inside a process.

Some of the most common signals are listed next (the complete list is in the signal(3) man page):

Signal Name

Description

Default Behavior

SIGCHLD

Child termination

Ignore

SIGINT

Interrupt request

Terminate process

SIGKILL

Kill program

Terminate process

SIGHUP

Terminal hangup

Terminate process

SIGUSR1

User defined

Terminate process

SIGUSR2

User defined

Terminate process

SIGALRM

Alarm timeout

Terminate process


To register your own signal handler, you simply define a function like this:

function sig_usr1($signal)
{
 print "SIGUSR1 Caught.n";
}

and then register it with this:

declare(ticks=1);
pcntl_signal(SIGUSR1, "sig_usr1");

Because signals occur at the process level and not inside the PHP virtual machine itself, the engine needs to be instructed to check for signals and run the pcntl callbacks. To allow this to happen, you need to set the execution directive ticks. ticks instructs the engine to run certain callbacks every N statements in the executor. The signal callback is essentially a no-op, so setting declare(ticks=1) instructs the engine to look for signals on every statement executed.

The following sections describe the two most useful signal handlers for multiprocess scripts—SIGCHLD and SIGALRM—as well as other common signals.

SIGCHLD

SIGCHLD is a common signal handler that you set in applications where you fork a number of children. In the examples in the preceding section, the parent has to loop on pcntl_wait() or pcntl_waitpid() to ensure that all children are collected on. Signals provide a way for the child process termination event to notify the parent process that children need to be collected. That way, the parent process can execute its own logic instead of just spinning while waiting to collect children.

To implement this sort of setup, you first need to define a callback to handle SIGCHLD events. Here is a simple example that removes the PID from the global $children array and prints some debugging information on what it is doing:

function sig_child($signal) 
{
 global $children;
 pcntl_signal(SIGCHLD, "sig_child");
 fputs(STDERR, "Caught SIGCHLDn");
 while(($pid = pcntl_wait($status, WNOHANG)) > 0) {
  $children = array_diff($children, array($pid));
  fputs(STDERR, "Collected pid $pidn");
 }
}

The SIGCHLD signal does not give any information on which child process has terminated, so you need to call pcntl_wait() internally to find the terminated processes. In fact, because multiple processes may terminate while the signal handler is being called, you must loop on pcntl_wait() until no terminated processes are remaining, to guarantee that they are all collected. Because the option WNOHANG is used, this call will not block in the parent process.

Most modern signal facilities restore a signal handler after it is called, but for portability to older systems, you should always reinstate the signal handler manually inside the call.

When you add a SIGCHLD handler to the earlier example, it looks like this:

#!/usr/bin/env php
<?php

declare(ticks=1);
pcntl_signal(SIGCHLD, "sig_child");

define('PROCESS_COUNT', '5');
$children = array();

for($i = 0; $i < PROCESS_COUNT; $i++) {
 if(($pid = pcntl_fork()) == 0) {
  exit(child_main());
 }
 else {
  $children[] = $pid;
 } 
}

while($children) {
 sleep(10); // or perform parent logic
}
pcntl_alarm(0);

function child_main() 
{
 sleep(rand(0, 10)); // or perform child logic
 return 1;
}

function sig_child($signal) 
{
 global $children;
 pcntl_signal(SIGCHLD, "sig_child");
 fputs(STDERR, "Caught SIGCHLDn");
 while(($pid = pcntl_wait($status, WNOHANG)) > 0) {
  $children = array_diff($children, array($pid));
  if(!pcntl_wifexited($status)) {
   fputs(STDERR, "Collected killed pid $pidn");
  }
  else {
   fputs(STDERR, "Collected exited pid $pidn");
  }
 }
}
?>

Running this yields the following output:

> ./8.php
Caught SIGCHLD
Collected exited pid 5000
Caught SIGCHLD
Collected exited pid 5003
Caught SIGCHLD
Collected exited pid 5001
Caught SIGCHLD
Collected exited pid 5002
Caught SIGCHLD
Collected exited pid 5004

SIGALRM

Another useful signal is SIGALRM, the alarm signal. Alarms allow you to bail out of tasks if they are taking too long to complete. To use an alarm, you define a signal handler, register it, and then call pcntl_alarm() to set the timeout. When the specified timeout is reached, a SIGALRM signal is sent to the process.

Here is a signal handler that loops through all the PIDs remaining in $children and sends them a SIGINT signal (the same as the Unix shell command kill):

function sig_alarm($signal)
{
 global $children;
 fputs(STDERR, "Caught SIGALRMn");
 foreach ($children as $pid) {
  posix_kill($pid, SIGINT);
 }
}

Note the use of posix_kill(). posix_kill() signals the specified process with the given signal.

You also need to register the sig_alarm() SIGALRM handler (alongside the SIGCHLD handler) and change the main block as follows:

declare(ticks=1);
pcntl_signal(SIGCHLD, "sig_child");
pcntl_signal(SIGALRM, "sig_alarm");

define('PROCESS_COUNT', '5');
$children = array();

pcntl_alarm(5);
for($i = 0; $i < PROCESS_COUNT; $i++) {
 if(($pid = pcntl_fork()) == 0) {
  exit(child_main());
 }
 else {
  $children[] = $pid;
 }
}

while($children) {
 sleep(10); // or perform parent logic
}
pcntl_alarm(0);

It is important to remember to set the alarm timeout to 0 when it is no longer needed; otherwise, it will fire when you do not expect it. Running the script with these modifications yields the following output:

> ./9.php
Caught SIGCHLD
Collected exited pid 5011
Caught SIGCHLD
Collected exited pid 5013
Caught SIGALRM
Caught SIGCHLD
Collected killed pid 5014
Collected killed pid 5012
Collected killed pid 5010

In this example, the parent process uses the alarm to clean up (via termination) any child processes that have taken too long to execute.

Other Common Signals

Other common signals you might want to install handlers for are SIGHUP, SIGUSR1, and SIGUSR2. The default behavior for a process when receiving any of these signals is to terminate. SIGHUP is the signal sent at terminal disconnection (when the shell exits). A typical process in the background in your shell terminates when you log out of your terminal session.

If you simply want to ignore these signals, you can instruct a script to ignore them by using the following code:

pcntl_signal(SIGHUP, SIGIGN);

Rather than ignore these three signals, it is common practice to use them to send simple commands to processes—for instance, to reread a configuration file, reopen a logfile, or dump some status information.

{mospagebreak title=Writing Daemons}

A daemon is a process that runs in the background, which means that once it is started, it takes no input from the user’s terminal and does not exit when the user’s session ends. Once started, daemons traditionally run forever (or until stopped) to perform recurrent tasks or to handle tasks that might last beyond the length of the user’s session. The Apache Web server, sendmail, and the cron daemon crond are examples of common daemons that may be running on your system. Daemonizing scripts is useful for handling long jobs and recurrent back-end tasks.

To successfully be daemonized, a process needs to complete the two following tasks:

  • Process detachment

  • Process independence

In addition, a well-written daemon may optionally perform the following:

  • Setting its working directory

  • Dropping privileges

  • Guaranteeing exclusivity

You learned about process detachment earlier in this chapter, in the section “Creating and Managing Child Processes.” The logic is the same as for daemonizing processes, except that you want to end the parent process so that the only running process is detached from the shell. To do this, you execute pnctl_fork() and exit if you are in the parent process (that is, if the return value is greater than zero).

In Unix systems, processes are associated with process groups, so if you kill the leader of a process group, all its associates will terminate as well. The parent process for everything you start in your shell is your shell’s process. Thus, if you create a new process with fork() and do nothing else, the process will still exit when you close the shell. To avoid having this happen, you need the forked process to disassociate itself from its parent process. This is accomplished by calling pcntl_setsid(), which makes the calling process the leader of its own process group.

Finally, to sever any ties between the parent and the child, you need to fork the process a second time. This completes the detachment process. In code, this detachment process looks like this:

if(pcntl_fork()) {
 exit;
}
pcntl_setsid();
if(pcntl_fork()) {
 exit;
}
# process is now completely daemonized

It is important for the parent to exit after both calls to pcntl_fork(); otherwise, multiple processes will be executing the same code.

Please check back next week for the conclusion of this article.

[gp-comments width="770" linklove="off" ]

chat