Implementing with PHP: Standalone Scripts

If you’ve ever been interested in making significant use of PHP outside of a web environment, this article will show you how. The first of three parts, it is excerpted from chapter five of the book Advanced PHP Programming, written by George Schlossnagle (Sams; ISBN: 0672325616).

This chapter describes how to reuse existing code libraries to perform administrative tasks in PHP and how to write standalone and one-liner scripts. It gives a couple extremely paradigm-breaking projects that put PHP to use outside the Web environment.

For me, one of the most exciting aspects of participating in the development of PHP has been watching the language grow from the simple Web-scripting-specific language of the PHP 3 (and earlier) days into a more robust and versatile language that also excels at Web scripting.

There are benefits to being an extremely specialized language:

  • It is easy to be the perfect tool for a given job if you were written specifically to do that job.

  • It is easier to take over a niche than to compete with other, more mature, general-purpose languages.

On the other hand, there are also drawbacks to being an extremely specialized language:

  • Companies rarely focus on a single niche to the exclusion of all others. For example, even Web-centric companies have back-end and systems scripting requirements.

  • Satisfying a variety of needs with specialist languages requires developers to master more than one language.

  • Common code gets duplicated in every language used.

As a Web professional, I see these drawbacks as serious problems. Duplicated code means that bugs need to be fixed in more than one place (and worse, in more than one language), which equates with a higher overall bug rate and a tendency for bugs to live on in lesser-used portions of the code base. Actively developing in a number of languages means that instead of developers becoming experts in a single language, they must know multiple languages. This makes it increasingly hard to have really good programmers, as their focus is split between multiple languages. Alternatively, some companies tackle the problem by having separate programmer groups handle separate business areas. Although that can be effective, it does not solve the code-reuse problem, it is expensive, and it decreases the agility of the business.


Pragmatism - In their excellent book The Pragmatic Programmer: From Journeyman to Master, David Thomas and Andrew Hunt suggest that all professional programmers learn (at least) one new language per year. I agree whole-heartedly with this advice, but I often see it applied poorly. Many companies have a highly schizophrenic code base, with different applications written in different languages because the developer who was writing them was learning language X at the time and thought it would be a good place to hone his skills. This is especially true when a lead developer at the company is particularly smart or driven and is able to juggle multiple languages with relative ease.

This is not pragmatic.

The problem is that although you might be smart enough to handle Python, Perl, PHP, Ruby, Java, C++, and C# at the same time, many of the people who will be working on the code base will not be able to handle this. You will end up with tons of repeated code. For instance, you will almost certainly have the same basic database access library rewritten in each language. If you are lucky and have foresight, all the libraries will at least have the same API. If not, they will all be slightly different, and you will experience tons of bugs as developers code to the Python API in PHP.

Learning new languages is a good thing. I try hard to take Thomas and Hunt’s advice. Learning languages is important because it expands your horizons, keeps your skills current, and exposes you to new ideas. Bring the techniques and insights you get from your studies with you to work, but be gentle about bringing the actual languages to your job.


In my experience, the ideal language is the one that has a specialist-like affinity for the major focus of your projects but is general enough to handle the peripheral tasks that arise. For most Web-programming needs, PHP fills that role quite nicely. The PHP development model has remained close to its Web-scripting roots. For ease of use and fit to the “Web problem,” it still remains without parallel (as evidenced by its continually rising adoption rate). PHP has also adapted to fill the needs of more general problems as well. Starting in PHP 4 and continuing into PHP 5, PHP has become aptly suited to a number of non-Web-programming needs as well.

Is PHP the best language for scripting back-end tasks? If you have a large API that drives many of your business processes, the ability to merge and reuse code from your Web environment is incredibly valuable. This value might easily outweigh the fact that Perl and Python are more mature back-end scripting languages.

{mospagebreak title=Introduction to the PHP Command-Line Interface (CLI)}

If you built PHP with --enable-cli, a binary called php is installed into the binaries directory of the installation path. By default this is /usr/local/bin. To prevent having to specify the full path of php every time you run it, this directory should be in your PATH environment variable. To execute a PHP script phpscript.php from the command line on a Unix system, you can type this:

> php phpscript.php

Alternatively, you can add the following line to the top of your script:

#!/usr/bin/env php

and then mark the script as executable with chmod, as follows:

> chmod u+rx phpscript.php

Now you can run phpscript.php as follows:

> ./phpscript.php

This #! syntax is known as a “she-bang,” and using it is the standard way of making script executables on Unix systems.

On Windows systems, your registry will be modified to associate .php scripts with the php executable so that when you click on them, they will be parsed and run. However, because PHP has a wider deployment on Unix systems (mainly for security, cost, and performance reasons) than on Windows systems, this book uses Unix examples exclusively.

Except for the way they handle input, PHP command-line scripts behave very much like their Web-based brethren.

{mospagebreak title=Handling Input/Output (I/O)}

A central aspect of the Unix design philosophy is that a number of small and independent programs can be chained together to perform complicated tasks. This chaining is traditionally accomplished by having a program read input from the terminal and send its output back to the terminal. The Unix environment provides three special file handles that can be used to send and receive data between an application and the invoking user’s terminal (also known as a tty):

  • stdin—Pronounced “standard in” or “standard input,” standard input captures any data that is input through the terminal.

  • stdout—Pronounced “standard out” or “standard output,” standard output goes directly to your screen (and if you are redirecting the output to another program, it is received on its stdin). When you use print or echo in the PHP CGI or CLI, the data is sent to stdout.

  • stderr—Pronounced “standard error,” this is also directed to the user’s terminal, but over a different file handle than stdin. stderr generated by a program will not be read into another application’s stdin file handle without the use of output redirection. (See the man page for your terminal shell to see how to do this; it’s different for each one.)

In the PHP CLI, the special file handles can be accessed by using the following constants:

  • STDIN

  • STDOUT

  • STDERR

Using these constants is identical to opening the streams manually. (If you are running the PHP CGI version, you need to do this manually.) You explicitly open those streams as follows:

$stdin = fopen("php://stdin", "r");
$stdout = fopen("php://stdout", "w");
$stderr = fopen("php://stderr", "w");

Why Use STDOUT? - Although it might seem pointless to use STDOUT as a file handle when you can directly print by using print/echo, it is actually quite convenient. STDOUT allows you to write output functions that simply take stream resources, so that you can easily switch between sending your output to the user’s terminal, to a remote server via an HTTP stream, or to anywhere via any other output stream.

The downside is that you cannot take advantage of PHP’s output filters or output buffering, but you can register your own streams filters via streams_filter_register().


Here is a quick script that reads in a file on stdin, numbers each line, and outputs the result to stdout:

#!/usr/bin/env php
<?php

$lineno = 1;
while(($line = fgets(STDIN)) != false) {
    fputs(STDOUT, "$lineno $line");
    $lineno++;
}
?>

When you run this script on itself, you get the following output:

1 #!/usr/bin/env php
2 <?php
3 
4 $lineno = 1;
5 while(($line = fgets(STDIN)) != false) {
6    fputs(STDOUT, "$lineno $line");
7    $lineno++;
8 }
9 ?>

stderr is convenient to use for error notifications and debugging because it will not be read in by a receiving program’s stdin. The following is a program that reads in an Apache combined-format log and reports on the number of unique IP addresses and browser types seen in the file:

<?php
$counts = array('ip' => array(), 'user_agent' =>
array()); while(($line = fgets(STDIN)) != false) { # This regex matches a combined log format line
field-by-field. $regex = '/^(S+) (S+) (S+)
[([^:]+):(d+:d+:d+) ([^]]+)] '. '"(S+) (.*?) (S+)" (S+) (S+) "([^"]*)"
"([^"]*)"$/'; preg_match($regex,$line,$matches); list(, $ip, $ident_name, $remote_user, $date,
$time, $gmt_off, $method, $url, $protocol, $code, $bytes, $referrer, $user_agent) = $matches; $counts['ip']["$ip"]++; $counts['user_agent']["$user_agent"]++; # Print a '.' to STDERR every thousand lines
processed. if(($lineno++ % 1000) == 0) { fwrite(STDERR, "."); } } arsort($counts['ip'], SORT_NUMERIC); reset($counts['ip']); arsort($counts['user_agent'], SORT_NUMERIC); reset($counts['user_agent']); foreach(array('ip', 'user_agent') as $field) { $i = 0; print "Top number of requests by $fieldn"; print "--------------------------------n"; foreach($counts[$field] as $k => $v) { print "$vtt$kn"; if($i++ == 10) { break; } } print "nn"; } ?>

The script works by reading in a logfile on STDIN and matching each line against $regex to extract individual fields. The script then computes summary statistics, counting the number of requests per unique IP address and per unique Web server user agent. Because combined-format logfiles are large, you can output a . to stderr every 1,000 lines to reflect the parsing progress. If the output of the script is redirected to a file, the end report will appear in the file, but the .‘s will only appear on the user’s screen.

{mospagebreak title=Parsing Command-Line Arguments}

When you are running a PHP script on the command line, you obviously can’t pass arguments via $_GET and $_POST variables (the CLI has no concept of these Web protocols). Instead, you pass in arguments on the command line. Command-line arguments can be read in raw from the $argv autoglobal.

The following script:

#!/usr/bin/env php
<?php
 print_r($argv);
?>

when run as this:

> ./dump_argv.php foo bar barbara

gives the following output:

Array
(
  [0] => dump_argv.php
  [1] => foo
  [2] => bar
  [3] => barbara
)

Notice that $argv[0] is the name of the running script.

Taking configuration directly from $argv can be frustrating because it requires you to put your options in a specific order. A more robust option than parsing options by hand is to use PEAR’s Console_Getopt package. Console_Getopt provides an easy interface to use to break up command-line options into an easy-to-manage array. In addition to simple parsing, Console_Getopt handles both long and short options and provides basic validation to ensure that the options passed are in the correct format.

Console_Getopt works by being given format strings for the arguments you expect. Two forms of options can be passed: short options and long options.

Short options are single-letter options with optional data. The format specifier for the short options is a string of allowed tokens. Option letters can be followed with a single : to indicate that the option requires a parameter or with a double :: to indicate that the parameter is optional.

Long options are an array of full-word options (for example, –help).The option strings can be followed by a single = to indicate that the option takes a parameter or by a double == if the parameter is optional.

For example, for a script to accept the -h and --help flags with no options, and for the --file option with a mandatory parameter, you would use the following code:

require_once "Console/Getopt.php";

$shortoptions = "h";
$longoptons = array("file=", "help");

$con = new Console_Getopt;
$args = Console_Getopt::readPHPArgv();
$ret = $con->getopt($args, $shortoptions,
$longoptions);

The return value of getopt() is an array containing a two-dimensional array. The first inner array contains the short option arguments, and the second contains the long option arguments. Console_Getopt::readPHPARGV() is a cross-configuration way of bringing in $argv (for instance, if you have register_argc_argv set to off in your php.ini file).

I find the normal output of getopt() to be a bit obtuse. I prefer to have my options presented as a single associative array of key/value pairs, with the option symbol as the key and the option value as the array value. The following block of code uses Console_Getopt to achieve this effect:

function getOptions($default_opt, $shortoptions,
$longoptions) { require_once "Console/Getopt.php"; $con = new Console_Getopt; $args = Console_Getopt::readPHPArgv(); $ret = $con->getopt($args, $shortoptions,
$longoptions); $opts = array(); foreach($ret[0] as $arr) { $rhs = ($arr[1] !== null)?$arr[1]:true; if(array_key_exists($arr[0], $opts)) { if(is_array($opts[$arr[0]])) { $opts[$arr[0]][] = $rhs; } else { $opts[$arr[0]] = array($opts[$arr[0]], $rhs); } } else { $opts[$arr[0]] = $rhs; } } if(is_array($default_opt)) { foreach ($default_opt as $k => $v) { if(!array_key_exists($k, $opts)) { $opts[$k] = $v; } } } return $opts; }

If an argument flag is passed multiple times, the value for that flag will be an array of all the values set, and if a flag is passed without an argument, it is assigned the Boolean value true. Note that this function also accepts a default parameter list that will be used if no other options match.

Using this function, you can recast the help example as follows:

$shortoptions = "h";
$longoptions = array("file=", "help");

$ret = getOptions(null, $shortoptions,
$longoptions);

If this is run with the parameters -h --file=error.log, $ret will have the following structure:

Array
(
  [h] => 1
  [--file] => error.log
)

Please check back next week for the continuation of this article.

[gp-comments width="770" linklove="off" ]

antalya escort bayan antalya escort bayan Antalya escort diyarbakir escort