What is the benefit of doing it on the server side? First and foremost, it offers security to the scripts and to your tool. Most statistical analysis tools online are handled with JavaScript, which is client-side and prone to data alteration, as in the case of JavaScript injection.

If the scripts are secure, one can ensure that the data collection and analysis are not being tampered with as they are exposed to the public via the Internet. This increases the integrity of the results.

Another benefit of using PHP to do a statistical analysis is to easily share your analysis tool online with your fellow students, engineers or analyst. This is one of the biggest down sides of using MS Excel. It cannot be easily shared online despite its statistical superiority — and if it can be shared (there are third party software programs which can convert an Excel sheet into an equivalent working HTML to process data), it uses JavaScript. As mentioned, JavaScript is prone to injection because of its client-side data validation, and also exposes your computational scripts to the public, which you might not like.

This article discusses how to do statistical analysis using PHP for the most common statistical analyses, such as:

- Accepting numerical data and then computing average, standard deviation, %CV, median, and range, or in general calculating the descriptive statistics.

- Accepting the comparison of two data samples and concluding whether or not they are statistically different (also known as "inferential statistics").

- Estimate the confidence interval of the population. For example, if you are conducting a study of the average hours of sleep for IT professionals, it is more appropriate to show the results as a confidence interval (such as 5 hours ± 1 hour) than a point average (5 hours). This way, you can relate it more accurately to your readers to get an idea of possible maximum and minimum values of the calculated interval.

{mospagebreak title=Computing descriptive statistics using PHP}

"Descriptive statistics," as the name suggests, gives a numerical description of your sample. These descriptions can be the location of the mean (average) and the variability of the sample (measured in standard deviation or % CV).

In PHP, we can execute these calculations using functions. A function is a programming block which is aimed at attaining an objective (such as calculating an average, standard deviation or % coefficient of variation). In general, below is an example web application form written in PHP that can accept numerical data for descriptive statistics analysis:

<html>

<head>

<title>Compute Descriptive Statistics of Numerical Data Using PHP by www.php-developer.org</title>

</head>

<body>

<?php

//Check if the form is submitted

if (!$_POST[‘submit’])

{

//form not submitted, display form

?>

<form action="<?php echo $SERVER[‘PHP_SELF’]; ?>"

method="post">

Compute descriptive statistics such as mean, standard deviation and % CV for the following form.<br />

Copy and paste numerical data for analysis below (one data per line):<br />

<textarea name="figures" rows="50" cols="20"></textarea>

<br />

<input type="submit" name="submit" value="Give me descriptive statistics of this sample numerical data">

</form>

<a href="/descriptivestats.php">Click here to reset or clear this form</a>

<?php

}

else

{

//form submitted, grab the data from POST

$figures =trim($_POST[‘figures’]);

//test if it contains some data.

if (!isset($figures) || trim($figures) == "")

{

//feedback to user that it contains no data

die (‘ERROR: Enter figures. <a href="/descriptivestats.php">Click here to proceed with the analysis</a>’);

}

else

{

//explode data and assign it to an array

$data = explode("n", $figures);

//function to compute statistical mean

function average($data) {

return array_sum($data)/count($data);

}

//function to compute standard deviation

function stdev($data){

$average = average($data);

foreach ($data as $value) {

$variance[] = pow($value-$average,2);

}

$standarddeviation = sqrt((array_sum($variance))/((count($data))-1));

return $standarddeviation;

}

//compute % coefficient of variation

$CV = ((stdev($data))/(average($data))) * 100;

//function to compute median of the datasets

function median($data) {

sort($data);

$arrangements = count($data);

if (($arrangements % 2) == 0) {

$i = $arrangements / 2;

return (($data[$i – 1] + $data[$i]) / 2);

} else {

$i = ($arrangements – 1) / 2;

return $data[$i];

}

}

//function to compute the range

function statisticalrange($data) {

return (max($data) – min($data));

}

//display results to browser

echo ‘<h2>Descriptive Statistics of the Analyzed Sample Data:</h2>';

echo ‘<br />';

echo ‘The mean of the sample is: <b> ‘.round(average($data),4).'</b>';

echo ‘<br />';

echo ‘The standard deviation of the sample is: <b> ‘.round(stdev($data),4).'</b>';

echo ‘<br />';

echo ‘The %coefficient of variation is (data in percent): <b> ‘.round($CV,4).'</b>';

echo ‘<br />';

echo ‘The median of the sample is: <b>’.median($data).'</b>';

echo ‘<br />';

echo ‘The maximum sample is: <b>’.round(max($data),4).'</b>';

echo ‘<br />';

echo ‘The minimum sample is: <b>’.round(min($data),4).'</b>';

echo ‘<br />';

echo ‘The statistical range of the sample is: <b> ‘.round(statisticalrange($data),4).'</b>';

echo ‘<br></br>';

echo ‘Below is the submitted/analyzed data for your reference';

echo ‘<br></br>';

$display = implode("n <br />", $data);

echo $display;

echo ‘<br></br>';

echo ‘<a href="/descriptivestats.php">Click here to do another analysis</a>';

}

}

?>

</body>

</html>

{mospagebreak title=Detailed explanation of the scripts}

Basically what the form will do is check to see if it is submitted:

if (!$_POST[‘submit’])

If not, it will show the form; otherwise, it will start processing the data from a form and assign it to an array variable. This array variable, $data, contains all the data needed for PHP statistical analysis.

This is taken from an HTML web form text area. Using the PHP explode function, every piece of data is unique and distinct when separated line by line.

To simplify calculations, PHP functions are defined for average, standard deviation, median and range.

{mospagebreak title=Performing calculations with PHP}

For the average:

return array_sum($data)/count($data);

The above formula computes the total sum of values contained in the array variable, and then divides by the total amount of data in the array.

The standard deviation function is rare in PHP, and very tricky:

function stdev($data){

$average = average($data);

foreach ($data as $value) {

$variance[] = pow($value-$average,2);

}

$standarddeviation = sqrt((array_sum($variance))/((count($data))-1));

return $standarddeviation;

}

Statistical formula:

There is no built-in PHP function for standard deviation currently supported by a lot of PHP programmers, so a user-defined function is more suitable for doing the computation.

First it gets the average of the data contained in the array, and then it will loop the square of the difference between each piece of data and the array average (this is called statistical variance). Finally, all of the variances are added, and then divided by total number of data minus 1.

The easier approach could be to use average instead of

array_sum($variance)/((count($data))-1)

However, it is not accurate because it is NOT actually the "sample" standard deviation. In statistical literature, there are two types of standard deviation, population and sample. If we use the population standard deviation, we can directly use the average instead of the above parameter; however, most scientific experiments are done with sampling.

If we are able to conduct an analysis, the scripts shown earlier in this article will produce a result like the one below:

It will show the summary statistics along with the data analyzed. To see this in action, you can go to:

http://www.php-developer.org/descriptivestats.php

If you would like to download/copy the script, you can go to:

http://www.php-developer.org/wp-content/uploads/scripts/descriptivestats.txt