HomePHP Page 2 - Performing Inferential Statistical Analysis with PHP
Overview of Statistical Comparison - PHP
Statistics aren't just for Excel spreadsheets anymore. If you've been wondering how to use PHP to help you do a basic statistical analysis, keep reading. In this article you'll learn how to use PHP to help you compare data sets.
In strict statistical literature, all samples must be gathered at random to avoid bias, which can pollute the results. For example, say you are conducting a study of whether your campaign efforts have improved your website's traffic. A suitable comparison could be to take daily traffic from your website at random during one month (say 15 days) before the your campaign, and do the same for one month after you've implemented the actions you're taking for your campaign (also 15 random days). Then you can use inferential statistical techniques to analyze your data and arrive at a conclusion.
One of the most popular of these techniques is called the Student T- test. It is a method of comparing the means of two data samples taken from a random sampling. Theoretically it is defined in T-value as:
T = (average of x – average of y) / (Square root ((Variance of x/Nx) + (Variance of y/Ny)))
It looks complex, doesn't it? This is why we need to find simple ways to automate the comparison of data online. Without PHP, things can get pretty complicated.
A short principle is that once the T-value has been computed, it will be compared with the T-critical value that determines whether the two compared samples are the same, or different.
In the above formula, the x and y variables signify the two samples. Then Nx is the number of data available in sample x, and same idea applies with Ny.
Variance is the square of the standard deviation, based on classic statistical literature.
The basic strategy is to compute the descriptive statistics (average and variance) first. Once these data are available, the T-value can be computed, which will then be compared with a T-critical value.
The T-critical value depends on the percentage of confidence level and degrees of freedom. The percentage of confidence level measures the confidence of the results. A 95% confidence level (industry standard level) means you are allowing 5% error in your results.