HomePHP Performing Descriptive Statistical Analysis with PHP
Performing Descriptive Statistical Analysis with PHP
Have you ever thought of doing statistical analysis using PHP? Well, as simple as this technology can be, it enables an analyst to do server-side scripting commands which accept data from a web form, and then analyze the data in the server using PHP.
What is the benefit of doing it on the server side? First and foremost, it offers security to the scripts and to your tool. Most statistical analysis tools online are handled with JavaScript, which is client-side and prone to data alteration, as in the case of JavaScript injection.
If the scripts are secure, one can ensure that the data collection and analysis are not being tampered with as they are exposed to the public via the Internet. This increases the integrity of the results.
Another benefit of using PHP to do a statistical analysis is to easily share your analysis tool online with your fellow students, engineers or analyst. This is one of the biggest down sides of using MS Excel. It cannot be easily shared online despite its statistical superiority -- and if it can be shared (there are third party software programs which can convert an Excel sheet into an equivalent working HTML to process data), it uses JavaScript. As mentioned, JavaScript is prone to injection because of its client-side data validation, and also exposes your computational scripts to the public, which you might not like.
This article discusses how to do statistical analysis using PHP for the most common statistical analyses, such as:
Accepting numerical data and then computing average, standard deviation, %CV, median, and range, or in general calculating the descriptive statistics.
Accepting the comparison of two data samples and concluding whether or not they are statistically different (also known as "inferential statistics").
Estimate the confidence interval of the population. For example, if you are conducting a study of the average hours of sleep for IT professionals, it is more appropriate to show the results as a confidence interval (such as 5 hours ± 1 hour) than a point average (5 hours). This way, you can relate it more accurately to your readers to get an idea of possible maximum and minimum values of the calculated interval.