HomePHP Page 4 - Implementing Bayesian Inference Using PHP: Part 2
Computing the MLE - PHP
While the first article in this series discussed building intelligent Web applications through conditional probability, this Bayesian inference article examines how you can use Bayes methods to solve parameter estimation problems. Relevant concepts are explained in the context of Web survey analysis using PHP and JPGraph. (This intermediate-level article was first published by IBM developerWorks on April 12, 2004 at http://www.ibm.com/developerWorks).
In the context of surveys with questions having only binary response options, you can model the distribution of responses as a binomial random variable: a variable that can only take on one of two values. Given this probability distribution model, one of the parameters you want to estimate using your survey data is the probability of success for a given question where success (denoted as p) can be defined as the probability that participants will give a 1-coded response. The letter q can be used to denote "failure" (a 0-coded response) and is a probability value equal to 1 - p.
To see how to compute an MLE of p, imagine that you have the following survey data to base your estimate of p on (the probability of responding "yes"):
Table 3. Survey data basis of estimate of p
participant
q1
1
0
2
1
3
0
4
0
5
0
To estimate p for question 1 using maximum likelihood techniques, you need to try out various values of p to see which one maximizes the conditional probability of the observed results R:
P(R | pi)
The results R can be summarized as the proportion of successes k observed among n sample items k/n. From the table above, one in five (or 20 percent) of the participants responded with a 1-coded response. Therefore, to compute the MLE of p, you try out various values of p and see which one maximizes the conditional probability of k/n:
MLE = max( P(k/n | pi) )
Given that the distribution of question scores can be modeled as a binomial random variable, you can use the binomial distribution function to calculate the probability of the observed results. The binomial distribution function returns the likelihood of an event occurring $k times in $n attempts, in which the probability of success on a single attempt is $p:
$likelihood = binomial($n, $k, $p);
The equation for computing the binomial probability of a particular result k/n given a particular value of p is:
P(k/n | pi) = nCk * (p)k * (1 - p) (n - k)
A binomial probability is a product of three terms:
The number of ways of selecting k items from n items: nCk. T
The probability of success raised to an exponent equal to the number of success events involved in the outcome: pk.
The probability of failure raised to an exponent equal to the number of failure events involved in the outcome: 1 - p (n - k).
The code for computing a binomial probability looks like this:
Listing 1. Computing a binomial probability
<?php
# Probability functions ported from Mastering Algorithms # With Perl by Macdonald, Orwant, and Hietaniemi
# choose($n, $k) is the number of ways to choose $k # elements from a set of $n elements, when the order # of selection is irrelevant.
function binomial($n, $k, $p) { if ($p == 0) { if ($k == 0) return 1; else return 0; } if ($p == 1) { if ($k == $n) return 1; else return 0; } return choose($n, $k) * pow($p, $k) * pow(1-$p, $n-$k); }
?>
So to determine the value of $p that maximizes the observed results, you can simply construct a loop where you keep the values of $n and $k fixed on each iteration but vary the probability of success $p for a particular trial (as in the following listing):
Listing 2. Constructing a loop to determine the value of $p