HomePHP Page 3 - Implement Bayesian inference using PHP, Part 1
Learning from experience - PHP
Have you ever wanted to build an intelligent Web application? Paul Meagher shows how to do it using conditional probability. (This intermediate-level article was first published by IBM developerWorks, March 16, 2004, at http://www.ibm.com/developerWorks).
To appreciate how the getConditionalProbabiltity function might be used in practice, consider a doctor confronted with the problem of determining whether a patient has cancer given that the patient tested positive on some cancer test. The test could be something as simple as a "yes" or "no" answer to a question (such as, were you ever exposed to high levels of radiation?) or it could be the result of a physical examination of the patient.
To compute the conditional probability of cancer given a positive test result, the doctor might tally the number of past cases where cancer and a positive test result occurred together and divide by the overall number of positive test results. The following code computes this probability based on a total of four past cases where this co-variation information was collected -- perhaps from the doctor's personal experiences with this particular cancer test.
Listing 2. Computing a conditional probability using getConditionalProbabiltity
/** * The elements of the $Data array use this coding convention: * * +cancer - patient has cancer * -cancer - patient does not have cancer * +test - patient tested positive on cancer test * -test - patient tested negative on cancer test */
// compute the conditional probability of having cancer given 1) // a positive test and 2) a sample of covariation data $probability = getConditionalProbabilty($A, $B, $Data);
echo "P($A|$B) = $probability";
// P(+cancer|+test) = 0.66666666666667
As you can see, the probability of having cancer given:
A positive test result
The data collected to date is estimated at 67 percent. In other words, in the next 100 cases where a patient tests positive, the best point estimate is that in 67 of those cases, the patient will actually have cancer. The doctor will need to weight this probability along with other information to arrive at a final diagnosis if one is warranted.
I can summarize what has been demonstrated here in more radical terms as follows:
An agent that derives a conditional probability estimate using the enumeration method appears to learn from experience and will provide an optimal estimate of the true conditional probability if it has enough representative data to draw upon.
If I replace the hypothetical doctor with a software agent implementing the enumeration algorithm above and being fed a steady diet of the case data, I might expect the agent's conditional probability estimates to become increasingly more reliable and accurate. I might say that such an agent is capable of "learning from experience."
If this is so, perhaps I want to ask what the relationship is between this simple enumeration technique for computing a conditional probability and more legitimate examples of "learning from experience," such as the semi-automated classification of spam using Bayes methods. In the next section, I will show a simple spam filter can be constructed using the enumerative power of a database.