Implement Bayesian inference using PHP, Part 1

Have you ever wanted to build an intelligent Web application? Paul Meagher shows how to do it using conditional probability. (This intermediate-level article was first published by IBM developerWorks, March 16, 2004, at http://www.ibm.com/developerWorks).


Support File Available Here.


Conditional probability — the probability of observing one event as a result of having observed another event — is a potentially important factor in designing intelligent Web applications. Paul Meagher introduces Bayesian inference by discussing the basic mathematical concepts involved and demonstrating how to implement the underlying conditional probability calculations using PHP. In this article, the author discusses how Bayesian inference can be used to build an online PHP-based wizard that guides a user through the process making a medical diagnosis. This three-part series features interesting applications designed to help you appreciate the power and potential of Bayesian inference concepts.

If you examine current artificial intelligence, statistics, and data-mining journals, books, and conferences, you will notice that Bayesian inference techniques are being applied to increasingly complex problems in a growing number of application areas. Many Web developers, however, lack a constructive understanding of Bayesian inference and this prevents them from utilizing these techniques in their own software-development practice. This article (the first of three) aims to remedy that situation.

For a Web developer, a constructive understanding of Bayesian inference means that you are able to see how Bayesian inference can be applied to the Web development problems you are facing. To achieve this level of understanding, a developer cannot be content to simply study a few examples of how the relevant math formulas work; he must also:

  1. See how these math formulas might be distilled into software routines

  2. Determine how these mathematical software routines can be integrated into interesting Web applications

In mathematics, we are usually concerned with declarative (what is) descriptions, whereas in computer science we are usually concerned with imperative (how to) descriptions. — Abelson, Sussman, and Sussman in Structure and Interpretation of Computer Programs, p.22

Bayesian inference techniques have been widely used in developing various types of Artifical Intelligence (AI) systems (for instance for text retrieval, classification, medical diagnosis, data mining, troubleshooting, and more), so this article series will be of interest to anyone interested in building intelligent Web applications.

In this article, I will introduce some of the basic mathematical concepts and notations you need to know in order to appreciate Bayesian inference. I will also demonstrate how you can implement conditional probability and Bayes theorem calculations using PHP, and how these calculations can be used to build an online medical diagnosis wizard.

In the next two articles, I will also explore the application of Bayesian inference to the design and analysis of Web surveys (see Resources). This article lays some of the groundwork necessary for understanding this advanced application of Bayesian inference concepts.

The first and most important piece of groundwork to mention concerns the concept of conditional probability.

{mospagebreak title=Conditional probability}

A conditional probability refers to the probability of observing an event A given that you have observed a separate event B. The mathematical shorthand for expressing this idea is:

P(A | B)

Imagine that A refers to “customer buys product A” and B refers to “customer buys product B”. P(A | B) would then read as the “probability that a customer will buy product A given that they have bought product B.” If A tends to occur when B occurs, then knowing that B has occurred allows you to assign a higher probability to A’s occurrence than in a situation in which you did not know that B occurred.

More generally, if A and B systematically co-vary in some way, then P(A | B) will not be equal to P(A). Conversely, if A and B are independent events, then P(A | B) would be expected to equal P(A).

The need to compute a conditional probability thus arises any time you think the occurence of some event has a bearing on the probability of another event’s occurring.

The most basic and intuitive method for computing P(A | B) is the set enumeration method. Using this method, P(A | B) can be computed by counting the number of times A and B occur together {A & B} and dividing by the number of times B occurs {B}:

P(A | B) = {A & B} / {B}

If you observe that 12 customers to date bought product B and of those 12, 10 also bought product A, then P(A | B) would be estimated at 10/12 or 0.833. In other words, the probability of a customer buying product A given that they have purchased product B can be estimated at 83 percent by using a method that involves enumerating relative frequencies of A and B events from the data gathered to date.

You can compute a conditional probability using the set enumeration method with the following PHP code:

Listing 1. Computing conditional probability using set enumeration

<?php

/**
* Returns conditional probability of $A given $B and $Data.
* $Data is an indexed array.  Each element of the $Data array
* consists of an A measurement and B measurment on a sample
* item.
*/
function getConditionalProbabilty($A, $B, $Data) {
  $NumAB   = 0;
  $NumB    = 0;
  $NumData = count($Data);
  for ($i=0; $i < $NumData; $i++) {
    if (in_array($B, $Data[$i])) {
      $NumB++;
      if (in_array($A, $Data[$i])) {
        $NumAB++;
      }
    }
  }
  return $NumAB / $NumB;
}

?>

{mospagebreak title=Learning from experience}

To appreciate how the getConditionalProbabiltity function might be used in practice, consider a doctor confronted with the problem of determining whether a patient has cancer given that the patient tested positive on some cancer test. The test could be something as simple as a “yes” or “no” answer to a question (such as, were you ever exposed to high levels of radiation?) or it could be the result of a physical examination of the patient.

To compute the conditional probability of cancer given a positive test result, the doctor might tally the number of past cases where cancer and a positive test result occurred together and divide by the overall number of positive test results. The following code computes this probability based on a total of four past cases where this co-variation information was collected — perhaps from the doctor’s personal experiences with this particular cancer test.

Listing 2. Computing a conditional probability using getConditionalProbabiltity

<?php

require “getConditionalProbability.php”;

/**
* The elements of the $Data array use this coding convention:
*
* +cancer – patient has cancer
* -cancer – patient does not have cancer
* +test   – patient tested positive on cancer test
* -test   – patient tested negative on cancer test
*/

$Data[0] = array(“+cancer”, “+test”);
$Data[1] = array(“-cancer”, “-test”);
$Data[2] = array(“+cancer”, “+test”);
$Data[3] = array(“-cancer”, “+test”);

// specify query variable $A and conditioning variable $B
$A = “+cancer”;
$B = “+test”;

// compute the conditional probability of having cancer given 1)
// a positive test and 2) a sample of covariation data
$probability = getConditionalProbabilty($A, $B, $Data);

echo “P($A|$B) = $probability”;

// P(+cancer|+test) = 0.66666666666667

?>

As you can see, the probability of having cancer given:

  1. A positive test result

  2. The data collected to date is estimated at 67 percent. In other words, in the next 100 cases where a patient tests positive, the best point estimate is that in 67 of those cases, the patient will actually have cancer. The doctor will need to weight this probability along with other information to arrive at a final diagnosis if one is warranted.

I can summarize what has been demonstrated here in more radical terms as follows:

An agent that derives a conditional probability estimate using the enumeration method appears to learn from experience and will provide an optimal estimate of the true conditional probability if it has enough representative data to draw upon.

If I replace the hypothetical doctor with a software agent implementing the enumeration algorithm above and being fed a steady diet of the case data, I might expect the agent’s conditional probability estimates to become increasingly more reliable and accurate. I might say that such an agent is capable of “learning from experience.”

If this is so, perhaps I want to ask what the relationship is between this simple enumeration technique for computing a conditional probability and more legitimate examples of “learning from experience,” such as the semi-automated classification of spam using Bayes methods. In the next section, I will show a simple spam filter can be constructed using the enumerative power of a database.

{mospagebreak title=Conditional probability and SQL}

P(A | B) can be mapped onto database-query operations. For example, the probability of cancer given a positive test result, P(+cancer | +test), can be obtained by issuing this SQL query then doing some tallies on the result set like this:

SELECT cancer_status FROM Data WHERE test_status=’+test’

If I gather information about how several boolean-valued tests co-vary with a boolean-valued diagnosis (like that of cancer or not cancer), then I can perform slightly more complex queries to study how diagnostically useful other factors are in determining whether a patient has cancer, such as in the following:

SELECT cancer_status
FROM Data
WHERE genetic_status=’+’
AND age_status=’+’
AND biopsy_status=’+’

In the case of detecting e-mail spam, I might be interested in computing P(+spam | title_word=’viagra’ AND title_word=’free’), which could be viewed as a directive to issue the following SQL query:

SELECT spam_status FROM Emails WHERE email_title LIKE ‘viagra’
     AND email_title LIKE ‘free’

After enumerating the number of e-mails that are spam and have “viagra” and “free” in the title (like so):

count_emails(spam_status=’+spam’ AND email_title LIKE ‘viagra’
     AND email_title LIKE ‘free’)

and dividing by the overall number of e-mails with the words “viagra” and “free” in the title:

count_emails(email_title LIKE ‘viagra’ AND email_title LIKE ‘free’)

I might arrive at the conclusion that the appearence of these words in the title strongly and specifically co-varies with the message being spam (after all, 18/18 = 100 percent) and this rule might be used to automatically filter such messages.

In Bayes spam filtering, you need to initially train the software in which e-mails are spam and which are not. One can imagine storing spam_status information with each e-mail record (for example, email_id, spam_status, email_title, or email_message) and doing the previous queries and counts on this data to decide whether to forward a new e-mail into your inbox.

{mospagebreak title=Frequency versus probability format}

The getConditionalProbability function you’ve developed operates on counts and frequencies rather than on probabilities. In reading the literature on Bayesian reasoning, you will notice that the enumeration method for computing P(A | B) is only briefly discussed. Most authors quickly move onto describing how P(A | B) can be formulated using terms denoting probability values rather than frequency counts. For example, you can recast the formula for computing P(A | B) using such probability terms as:

P(A | B) = P(A & B) / P(B)

The advantage of recasting the formula using terms denoting probabilities instead of frequency counts arises because in practice, you often don’t have access to a data set we can use to derive conditional probability estimates through an enumeration of cases method. Instead, you often have access to higher-level summary information from past studies in the form of percentages and probabilities. With the available information, the challenge then becomes finding a way to use these probability estimates instead to compute the conditional probabilities you are interested in. Recasting the conditional probability formula in terms of probabilities allows you to make inferences based on related probability information that is more readily accessible.

The enumeration method might still be regarded as the most basic and intuitive method for computing a conditional probability. In Thomas Bayes’ “Essay on the Doctrine of Chances,” he uses enumeration to arrive at the conclusion that P( 2nd Event = b | 1st Event = a ) is equal to [P / N] / [ a / N], which is equal to P / a, which one can also denote as {a & b} / {a}:

Figure 1. Graphical representation of relations

Another reason why it is important to be aware of frequency versus probability format issues is because it has been demonstrated by Gerd Gigerenzer (and others) that people are better at reasoning in accordance with prescriptive Bayesian rules of inference when background information is presented in terms of frequencies of cases (1 in 10 cases) rather than probabilities (10 percent probability). A practical application of this research is that medical students are now being taught to communicate risk information in terms of frequencies of cases instead of probabilities, making it easier for patients to make better informed judgements about what actions are warranted given the test results.

Joint probability

The most basic method for computing a conditional probability using a probability format is:

P(A | B) = P(A & B) / P(B)

This probability format is identical to the frequency format, except for the probability operator P( ) surrounding the numerator and denominator terms. The P(A & B) term denotes the joint probability of A and B occurring together. To understand how the joint probability P(A & B) can be computed from cross-tabulated data, consider the following hypothetical data (taken from pp. 147-48 of Grimstead and Snell’s online texbook):

-Smokes +Smokes Totals
-Cancer 40 10 15
+Cancer 7 3 10
Totals 47 13 60

To convert this table of frequencies to a table of probabilities, you divide each cell frequency by the total frequency (60). Note that dividing by the total frequency also ensures that Cancer x Smokes cell probabilies sum to 1 and permits you to refer to the silver area of the table below as the joint probability distribution of Cancer and Smoking.

-Smokes +Smokes Totals
-Cancer 40/60 10/60 50/60
+Cancer 7/60 3/60 10/60
Totals 47/60 13/60 60/60

To compute the probability of cancer given that a person smokes P(+Cancer | +Smokes), you can simply substitute the values from this table into the above formula as follows:

P(+Cancer | +Smokes) = ( 3 / 60 ) / ( 13 / 60) = 0.05 / .217 = 0.23

Note that you could have derived this value from the table of frequencies as well:

P(+Cancer | +Smokes) = 3 / 13 = 0.23

How do you interpret this result? Using the recommended approach of communicating risk in terms of frequencies, you might say that of the next 100 smokers you enounter, you can expect 23 of them to experience cancer in their lifetime. What is the probability of getting cancer if you do not smoke?

P(+Cancer | -Smokes) = ( 7 / 60 ) / ( 47 / 60) = 0.117 / .783 = 0.15

So it appears that you are more likely to get cancer if you smoke than if you do not smoke, even though the tallies appearing in the table might not have initially given you that impression. It is interesting to speculate on what the true conditional probabilities might be for various types of cancer given various criteria for defining someone as a smoker.

A “cohort” research methodology would also require you to equate smokers and non-smokers on other variables like age, gender, and weight so that smoking, and not these other co-variates, can be isolated as the root cause of the different cancer rates.

To summarize, you can compute a conditional probability (+Cancer | +Smokes) from joint distribution data by dividing the relevant joint probability P(+Cancer & +Smokes) by the relevant marginal probability P(+Smokes). As you might imagine, it is often easier and more feasible to derive estimates of a conditional probability from summary tables like this, rather than expecting to apply more data-intensive enumeration methods.

{mospagebreak title=Deriving Bayes Theorem}

You are now in a position to discuss the canonical formula for Bayes inference. The derivation of Bayes formula follows naturally from the definition of conditional probability using the probability format:

P(A | B) = P(A & B) / P(B)

Using some algebra, this equation can be rewritten as:

P(A & B) = P(A | B) P(B)

The same right-hand value can also be computed using A as the conditioning variable:

P(A & B) = P(B | A) P(A)

Given this equivalence, you can write:

P(A | B) P(B) = P(B | A) P(A)

Simplifying, you arrive at Bayes theorem:

P(A | B) = P(B | A) P(A) / P(B)

Notice that this formula for computing a conditional probability is similiar to the original formula with the exception that the joint probability P(A & B) that used to appear in the numerator has been replaced with the equivalent expression P(B | A) P(A).

Computing the full posterior

Bayesian inference is often put forth as a prescriptive framework for hypothesis testing. Using this framework, it is standard to replace P(A | B) with P(H | E) where H stands for hypothesis and E stands for evidence. Bayes inference rule then looks like this:

P(H | E) = P(E | H) P(H) / P(E)

In words, the formula says that the posterior probability of a hypothesis given the evidence P(H | E) is equal to the likelihood of the evidence given the hypothesis P(E | H) multiplied by the prior probability of the hypothesis P(H). You can ignore P(E) as only serving a normalization role (in other words, ensuring the sum of all the cell probabilities is 1.0). You can thus mentally simplify the equation to:

P(H | E) = P(E | H) P(H)

The prior distribution P(H) in this equation can be represented in PHP as an indexed array of probability values (as shown):

var $priors = array();

The $priors array is expected to contain a list of numbers denoting the prior probability of each hypothesis. In the context of medical diagnosis, the $priors array might contain the prevalence rates of each hypothesized disease in the population. Alternatively, the array might contain a medical specialist’s best guess as to the prior probability of each disease under consideration given everything they know about each disease and current conditions.

The exact nature of the full posterior probability computation is made clearer by seeing that the posterior and likelihood terms appear in a PHP implementation as two-dimensional arrays (the closest you can currently get to a matrix datatype in PHP).

Listing 3. The posterior and likelihood terms appear in a PHP implementation as 2D arrays
<?php

// $m denotes the number of hypothesis
// $n denotes the number of evidence patterns

$m = 3;
$n = 4;

$priors      = getPriorDistribution();
$likelihoods = getlikelihoodDistribution();
$evidence    = getEvidenceDistribution();

for($e=0; $e < $n; $e++) {
  for ($h=0; $h < $m; $h++) {
    $posterior[$e][$h] = $priors[$h]
       * $likelihoods[$h][$e] / $evidence[$e];
  }
}

?>

For now, ignore the issue of how the $prior, $likelihood, and $evidence distribution values are computed from raw data. You can posit magical get functions to obtain these values. The previous code shows how the values of the posterior probability matrix are computed by looping over the evidence items and the hypothesis alternatives.

The order of the index elements $e and $h in the posterior matrix might be puzzling until you realize that in PHP the evidence key should appear first in the posterior matrix because it is a lookup key. If you access the posterior matrix using an evidence key $e, it will return an array containing the probability of each hypothesis under consideration (such as, +cancer, -cancer) given the particular evidence key you have supplied (like +test). The code above computes the full posterior distribution over all evidence keys. To compute a row of the full posterior distribution for a particular evidence key, you would use this formula:

Figure 2. Formula to compute posterior distribution

{mospagebreak title=Medical diagnosis wizard}

To make it clear how Bayes theorem works, you will develop an online medical diagnosis wizard using PHP. This wizard could also have been called a calculator except that it takes four input steps to supply the prerequisite information then a step to review the result.

The wizard works by asking the user to supply the various pieces of information critical to computing the full posterior probability. The user can examine the posterior distribution to determine which which disease hypothesis enjoys the highest probability based on:

  1. The diagnositic test information
  2. The sample data used to estimate the prior and likelihood distributions

Bayes Wizard: Step 1

Step 1 in using Bayes theorem to make a medical diagnosis involves specifying the number of disease alternatives that you will examine along with the number of symptoms or evidence keys. In the generic example you will look at, you will evaluate three disease alternatives based on evidence from two diagnostic tests. Each diagnostic test can only produce a positive or negative result. This means that the total number of symptom combinations, or evidence keys, you can observe is four (++, +-, -+, or –).

Figure 3. Form to enter disease hypotheses and symptom possibilities

Bayes Wizard: Step 2

Step 2 involves entering the disease and symptom labels. In this case, you are just going to enter d1, d2, and d3 for the disease labels and ++, +-, -+ and — for the symptom labels. The two symbols used for symptom labels signify whether the results of the two diagnostic tests came out positive or negative.

Figure 4. Form to enter disease and symptom labels

Bayes Wizard: Step 3

Step 3 involves entering the prior probabilities for each disease. You will use the data table below to determine the prior probabilities to enter for step three and the likelihood to enter for step four (this data table originally appeared in Introduction to Probability). Using this example allows you to confirm that the final result you obtain from the wizard agrees with the results you can find in this book.

Figure 5. Joint frequency of diseases and symptoms

The prior probability of each disease refers to the number of patients diagnosed with each disease divided by the total number of diagnosed cases in this sample. The relevant prior probabilities for each disease are entered in the following:

Figure 6. Form to enter disease priors

You do not have to rely upon a data table such as the previous one to derive the prior probability estimates. In some cases, you can derive prior probabilities by using common-sense reasoning: The prior probability of a fair two-sided coin coming up heads is 0.5. The prior probability of selecting a queen of hearts from a randomized deck of cards is 1/52.

You also commonly run into situations where you intially have no good estimates of what the prior probability of each hypothesis might be. In such cases, it is common to posit noninformative priors. If you have four hypothesis alternatives, then the noninformative prior distribution would be 1/4 or 0.25 for each hypothesis. You might note here that Bayesians often criticize the use of a null hypothesis in significance testing because it amounts to assuming noninformative priors in cases where positing informative priors might be more theoretically or empirically justified.

A final way to derive estimates of the prior probability of each hypothesis P(Hi) is through a subjective estimate of what those probabilities might be given everything you have learned about the way the world works up to that point P( H=h | Everything you know). You will often find Bayesian inference sharing the same bed with a subjective view of probability in which the probability of a proposition is equated with one’s subjective degree of belief in the proposition.

What it important in this discussion is that Bayesian inference is a flexible technique that allows you to estimate prior probabilities using objective methods, common-sense logical methods, and subjective methods. When using subjective methods, you must still be willing to defend your prior probability estimates. You may use objective data to help set and justify your subjective estimates which means that Bayesian inference is not necessarily in conflict with more objectively oriented approaches to statistical inference.

Bayes Wizard: Step 4

The data table provides you with information you can use to compute the probability of the symptoms (like test results) given the disease, also known as the likelihood distribution P(E | H).

To see how the likelihood values entered below were computed, you can unpack P(E|H) using the frequency format for computing conditional probabilities:

P(E | H) = {E & H} / {H}

This tells us that you need to divide a joint frequency count {E & H} by a marginal frequency count {H} to obtain the likelihood value for each cell in your likelihood matrix. The top left cell of your likelihood matrix P(E=’++’ | H=’d1) can be immediately computed from the joint and marginal frequency counts appearing in the data table:

P(E=’++’ | H=’d1) = 2110 / 3125 = .6562

All the likelihood values entered in Step 4 were computed in this manner.

Figure 7. Form to enter likelihood of symptoms given the disease

It should be noted that many statisticians use likelihood as a system of inference instead of, or in addition to, Bayesian inference. This is because likelihoods also provide a metric one can use to evaluate the relative degree of support for several hypotheses given the data.

In the previous example, you can see that the probability of a particular evidence key varies for each hypothesis under consideration. The probability of the ++ evidence key is the greatest for the d1 hypothesis. You can assess which hypothesis is best supported by the data by:

  1. Examining the likelihood of the evidence key given each hypothesis key

  2. Selecting the hypothesis that maximizes the likelihood of the evidence key

Doing so would be an example of inference according to the principle of maximum likelihood.

Another interesting point to note is that all the values in the above likelihood distibution sum to a value greater than 1. What this means is that the likelihood distribution is not really a probability distribution because it lacks the defining property that the distribution of values sum to 1. This summation property is not essential for the purposes of evaluating the relative support for different hypotheses. What is important for this purpose is that the “likelihood supplies a natural order of preference among the possibilities under consideration” (from R.A. Fisher’s Statistical Methods and Scientific Inference, p. 68).

You may not understand fully the concept of likelihood from this brief discussion, but I do hope that you appreciate its importance to the overall Bayes theorem calculation and its importance as the foundation for another system of inference. The likelihood system of inference is preferred by many statisticians because you don’t have to resort to the dubious practice of trying to estimate the prior probability of each hypothesis.

Maximum likelihood estimators also have many desirable mathematical properties that make them nice to work with (the properties include transitivity, additivity, a lack of bias, and invariance under transformations, among others). For these reasons, it is often a good idea to closely examine your likelihood distribution in addition to your posterior distibution when making inferences from your data.

Bayes Wizard: Step 5

The final step of the process involves displaying the posterior distribution of the diseases given the symptoms P(H | E):

Figure 8. Probability of each disease given symptoms

The section of the script that was used to compute and display the posterior distribution looks like this:

Listing 4. Computing and displaying the posterior distribution

<?php
include “Bayes.php”;

$disease_labels = $_POST["disease_labels"];
$symptom_labels = $_POST["symptom_labels"];
$priors         = $_POST["priors"];
$likelihoods    = $_POST["likelihoods"];

$bayes = new Bayes($priors, $likelihoods);
$bayes->getPosterior();
$bayes->setRowLabels($symptom_labels);    // aka evidence labels
$bayes->setColumnLabels($disease_labels); // aka hypothesis labels
$bayes->toHTML();
?>

You begin by loading the Bayes constructor with the priors and likelihoods obtained from previous wizard steps. Using this information, you compute the posterior using the $bayes->getPosterior() method. To output the posterior distribution to the browser, you first set the row and column labels to display, then output the posterior distribution using the $bayes->toHTML() method.

{mospagebreak title=Implementing the calculation with Bayes.php}

The Bayes.php class implements the Bayes theorem calculation. The getPosterior method is where most of the mathematically interesting code resides.

Listing 5. Implementing the calculation with the Bayes.php class
<?php

/**
* Bayes
*
* Calculates posterior probabilities for m hypotheses and n evidence
* alternatives.  The code was inspired by a procedural TrueBasic version
* (Bayes.tru) bundled with Grimstead and Snell’s excellent online
* textbook “Introduction to Probability”.
*/
class Bayes {

  /**
  * Number of evidence alternatives (that is, number of rows).
  */
  var $m;

  /**
  * Number of hypothesis alternatives (that is, number of columns).
  */
  var $n;

  /**
  * Output labels for evidence alternatives.
  */
  var $row_labels = array();
 
  /**
  * Output labels for hypothesis alternatives.
  */ 
  var $column_labels = array();

  /**
  * Vector container for prior probabilities.
  */
  var $priors = array();

  /**
  * Matrix container for likelihood of evidence e given hypothesis h.
  */
  var $likelihoods = array();

  /**
  * Matrix container for posterior probabilties.
  */
  var $posterior = array();

  /**
  * Vector container for evidence probabilties.
  */
  var $evidence = array();

  /**
  * Initialize the Bayes algorithm by setting the priors, likelihoods
  * and dimensions of the likelihood and posterior matrices.
  */
  function Bayes($priors, $likelihoods) {
    $this->priors = $priors;
    $this->likelihoods = $likelihoods;
    $this->m = count($this->likelihoods);  // num rows
    $this->n = count($this->likelihoods[0]); // num cols
    return true;
  }
 
  /**
  * Output method for setting row labels prior to display.
  */
  function setRowLabels($row_labels) {
    $this->row_labels = $row_labels;
    return true;
  }

  /**
  * Output method for setting column labels prior to display.
  */
  function setColumnLabels($column_labels) {
    $this->column_labels = $column_labels;
    return true;
  }

  /**
  * Compute the posterior probability matrix given the priors and
  * likelihoods.
  *
  * The first set of loops computes the denominator of the canonical
  * Bayes equation. The probability appearing in the denominator
  * serves a normalizing role in the computation – it ensures that
  * posterior probabilities sum to 1.
  *
  * The second set of loops:
  *
  *   1. multiplies the prior[$h] by the likelihood[$h][$e]
  *   2. divides the result by the denominator
  *   3. assigns the result to the posterior[$e][$h] probability matrix
  */
  function getPosterior() {
    // Find probability of evidence e
    for($e=0; $e < $this->n; $e++) {
      for ($h=0; $h < $this->m; $h++) {
        $this->evidence[$e] += $this->priors[$h]
           * $this->likelihoods[$h][$e];
      }
    }
    // Find probability of hypothesis given evidence
    for($e=0; $e < $this->n; $e++) {
      for ($h=0; $h < $this->m; $h++) {
        $this->posterior[$e][$h] = $this->priors[$h
           * $this->likelihoods[$h][$e] / $this->evidence[$e];
      }
    }
    return true;
  }
 
  /**
  * Output method for displaying posterior probability matrix
  */
  function toHTML($number_format=”%01.3f”) {
    ?>
    <table border=’1′ cellpadding=’5′ cellspacing=’0′>
      <tr>
        <td> </td>
        <?php
        for ($h=0; $h < $this->m; $h++) {
          ?>
          <td align=’center’>
             <b><?php echo $this->column_labels[$h] ?></b>
          </td>
          <?php
        }
        ?>
      </tr>
      <?php
      for($e=0; $e < $this->n; $e++) {
        ?>
        <tr>
          <td><b><?php echo $this->row_labels[$e] ?></b></td>
          <?php
          for ($h=0; $h < $this->m; $h++) {
            ?>
            <td align=’right’>
               <?php printf($number_format, $this->posterior[$e][$h]) ?>
            </td>
            <?php
          }
          ?>
        </tr>
        <?php
      }
      ?>
    </table>
    <?php
  }
}
?>

{mospagebreak title=Sensitivity analysis}

An important aspect of Bayesian inference involves examining the effect of small changes to your prior and likelihood distributions. If the prior probability values you are using are viewed as best guesses, then you might want to see what happens when you adjust the prior probabilities of each hypothesis slightly. You may notice that this significantly changes the posterior distribution values or it might have little effect. It is good to know how sensitive your results are to the exact prior values (or likelihood values) used.

The final screen of the Bayes diagnosis wizard gives you the options to

  • Start again
  • Re-enter labels
  • Re-enter your priors
  • Re-enter your likelihoods

If you decide to re-enter your priors, the wizard remembers your previously entered likelihoods. After you re-enter your priors, you can click forward to Step 5 without having to re-enter your likelihood values (or you can modify the likelihoods as well). In other words, the design of the Bayes wizard encourages you to engage in sensitivity analysis prior to drawing any final conclusions.

It’s only a beginning

Since you have made it this far, you should have a basic understanding of how Bayesian inference works. I will, however, continue to focus on the more general concept of conditional probability and the availability of various techniques, including but not restricted to the Bayes theorem, that you might use to compute a conditional probability value.

One way to compute a conditional probability is by enumeration and you have explored the idea that databases might be good tools to use to compute conditional probabilties in this way. Indeed, these conditional probability computations often form the primitives used in many data-mining applications. I’ll present other opportunties to explore the role of databases in computing conditional probabilites in the upcoming articles on Web survey analysis.

Bayes theorem is another method you can use to compute a conditional probability. In this article, I demonstrated what a prior, likelihood, and posterior distribution are; how to estimate the prior and likelihood distributions from raw data; and how you can use PHP to compute the full posterior distribution. To become more skillful in the art of Bayesian inference requires that you become thoroughly familiar with these three concepts.

You’ve only scratched the surface of Bayes inference. Hopefully this article has provided you a good foundation for exploring more advanced topics in Bayesian inference, such as Bayes classifiers, Bayes learning algorithms, and Bayes networks.

{mospagebreak title=Resources}

• Download the source code used in this article. Find updates to the code at phpmath.com.

• Read “SMART HEURISTICS” by Gerd Gigerenzer. This study of fast and frugal decision-making looks at smart heuristics people actually use to make good decisions.

• Look at the paper “Teaching Bayesian Reasoning in Less Than Two Hours.” Authors Gerd Gigerenzer and Peter Sedlmeier present and test a new method of teaching Bayesian reasoning.

• Discover one of the oldest (but still powerful) computer science references, “An Essay towards solving a Problem in the Doctrine of Chances” by Thomas Bayes (1763) as it covers the Bayes’s Theorem.

• Explore the development of a new concept for aggregating items of evidence in classification problems in “Multiplicative Adjustment of Class Probability: Educating Naive Bayes.”

• Read “A Decomposition Of Classes Via Clustering To Explain And Improve Naive Bayes” for a method to improve the probability estimates made by Naive Bayes and avoid the effects of poor class conditional probabilities based on product distributions when each class spreads into multiple regions.

• In “An analysis of data characteristics that affect naive Bayes performance,” identify some data characteristics for which naive Bayes works well.

• Take the “Web site user modeling with PHP” tutorial and learn how to construct a user-modeling platform that can use clickstream data to build Web site user models (developerWorks, December 2003).

• Explore “An autonomic computing roadmap” for more about the Agent Building and Learning Environment (ABLE) that provides Learning Beans that implement Bayesian reasoning (developerWorks, February 2004).

• Halt spam with these two Bayesian-based techniques in “Spam filtering techniques” (developerWorks, September 2002).

• In “Apply probability models to Web data using PHP,” discover how to fit the benefits of probability modeling into Web application development (developerWorks, October 2003).

• Get good ideas on how to apply Bayesian inference to database technology in these data mining publications by Rakesh Agrawal, an IBM Fellow recently recognized as an ACM Fellow for his pioneering research in data mining.

• Read the later articles in the author’s series on Bayesian inference:
o “Implement Bayesian inference using PHP, Part 2” solves parameter estimation problems (developerWorks, April 2004).
o “Implement Bayesian inference using PHP, Part 3” solves classification problems in medical diagnostic testing and Web survey analysis as it applies Bayesian and conditional probability concepts to both building classifier systems and analyzing the accuracy of their output (developerWorks, May 2004)

• Explore the Hugin Expert site for such Bayesian networking software tools as BayesCredit, a tool for risk protection.

• See what Norsys offers with Netica, a Bayesian network development software that helps manage uncertainty.

• Visit Bayesia for products that facilitate knowledge modeling and data mining, and can help developers add the power of a Bayesian decision engine to their applications.

• Get a good start on topics in probability with the textbook Introduction to Probability by Charles Grimstead and J. Laurie Snell (American Mathematical Society, 2nd Ed., available in PDF).

• For excellent coverage of more advanced Bayesian reasoning techniques, read Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig (Prentice Hall, 2003).

• If you’re interested in applying Bayesian inference to data mining problems, Data Mining: Concepts and Techniques by Jiawei Han and Micheline Kamber is a good starting point (Morgan Kaufmann Publishers, 2000).

• Learn the conceptual basis of programming in this well-know text, Structure and Interpretation of Computer Programs by Harold Abelson, Gerald Jay Sussman, and Julie Sussman (MIT Press, 2nd. Ed., 1996).

• Check out Statistical Methods, Experimental Design, and Scientific Inference, a single volume that brings together the classical works of R.A. Fisher: Statistical Methods for Research Workers, Statistical Methods and Scientific Inference (mentioned in this article), and The Design of Experiments (Oxford University Press, 1990).

• Browse the developerWorks bookstore for titles on this and other related subjects.

• Visit developerWorks Web Architecture zone for a range of articles on the topic of Web architecture and usability.

Name Size Download Method
wa-bayes1.tar.gz 5KB FTP

Information about download methods

About the author
Paul Meagher is a freelance Web developer, writer, and data analyst. Paul has a graduate degree in Cognitive Science and has spent the last six years developing Web applications. His current interests include statistical computing, data mining, content management, and e-learning. Paul can be contacted at paul@datavore.com.

[gp-comments width="770" linklove="off" ]

chat