PHP
  Home arrow PHP arrow Page 5 - Implement Bayesian inference using PHP, Part 1
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PHP

Implement Bayesian inference using PHP, Part 1
By: developerWorks
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 23
    2005-01-05


    Table of Contents:
  • Implement Bayesian inference using PHP, Part 1
  • Conditional probability
  • Learning from experience
  • Conditional probability and SQL
  • Frequency versus probability format
  • Deriving Bayes Theorem
  • Medical diagnosis wizard
  • Implementing the calculation with Bayes.php
  • Sensitivity analysis
  • Resources

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Implement Bayesian inference using PHP, Part 1 - Frequency versus probability format
    ( Page 5 of 10 )

    The getConditionalProbability function you've developed operates on counts and frequencies rather than on probabilities. In reading the literature on Bayesian reasoning, you will notice that the enumeration method for computing P(A | B) is only briefly discussed. Most authors quickly move onto describing how P(A | B) can be formulated using terms denoting probability values rather than frequency counts. For example, you can recast the formula for computing P(A | B) using such probability terms as:

    P(A | B) = P(A & B) / P(B)

    The advantage of recasting the formula using terms denoting probabilities instead of frequency counts arises because in practice, you often don't have access to a data set we can use to derive conditional probability estimates through an enumeration of cases method. Instead, you often have access to higher-level summary information from past studies in the form of percentages and probabilities. With the available information, the challenge then becomes finding a way to use these probability estimates instead to compute the conditional probabilities you are interested in. Recasting the conditional probability formula in terms of probabilities allows you to make inferences based on related probability information that is more readily accessible.

    The enumeration method might still be regarded as the most basic and intuitive method for computing a conditional probability. In Thomas Bayes' "Essay on the Doctrine of Chances," he uses enumeration to arrive at the conclusion that P( 2nd Event = b | 1st Event = a ) is equal to [P / N] / [ a / N], which is equal to P / a, which one can also denote as {a & b} / {a}:

    Figure 1. Graphical representation of relations

    Another reason why it is important to be aware of frequency versus probability format issues is because it has been demonstrated by Gerd Gigerenzer (and others) that people are better at reasoning in accordance with prescriptive Bayesian rules of inference when background information is presented in terms of frequencies of cases (1 in 10 cases) rather than probabilities (10 percent probability). A practical application of this research is that medical students are now being taught to communicate risk information in terms of frequencies of cases instead of probabilities, making it easier for patients to make better informed judgements about what actions are warranted given the test results.

    Joint probability

    The most basic method for computing a conditional probability using a probability format is:

    P(A | B) = P(A & B) / P(B)

    This probability format is identical to the frequency format, except for the probability operator P( ) surrounding the numerator and denominator terms. The P(A & B) term denotes the joint probability of A and B occurring together. To understand how the joint probability P(A & B) can be computed from cross-tabulated data, consider the following hypothetical data (taken from pp. 147-48 of Grimstead and Snell's online texbook):

    -Smokes +Smokes Totals
    -Cancer 40 10 15
    +Cancer 7 3 10
    Totals 47 13 60

    To convert this table of frequencies to a table of probabilities, you divide each cell frequency by the total frequency (60). Note that dividing by the total frequency also ensures that Cancer x Smokes cell probabilies sum to 1 and permits you to refer to the silver area of the table below as the joint probability distribution of Cancer and Smoking.

    -Smokes +Smokes Totals
    -Cancer 40/60 10/60 50/60
    +Cancer 7/60 3/60 10/60
    Totals 47/60 13/60 60/60

    To compute the probability of cancer given that a person smokes P(+Cancer | +Smokes), you can simply substitute the values from this table into the above formula as follows:

    P(+Cancer | +Smokes) = ( 3 / 60 ) / ( 13 / 60) = 0.05 / .217 = 0.23

    Note that you could have derived this value from the table of frequencies as well:

    P(+Cancer | +Smokes) = 3 / 13 = 0.23

    How do you interpret this result? Using the recommended approach of communicating risk in terms of frequencies, you might say that of the next 100 smokers you enounter, you can expect 23 of them to experience cancer in their lifetime. What is the probability of getting cancer if you do not smoke?

    P(+Cancer | -Smokes) = ( 7 / 60 ) / ( 47 / 60) = 0.117 / .783 = 0.15

    So it appears that you are more likely to get cancer if you smoke than if you do not smoke, even though the tallies appearing in the table might not have initially given you that impression. It is interesting to speculate on what the true conditional probabilities might be for various types of cancer given various criteria for defining someone as a smoker.

    A "cohort" research methodology would also require you to equate smokers and non-smokers on other variables like age, gender, and weight so that smoking, and not these other co-variates, can be isolated as the root cause of the different cancer rates.

    To summarize, you can compute a conditional probability (+Cancer | +Smokes) from joint distribution data by dividing the relevant joint probability P(+Cancer & +Smokes) by the relevant marginal probability P(+Smokes). As you might imagine, it is often easier and more feasible to derive estimates of a conditional probability from summary tables like this, rather than expecting to apply more data-intensive enumeration methods.



     
     
    >>> More PHP Articles          >>> More By developerWorks
     

       

    PHP ARTICLES

    - Building Dynamic Queries with Chainable Meth...
    - PHP Encryption and Decryption Methods
    - Building a MySQL Abstraction Class with Meth...
    - Completing a Sample String Processor with Me...
    - Mastering WHILE Loops for PHP and MySQL
    - Method Chaining: Adding More Methods to the ...
    - Method Chaining in PHP 5
    - The Role of Interfaces in Applying the Depen...
    - Dependency Injection: Using a Setter Method ...
    - Using a Model Class with the Dependency Inje...
    - Injecting Objects Using Setter Methods with ...
    - Injecting Objects by Constructor with the De...
    - The Dependency Injection Design Pattern in P...
    - Performing Inferential Statistical Analysis ...
    - Performing Descriptive Statistical Analysis ...





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 3 Hosted by Hostway
    Stay green...Green IT