Home PHP Page 2 - Implementing Bayesian Inference Using PHP: Part 2

# Defining simple surveys - PHP

While the first article in this series discussed building intelligent Web applications through conditional probability, this Bayesian inference article examines how you can use Bayes methods to solve parameter estimation problems. Relevant concepts are explained in the context of Web survey analysis using PHP and JPGraph. (This intermediate-level article was first published by IBM developerWorks on April 12, 2004 at  http://www.ibm.com/developerWorks).

Rating:  / 2
January 12, 2005

SEARCH DEV SHED

TOOLS YOU CAN USE

Surveys come in many forms. You can present questions and solicit answers in a large variety of ways. I won't be concerned with understanding how to analyze every possible type of survey; instead, I will try to be a bit more strategic by starting with the simplest possible types of surveys.

The first type of survey to examine is one in which all the survey questions require a boolean response (yes/no, agree/disagree, male/female, and so forth). Think of this as a multiple-choice survey where all questions only offer two mutually-exclusive response options, probably the simplest type of survey you can imagine constructing.

When participants take a Web survey, you need to store their answers in a form suitable for later analysis. For analysis purposes, the best way to store survey answers is in a database table dedicated to responses from a particular survey. The survey table should have columns devoted to recording the boolean-valued response for each question (denoted q1 to q3 in the following table):

Table 1. Storing survey answers so they're suitable for later analysis

 participant q1 q2 q3 1 0 0 0 2 0 0 1 3 1 1 0 add more rows here

Opinion poll surveys with binary response options are ideally collected in this format. Surveys with binary data collected in this format are referred to as binary surveys.

A survey that is constructed for the purposes of classifying participants differs from the above in that it requires at least one extra classification field (denoted c1 below) to record, for example, the employment status of the participant (coded as 0 = unemployed and 1 = employed).

Table 2. Extra field allows participants to be classified

 participant q1 q2 q3 c1 1 0 0 0 1 2 0 0 1 1 3 1 1 0 0 add more rows here

Note that records used for medical diagnostic testing are likely to have a similar format.

Surveys with binary data collected in this format are referred to as binary classification surveys.

When the adjective "simple" is used to describe a binary survey, this means that the survey consists of only one binary response per participant -- which most people would refer to as a poll. It can also be viewed as the limiting case of a survey.

When the adjective "simple" is used to describe a binary classification survey, this means that the survey consists of only two binary responses per participant, one being a response to the test question q1 and one being a response to the classification question c1.

You will find the parameter estimation concepts that I discuss in this article to be useful for analyzing simple binary surveys. In my next article, I focus on concepts and code useful for analyzing simple binary classification surveys and multivariate binary classification surveys.

The range of binary surveys as defined here represent a distinct and interesting class of surveys to study. A cornucopia of literature and palette of analytic techniques is available to analyze binary data. Binary surveys are also interesting because responses coded as 0s and 1s are written in the native language of hardware-based computing. Fields as diverse as statistics, computer science, physics, medical diagnosis, data compression, and electrical engineering can be treated in a unified manner within the mathematics of binary data analysis and modeling.

 >>> More PHP Articles          >>> More By developerWorks