Graphical Interfaces and Unit Testing

In this final part of a three-part series on unit testing, we discuss the use of graphical interfaces, unit testing in a web environment, and more. The article is excerpted from chapter six of the book Advanced PHP Programming, written by George Schlossnagle (Sams; ISBN: 0672325616).

Using Graphical Interfaces

Because PHP is a Web-oriented language, you might want an HTML-based user interface for running your unit tests. PHPUnit comes bundled with this ability, using PHPUnit_WebUI_TestRunner::run(). This is in fact a nearly identical framework to TextUI; it simply uses its own listener to handle generate HTML-beautified output.

Hopefully, in the future some of the PHP Integrated Development Environments (IDEs; programming GUIs) will expand their feature sets to include integrated support for unit testing (as do many of the Java IDEs). Also, as with PHP-GTK (a PHP interface to the GTK graphics library API that allows for Windows and X11 GUI development in PHP), we can always hope for a PHP-GTK front end for PHPUnit. In fact, there is a stub for PHPUnit_GtkUI_TestRunner in the PEAR repository, but at this time it is incomplete.

Test-Driven Design

There are three major times when you can write tests: before implementation, during implementation, and after implementation. Kent Beck, author of JUnit and renowned Extreme Programming guru, advocates to “never write a line of functional code without a broken test case.” What this quote means is that before you implement anything—including new code—you should predefine some sort of call interface for the code and write a test that validates the functionality that you think it should have. Because there is no code to test, the test will naturally fail, but the point is that you have gone through the exercise of determining how the code should look to an end user, and you have thought about the type of input and output it should receive. As radical as this may sound at first, test-driven development (TDD) has a number of benefits:

  • Encourages good design—You fully design your class/function APIs before you begin coding because you actually write code to use the APIs before they exist.

  • Discourages attempts to write tests to match your code—You should do TDD instead of writing code to match your tests. This helps keep your testing efforts honest.

  • Helps constrain the scope of code—Features that are not tested do not need to be implemented

  • Improves focus—With failing tests in place, development efforts are naturally directed to making those tests complete successfully.

  • Sets milestones—When all your tests run successfully, your code is complete.

The test-first methodology takes a bit of getting used to and is a bit difficult to apply in some situations, but it goes well with ensuring good design and solid requirements specifications. By writing tests that implement project requirements, you not only get higher-quality code, but you also minimize the chance of overlooking a feature in the specification.

The Flesch Score Calculator

Rudolf Flesch is a linguist who studied the comprehensibility of languages, English in particular. Flesch’s work on what constitutes readable text and how children learn (and don’t learn) languages inspired Theodor Seuss Geisel (Dr. Seuss) to write a unique series of children’s book, starting with The Cat in the Hat. In his 1943 doctoral thesis from Columbia University, Flesch describes a readability index that analyzes text to determine its level of complexity. The Flesch index is still widely used to rank the readability of text.

The test works like this:

  1. Count the number of words in the document.

  2. Count the number of syllables in the document.

  3. Count the number of sentences in the document.

The index is computed as follows:

    Flesch score = 206.835 – 84.6 x (syllables/words) – 1.015 x (words/sentences)

The score represents the readability of the text. (The higher the score, the more readable.) These scores translate to grade levels as follows:

Score

School Level

90–100

5th grade

80–90

6th grade

70–80

7th grade

60–70

8th and 9th grades

50–60

high school

30–50

college

0–30

college graduate


Flesch calculates that Newsweek magazine has a mean readability score of 50; Seventeen magazine a mean score of 67; and the U.S. Internal Revenue Service tax code to have a score of –6. Readability indexes are used to ensure proper audience targeting (for example, to ensure that a 3rd-grade text book is not written at a 5th-grade level), by marketing companies to ensure that their materials are easily comprehensible, and by the government and large corporations to ensure that manuals are on level with their intended audiences.

{mospagebreak title=Testing the Word Class}

Let’s start by writing a test to count the number of syllables in a word:

<?php
 require "PHPUnit/Framework/TestSuite.php"; 
 require "PHPUnit/TextUI/TestRunner.php";
 require "Text/Word.inc"; 
 
 class Text_WordTestCase extends
PHPUnit_Framework_TestCase { public $known_words = array( 'the' => 1, 'late' => 1, 'frantic' => 2, 'programmer' => 3); public function _ _construct($name) { parent::_ _construct($name); } public function testKnownWords() { foreach ($this->known_words as $word =>
$syllables) { $obj = new Text_Word($word); $this->assertEquals($syllables,
$obj->numSyllables()); } } } $suite = new
PHPUnit_Framework_TestSuite('Text_WordTestCase'); PHPUnit_TextUI_TestRunner::run($suite); ?>

Of course this test immediately fails because you don’t even have a Word class, but you will take care of that shortly. The interface used for Word is just what seemed obvious. If it ends up being insufficient to count syllables, you can expand it.

The next step is to implement the class Word that will pass the test:

<?php
class Text_Word { 
  public $word; 
  public function _ _construct($name) { 
    $this->word = $name; 
  } 
  protected function mungeWord($scratch) { 
    // lower case for simplicity 
    $scratch = strtolower($scratch);
    return $scratch; 
  } 
  protected function numSyllables() { 
    $scratch = mungeWord($this->word); 
    // Split the word on the vowels. a e i o u,
and for us always y $fragments = preg_split("/[^aeiouy]+/",
$scratch); // Clean up both ends of our array if they have
null elements if(!$fragments[0]) { array_shift($fragments); } if (!$fragments[count($fragments) - 1]) { array_pop($fragments); } return count($fragments); } } ?>

This set of rules breaks for late. When an English word ends in an e alone, it rarely counts as a syllable of its own (in contrast to, say, y, or ie). You can correct this by removing a trailing e if it exists. Here’s the code for that:

function mungeWord($scratch) { 
    $scratch = strtolower($scratch);
    $scratch = preg_replace("/e$/", "", $scratch);
    return $scratch; 
}

The test now breaks the, which has no vowels left when you drop the trailing e. You can handle this by ensuring that the test always returns at least one syllable. Here’s how:

function numSyllables() { 
    $scratch = mungeWord($this->word); 
    // Split the word on the vowels. a e i o u, and
for us always y $fragments = preg_split("/[^aeiouy]+/",
$scratch); // Clean up both ends of our array if they have
null elements if(!$fragments[0]) { array_shift($fragments); } if (!$fragments[count($fragments) - 1]) { array_pop($fragments); } if(count($fragments)) { return count($fragments); } else { return 1; } }

When you expand the word list a bit, you see that you have some bugs still, especially with nondiphthong multivowel sounds (such as ie in alien and io in biography). You can easily add tests for these rules:

<?php
require_once "Text/Word.inc";
require_once "PHPUnit/Framework/TestSuite.php";

class Text_WordTestCase extends
PHPUnit_Framework_TestCase { public $known_words = array( 'the' => 1, 'late' => '1', 'hello' => '2', 'frantic' => '2', 'programmer' => '3'); public $special_words = array ( 'absolutely' => 4, 'alien' => 3, 'ion' => 2, 'tortion' => 2, 'gracious' => 2, 'lien' => 1, 'syllable' => 3); function _ _construct($name) { parent::_ _construct($name); } public function testKnownWords() { foreach ($this->known_words as
$word => $syllables) { $obj = new Text_Word($word); $this->assertEquals($syllables,
$obj->numSyllables(), "$word has incorrect syllable count"); } } public function testSpecialWords() { foreach ($this->special_words as
$word => $syllables) { $obj = new Text_Word($word); $this->assertEquals($syllables,
$obj->numSyllables(), "$word has incorrect syllable count"); } } } if(realpath($_SERVER['PHP_SELF']) == _ _FILE_ _) { require_once "PHPUnit/TextUI/TestRunner.php"; $suite = new
PHPUnit_Framework_TestSuite('Text_WordTestCase'); PHPUnit_TextUI_TestRunner::run($suite); } ?>

This is what the test yields now:

PHPUnit 1.0.0-dev by Sebastian Bergmann.

..F

Time: 0.00660002231598
There was 1 failure:
1) TestCase text_wordtestcase->testspecialwords()
failed: absolutely has incorrect syllable count expected 4, actual 5 FAILURES!!! Tests run: 2, Failures: 1, Errors: 0.

To fix this error, you start by adding an additional check to numSyllables() that adds a syllable for the io and ie sounds, adds a syllable for the two-syllable able, and deducts a syllable for the silent e in absolutely. Here’s how you do this:

<?
function countSpecialSyllables($scratch) {
 $additionalSyllables = array( '/wlien/', // alien
but not lien '/bl$/', // syllable '/io/', // biography ); $silentSyllables = array( '/wely$/',
// absolutely but not ely ); $mod = 0; foreach( $silentSyllables as $pat ) { if(preg_match($pat, $scratch)) { $mod--; } } foreach( $additionalSyllables as $pat ) { if(preg_match($pat, $scratch)) { $mod++; } } return $mod; } function numSyllables() { if($this->_numSyllables) { return $this->_numSyllables; } $scratch = $this->mungeWord($this->word); // Split the word on the vowels. a e i o u, and
for us always y $fragments = preg_split("/[^aeiouy]+/", $scratch); if(!$fragments[0]) { array_shift($fragments); } if(!$fragments[count($fragments) - 1]) { array_pop($fragments); } $this->_numSyllables +=
$this->countSpecialSyllables($scratch); if(count($fragments)) { $this->_numSyllables += count($fragments); } else { $this->_numSyllables = 1; } return $this->_numSyllables; } ?>

The test is close to finished now, but tortion and gracious are both two-syllable words. The check for io was too aggressive. You can counterbalance this by adding -ion and -iou to the list of silent syllables:

function countSpecialSyllables($scratch) { 
 $additionalSyllables = array( '/wlien/', // alien
but not lien '/bl$/', // syllable '/io/', // biography ); $silentSyllables = array( '/wely$/',
// absolutely but not ely '/wion/', // to counter the io match '/iou/', ); $mod = 0; foreach( $silentSyllables as $pat ) { if(preg_match($pat, $scratch)) { $mod--; } } foreach( $additionalSyllables as $pat ) { if(preg_match($pat, $scratch)) { $mod++; } } return $mod; }

The Word class passes the tests, so you can proceed with the rest of the implementation and calculate the number of words and sentences. Again, you start with a test case:

<?php
require_once "PHPUnit/Framework/TestCase.php";
require_once "Text/Statistics.inc";

class TextTestCase extends
PHPUnit_Framework_TestCase { public $sample; public $object; public $numSentences; public $numWords; public $numSyllables; public function setUp() { $this->sample = " Returns the number of words in the analyzed text
file or block. A word must consist of letters a-z with at least
one vowel sound, and optionally an apostrophe or a hyphen."; $this->numSentences = 2; $this->numWords = 31; $this->numSyllables = 45; $this->object =
new Text_Statistics($this->sample); } function _ _construct($name) { parent::_ _construct($name); } function testNumSentences() { $this->assertEquals($this->numSentences,
$this->object->numSentences); } function testNumWords() { $this->assertEquals($this->numWords,
$this->object->numWords); } function testNumSyllables() { $this->assertEquals($this->numSyllables,
$this->object->numSyllables); } } if(realpath($_SERVER['PHP_SELF']) == _ _FILE_ _) { require_once "PHPUnit/Framework/TestSuite.php"; require_once "PHPUnit/TextUI/TestRunner.php"; $suite = new
PHPUnit_Framework_TestSuite('TextTestCase'); PHPUnit_TextUI_TestRunner::run($suite); } ?>

You’ve chosen tests that implement exactly the statistics you need to be able to calculate the Flesch score of a text block. You manually calculate the “correct” values, for comparison against the soon-to-be class. Especially with functionality such as collecting statistics on a text document, it is easy to get lost in feature creep. With a tight set of tests to code to, you should be able to stay on track more easily.

Now let’s take a first shot at implementing the Text_Statistics class:

<?php
require_once "Text/Word.inc";
class Text_Statistics {
 public $text = '';
 public $numSyllables = 0;
 public $numWords = 0;
 public $uniqWords = 0;
 public $numSentences = 0;
 public $flesch = 0;
 public function _ _construct($block) {
  $this->text = $block;
  $this->analyze();
 }
 protected function analyze() {
  $lines = explode("n", $this->text) ;
  foreach($lines as $line) {
   $this->analyze_line($line);
  }
  $this->flesch = 206.835 -
          (1.015 * ($this->numWords /
$this->numSentences)) - (84.6 * ($this->numSyllables /
$this->numWords)); } protected function analyze_line($line) { preg_match_all("/b(w[w'-]*)b/", $line,
$words); foreach($words[1] as $word) { $word = strtolower($word); $w_obj = new Text_Word($word); $this->numSyllables += $w_obj->numSyllables(); $this->numWords++; if(!isset($this->_uniques[$word])) { $this->_uniques[$word] = 1; } else { $this->uniqWords++; } } preg_match_all("/[.!?]/", $line, $matches); $this->numSentences += count($matches[0]); } } ?>

How does this all work? First, you feed the text block to the analyze method. analyze uses the explode method on the newlines in the document and creates an array, $lines, of all the individual lines in the document. Then you call analyze_line() on each of those lines. analyze_line() uses the regular expression /b(w[w'-]*)b/ to break the line into words. This regular expression matches the following:

b       # a zero-space word break
(        # start capture
w       # a single letter or number 
[w'-]*  # zero or more alphanumeric characters
plus 's or –s # (to allow for hyphenations and
contractions ) # end capture, now $words[1] is our
captured word b # a zero-space word break

For each of the words that you capture via this method, you create a Word object and extract its syllable count. After you have processed all the words in the line, you count the number of sentence-terminating punctuation characters by counting the number of matches for the regular expression /[.!?]/.

When all your tests pass, you’re ready to push the code to an application testing phase. Before you roll up the code to hand off for quality assurance, you need to bundle all the testing classes into a single harness. With PHPUnit::TestHarness, which you wrote earlier, this is a simple task:

<?php
require_once "TestHarness.php";
require_once "PHPUnit/TextUI/TestRunner.php";

$suite = new TestHarness();
$suite->register("Text/Word.inc");
$suite->register("Text/Statistics.phpt");
PHPUnit_TextUI_TestRunner::run($suite);
?>

In an ideal world, you would now ship your code off to a quality assurance team that would put it through its paces to look for bugs. In a less perfect world, you might be saddled with testing it yourself. Either way, any project of even this low level of complexity will likely have bugs.

{mospagebreak title=Bug Report 1}

Sure enough, when you begin testing the code you created in the previous sections, you begin receiving bug reports. The sentence counts seem to be off for texts that contain abbreviations (for example, Dear Mr. Smith). The counts come back as having too many sentences in them, skewing the Flesch scores.

You can quickly add a test case to confirm this bug. The tests you ran earlier should have caught this bug but didn’t because there were no abbreviations in the text. You don’t want to replace your old test case (you should never casually remove test cases unless the test itself is broken); instead, you should add an additional case that runs the previous statistical checks on another document that contains abbreviations. Because you want to change only the data that you are testing on and not any of the tests themselves, you can save yourself the effort of writing this new TestCase object from scratch by simply subclassing the TextTestCase class and overloading the setUp method. Here’s how you do it:

class AbbreviationTestCase extends TextTestCase {
 function setUp() {
  $this->sample = "
Dear Mr. Smith,
 
Your request for a leave of absence has been
approved. Enjoy your vacation. "; $this->numSentences = 2; $this->numWords = 16; $this->numSyllables = 24; $this->object = new
Text_Statistics($this->sample); } function _ _construct($name) { parent::_ _construct($name); } }

Sure enough, the bug is there. Mr. matches as the end of a sentence. You can try to avoid this problem by removing the periods from common abbreviations. To do this, you need to add a list of common abbreviations and expansions that strip the abbreviations of their punctuation. You make this a static attribute of Text_Statistics and then substitute on that list during analyze_line. Here’s the code for this:

class Text_Statistics {
 // ...
 static $abbreviations = array('/Mr./' =>'Mr',
                '/Mrs./i' =>'Mrs',
                '/etc./i' =>'etc',
                '/Dr./i' =>'Dr',
                );
 // ...
 protected function analyze_line($line) {
  // replace our known abbreviations 
  $line =
preg_replace(array_keys(self::$abbreviations), array_values(self::$abbreviations), $line); preg_match_all("/b(w[w'-]*)b/", $line,
$words); foreach($words[1] as $word) { $word = strtolower($word); $w_obj = new Text_Word($word); $this->numSyllables += $w_obj->numSyllables(); $this->numWords++; if(!isset($this->_uniques[$word])) { $this->_uniques[$word] = 1; } else { $this->uniqWords++; } } preg_match_all("/[.!?]/", $line, $matches); $this->numSentences += count($matches[0]); } }

The sentence count is correct now, but now the syllable count is off. It seems that Mr. counts as only one syllable (because it has no vowels). To handle this, you can expand the abbreviation expansion list to not only eliminate punctuation but also to expand the abbreviations for the purposes of counting syllables. Here’s the code that does this:

class Text_Statistics {
 // ...
 static $abbreviations = array('/Mr./' =>'Mister',
                '/Mrs./i' =>'Misses', //Phonetic
                '/etc./i' =>'etcetera',
                '/Dr./i' =>'Doctor',
                );
 // ...
}

There are still many improvements you can make to the Text_Statistics routine. The $silentSyllable and $additionalSyllable arrays for tracking exceptional cases are a good start, but there is still much work to do. Similarly, the abbreviations list is pretty limited at this point and could easily be expanded as well. Adding multilingual support by extending the classes is an option, as is expanding the statistics to include other readability indexes (for example, the Gunning FOG index, the SMOG index, the Flesch-Kincaid grade estimation, the Powers-Sumner-Kearl formula, and the FORCAST Formula). All these changes are easy, and with the regression tests in place, it is easy to verify that modifications to any one of them does not affect current behavior.

{mospagebreak title=Unit Testing in a Web Environment}

When I speak with developers about unit testing in PHP in the past, they often said “PHP is a Web-centric language, and it’s really hard to unit test Web pages.” This is not really true, however.

With just a reasonable separation of presentation logic from business logic, the vast majority of application code can be unit tested and certified completely independently of the Web. The small portion of code that cannot be tested independently of the Web can be validated through the curl extension.


About curl - curl is a client library that supports file transfer over an incredibly wide variety of Internet protocols (for example, FTP, HTTP, HTTPS, LDAP). The best part about curl is that it provides highly granular access to the requests and responses, making it easy to emulate a client browser. To enable curl, you must either configure PHP by using --with-curl if you are building it from source code, or you must ensure that your binary build has curl enabled.


We will talk about user authentication in much greater depth in Chapter 13, “User Authentication and Session Security” but for now let’s evaluate a simple example. You can write a simple inline authentication system that attempts to validate a user based on his or her user cookie. If the cookie is found, this HTML comment is added to the page:

<!-- crafted for NAME !>

First, you need to create a unit test. You can use curl to send a user=george cookie to the authentication page and then try to match the comment that should be set for that user. For completeness, you can also test to make sure that if you do not pass a cookie, you do not get authenticated. Here’s how you do all this:

<?php
require_once "PHPUnit/Framework/TestCase.php";

// WebAuthCase is an abstract class which just sets
up the // url for testing but runs no actual tests. class WebAuthTestCase extends
PHPUnit_Framework_TestCase{ public $curl_handle; public $url; function _ _construct($name) { parent::_ _construct($name); } function setUp() { // initialize curl $this->curl_handle = curl_init(); // set curl to return the response back to us
after curl_exec curl_setopt($this->curl_handle,
CURLOPT_RETURNTRANSFER, 1); // set the url $this->url = "http://devel.omniti.com/auth.php"; curl_setopt($this->curl_handle, CURLOPT_URL,
$this->url); } function tearDown() { // close our curl session when we're finished curl_close($this->curl_handle); } } // WebGoodAuthTestCase implements a test of
successful authentication class WebGoodAuthTestCase extends WebAuthTestCase { function _ _construct($name) { parent::_ _construct($name) ; } function testGoodAuth() { $user = 'george'; // Consturct a user=NAME cookie $cookie = "user=$user;"; // Set the cookie to be sent curl_setopt($this->curl_handle, CURLOPT_COOKIE,
$cookie); // execute our query $ret = curl_exec($this->curl_handle); $this->assertRegExp("/<!-- crafted for
$user -->/", $ret); } } // WebBadAuthTestCase implements a test of
unsuccessful authentication class WebBadAuthTestCase extends WebAuthTestCase { function _ _construct($name) { parent::_ _construct($name); } function testBadAuth() { // Don't pass a cookie curl_setopt($this->curl_handle, CURLOPT_COOKIE,
$cookie); // execute our query $ret = curl_exec($this->curl_handle); if(preg_match("/<!-- crafted for /", $ret)) { $this->fail(); } else { $this->pass(); } } } if(realpath($_SERVER['PHP_SELF']) == _ _FILE_ _) { require_once "PHPUnit/Framework/TestSuite.php"; require_once "PHPUnit/TextUI/TestRunner.php"; $suite = new
PHPUnit_Framework_TestSuite('WebGoodAuthTestCase'); $suite->addTestSuite("WebBadAuthTestCase"); PHPUnit_TextUI_TestRunner::run($suite); } ?>

In contrast with the unit test, the test page is very simple—just a simple block that adds a header when a successful cookie is matched:

<HTML>
<BODY> 
<?php
 if($_COOKIE[user]) {
  echo "<!-- crafted for $_COOKIE[user] -->";
 }
?> 
<?php print_r($_COOKIE) ?> 
Hello World.
</BODY>
</HTML>

This test is extremely rudimentary, but it illustrates how you can use curl and simple pattern matching to easily simulate Web traffic. In Chapter 13, “User Authentication and Session Security,” which discusses session management and authentication in greater detail, you use this WebAuthTestCase infrastructure to test some real authentication libraries.

Further Reading

An excellent source for information on unit testing is Test Driven Development By Example by Kent Beck (Addison-Wesley). The book uses Java and Python examples, but its approach is relatively language agnostic. Another excellent resource is the JUnit homepage, at http://www.junit.org.

If you are interested in learning more about the Extreme Programming methodology, see Testing Extreme Programming, by Lisa Crispin and Tip House (Addison-Wesley), and Extreme Programming Explained: Embrace Change, by Kent Beck (Addison-Wesley), which are both great books.

Refactoring: Improving the Design of Existing Code, by Martin Fowler (Addison-Wesley), is an excellent text that discusses patterns in code refactoring. The examples in the book focus on Java, but the patterns are very general.

There are a huge number of books on qualitative analysis of readability, but if you are primarily interested in learning about the actual formulas used, you can do a Google search on readability score to turn up a number of high-quality results.

[gp-comments width="770" linklove="off" ]
antalya escort bayan antalya escort bayan