Home arrow PHP arrow Page 3 - Graphical Interfaces and Unit Testing

Bug Report 1 - PHP

In this final part of a three-part series on unit testing, we discuss the use of graphical interfaces, unit testing in a web environment, and more. The article is excerpted from chapter six of the book Advanced PHP Programming, written by George Schlossnagle (Sams; ISBN: 0672325616).

TABLE OF CONTENTS:
  1. Graphical Interfaces and Unit Testing
  2. Testing the Word Class
  3. Bug Report 1
  4. Unit Testing in a Web Environment
By: Sams Publishing
Rating: starstarstarstarstar / 2
November 02, 2006

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Sure enough, when you begin testing the code you created in the previous sections, you begin receiving bug reports. The sentence counts seem to be off for texts that contain abbreviations (for example, Dear Mr. Smith). The counts come back as having too many sentences in them, skewing the Flesch scores.

You can quickly add a test case to confirm this bug. The tests you ran earlier should have caught this bug but didn't because there were no abbreviations in the text. You don't want to replace your old test case (you should never casually remove test cases unless the test itself is broken); instead, you should add an additional case that runs the previous statistical checks on another document that contains abbreviations. Because you want to change only the data that you are testing on and not any of the tests themselves, you can save yourself the effort of writing this new TestCase object from scratch by simply subclassing the TextTestCase class and overloading the setUp method. Here's how you do it:

class AbbreviationTestCase extends TextTestCase {
function setUp() {
$this->sample = "
Dear Mr. Smith,
Your request for a leave of absence has been
approved. Enjoy your vacation. "; $this->numSentences = 2; $this->numWords = 16; $this->numSyllables = 24; $this->object = new
Text_Statistics($this->sample); } function _ _construct($name) { parent::_ _construct($name); } }

Sure enough, the bug is there. Mr. matches as the end of a sentence. You can try to avoid this problem by removing the periods from common abbreviations. To do this, you need to add a list of common abbreviations and expansions that strip the abbreviations of their punctuation. You make this a static attribute of Text_Statistics and then substitute on that list during analyze_line. Here's the code for this:

class Text_Statistics {
// ...
static $abbreviations = array('/Mr\./' =>'Mr',
'/Mrs\./i' =>'Mrs',
'/etc\./i' =>'etc',
'/Dr\./i' =>'Dr',
);
// ...
protected function analyze_line($line) {
// replace our known abbreviations 
$line =
preg_replace(array_keys(self::$abbreviations), array_values(self::$abbreviations), $line); preg_match_all("/\b(\w[\w'-]*)\b/", $line,
$words); foreach($words[1] as $word) { $word = strtolower($word); $w_obj = new Text_Word($word); $this->numSyllables += $w_obj->numSyllables(); $this->numWords++; if(!isset($this->_uniques[$word])) { $this->_uniques[$word] = 1; } else { $this->uniqWords++; } } preg_match_all("/[.!?]/", $line, $matches); $this->numSentences += count($matches[0]); } }

The sentence count is correct now, but now the syllable count is off. It seems that Mr. counts as only one syllable (because it has no vowels). To handle this, you can expand the abbreviation expansion list to not only eliminate punctuation but also to expand the abbreviations for the purposes of counting syllables. Here's the code that does this:

class Text_Statistics {
// ...
static $abbreviations = array('/Mr\./' =>'Mister',
'/Mrs\./i' =>'Misses', //Phonetic
'/etc\./i' =>'etcetera',
'/Dr\./i' =>'Doctor',
);
// ...
}

There are still many improvements you can make to the Text_Statistics routine. The $silentSyllable and $additionalSyllable arrays for tracking exceptional cases are a good start, but there is still much work to do. Similarly, the abbreviations list is pretty limited at this point and could easily be expanded as well. Adding multilingual support by extending the classes is an option, as is expanding the statistics to include other readability indexes (for example, the Gunning FOG index, the SMOG index, the Flesch-Kincaid grade estimation, the Powers-Sumner-Kearl formula, and the FORCAST Formula). All these changes are easy, and with the regression tests in place, it is easy to verify that modifications to any one of them does not affect current behavior.



 
 
>>> More PHP Articles          >>> More By Sams Publishing
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PHP ARTICLES

- Hackers Compromise PHP Sites to Launch Attac...
- Red Hat, Zend Form OpenShift PaaS Alliance
- PHP IDE News
- BCD, Zend Extend PHP Partnership
- PHP FAQ Highlight
- PHP Creator Didn't Set Out to Create a Langu...
- PHP Trends Revealed in Zend Study
- PHP: Best Methods for Running Scheduled Jobs
- PHP Array Functions: array_change_key_case
- PHP array_combine Function
- PHP array_chunk Function
- PHP Closures as View Helpers: Lazy-Loading F...
- Using PHP Closures as View Helpers
- PHP File and Operating System Program Execut...
- PHP: Effects of Wrapping Code in Class Const...

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: