HomePHP Page 3 - Graphical Interfaces and Unit Testing
Bug Report 1 - PHP
In this final part of a three-part series on unit testing, we discuss the use of graphical interfaces, unit testing in a web environment, and more. The article is excerpted from chapter six of the book Advanced PHP Programming, written by George Schlossnagle (Sams; ISBN: 0672325616).
Sure enough, when you begin testing the code you created in the previous sections, you begin receiving bug reports. The sentence counts seem to be off for texts that contain abbreviations (for example, Dear Mr. Smith). The counts come back as having too many sentences in them, skewing the Flesch scores.
You can quickly add a test case to confirm this bug. The tests you ran earlier should have caught this bug but didn't because there were no abbreviations in the text. You don't want to replace your old test case (you should never casually remove test cases unless the test itself is broken); instead, you should add an additional case that runs the previous statistical checks on another document that contains abbreviations. Because you want to change only the data that you are testing on and not any of the tests themselves, you can save yourself the effort of writing this new TestCase object from scratch by simply subclassing the TextTestCase class and overloading the setUp method. Here's how you do it:
class AbbreviationTestCase extends TextTestCase {
function setUp() {
$this->sample = "
Dear Mr. Smith,
Your request for a leave of absence has been
approved. Enjoy your vacation.
";
$this->numSentences = 2;
$this->numWords = 16;
$this->numSyllables = 24;
$this->object = new
Text_Statistics($this->sample);
}
function _ _construct($name) {
parent::_ _construct($name);
}
}
Sure enough, the bug is there. Mr. matches as the end of a sentence. You can try to avoid this problem by removing the periods from common abbreviations. To do this, you need to add a list of common abbreviations and expansions that strip the abbreviations of their punctuation. You make this a static attribute of Text_Statistics and then substitute on that list during analyze_line. Here's the code for this:
The sentence count is correct now, but now the syllable count is off. It seems that Mr. counts as only one syllable (because it has no vowels). To handle this, you can expand the abbreviation expansion list to not only eliminate punctuation but also to expand the abbreviations for the purposes of counting syllables. Here's the code that does this:
There are still many improvements you can make to the Text_Statistics routine. The $silentSyllable and $additionalSyllable arrays for tracking exceptional cases are a good start, but there is still much work to do. Similarly, the abbreviations list is pretty limited at this point and could easily be expanded as well. Adding multilingual support by extending the classes is an option, as is expanding the statistics to include other readability indexes (for example, the Gunning FOG index, the SMOG index, the Flesch-Kincaid grade estimation, the Powers-Sumner-Kearl formula, and the FORCAST Formula). All these changes are easy, and with the regression tests in place, it is easy to verify that modifications to any one of them does not affect current behavior.