Tracking Parsing Errors with the Tidy Library in PHP 5

Creating well-formed (X)HTML documents can be a hard-to-accomplish task, particularly for PHP developers who need to focus mainly on the data and business layers of their web applications, and not on their visual presentation modules. However, this issue can be addressed with minor hassles with the assistance of the excellent Tidy extension.

Introduction

Welcome to the last tutorial of the series "Working with the Tidy library in PHP 5." As you might have guessed, this series offers a friendly guide to using the most important functions that come bundled with the Tidy library, so you can start quickly incorporating them into your own PHP applications.

Going straight to subject of this series, you’ll remember that in the preceding article I showed you how to dissect and extract different parts of a given (X)HTML string (or even an (X)HTML file) for further processing. More concretely speaking, I demonstrated how to use the straightforward "tidy_get_html()," "tidy_get_head()" and "tidy_get_body()" functions, obviously included with the Tidy extension, in order to retrieve the entire content of a specific (X)HTML string, in addition to extracting its <head> and <body> sections respectively.

While it must be admitted that retrieving the distinct parts of a concrete (X)HTML string might not be the most useful task for a seasoned PHP developer, it’s valid to mention here that the Tidy library comes equipped with a remarkable set of functions for breaking up a web document in its main section, in this way allowing users to parse the different nodes of the document in question in a simpler way.

All right, at this stage you’ve hopefully learned how to parse and fix the format of a given (X)HTML file via its specific set of Tidy functions, in addition to splitting a file into its most important pieces. So, the question that comes up is the following: what is the next step? Well, from a PHP developer’s point of view, tracking all the errors that occurred when parsing a concrete (X)HTML string might be quite useful. Therefore, in this final tutorial of the series I’m going to cover some new functions bundled with the Tidy extension which are designed to show you the potential errors raised when interpreting (X)HTML data.

Now, with the preliminaries out of our way, it’s time to tackle this last article of the series and learn how to handle parsing errors with the Tidy extension. Let’s move on!

{mospagebreak title=Summarizing some Tidy library concepts}

Before I move on, I’d first like to remind you of the most important topics treated in the preceding article of the series. This review should help you see more easily how the Tidy functions learned in that tutorial can be linked with the ones that I plan to cover in a few moments.

Having said that, below I listed some basic examples of how to use the hopefully familiar "tidy_get_html()," "tidy_get_head()" and "tidy_get_body()" functions, which are tasked with extracting the different sections of a specified (X)HTML string. Here are the corresponding code samples; have a look at them please:

// example on using the ‘tidy_get_html()’ function

$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html
xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing
Tidy</title></head><body><p>Testing Tidy</p></body></html>';
$tidy=tidy_parse_string($html);
$htmlNode=tidy_get_html($tidy);
echo $htmlNode->value;

/* displays the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Testing Tidy</title>
</head>
<body><p>Testing Tidy</p>
</body>
</html>
*/

// example on using the ‘tidy_get_head()’ function

$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html
xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing
Tidy</title></head><body><p>Testing Tidy</p></body></html>';
$tidy=tidy_parse_string($html);
$headNode=tidy_get_html($tidy);
echo $headNode->value;

/* displays the following:
<head>
  <title>Testing Tidy</title>
</head>
*/

// example on using the ‘tidy_get_body()’ function

$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html
xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing
Tidy</title></head><body><p>Testing Tidy</p></body></html>';
$tidy=tidy_parse_string($html);
$bodyNode=tidy_get_body($tidy);
echo $bodyNode->value;

/* displays the following:
<body>
  <p>Testing Tidy</p>
</body>
*/

Hopefully, after studying the respective signatures for the three previous examples, you’ll recall how easy it is to extract the distinct sections of a specified (X)HTML string using the functions provided by the Tidy library. Nonetheless, I believe that these functions in particular don’t bear any additional discussion, since they were already covered in detail in the previous tutorial of the series.

Therefore, it’s a good time to move forward and start looking into the group of Tidy functions aimed specifically at handling all the errors that occur when parsing a concrete (X)HTML string.

Want to see how these brand new functions will be properly implemented? Click on the link below and keep reading.

{mospagebreak title=Using the tidy_get_error_buffer() function}

As I stated in the section that you just read, not surprisingly the Tidy library comes equipped with a neat group of functions whose primary capacity is to return to calling code all the errors raised when parsing a specified (X)HTML string.

Let me show you the first function that belongs to this specific group. It’s called "tidy_get:error_buffer()," and it allows you to easily retrieve, from Tidy’s error buffer, all the warnings raised when interpreting a concrete (X)HTML string. The corresponding code sample is as follows:

// example on using the ‘ tidy_get_error_buffer()’ function
$html='<p>This paragraph will be parsed by tidy</p>';
$tidy=tidy_parse_string($html);
echo tidy_get_error_buffer($tidy);

/* displays the following:
line 1 column 1 – Warning: missing <!DOCTYPE> declaration
line 1 column 1 – Warning: inserting missing ‘title’ element
*/

As illustrated above, the "tidy_get_error_buffer()" function displays all the errors that occurred while parsing a concrete (X)HTML string. In this case, I coded some badly-formatted (X)HTML data, and then used the function to show the pertinent warnings thrown by "tidy_parse_string()." Not too complex, right?

However, as I said earlier, the Tidy library comes with a remarkable set of functions for retrieving all of the potential errors triggered when parsing some (X)HTML data. Therefore, in the following section I’m going to teach you how to use these additional error-handling functions. Click on the link that appears below and keep reading.

{mospagebreak title=Using the tidy_access_count(), tidy_error_count() and tidy_warning_count() functions}

Counting the errors and warnings raised when parsing a specified (X)HTML string can be a no-brainer process with the assistance of some handy functions included with the Tidy extension.

In this case, I’m talking about the useful "tidy_access_count()," "tidy_error_count()" and "tidy_warning_count()" functions. As their names clearly suggest, they are tasked with keeping track of any potential failure triggered at the time of interpreting and fixing a given (X)HTML string.

All right, having introduced these brand new error-handling functions bundled with the Tidy extension, I’m going to show you some hands-on examples which demonstrate in a friendly fashion how to use the functions in question in some concrete cases. Please take a look at them:

// example on using the ‘tidy_access_count()’ function

$html='<html><head><title>This file will be parsed by
Tidy</title></head><body><p>This is paragraph</p></body></html>';
// set accessibility check level: 1, 2 or 3
$params=array(‘accessibility-check’=>3);
$tidy=tidy_parse_string($html,$params);
$tidy->CleanRepair();
$tidy->diagnose();
echo tidy_access_count($tidy);

/* displays the following:
5
*/

// example on using the  ‘tidy_error_count()’ function
$html='<p>This is an erroneous line</i>';
$tidy=tidy_parse_string($html);
echo ‘Number of errors encountered when parsing string is the following:’.tidy_error_count($tidy);

/* displays the following:
1
*/

// example on using the ‘tidy_warning_count()’ function

$html='<p>This is an erroneous line</i>';
$tidy=tidy_parse_string($html);
echo ‘Number of errors encountered when parsing string is the following:’.tidy_warning_count($tidy);

/* displays the following:
Number of errors encountered when parsing string is the following:
4
*/

As illustrated above, the first example uses the "tidy_access_count()" function to display the number of errors triggered when parsing a sample (X)HTML string. Also, it’s worth noting here that this function is used along with another one named "diagnose()." Unfortunately, the reason for engaging in this coupling process hasn’t been specified yet in the official PHP documentation, so for now you’ll have to take this example as it is.

Now that I have clarified the issue surrounding the implementation of the "tidy_access_count()" function, I will explain the second hands-on example. In this case, the number of errors triggered at the time of parsing a sample (X)HTML string is displayed on the browser via the simple "tidy_error_count()" function, which certainly doesn’t bear too much discussion here.

And finally, the third example demonstrates how to count the number of warnings thrown when parsing the same sample (X)HTML string utilizing the "tidy_warning_count()" function.

In addition to the error-handing functions discussed above, I’d like to show you one more, named "tidy_get_status()," which comes in handy for determining the status of a tidy object after parsing a couple of badly-formatted (X)HTML strings.

The corresponding code sample is as follows:  

// example on using the ‘tidy_get_status()’ function

$badhtml1='<p>This is an erroneous line</i>';
$tidyObj1=tidy_parse_string($html);
$badhtml2='<p>This is another erroneous line</i>';
$tidyObj2=tidy_parse_string($html2);
echo ‘Status of tidy object is the following: ‘.tidy_get_status
($tidyObj1);

/* displays the following:
Status of tidy object is the following: 1
*/

echo ‘Status of tidy object is the following’.tidy_get_status($tidyObj2);

/* displays the following:
Status of tidy object is the following: 2
*/

As you can see, the above hands-on example simply shows how a tidy object, which is returned by the already familiar "tidy_parse_string()" function, can modify its status in consonance with the errors raised when interpreting a specified (X)HTML string via the "tidy_get_status()" function.

Even though it’s clear to see that the prior function has a rather limited utility in certain cases, it deserves at least a basic analysis to complete the coverage of error-handing functions included with the Tidy extension.

Final thoughts

That’s all for the moment. Sadly, we’ve come to the end of this series, but hopefully the experience has been pretty instructive. As you learned in these three tutorials, the Tidy library can be really useful if you’re a PHP developer who doesn’t spend much time formatting the (X)HTML documents included in your web applications.

See you in the next PHP tutorial!

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye