HomePHP Page 3 - Parsing Web Document Nodes with the Tidy Library in PHP 5
Using the tidy_get_html() and tidy_get_head() functions - PHP
Writing well-formatted (X)HTML code to include in the presentation layers of certain PHP applications can be an annoying and time-consuming process for many web developers. However, the Tidy extension that comes integrated with PHP 5 can turn this ugly task into a pleasant experience. Keep reading to learn how.
Indeed, it must be admitted that breaking a concrete (X)HTML string into different parts for further processing isn't the most common task that a web developer has to tackle on a frequent basis. Regardless, the Tidy library has a respectable number of functions which are precisely targeted to extracting or dissecting a specific (X)HTML string into its main sections.
Speaking more specifically, Tidy offers two concrete functions, called "tidy_get_html()" and "tidy_get_head()" respectively, which are tasked with breaking the structure of a concrete (X)HTML string into several pieces.
But, let me get rid of these boring explanations and show you a couple of illustrative examples of how to use these new Tidy functions. Here are the corresponding code samples:
// example on using the 'tidy_get_html()' function
$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing Tidy</title></head><body><p>Testing Tidy</p></body></html>'; $tidy=tidy_parse_string($html); $htmlNode=tidy_get_html($tidy); echo $htmlNode->value;
/* displays the following: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>Testing Tidy</title> </head> <body><p>Testing Tidy</p> </body> </html> */
// example on using the 'tidy_get_head()' function
$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing Tidy</title></head><body><p>Testing Tidy</p></body></html>'; $tidy=tidy_parse_string($html); $headNode=tidy_get_html($tidy); echo $headNode->value;
/* displays the following: <head> <title>Testing Tidy</title> </head> */
True to form, that's all the source code required to test the previous "tidy_get_html()" and "tidy_get_head()" functions. As you can see, the functions in question are indeed very easy to follow, since they demonstrate in a simple fashion how the different sections of a specific (X)HTML string can be extracted separately.
Of course, as you might have guessed, the implementation of the first hands-on example is rather useless, simply because the "tidy_get_html()" function returns the whole (X)HTML string as a new node, which is directly displayed on the browser via its "value" property. However, it's worthwhile to mention that the second case is slightly more useful, since it first extracts the <head> part of a sample (X)HTML string, and then displays its contents by utilizing the aforementioned "value" property.
So far, so good, right? At this point I'm pretty certain that you already grasped the logic that stand behinds dissecting a concrete (X)HTML string into different parts for further processing. As you learned from the pair of practical examples shown above, this process is reduced simply to calling the appropriate Tidy function, then extracting the selected part of a given (X)HTML string, and finally displaying the pertinent contents on the browser.
However, the Tidy extension still has a couple of extra functions which can be useful when it comes to breaking a concrete (X)HTML string into several sections. Therefore, considering that these brand new functions might be interesting to you, in the following section I'm going to show you how to use them to extract the <body> part of a given (X)HTML string, in addition to parsing and fixing the string in question as an unique node.
To learn how these tasks can be performed with the Tidy library, please jump ahead and read the following lines. I'll be there, waiting for you.