Home arrow PHP arrow Page 3 - Parsing Web Document Nodes with the Tidy Library in PHP 5

Using the tidy_get_html() and tidy_get_head() functions - PHP

Writing well-formatted (X)HTML code to include in the presentation layers of certain PHP applications can be an annoying and time-consuming process for many web developers. However, the Tidy extension that comes integrated with PHP 5 can turn this ugly task into a pleasant experience. Keep reading to learn how.

TABLE OF CONTENTS:
  1. Parsing Web Document Nodes with the Tidy Library in PHP 5
  2. Parsing and formatting basic (X)HTML code with Tidy
  3. Using the tidy_get_html() and tidy_get_head() functions
  4. Using the tidy_get_body() and tidy_get_ouput() functions
By: Alejandro Gervasio
Rating: starstarstarstarstar / 4
July 03, 2007

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Indeed, it must be admitted that breaking a concrete (X)HTML string into different parts for further processing isn't the most common task that a web developer has to tackle on a frequent basis. Regardless, the Tidy library has a respectable number of functions which are precisely targeted to extracting or dissecting a specific (X)HTML string into its main sections.

Speaking more specifically, Tidy offers two concrete functions, called "tidy_get_html()" and "tidy_get_head()" respectively, which are tasked with breaking the structure of a concrete (X)HTML string into several pieces.

But, let me get rid of these boring explanations and show you a couple of illustrative examples of how to use these new Tidy functions. Here are the corresponding code samples:

// example on using the 'tidy_get_html()' function

$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html
xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing
Tidy</title></head><body><p>Testing Tidy</p></body></html>';
$tidy=tidy_parse_string($html);
$htmlNode=tidy_get_html($tidy);
echo $htmlNode->value;

/* displays the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>Testing Tidy</title>
</head>
<body><p>Testing Tidy</p>
</body>
</html>
*/

// example on using the 'tidy_get_head()' function

$html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html
xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing
Tidy</title></head><body><p>Testing Tidy</p></body></html>';
$tidy=tidy_parse_string($html);
$headNode=tidy_get_html($tidy);
echo $headNode->value;

/* displays the following:
<head>
<title>Testing Tidy</title>
</head>
*/

True to form, that's all the source code required to test the previous "tidy_get_html()" and "tidy_get_head()" functions. As you can see, the functions in question are indeed very easy to follow, since they demonstrate in a simple fashion how the different sections of a specific (X)HTML string can be extracted separately.

Of course, as you might have guessed, the implementation of the first hands-on example is rather useless, simply because the "tidy_get_html()" function returns the whole (X)HTML string as a new node, which is directly displayed on the browser via its "value" property. However, it's worthwhile to mention that the second case is slightly more useful, since it first extracts the <head> part of a sample (X)HTML string, and then displays its contents by utilizing the aforementioned "value" property.

So far, so good, right? At this point I'm pretty certain that you already grasped the logic that stand behinds dissecting a concrete (X)HTML string into different parts for further processing. As you learned from the pair of practical examples shown above, this process is reduced simply to calling the appropriate Tidy function, then extracting the selected part of a given (X)HTML string, and finally displaying the pertinent contents on the browser.

However, the Tidy extension still has a couple of extra functions which can be useful when it comes to breaking a concrete (X)HTML string into several sections. Therefore, considering that these brand new functions might be interesting to you, in the following section I'm going to show you how to use them to extract the <body> part of a given (X)HTML string, in addition to parsing and fixing the string in question as an unique node.

To learn how these tasks can be performed with the Tidy library, please jump ahead and read the following lines. I'll be there, waiting for you.



 
 
>>> More PHP Articles          >>> More By Alejandro Gervasio
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PHP ARTICLES

- Hackers Compromise PHP Sites to Launch Attac...
- Red Hat, Zend Form OpenShift PaaS Alliance
- PHP IDE News
- BCD, Zend Extend PHP Partnership
- PHP FAQ Highlight
- PHP Creator Didn't Set Out to Create a Langu...
- PHP Trends Revealed in Zend Study
- PHP: Best Methods for Running Scheduled Jobs
- PHP Array Functions: array_change_key_case
- PHP array_combine Function
- PHP array_chunk Function
- PHP Closures as View Helpers: Lazy-Loading F...
- Using PHP Closures as View Helpers
- PHP File and Operating System Program Execut...
- PHP: Effects of Wrapping Code in Class Const...

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: