Home arrow PHP arrow Page 2 - Parsing Web Document Nodes with the Tidy Library in PHP 5

Parsing and formatting basic (X)HTML code with Tidy - PHP

Writing well-formatted (X)HTML code to include in the presentation layers of certain PHP applications can be an annoying and time-consuming process for many web developers. However, the Tidy extension that comes integrated with PHP 5 can turn this ugly task into a pleasant experience. Keep reading to learn how.

TABLE OF CONTENTS:
  1. Parsing Web Document Nodes with the Tidy Library in PHP 5
  2. Parsing and formatting basic (X)HTML code with Tidy
  3. Using the tidy_get_html() and tidy_get_head() functions
  4. Using the tidy_get_body() and tidy_get_ouput() functions
By: Alejandro Gervasio
Rating: starstarstarstarstar / 4
July 03, 2007

print this article
SEARCH DEV SHED

TOOLS YOU CAN USE

advertisement

Before I continue discussing the other functions included with the Tidy extension, I'd like to review some important topics, such as the implementation of the functions covered in the preceding article of the series. Doing so should give you a much better idea of how these previous functions can be linked with the new ones that I plan to explain in a few moments.

Having said that, below I included some illustrative examples concerning the use of the hopefully familiar "tidy_parse_string()," "tidy_parse_file()" and "tidy_repair_string()" functions respectively. All of them were explained in detail in the first tutorial of this series.

Here are the corresponding code samples. Take a look at them please:

// example of 'tidy_parse_string()' function

<?php
ob_start();
?>
<html>
  <head>
   <title>This file will be parsed by Tidy</title>
  </head>
  <body>
   <p>This is an erroneous line
   <p>This is another erroneous line</i>
  </body>
</html>
<?php
$fileContents=ob_get_clean();
$params=array('indent'=>TRUE,'output-xhtml'=>TRUE,'wrap'=>200);
$tidy=tidy_parse_string($fileContents,$params,'UTF8');
$tidy->cleanRepair();
echo $tidy;

/* displays the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
        This file will be parsed by Tidy
    </title>
  </head>
  <body>
    <p>This is an erroneous line</p>
    <p>This is another erroneous line</p>
  </body>
</html>
*/

// example of 'tidy_repair_string()' function

<?php
ob_start();
?>
<html>
  <head>
   <title>This file will be parsed by Tidy</title>
  </head>
  <body>
   <p>This is an erroneous line
   <p>This is another erroneous line</i>
  </body>
</html>
<?php
$fileContents=ob_get_clean();
$tidy=tidy_repair_string($fileContents);
echo $tidy;

/* displays the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      This file will be parsed by Tidy
    </title>
  </head>
  <body>
    <p>This is an erroneous line</p>
    <p>This is another erroneous line</p>
  </body>
</html>
*/

// example of 'tidy_parse_file()' function
// definition of (target_file.htm)

<html>
  <head>
   <title>This file will be parsed by Tidy</title>
  </head>
  <body>
   <p>This is an erroneous line
   <p>This is another erroneous line</i>
  </body>
</html>
$tidy=tidy_parse_file('target_file.html');
$tidy->cleanRepair();
if(!empty($tidy->errorBuffer)){
    trigger_error('Some errors occurred when parsing target
file'.$tidy->errorBuffer,E_USER_ERROR);
}

Undoubtedly, after analyzing the above hands-on examples, you'll recall how the useful "tidy_parse_string()," "tidy_parse_file()" and "tidy_repair_string()" functions can be used to fix and format correctly any web document. Of course, as I stated in the first tutorial of the series, these are simple demonstrations of how to utilize these useful functions, but I'm sure that they'll be for you a good starting point toward the development of more complex (X)HTML parsing applications.

All right, at this stage you hopefully recalled how to implement, at least basically, the three previous functions bundled with the Tidy extension. So, what's the next step? Well, in accordance with the concepts deployed in the introduction of this article, basically I plan to show you how to use a few additional functions integrated with Tidy, whose functionality is aimed mainly at extracting different nodes of a given (X)HTML string.

To see how these brand new functions can be put to work in a useful way, please click on the link that appears below and keep reading.



 
 
>>> More PHP Articles          >>> More By Alejandro Gervasio
 

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort
   

PHP ARTICLES

- Hackers Compromise PHP Sites to Launch Attac...
- Red Hat, Zend Form OpenShift PaaS Alliance
- PHP IDE News
- BCD, Zend Extend PHP Partnership
- PHP FAQ Highlight
- PHP Creator Didn't Set Out to Create a Langu...
- PHP Trends Revealed in Zend Study
- PHP: Best Methods for Running Scheduled Jobs
- PHP Array Functions: array_change_key_case
- PHP array_combine Function
- PHP array_chunk Function
- PHP Closures as View Helpers: Lazy-Loading F...
- Using PHP Closures as View Helpers
- PHP File and Operating System Program Execut...
- PHP: Effects of Wrapping Code in Class Const...

Developer Shed Affiliates

 


Dev Shed Tutorial Topics: