PHP
  Home arrow PHP arrow Page 2 - Parsing Web Document Nodes with the Tidy Library in PHP 5
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PHP

Parsing Web Document Nodes with the Tidy Library in PHP 5
By: Alejandro Gervasio
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 4
    2007-07-03


    Table of Contents:
  • Parsing Web Document Nodes with the Tidy Library in PHP 5
  • Parsing and formatting basic (X)HTML code with Tidy
  • Using the tidy_get_html() and tidy_get_head() functions
  • Using the tidy_get_body() and tidy_get_ouput() functions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Parsing Web Document Nodes with the Tidy Library in PHP 5 - Parsing and formatting basic (X)HTML code with Tidy
    ( Page 2 of 4 )

    Before I continue discussing the other functions included with the Tidy extension, I'd like to review some important topics, such as the implementation of the functions covered in the preceding article of the series. Doing so should give you a much better idea of how these previous functions can be linked with the new ones that I plan to explain in a few moments.

    Having said that, below I included some illustrative examples concerning the use of the hopefully familiar "tidy_parse_string()," "tidy_parse_file()" and "tidy_repair_string()" functions respectively. All of them were explained in detail in the first tutorial of this series.

    Here are the corresponding code samples. Take a look at them please:

    // example of 'tidy_parse_string()' function

    <?php
    ob_start();
    ?>
    <html>
      <head>
       <title>This file will be parsed by Tidy</title>
      </head>
      <body>
       <p>This is an erroneous line
       <p>This is another erroneous line</i>
      </body>
    </html>
    <?php
    $fileContents=ob_get_clean();
    $params=array('indent'=>TRUE,'output-xhtml'=>TRUE,'wrap'=>200);
    $tidy=tidy_parse_string($fileContents,$params,'UTF8');
    $tidy->cleanRepair();
    echo $tidy;

    /* displays the following:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>
            This file will be parsed by Tidy
        </title>
      </head>
      <body>
        <p>This is an erroneous line</p>
        <p>This is another erroneous line</p>
      </body>
    </html>
    */

    // example of 'tidy_repair_string()' function

    <?php
    ob_start();
    ?>
    <html>
      <head>
       <title>This file will be parsed by Tidy</title>
      </head>
      <body>
       <p>This is an erroneous line
       <p>This is another erroneous line</i>
      </body>
    </html>
    <?php
    $fileContents=ob_get_clean();
    $tidy=tidy_repair_string($fileContents);
    echo $tidy;

    /* displays the following:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>
          This file will be parsed by Tidy
        </title>
      </head>
      <body>
        <p>This is an erroneous line</p>
        <p>This is another erroneous line</p>
      </body>
    </html>
    */

    // example of 'tidy_parse_file()' function
    // definition of (target_file.htm)

    <html>
      <head>
       <title>This file will be parsed by Tidy</title>
      </head>
      <body>
       <p>This is an erroneous line
       <p>This is another erroneous line</i>
      </body>
    </html>
    $tidy=tidy_parse_file('target_file.html');
    $tidy->cleanRepair();
    if(!empty($tidy->errorBuffer)){
        trigger_error('Some errors occurred when parsing target
    file'.$tidy->errorBuffer,E_USER_ERROR);
    }

    Undoubtedly, after analyzing the above hands-on examples, you'll recall how the useful "tidy_parse_string()," "tidy_parse_file()" and "tidy_repair_string()" functions can be used to fix and format correctly any web document. Of course, as I stated in the first tutorial of the series, these are simple demonstrations of how to utilize these useful functions, but I'm sure that they'll be for you a good starting point toward the development of more complex (X)HTML parsing applications.

    All right, at this stage you hopefully recalled how to implement, at least basically, the three previous functions bundled with the Tidy extension. So, what's the next step? Well, in accordance with the concepts deployed in the introduction of this article, basically I plan to show you how to use a few additional functions integrated with Tidy, whose functionality is aimed mainly at extracting different nodes of a given (X)HTML string.

    To see how these brand new functions can be put to work in a useful way, please click on the link that appears below and keep reading.



     
     
    >>> More PHP Articles          >>> More By Alejandro Gervasio
     

       

    PHP ARTICLES

    - Using Directory Iterators to Build Loader Ap...
    - Using the spl_autoload() Functions to Build ...
    - Working Out of the Object Context to Build L...
    - Using the _autoload() Magic Function to Buil...
    - The Destruct Magic Function in PHP 5
    - The Autoload Magic Function in PHP 5
    - Developing a Recursive Loading Class for Loa...
    - The Sleep and Wakeup Magic Functions in PHP 5
    - Using the Clone Magic Function in PHP 5
    - Including Files Recursively with Loader Appl...
    - The Call Magic Function in PHP 5
    - Designing a Captcha System with PHP and MySQL
    - Using Static Methods to Build Loader Apps in...
    - The Isset and Unset Magic Functions in PHP 5
    - Advanced PHP Form Input Validation to Check ...





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway
    Stay green...Green IT