PHP
  Home arrow PHP arrow Page 4 - Parsing Web Document Nodes with the Tidy Library in PHP 5
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
PHP

Parsing Web Document Nodes with the Tidy Library in PHP 5
By: Alejandro Gervasio
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 4
    2007-07-03


    Table of Contents:
  • Parsing Web Document Nodes with the Tidy Library in PHP 5
  • Parsing and formatting basic (X)HTML code with Tidy
  • Using the tidy_get_html() and tidy_get_head() functions
  • Using the tidy_get_body() and tidy_get_ouput() functions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Parsing Web Document Nodes with the Tidy Library in PHP 5 - Using the tidy_get_body() and tidy_get_ouput() functions
    ( Page 4 of 4 )

    In consonance with the concepts expressed in the section that you just read, the last two functions included with the Tidy library that I plan to teach you in this tutorial will be the ones called "tidy_get_body()" and "tidy_get_output()." As you may guess, the first function comes in handy for extracting the <body> section of a concrete (X)HTML string, while the second one simply retrieves the whole string as a unique node.

    Now that I have explained how these brand new Tidy functions work, please take a look at the following code samples, which demonstrate their rather limited functionality:

    // example on using the 'tidy_get_body()' function

    $html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html
    xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing
    Tidy</title></head><body><p>Testing Tidy</p></body></html>';
    $tidy=tidy_parse_string($html);
    $bodyNode=tidy_get_body($tidy);
    echo $bodyNode->value;

    /* displays the following:
    <body>
      <p>Testing Tidy</p>
    </body>
    */

    // example on using the 'tidy_get_output()' function

    $html='<html><head><title>This file will be parsed by
    Tidy</title></head><body><p>This is an erroneous line</i>This is
    another erroneous line</i></body></html>';
    $tidy=tidy_parse_string($html);
    $tidy->cleanRepair();
    echo tidy_get_output($tidy);

    /* displays the following:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>
          This file will be parsed by Tidy
        </title>
      </head>
      <body>
        <p>This is an erroneous line</p>
        <p>This is another erroneous line</p>
      </body>
    </html>
    */

    As you can see, the source code corresponding to the above examples is very easy to follow. In the first case, the "tidy_get_body()" function is used obviously to retrieve the <body> part of a sample (X)HTML string, certainly a procedure that doesn't bear too much discussion.

    With reference to the second code listing, it simply demonstrates how to correct the format of the sample string via the already familiar "cleanRepair()" Tidy method, and then display the respective contents on the browser, in this case using the "tidy_get_output()" function. Quite simple, right?

    Finally, as usual with many of my articles on PHP-based web development, feel free to modify the source code of all the examples shown here, if you want to continue exploring how to handle these useful Tidy functions.

    Final thoughts

    This second article of the series was entirely aimed at demonstrating how to use some simple functions bundled with the Tidy library to extract the different parts of a specified (X)HTML string.

    Nevertheless, this story is not yet finished, since in the last tutorial of the series I'm going to show you how to utilize Tidy's remarkable capabilities to keep track of the eventual errors that occur when parsing a web document. You won't want to miss it!



     
     
    >>> More PHP Articles          >>> More By Alejandro Gervasio
     

       

    PHP ARTICLES

    - Building Dynamic Queries with Chainable Meth...
    - PHP Encryption and Decryption Methods
    - Building a MySQL Abstraction Class with Meth...
    - Completing a Sample String Processor with Me...
    - Mastering WHILE Loops for PHP and MySQL
    - Method Chaining: Adding More Methods to the ...
    - Method Chaining in PHP 5
    - The Role of Interfaces in Applying the Depen...
    - Dependency Injection: Using a Setter Method ...
    - Using a Model Class with the Dependency Inje...
    - Injecting Objects Using Setter Methods with ...
    - Injecting Objects by Constructor with the De...
    - The Dependency Injection Design Pattern in P...
    - Performing Inferential Statistical Analysis ...
    - Performing Descriptive Statistical Analysis ...





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 3 Hosted by Hostway
    Stay green...Green IT