PHP
  Home arrow PHP arrow Page 3 - Parsing Web Document Nodes with the Tidy Library in PHP 5
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
Google.com  
PHP

Parsing Web Document Nodes with the Tidy Library in PHP 5
By: Alejandro Gervasio
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 4
    2007-07-03


    Table of Contents:
  • Parsing Web Document Nodes with the Tidy Library in PHP 5
  • Parsing and formatting basic (X)HTML code with Tidy
  • Using the tidy_get_html() and tidy_get_head() functions
  • Using the tidy_get_body() and tidy_get_ouput() functions

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Parsing Web Document Nodes with the Tidy Library in PHP 5 - Using the tidy_get_html() and tidy_get_head() functions
    ( Page 3 of 4 )

    Indeed, it must be admitted that breaking a concrete (X)HTML string into different parts for further processing isn't the most common task that a web developer has to tackle on a frequent basis. Regardless, the Tidy library has a respectable number of functions which are precisely targeted to extracting or dissecting a specific (X)HTML string into its main sections.

    Speaking more specifically, Tidy offers two concrete functions, called "tidy_get_html()" and "tidy_get_head()" respectively, which are tasked with breaking the structure of a concrete (X)HTML string into several pieces.

    But, let me get rid of these boring explanations and show you a couple of illustrative examples of how to use these new Tidy functions. Here are the corresponding code samples:

    // example on using the 'tidy_get_html()' function

    $html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html
    xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing
    Tidy</title></head><body><p>Testing Tidy</p></body></html>';
    $tidy=tidy_parse_string($html);
    $htmlNode=tidy_get_html($tidy);
    echo $htmlNode->value;

    /* displays the following:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head><title>Testing Tidy</title>
    </head>
    <body><p>Testing Tidy</p>
    </body>
    </html>
    */

    // example on using the 'tidy_get_head()' function

    $html='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html
    xmlns="http://www.w3.org/1999/xhtml"><head><title>Testing
    Tidy</title></head><body><p>Testing Tidy</p></body></html>';
    $tidy=tidy_parse_string($html);
    $headNode=tidy_get_html($tidy);
    echo $headNode->value;

    /* displays the following:
    <head>
    <title>Testing Tidy</title>
    </head>
    */

    True to form, that's all the source code required to test the previous "tidy_get_html()" and "tidy_get_head()" functions. As you can see, the functions in question are indeed very easy to follow, since they demonstrate in a simple fashion how the different sections of a specific (X)HTML string can be extracted separately.

    Of course, as you might have guessed, the implementation of the first hands-on example is rather useless, simply because the "tidy_get_html()" function returns the whole (X)HTML string as a new node, which is directly displayed on the browser via its "value" property. However, it's worthwhile to mention that the second case is slightly more useful, since it first extracts the <head> part of a sample (X)HTML string, and then displays its contents by utilizing the aforementioned "value" property.

    So far, so good, right? At this point I'm pretty certain that you already grasped the logic that stand behinds dissecting a concrete (X)HTML string into different parts for further processing. As you learned from the pair of practical examples shown above, this process is reduced simply to calling the appropriate Tidy function, then extracting the selected part of a given (X)HTML string, and finally displaying the pertinent contents on the browser.

    However, the Tidy extension still has a couple of extra functions which can be useful when it comes to breaking a concrete (X)HTML string into several sections. Therefore, considering that these brand new functions might be interesting to you, in the following section I'm going to show you how to use them to extract the <body> part of a given (X)HTML string, in addition to parsing and fixing the string in question as an unique node.

    To learn how these tasks can be performed with the Tidy library, please jump ahead and read the following lines. I'll be there, waiting for you.



     
     
    >>> More PHP Articles          >>> More By Alejandro Gervasio
     

       

    PHP ARTICLES

    - Implementing Factory Methods in PHP 5
    - Merging a File Split for FTP Upload using PHP
    - Getting Data from Yahoo Site Explorer Inboun...
    - Method Chaining: Adding More Selecting Metho...
    - How to Split a File During an FTP Upload Usi...
    - Expanding a Custom CodeIgniter Library with ...
    - Using the Yahoo Site Explorer Inbound Links ...
    - Building a CodeIgniter Custom Library with M...
    - Building an E-mini Trading System Using PHP ...
    - Completing the MySQL Class with Method Chain...
    - Building Dynamic Queries with Chainable Meth...
    - PHP Encryption and Decryption Methods
    - Building a MySQL Abstraction Class with Meth...
    - Completing a Sample String Processor with Me...
    - Mastering WHILE Loops for PHP and MySQL





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek