HomePHP Parsing Web Document Nodes with the Tidy Library in PHP 5
Parsing Web Document Nodes with the Tidy Library in PHP 5
Writing well-formatted (X)HTML code to include in the presentation layers of certain PHP applications can be an annoying and time-consuming process for many web developers. However, the Tidy extension that comes integrated with PHP 5 can turn this ugly task into a pleasant experience. Keep reading to learn how.
Welcome to the second tutorial of the series that began with "Working with the Tidy Library in PHP 5." Made up of three instructive articles, this series steps you through using the most important functions bundled with this powerful library, and complements the corresponding theory with illustrative hands-on examples.
If you already read the first installment of the series, then it's quite possible that you find the Tidy extension very familiar, since its remarkable capacity for parsing and formatting (X)HTML markup is accompanied by an extremely easy learning curve. True to form, Tidy comes equipped with a decent arsenal of functions (or method and properties, if you're using an object-based syntax) that allows you to correct the format of any web document in a few simple steps.
And speaking of performing simple tasks, certainly you'll recall that in the first article of the series I discussed how to parse and format several basic (X)HTML documents, by using some straightforward functions bundled with this library, such as "tidy_parse_file()," "tidy_repair_file()" and "tidy_parse_string()."
As you learned in that tutorial, repairing badly-formatted web documents is actually an effortless process with the assistance of the Tidy extension. Thus, based upon the fact that Tidy has much more to offer when it comes to parsing and fixing (X)HTML code, in this second article of the series I'm going to discuss how to extract different sections of a specific (X)HTML document (called file nodes) by using the capabilities provided by some additional functions included with this library.
At the end of this tutorial you'll be equipped with the required background to dissect the principal nodes of a concrete (X)HTML file with the help of some easy-to-follow Tidy functions.
So, are you ready to explore some more useful features integrated with the Tidy extension? Okay, let's begin this journey now!