Working with the Tidy Library in PHP 5

As a PHP developer, you’ve probably developed database-driven applications that deliver their contents in (X)HTML format to the end user. If so, you know that when you work directly with hard-coded (X)HTML files, you risk forgetting to close tags and DTD headers, making the process annoying and time-consuming. Keep reading; help is on the way.

Introduction

As you know, hand-coding (X)HTML files certainly forces your PHP applications to have a clean and tight presentation layer, but actually giving your code this quality comes at a cost. You’ll need to verify that all those files have been created in the correct format. Is there any way to make this process as painless as possible?

Of course there is! If you’ve ever worked with code editors like Home Site and Dreamweaver MX, to name just two, then you’ll recall that all of them incorporate the popular Tidy application as part of their arsenal of code-cleaning tools. It’s extremely useful for correcting rapidly any errors that have occurred while coding (X)HTML files.

However, the real good news about Tidy is that you can take advantage of its neat features by accessing it from your PHP files, since this excellent code-cleaning package is now available as an external library in PHP 5. This means that you can correct any source (X)HTML files very efficiently and with minor hassles.

Thus, now that you know that the Tidy (X)HTML formatting/correcting application can be called directly from your own PHP 5 scripts, over the course of this series, which is comprised of three friendly tutorials, I’m going to walk you through using the bunch of useful functions included with this library. Naturally I’ll accompany the corresponding theory with a decent number of code samples, so you can learn quickly how to make this PHP extension work for you in a very short time.

Now, it’s time to get rid of the preliminaries and start learning how to work with the Tidy library and PHP 5. Let’s go!

{mospagebreak title=Parsing (X)HTML strings}

As I stated in the beginning of this article, the Tidy library can be really useful in those cases where a specific section of (X)HTML has been badly formatted and, in consequence, it needs to be fixed quickly.

With reference to performing this code-correcting process, Tidy is packaged with a neat set of format-cleaning functions, starting with the one called "tidy_parse_string()" whose implementation is demonstrated by the example below:

<?php
// example of ‘tidy_parse_string()’ function
ob_start();
?>
<html>
  <head>
   <title>This file will be parsed by Tidy</title>
  </head>
  <body>
   <p>This is an erroneous line
   <p>This is another erroneous line</i>
  </body>
</html>
<?php
$fileContents=ob_get_clean();
$params=array(‘indent’=>TRUE,’output-xhtml’=>TRUE,’wrap’=>200);
$tidy=tidy_parse_string($fileContents,$params,’UTF8′);
$tidy->cleanRepair();
echo $tidy;

/* displays the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
        This file will be parsed by Tidy
    </title>
  </head>
  <body>
    <p>This is an erroneous line</p>
    <p>This is another erroneous line</p>
  </body>
</html>
*/

As you can see, there are some important things to note with reference to the above example. First, I used the previously mentioned "tidy_parse_string()" function along with a few simple formatting input parameters to logically parse a specified (X)HTML string. In this case, the string in question has been placed into an output buffer, and then interpreted. This condition can be easily modified, however — for instance, to read the respective data via a native PHP function.

Besides, you should notice that the "tidy_parse_string()" function returns to client code a new "Tidy" object, which has a bunch of methods and properties that can be really useful to perform a variety of tasks, including the correction of missing and erroneous tags. The previous example shows how to format properly the prior (X)HTML string via the "cleanRepair()" method.

And finally, you can see that the sample string has been fixed, not only by correcting its erroneous <p> and </li> tags, but adding on top of it a DTD statement. Undoubtedly, after studying the previous code sample, you’ll have to agree with me that using the Tidy extension with PHP 5 is indeed a no-brainer process, right?

Okay, at this point I showed you how to use the "tidy_parse_string()" function, with the purpose of parsing and formatting correctly some basic (X)HTML markup. However, Tidy has another function called "tidy_clean_repair()," which as you’ll see in a moment, can also be helpful for repairing badly-formatted (X)HTML strings.

To learn how this brand new Tidy function will be implemented, please jump into the following lines and keep reading.

{mospagebreak title=Implementing the tidy_clean_repair() function}

As I stated previously, the Tidy library comes equipped with another useful function, named "tidy_clean_repair()," which behaves in a manner nearly identical to the "cleanRepair()" method demonstrated in the section that you just read. In this case, this brand new function will fix any badly-formatted (X)HTML markup, and its usage is illustrated by the following example:

// example of ‘tidy_clean_repair()’ function
$html='<html><head><title>This file will be parsed by
Tidy</title></head><body><p>This is an erroneous line</i>This is
another erroneous line</i></body></html>';
$tidy=tidy_parse_string($html);
tidy_clean_repair($tidy);
echo $tidy;

/* displays the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      This file will be parsed by Tidy
    </title>
  </head>
  <body>
    <p>This is an erroneous line</p>
    <p>This is another erroneous line</p>
  </body>
</html>

As you can see, using the above "tidy_clean_repair()" function is indeed a very straightforward process, since the function in question performs a clean-up task on a specified (X)HTML string, certainly behaving identically to its cousin "cleanRepair()" method.

Additionally, when it comes to correcting the format of a specific (X)HTML string, the Tidy library also offers the neat "tidy_repair_string()" function, which can be used as indicated below:

// example of ‘tidy_repair_string()’ function
ob_start();
?>
<html>
  <head>
   <title>This file will be parsed by Tidy</title>
  </head>
  <body>
   <p>This is an erroneous line
   <p>This is another erroneous line</i>
  </body>
</html>
<?php
$fileContents=ob_get_clean();
$tidy=tidy_repair_string($fileContents);
echo $tidy;

/* displays the following:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>
      This file will be parsed by Tidy
    </title>
  </head>
  <body>
    <p>This is an erroneous line</p>
    <p>This is another erroneous line</p>
  </body>
</html>
*/

So far, so good, right?. At this point you’ve hopefully learned how to use a few useful functions included with the Tidy library to format correctly a particular (X)HTML string. Nevertheless, as you might have guessed, Tidy has plenty of neat functions when it comes to fixing badly-formatted markup.

Thus, keeping in mind this important fact, in the section to come I’m going to show you how to use the excellent functionality provided by the Tidy extension to parse, and eventually correct, the format of different (X)HTML files.

As you know, this brand new Tidy feature will be covered in the next few lines, so click on the link that appears below and keep reading.

{mospagebreak title=Using the tidy_parse_file() and tidy_repair_file() functions}

In consonance with the concepts deployed in the prior section, you already saw that the Tidy extension comes equipped with a set of functions for parsing and fixing specified (X)HTML strings. In a similar fashion, this powerful PHP package also includes a pair of additional functions, called "tidy_parse_file()" and "tidy_repair_file()" respectively, which are helpful for parsing and correcting badly-formatted (X)HTML files.

Even when the difference between parsing files or plain strings may seem subtle from a theoretical point of view, in practical terms, it may be rather relevant, particularly in those cases where your PHP applications need to deal with (X)HTML markup stored on text files.

Having clarified this important point, please pay attention to the following pair of code samples, which demonstrate a simple implementation for the aforementioned Tidy functions. Here’s how the examples look:

// example of ‘tidy_parse_file()’ function

// definition of (target_file.htm)
<html>
  <head>
   <title>This file will be parsed by Tidy</title>
  </head>
  <body>
   <p>This is an erroneous line
   <p>This is another erroneous line</i>
  </body>
</html>

$tidy=tidy_parse_file(‘target_file.html’);
$tidy->cleanRepair();
if(!empty($tidy->errorBuffer)){
    trigger_error(‘Some errors occurred when parsing target
file’.$tidy->errorBuffer,E_USER_ERROR);
}

// example of ‘tidy_repair_file()’ function
$brokenFile=’target_file.htm';
$fixedFile=tidy_repair_file($brokenFile);
if(!file_put_contents($brokenFile,$fixedFile)){
            trigger_error(‘Error putting fixed contents on target
file’,E_USER_ERROR);
}

As demonstrated above, the first example shows how to take advantage of the capacity provided by the "tidy_parse_file()" function to interpret the markup of a concrete (X)HTML file, which is finally corrected via the already familiar "cleanRepair()" method that you learned in a previous section.

The second case illustrates a simple implementation of the "tidy_repair_file()" function, which comes in very convenient for reading and fixing the contents of a concrete (X)HTML file in only one step.

At this stage, and after analyzing in detail the pair of hands-on examples listed a few lines above, hopefully you’ll have a much better idea of how to use (at least basically) some of the most useful  functions that come integrated with the Tidy extension in PHP 5.

Of course, this tutorial is intended to be a simple introduction to the main features provided by this excellent PHP 5 package. If you’re searching for a more detailed reference on Tidy’s functions and properties, the PHP official site is the best place to look.

Final thoughts

Unfortunately, we’ve come to the end of this first article of the series. As you saw in this tutorial, I walked you though the basic concepts of using the Tidy extension in conjunction with PHP 5. Nevertheless, this instructive journey is just beginning, since there are many other useful Tidy functions that still need to be properly reviewed, so you can acquire a more solid background in this helpful PHP extension.

In the next installment of the series I’m going to teach you how to use some additional functions included with the Tidy library for extracting specific nodes of a given (X)HTML document.

Now that you know what to expect from the next part, you won’t want to miss it! 

[gp-comments width="770" linklove="off" ]

chat sex hikayeleri Ensest hikaye