Handling HTML Strings and Files with the DOM XML Extension in PHP 5

The DOM XML extension has a few additional methods that can be used to process HTML files and strings, at least at a pretty basic level. Thus, this fifth part of the series will be entirely focused on explaining how to work with these methods. I will include some illustrative code samples.

Working on XML documents with PHP can be quite a challenging task, since this process often involves reading data from remote files, parsing specific nodes, inserting and removing elements from the document tree, and so forth. However, if you’re just about to tackle the development of a web application that requires thorough processing of XML data, there’s no need to start pulling your hair because PHP comes equipped with a helpful extension, called DOM XML, which can be used to parse XML documents in all sorts of clever ways using the API of the Document Object Model.

Understanding how to use the methods and properties provided by this library requires a little effort from you, despite its fairly easy learning curve. Therefore, if you’re looking for a comprehensive guide on how to get the most out of this powerful XML-related PHP extension, then you should take a closer look at this article series. In doing so, you’ll hopefully learn a few useful things, such as creating XML documents from scratch, appending, copying, and removing nodes, and even processing HTML strings by way of their DOM representation. What else can you ask for?

Now that you’re well aware of the topics that are treated in this series of articles, let me quickly review the items that were discussed in the last tutorial. This way you’ll be able to have clearer idea about how they can be linked with the ones I plan to cover in this article.

Throughout the course of the preceding tutorial I explained how to access multiple nodes of a specific XML document using the “getElementsByTagName()” method, which presents practically the same functionality as the one utilized to parse web pages with JavaScript. In addition, I demonstrated how to read XML data from an existing text file via the “load()” method. And finally, I demonstrated how to read XML strings by way of a similar method, called “loadXML()”.

As you can see, the DOM XML extension has plenty of options when it comes to moving portions of an XML document (or even the entire document) from one place to another. This is certainly a process that can be performed with minor hassles by utilizing the intuitive DOM API mentioned in the beginning.

So far, so good. At this moment, I’m assuming that you’ve acquired a considerable background in performing some basic operations on simple XML documents through the functionality offered by the DOM XML extension. But what if I tell you that this library permits you to work with data formatted in plain HTML?

Let’s not waste any more time in preliminaries and continue learning how to work with HTML and the DOM XML library. Let’s get started!

{mospagebreak title=Loading HTML code from a specific string with the loadHTML() method}

In consonance with the concepts that I deployed in the introduction, the DOM XML extension includes some useful methods aimed specifically at handling HTML in different ways. To demonstrate this specific capacity of the library I’m going to teach you how to utilize the brand new “loadHTML()” method, which comes in handy when loading an HTML string onto the web server’s memory.

That being said, here’s an example that illustrates how the aforementioned method does its thing:


// example on loading HTML from an existing string using the ‘loadHTML()’ method


$html=’<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-
Type" content="text/html; charset=iso-8859-1" /><title>Example of loading HTML</title></head><body><p>Example on loading HTML</p></body></html>’;

$dom=new DOMDocument();

$dom->loadHTML($html);

echo $dom->saveHTML();


/* displays the following

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<title>Example of loading HTML</title>

</head>

<body><p>Example on loading HTML</p></body>

</html>

*/


As demonstrated by the above code sample, the “loadHTML()” method is convenient for loading an HTML string into a PHP object. In this particular case, the method in question is first used to retrieve some simple HTML content, which is then echoed to the browser. Not too hard to understand, right?

So far, so good. With the introduction of the previous hands-on example, hopefully you’ve already grasped how the pertinent “loadHTML()” method does its business. However, the DOM XML extension still has some other helpful methods that can be used to handle HTML. Thus, since you may want to learn how to put them to work for you, in the upcoming section I’ll be teaching you a brand new method, named “loadHTMLFile(), which can be used to read the contents of a specified HTML file.

To learn the full details of how to use this handy method, please click on the link that appears below and keep reading.

{mospagebreak title=Reading contents from HTML files with the loadHTMLFile() method}

As I stated in the previous section, the DOM XML library comes equipped with another useful method that can be used to read the contents of a specified HTML file, which can then be easily processed later on.

The method that performs such a useful task is called “loadHTMLFile()” and can be implemented in the following way:


(definition of ‘sample_file.htm’ file)


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

<title>This is a sample HTML file</title>

</head>

<body>

<p>This is a sample HTML file</p>

</body>

</html>



// example on loading HTML from an existing file using the ‘loadHTMLFile()’ method


$dom=new DOMDocument();

// load HTML from existing file

$dom->loadHTMLFile(‘sample_file.htm’);

echo $dom->saveHTML();


/* displays the following

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<title>This is a sample HTML file</title>

</head>

<body>

<p>This is a sample HTML file</p>

</body>

</html>

*/


As you can see, I first defined a primitive HTML file, called “sample_file.htm”, and then coded a simple script that is responsible for using the aforementioned “loadHTMLFile()” method to read the corresponding file content. In the end, the script finishes its execution by displaying this content on the browser.

After looking at the previous hands-on example, you’ll have to agree with me that using the DOM XML API to read the content from a few specified HTML files is actually a no-brainer process that can be tackled with minor troubles.

All right, you learned a few additional methods that come packaged with the DOM XML extension aimed at handling HTML in different ways. Of course, the functionality of these methods is rather limited, but they do perform decently when it comes to working with basic HTML strings and files.

However, the DOM XML library provides you with yet another HTML-related method that can potentially be useful, particularly in those cases where you need to build an HTML file from its corresponding DOM representation. This sounds interesting, right? Assuming that this topic has caught your attention, in the last section of this tutorial, I’m going to show you how to create HTML files using the DOM API.

To see how this will be achieved, please jump ahead and read the next few lines.

{mospagebreak title=Building HTML files from their DOM representation with the saveHTMLFile() method}


As I mentioned in the section that you just read, the last example that I’m going to teach you in this article will be specifically aimed at demonstrating how to build a basic HTML file from its corresponding DOM representation. Surely, at this very moment, you’re wondering how this process can be performed. The answer to that question is via the brand new “saveHTMLFile()” method.

In order to help you grasp how this method functions, below I coded another illustrative example that shows how to create a primitive HTML document via the DOM API and then save it to a specified file destination.

Here’s the corresponding code sample. Please take a close look at it: 

// example on building an HTML document from the DOM representation using the saveHTMLFile()’ method


$dom=new DOMDocument(’1.0′);

// format output

$dom->formatOutput=true;

// create <html> element

$root=$dom->createElement(‘html’);

$root=$dom->appendChild($root);

// create <head> element

$head=$dom->createElement(‘head’);

$head=$root->appendChild($head);

// create <title> element

$title=$dom->createElement(‘title’);

$title=$head->appendChild($title);

// create text for title of HTML page

$text=$dom->createTextNode(‘This title was created with the DOM XML extension of PHP’);

$text=$title->appendChild($text);

// save HTML to file

echo ‘The file just created has a size of ‘. $dom->saveHTMLFile(‘test_file.htm’). ‘ bytes.’;


/* displays the following

The file just created has a size of 168 bytes

*/


As shown in the previous example, a trivial HTML document is built by using some of the methods that you learned so far, such as “createElement()” and “appendChild()” respectively. Then, once the document in question has been created, it’s simply saved to a target file, called “test_file.htm”, by calling the pertinent “saveHTMLFile()” method.

While the functionality of “saveHTMLFile()” is pretty easy to grasp, you should notice two important things. First, I used the brand new property, called “formatOutput”, to adequately format the contents of the file that is just about to be created. And second, if the mentioned file doesn’t exist, then the “saveHTMLFile()” method will attempt to build it. Simple and short!

Lastly, before I forget, here’s the signature of the sample HTML file constructed with “saveHTMLFile()”:


<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>This title was created with the DOM XML extension of PHP</title>

</head></html>


As usual with my articles on PHP development, you’re free to tweak all of the code samples developed earlier so you can improve your skills using the DOM XML PHP extension.

Final thoughts

In this fifth chapter of the series, you learned how to handle HTML strings and files through the set of methods provided by the DOM XML extension. Since they’re quite simple to use, you shouldn’t have major difficulties incorporating them into your own PHP applications.

In the upcoming part, I’ll be exploring the solid capabilities of the DOM XML library when it comes to parsing parent and child nodes of a specified XML document, so you don’t have any excuses to miss it! 

Google+ Comments

Google+ Comments