By separating content from presentation, XML offers Web developers a powerful alternative to traditional HTML technology...and when you combine that with PHP, you have a truly compelling new set of tools. In this article, find out how PHP's SAX parser can be used to parse XML data and generate HTML Web pages.
Now, the startElement(), endElement() and characterData() functions will be called by the parser as it progresses through the document. We haven't defined these yet - let's do that next:
<?
// use this to keep track of which tag the parser is currently processing
$currentTag
= "";
function startElement($parser, $name, $attrs) {
global $currentTag;
$currentTag
= $name;
// output opening HTML tags
switch ($name) {
case "BOOK":
echo
"<tr>";
break;
case "TITLE":
echo "<td>";
break;
case "AUTHOR":
echo
"<td>";
break;
case "PRICE":
echo "<td>";
break;
case "RATING":
echo
"<td>";
break;
default:
break;
}
}
?>
Each time the parser encounters a starting tag, it calls startElement() with
the name of the tag (and attributes, if any) as arguments. The startElement() function then processes the tag, printing corresponding HTML markup in place of the XML tag.
I've used a "switch" statement, keyed on the tag name, to decide how to process each tag. For example, since I know that <book> indicates the beginning of a new row in my desired output, I replace it with a <tr>, while other elements like <title> and <author> correspond to table cells, and are replaced with <td> tags.
Finally, I've also stored the current tag name in the global variable $currentTag - this can be used to identify which tag is being processed at any stage, and it'll come in useful a little further down.
The endElement() function takes care of closing tags, and looks similar - note that I've specifically cleaned up $currentTag at the end.
<?
function endElement($parser, $name) {
global $currentTag;
// output
closing HTML tags
switch ($name) {
case "BOOK":
echo "</tr>";
break;
case
"TITLE":
echo "</td>";
break;
case "AUTHOR":
echo "</td>";
break;
case
"PRICE":
echo "</td>";
break;
case "RATING":
echo "</td>";
break;
default:
break;
}
//
clear current tag variable
$currentTag = "";
}
?>
So this takes care of replacing XML tags with corresponding HTML tags...but what
about handling the data between them?
<?
// process data between tags
function characterData($parser, $data) {
global
$currentTag;
// text ratings
$ratings = array("Words fail me!", "Terrible",
"Bad", "Indifferent",
"Good", "Excellent");
// format the data
switch ($currentTag)
{
case "TITLE":
// italics for title
echo "<i>$data</i>";
break;
case
"AUTHOR":
echo $data;
break;
case "PRICE":
// add currency symbol for
price
echo "$" . $data;
break;
case "RATING":
// get text rating
echo
$ratings[$data];
break;
default:
break;
}
}
?>
The characterData() function is called whenever the parser encounters data between
an XML tag pair. Note, however, that the function is only passed the data as argument; there is no way of telling which tags are around it. However, since the parser processes XML chunk-by-chunk, we can use the $currentTag variable to identify which tag this data belongs to.
Depending on the value of $currentTag, a "switch" loop is used to print data with appropriate formatting; this is the place where I add italics to the title, a currency symbol to the price, and a text rating (corresponding to a numerical index) from the $ratings array.
Here's what the finished script, with some additional HTML, looks like:
<html>
<head>
<title>The Library</title>
<style type="text/css">
TD
{font-family: Arial; font-size: smaller}
H2 {font-family: Arial}
</style>
</head>
<body
bgcolor="white">
<h2>The Library</h2>
<table border="1" cellspacing="1"
cellpadding="5">
<tr>
<td align=center>Title</td>
<td align=center>Author</td>
<td
align=center>Price</td>
<td align=center>User Rating</td>
</tr>
<?
//
data file
$file = "library.xml";
// use this to keep track of which tag the parser
is currently processing
$currentTag = "";
function startElement($parser, $name,
$attrs) {
global $currentTag;
$currentTag = $name;
// output opening HTML
tags
switch ($name) {
case "BOOK":
echo "<tr>";
break;
case "TITLE":
echo
"<td>";
break;
case "AUTHOR":
echo "<td>";
break;
case "PRICE":
echo
"<td>";
break;
case "RATING":
echo "<td>";
break;
default:
break;
}
}
function
endElement($parser, $name) {
global $currentTag;
// output closing HTML tags
switch
($name) {
case "BOOK":
echo "</tr>";
break;
case "TITLE":
echo "</td>";
break;
case
"AUTHOR":
echo "</td>";
break;
case "PRICE":
echo "</td>";
break;
case
"RATING":
echo "</td>";
break;
default:
break;
}
// clear current
tag variable
$currentTag = "";
}
// process data between tags
function characterData($parser,
$data) {
global $currentTag;
// text ratings
$ratings = array("Words fail
me!", "Terrible", "Bad", "Indifferent",
"Good", "Excellent");
// format the
data
switch ($currentTag) {
case "TITLE":
// italics for title
echo "<i>$data</i>";
break;
case
"AUTHOR":
echo $data;
break;
case "PRICE":
// add currency symbol for
price
echo "$" . $data;
break;
case "RATING":
// get text rating
echo
$ratings[$data];
break;
default:
break;
}
}
// initialize parser
$xml_parser
= xml_parser_create();
// set callback functions
xml_set_element_handler($xml_parser,
"startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
//
open XML file
if (!($fp = fopen($file, "r")))
{
die("Cannot locate XML data
file: $file");
}
// read and parse data
while ($data = fread($fp, 4096))
{
// error handler
if (!xml_parse($xml_parser, $data, feof($fp)))
{
die(sprintf("XML
error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
// clean up
xml_parser_free($xml_parser);
?>
</table>
</body>
</html>
And when you run it, here's what you'll see:
You can now add new items to your XML document, or edit existing items, and your rendered HTML page will change accordingly. By separating the data from the presentation, XML has imposed standards on data collections, making it possible, for example, for users with no technical knowledge of HTML to easily update content on a Web site, or to present data from a single source in different ways.