The first step in this script is to set up some global variables: The $currentTag variable will hold the name of the element that the parser is currently processing - you'll see why this is needed shortly. Since my ultimate goal is to display the individual items in the channel, with links, I also need to know when the parser has exited the <channel>...</channel> block and entered the <item>...</item> sections of the document. Since I'm using a SAX parser, which operates in a sequential manner, there is no parser API available to discover depth or location in the tree. So I have to invent my own mechanism to do this - which is where the $flag variable comes in. The $flag variable will be used to find out if the parser is within the <channel> block or the <item> block. The next step is to initialize the SAX parser and begin parsing the RSS document. This is a fairly standard sequence of commands, and the comments should explain it sufficiently. The xml_parser_create() function is used to instantiate the parser, and assign it to the handle $xp. Next, callback functions are set up to handle opening and closing tags, and the character data within them. Finally, the xml_parse() function, in conjunction with a bunch of fread() calls, is used to read the RDF file and parse it. Each time an opening tag is encountered in the document, the opening tag handler elementBegin() is called. This function receives, as function argument, the name of the current tag and its attributes (if any). This tag name is assigned to the global $currentTag variable, and - if the tag is an opening <item> tag - the $flag variable is set to 1. Conversely, when a closing tag is found, the closing tag handler elementEnd() is invoked. This closing tag handler also receives the tag name as parameter. If this is a closing </item> tag, the value of $flag is reset to 0, and the value of $currentTag is cleared. Now, what about the data between the tags, which is what we're really interested in? Say hello to the character data handler, characterData(). Now, if you look at the arguments passed to this function, you'll see that characterData() only receives the data between the opening and closing tag - it has no idea which particular tag the parser is currently processing. Which is why we needed the global $currentTag variable in the first place (told you this would make sense eventually!) If the value of $flag is 1 - in other words, if the parser is currently within an <item>...</item> block - and if the element currently being processed is either a <title>, <link> or <description> element, then the data is printed to the output device (in this case, the Web browser), followed by a line break. The entire RDF document will be processed in this sequential manner, with output appearing every time an item is found. Here's what you'll see when you run the script: ![]()
blog comments powered by Disqus |