While the above XML structure is fine for many things, compacting things into attributes is often very helpful and a much better idea. Let's consider a music library (consisting of individual songs rather than whole albums). Instead of creating indivual tags for the album the song comes from, the artist of the song, the name of the song and length of the song, which would get tiresome to type, we could simply create attributes for each of these properties. Here's an XML file that does this: <?xml version="1.0" encoding="UTF-8"?> If we had given every property of the song its own tag, then the file would have been much longer than it is now. However, we are able to reduce the length of the file by putting properties in attributes. Now, however, we are left with the task of parsing the data. The process is similar to what we did above, but there are some differences since we are dealing with attributes this time around. Here's how it all works: import xml.sax # Create a class to handle the contents of the XML file def __init__ ( self ): self.artist = '' # Handle the start of an element # Check to see if it is a "track" element # Handle content # If the content isn't a newline or blank space, then we # Parse the file It's not a very complex script, and it's not very lengthy. It strongly resembles the previous script, but note that we choose not to store anything in a dictionary. Rather, we just dump it all out to the user as we receive it. The attributes variable in the startElement method is an object representing all the attributes of that tag. We then access the attributes by name with the getValue method, saving the values in variables that we print out in the characters method. That's all there is to parsing attributes. What if, however, we do not know the names of the attributes? It isn't too much of a problem, since we can loop through all the attributes and get their values: import xml.sax # Create a class to handle the XML def __init__ ( self ): self.attributes = None # Handle the beginning part of each tag # Check to see if we're dealing with a track tag # Handle content if ( content != '\n' ) and ( content.replace ( ' ', '' ) != '' ): # Loop through each attribute and print the name and value # Parse it all In the above script, we use the getNames method to retrieve a list of attribute names. We loop through the list and print each attribute's name (with the first letter capitalized) and value.
blog comments powered by Disqus |
|
|
|
|
|
|
|