Working with XML Documents and Python - Describing a Music Library
(Page 3 of 4 )
While the above XML structure is fine for many things, compacting things into attributes is often very helpful and a much better idea. Let's consider a music library (consisting of individual songs rather than whole albums). Instead of creating indivual tags for the album the song comes from, the artist of the song, the name of the song and length of the song, which would get tiresome to type, we could simply create attributes for each of these properties. Here's an XML file that does this:
<?xml version="1.0" encoding="UTF-8"?>
<library>
<track artist='Boston' album='Greatest Hits' time='5:04'>Peace
of Mind</track>
<track artist='Dire Straits' album='Sultans Of Swing-Best of
Dire Straits' time='5:50'>Sultans of Swing</track>
<track artist='Dire Straits' album='Sultans Of Swing-Best of
Dire Straits' time='4:12'>Walk of Life</track>
<track artist='The Eagles' album='The Very Best of The Eagles'
time='3:33'>Take It Easy</track>
<track artist='Gary Allan' album='Best I Ever Had'
time='4:18'>Best I Ever Had</track>
<track artist='Goo Goo Dolls' album='Dizzy Up the Girl'
time='4:50'>Iris</track>
<track artist='Kansas' album='The Ultimate Kansas'
time='3:24'>Dust in the Wind</track>
<track artist='Toby Keith' album='Greatest Hits 2'
time='3:27'>How Do You Like Me Now?!</track>
<track artist='Toby Keith' album='Greatest Hits 2'
time='3:15'>Courtest of the Red, White and Blue</track>
<track artist='ZZ Top' album='ZZ Top-Greatest Hits'
time='4:03'>Got Me Under Pressure</track>
</library>
If we had given every property of the song its own tag, then the file would have been much longer than it is now. However, we are able to reduce the length of the file by putting properties in attributes.
Now, however, we are left with the task of parsing the data. The process is similar to what we did above, but there are some differences since we are dealing with attributes this time around. Here's how it all works:
import xml.sax
# Create a class to handle the contents of the XML file
class HandleLibrary ( xml.sax.ContentHandler ):
def __init__ ( self ):
self.artist = ''
self.album = ''
self.time = ''
# Handle the start of an element
def startElement ( self, name, attributes ):
# Check to see if it is a "track" element
# If so, store the attributes
if name == 'track':
self.artist = attributes.getValue ( 'artist' )
self.album = attributes.getValue ( 'album' )
self.time = attributes.getValue ( 'time' )
# Handle content
def characters ( self, content ):
# If the content isn't a newline or blank space, then we
have the track name
# Print out all the track info
if ( content != '\n' ) and ( content.replace ( ' ', '' ) !=
'' ):
print
print 'Track: ' + content
print 'Artist: ' + self.artist
print 'Album: ' + self.album
print 'Length: ' + self.time
# Parse the file
parser = xml.sax.make_parser()
parser.setContentHandler ( HandleLibrary() )
parser.parse ( 'music.xml' )
It's not a very complex script, and it's not very lengthy. It strongly resembles the previous script, but note that we choose not to store anything in a dictionary. Rather, we just dump it all out to the user as we receive it. The attributes variable in the startElement method is an object representing all the attributes of that tag. We then access the attributes by name with the getValue method, saving the values in variables that we print out in the characters method. That's all there is to parsing attributes.
What if, however, we do not know the names of the attributes? It isn't too much of a problem, since we can loop through all the attributes and get their values:
import xml.sax
# Create a class to handle the XML
class HandleLibrary ( xml.sax.ContentHandler ):
def __init__ ( self ):
self.attributes = None
# Handle the beginning part of each tag
def startElement ( self, name, attributes ):
# Check to see if we're dealing with a track tag
# If we are, store the attributes
if name == 'track':
self.attributes = attributes
# Handle content
def characters ( self, content ):
if ( content != '\n' ) and ( content.replace ( ' ', '' ) != '' ):
# Loop through each attribute and print the name and value
print
print 'Track: ' + content
for attribute in self.attributes.getNames():
print attribute [ 0 ].upper() + attribute [ 1: ] + ': ' + self.attributes.getValue ( attribute )
print
# Parse it all
parser = xml.sax.make_parser()
parser.setContentHandler ( HandleLibrary() )
parser.parse ( 'music.xml' )
In the above script, we use the getNames method to retrieve a list of attribute names. We loop through the list and print each attribute's name (with the first letter capitalized) and value.
Next: The Document Object Model >>
More Python Articles
More By Peyton McCullough