SAX is not the only way to process XML data in Python. The Document Object Model exists, and it gives us an object-oriented interface to XML data. Python contains a library called minidom that provides a simple interface to DOM without a whole lot of bells and whistles. Let's recreate our music library parser, making use of DOM this time:
# Load the music library
# Get a list of the tracks
# Go through each track
# Print each track's information
We end up with a very short script in the above example. We start by pointing Python to the file we wish to parse. Then we get all tags by the name of “track” that belong to the main tag. We loop through the list provided, printing out the information contained within. To access the text inside the element, we access the track's list of childNodes. The text is stored in a node, and we print out the value of it. The attributes are stored in attributes, and we reference them by name, printing out nodeValue.
Again, though, what if we don't know everything we are going to parse? Let's recreate the second example of the previous section using DOM:
# Load the XML file
# Get a list of tracks
# Loop through the tracks
# Print the track name
# Loop through the attributes
We loop through the names of the attributes returned in the keys method, printing the name of the attribute out with the first letter capitalized. We then print the value of the attribute out by referencing the attribute by its key and then accessing nodeValue.
Let's say we have tags nested in each other. Consider our very first script that parsed a book collection. Let's rebuild it with DOM:
# Load the book collection
# Get a list of books
# Loop through the books
# Print out the book's information
We load the collection file and then get a list of books. Then, we loop through the list of books and print out what's wrapped inside of each tag, which is in the form of a child node that we must get the value of. It's all very simple.
DOM includes many more features, but all you need to simply read a document is contained within the minidom module.
XML is a useful tool for describing data. It can be used to describe just about anything –- from book collections and music libraries to user settings for an application. Python contains a few utilities that can be used to read and process XML data, namely SAX and DOM. SAX allows you to create handler classes that can process individual items within an XML document, and DOM abstracts the entire document, allowing you to easily navigate through a tree of XML data. Both tools are extremely simple to use and contribute to the phrase “batteries included” that is used to describe the Python language.
blog comments powered by Disqus