HomePython Page 2 - Working with XML Documents and Python
Organizing a Book Collection - Python
XML can be used for describing data without needing a database. However, this leaves us with the problem of interpreting the data embedded within the XML. This is where Python comes to the rescue, as Peyton explains.
Let's say we want to organize a book collection using XML to describe it all. We don't need anything fancy. We only need to store the title, author, and genre of the book. Let's go ahead and create the markup for a few books:
<?xml version="1.0" encoding="UTF-8"?> <collection> <book> <title>The Once and Future King</title> <author>T.H. White</author> <genre>Fantasy</genre> </book> <book> <title>The Curse of Chalion</title> <author>Lois McMaster Bujold</author> <genre>Fantasy</genre> </book> <book> <title>Paladin of Souls</title> <author>Lois McMaster Bujold</author> <genre>Fantasy</genre> </book> <book> <title>Alas, Babylon</title> <author>Pat Frank</author> <genre>Fiction</genre> </book> <book> <title>Rifles for Wattie</title> <author>Harold Keith</author> <genre>Fiction</genre> </book> </collection>
Now we're left with parsing the data and turning it into something presentable. If you examine the way the data is stored, you will notice that it is similar to a dictionary in Python. Therefore, a dictionary would be an ideal type to store the data in. We'll create a chunk of code that does just this. SAX, Simple API for XML, will be used for this project. It is contained in xml.sax:
# Create a collection list collection = 
# This handles the parsing of the content class HandleCollection ( xml.sax.ContentHandler ):
As you can see, there's really not much work involved. All we have to do is write the instructions that organize each book into a dictionary and put all the dictionaries into the collection list. We start by subclassing xml.sax.ContentHandler. The class we create is charged with handling the content of the document we parse. In our class's __init__ method, we define a few variables. The book dictionary will, of course, house the book's information. The title variable will be used by the characters method to determine whether we are dealing with the title tag's content. The same goes for the author variable and the genre variable. These are set to True in startElement if we're dealing with that particular element. They are then set to False when we have finished using them in endElement. Finally, we instruct Python to parse the file in the last three lines.
We are now free to present this information to the user in whichever way we see fit. For example, if we wanted to just output the book information without dressing it up too much, we could simply append some code to the above script that sorts through the list of dictionaries that the script creates:
for book in collection: print print 'Title: ', book [ 'title' ] print 'Author: ', book [ 'author' ] print 'Genre: ', book [ 'genre' ]