Let's say we want to organize a book collection using XML to describe it all. We don't need anything fancy. We only need to store the title, author, and genre of the book. Let's go ahead and create the markup for a few books: <?xml version="1.0" encoding="UTF-8"?> Now we're left with parsing the data and turning it into something presentable. If you examine the way the data is stored, you will notice that it is similar to a dictionary in Python. Therefore, a dictionary would be an ideal type to store the data in. We'll create a chunk of code that does just this. SAX, Simple API for XML, will be used for this project. It is contained in xml.sax: import xml.sax # Create a collection list # This handles the parsing of the content def __init__ ( self ): self.book = {} # Called at the start of an element if name == 'title': # Called at the end of an element if name == 'book': # Called to handle content besides elements if self.title: # Parse the collection As you can see, there's really not much work involved. All we have to do is write the instructions that organize each book into a dictionary and put all the dictionaries into the collection list. We start by subclassing xml.sax.ContentHandler. The class we create is charged with handling the content of the document we parse. In our class's __init__ method, we define a few variables. The book dictionary will, of course, house the book's information. The title variable will be used by the characters method to determine whether we are dealing with the title tag's content. The same goes for the author variable and the genre variable. These are set to True in startElement if we're dealing with that particular element. They are then set to False when we have finished using them in endElement. Finally, we instruct Python to parse the file in the last three lines. We are now free to present this information to the user in whichever way we see fit. For example, if we wanted to just output the book information without dressing it up too much, we could simply append some code to the above script that sorts through the list of dictionaries that the script creates: for book in collection:
blog comments powered by Disqus |
|
|
|
|
|
|
|