Home arrow Python arrow Page 2 - Working with XML Documents and Python

Organizing a Book Collection - Python

XML can be used for describing data without needing a database. However, this leaves us with the problem of interpreting the data embedded within the XML. This is where Python comes to the rescue, as Peyton explains.

  1. Working with XML Documents and Python
  2. Organizing a Book Collection
  3. Describing a Music Library
  4. The Document Object Model
By: Peyton McCullough
Rating: starstarstarstarstar / 17
November 17, 2005

print this article



Let's say we want to organize a book collection using XML to describe it all. We don't need anything fancy. We only need to store the title, author, and genre of the book. Let's go ahead and create the markup for a few books:

<?xml version="1.0" encoding="UTF-8"?>
      <title>The Once and Future King</title>
      <author>T.H. White</author>
      <title>The Curse of Chalion</title>
      <author>Lois McMaster Bujold</author>
      <title>Paladin of Souls</title>
      <author>Lois McMaster Bujold</author>
      <title>Alas, Babylon</title>
      <author>Pat Frank</author>
      <title>Rifles for Wattie</title>
      <author>Harold Keith</author>

Now we're left with parsing the data and turning it into something presentable. If you examine the way the data is stored, you will notice that it is similar to a dictionary in Python. Therefore, a dictionary would be an ideal type to store the data in. We'll create a chunk of code that does just this. SAX, Simple API for XML, will be used for this project. It is contained in xml.sax:

import xml.sax

# Create a collection list
collection = []

# This handles the parsing of the content
class HandleCollection ( xml.sax.ContentHandler ):

   def __init__ ( self ):

      self.book = {}
      self.title = False
      self.author = False
      self.genre = False

   # Called at the start of an element
   def startElement ( self, name, attributes ):

      if name == 'title':
         self.title = True
      elif name == 'author':
         self.author = True
      elif name == 'genre':
         self.genre = True

   # Called at the end of an element
   def endElement ( self, name ):

      if name == 'book':
         collection.append ( self.book )
         self.book = {}
      elif name == 'title':
         self.title = False
      elif name == 'author':
         self.author = False
      elif name == 'genre':
         self.genre = False

   # Called to handle content besides elements
   def characters ( self, content ):

      if self.title:
         self.book [ 'title' ] = content
      elif self.author:
         self.book [ 'author' ] = content
      elif self.genre:
         self.book [ 'genre' ] = content

# Parse the collection
parser = xml.sax.make_parser()
parser.setContentHandler ( HandleCollection() )
parser.parse ( 'collection.xml' )

As you can see, there's really not much work involved. All we have to do is write the instructions that organize each book into a dictionary and put all the dictionaries into the collection list. We start by subclassing xml.sax.ContentHandler. The class we create is charged with handling the content of the document we parse. In our class's __init__ method, we define a few variables. The book dictionary will, of course, house the book's information. The title variable will be used by the characters method to determine whether we are dealing with the title tag's content. The same goes for the author variable and the genre variable. These are set to True in startElement if we're dealing with that particular element. They are then set to False when we have finished using them in endElement. Finally, we instruct Python to parse the file in the last three lines.

We are now free to present this information to the user in whichever way we see fit. For example, if we wanted to just output the book information without dressing it up too much, we could simply append some code to the above script that sorts through the list of dictionaries that the script creates:

for book in collection:
   print 'Title:  ', book [ 'title' ]
   print 'Author: ', book [ 'author' ]
   print 'Genre:  ', book [ 'genre' ]

>>> More Python Articles          >>> More By Peyton McCullough

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Python Big Data Company Gets DARPA Funding
- Python 32 Now Available
- Final Alpha for Python 3.2 is Released
- Python 3.1: String Formatting
- Python 3.1: Strings and Quotes
- Python 3.1: Programming Basics and Strings
- Tuples and Other Python Object Types
- The Dictionary Python Object Type
- String and List Python Object Types
- Introducing Python Object Types
- Mobile Programming using PyS60: Advanced UI ...
- Nested Functions in Python
- Python Parameters, Functions and Arguments
- Python Statements and Functions
- Statements and Iterators in Python

Developer Shed Affiliates


Dev Shed Tutorial Topics: