Home arrow Python arrow Page 4 - Working with XML Documents and Python

The Document Object Model - Python

XML can be used for describing data without needing a database. However, this leaves us with the problem of interpreting the data embedded within the XML. This is where Python comes to the rescue, as Peyton explains.

  1. Working with XML Documents and Python
  2. Organizing a Book Collection
  3. Describing a Music Library
  4. The Document Object Model
By: Peyton McCullough
Rating: starstarstarstarstar / 17
November 17, 2005

print this article



SAX is not the only way to process XML data in Python. The Document Object Model exists, and it gives us an object-oriented interface to XML data. Python contains a library called minidom that provides a simple interface to DOM without a whole lot of bells and whistles. Let's recreate our music library parser, making use of DOM this time:

import xml.dom.minidom

# Load the music library
library = xml.dom.minidom.parse ( 'music.xml' )

# Get a list of the tracks
tracks = library.documentElement.getElementsByTagName ( 'track' )

# Go through each track
for track in tracks:

   # Print each track's information
   print 'Track:  ' + track.childNodes [ 0 ].nodeValue
   print 'Artist: ' + track.attributes [ 'artist' ].nodeValue
   print 'Album:  ' + track.attributes [ 'album' ].nodeValue
   print 'Length: ' + track.attributes [ 'time' ].nodeValue

We end up with a very short script in the above example. We start by pointing Python to the file we wish to parse. Then we get all tags by the name of “track” that belong to the main tag. We loop through the list provided, printing out the information contained within. To access the text inside the element, we access the track's list of childNodes. The text is stored in a node, and we print out the value of it. The attributes are stored in attributes, and we reference them by name, printing out nodeValue.

Again, though, what if we don't know everything we are going to parse? Let's recreate the second example of the previous section using DOM:

import xml.dom.minidom

# Load the XML file
library = xml.dom.minidom.parse ( 'music.xml' )

# Get a list of tracks
tracks = library.documentElement.getElementsByTagName ( 'track' )

# Loop through the tracks
for track in tracks:

   # Print the track name
   print 'Track: ' + track.childNodes [ 0 ].nodeValue

   # Loop through the attributes
   for attribute in track.attributes.keys():
      print attribute [ 0 ].upper() + attribute [ 1: ] + ': ' +
track.attributes [ attribute ].nodeValue

We loop through the names of the attributes returned in the keys method, printing the name of the attribute out with the first letter capitalized. We then print the value of the attribute out by referencing the attribute by its key and then accessing nodeValue.

Let's say we have tags nested in each other. Consider our very first script that parsed a book collection. Let's rebuild it with DOM:

import xml.dom.minidom

# Load the book collection
collection = xml.dom.minidom.parse ( 'collection.xml' )

# Get a list of books
books = collection.documentElement.getElementsByTagName
( 'book' )

# Loop through the books
for book in books:

   # Print out the book's information
   print 'Title:  ' + book.getElementsByTagName ( 'title' )
[ 0 ].childNodes [ 0 ].nodeValue
   print 'Author: ' + book.getElementsByTagName ( 'author' )
[ 0 ].childNodes [ 0 ].nodeValue
   print 'Genre:  ' + book.getElementsByTagName ( 'genre' )
[ 0 ].childNodes [ 0 ].nodeValue

We load the collection file and then get a list of books. Then, we loop through the list of books and print out what's wrapped inside of each tag, which is in the form of a child node that we must get the value of. It's all very simple.

DOM includes many more features, but all you need to simply read a document is contained within the minidom module.


XML is a useful tool for describing data. It can be used to describe just about anything –- from book collections and music libraries to user settings for an application. Python contains a few utilities that can be used to read and process XML data, namely SAX and DOM. SAX allows you to create handler classes that can process individual items within an XML document, and DOM abstracts the entire document, allowing you to easily navigate through a tree of XML data. Both tools are extremely simple to use and contribute to the phrase “batteries included” that is used to describe the Python language.

>>> More Python Articles          >>> More By Peyton McCullough

blog comments powered by Disqus
escort Bursa Bursa escort Antalya eskort


- Python Big Data Company Gets DARPA Funding
- Python 32 Now Available
- Final Alpha for Python 3.2 is Released
- Python 3.1: String Formatting
- Python 3.1: Strings and Quotes
- Python 3.1: Programming Basics and Strings
- Tuples and Other Python Object Types
- The Dictionary Python Object Type
- String and List Python Object Types
- Introducing Python Object Types
- Mobile Programming using PyS60: Advanced UI ...
- Nested Functions in Python
- Python Parameters, Functions and Arguments
- Python Statements and Functions
- Statements and Iterators in Python

Developer Shed Affiliates


Dev Shed Tutorial Topics: