This tutorial will explain how to parse an XML document using the SAX API implementation available for Python. Of course, there is more than one way to parse XML data with Python. In this article we will focus at its built-in SAX module.
You don’t have to be an expert in Python or XML in order to follow this article. On the contrary, this is an introduction rather than an in-depth analysis. Of course I assume that you have some basic knowledge of programming and –preferably- the Python syntax and structures. It will also be nice if you are aware of the basic XML principles and terms.
In the next part of this article, I will describe the SAX classes of Python. Afterwards, I will use an example in order to show how the theory can be applied. In the last parts I will provide some homework and some links that will help you to delve deeper in the subjects introduced in this article.
In any case, if you want to test the code in this tutorial, you will need Python 2.1 or later installed. I don’t provide any installation details; if you need them, I would recommend that you check other sources, like the article Getting Started With Python.
Our example is web-based therefore it would be nice if Python were integrated in your web server, but –of course- you may modify the script to run as standalone application or in any other way you desire.
After the end of the article you will be in a position to successfully use Python in order to parse XML documents with the SAX interfaces, but let’s first make sure we cover the theory...