Python
  Home arrow Python arrow Page 3 - Working with XML Documents and Python
Dev Shed Forums  
Administration  
AJAX  
Apache  
BrainDump  
DHTML  
Flash  
Java  
JavaScript  
Multimedia  
MySQL  
Oracle  
Perl  
PHP  
Practices  
Python  
Reviews  
Security  
Smartphone Development  
Style-Sheets  
Web Services  
XML  
Zend  
Zope  
Mobile Linux  
App Generation ROI  
IBM® developerWorks  
Forums Sitemap  
E-Commerce Hosting  
Linux Web Hosting  
Managed Hosting  
Small Business Hosting  
VPS Hosting  
Weekly Newsletter

 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid  
Request Media Kit
Contact Us  
Site Map  
Privacy Policy  
Support  
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
Google.com  
PYTHON

Working with XML Documents and Python
By: Peyton McCullough
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: starstarstarstarstar / 14
    2005-11-17


    Table of Contents:
  • Working with XML Documents and Python
  • Organizing a Book Collection
  • Describing a Music Library
  • The Document Object Model

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      error-file:tidyout.log Del.ici.ous error-file:tidyout.log Digg
      error-file:tidyout.log Blink error-file:tidyout.log Simpy
      error-file:tidyout.log Google error-file:tidyout.log Spurl
      error-file:tidyout.log Y! MyWeb error-file:tidyout.log Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article

     
     
    ADVERTISEMENT


    Working with XML Documents and Python - Describing a Music Library
    ( Page 3 of 4 )

    While the above XML structure is fine for many things, compacting things into attributes is often very helpful and a much better idea. Let's consider a music library (consisting of individual songs rather than whole albums). Instead of creating indivual tags for the album the song comes from, the artist of the song, the name of the song and length of the song, which would get tiresome to type, we could simply create attributes for each of these properties. Here's an XML file that does this:

    <?xml version="1.0" encoding="UTF-8"?>
    <library>
       <track artist='Boston' album='Greatest Hits' time='5:04'>Peace
    of Mind</track>
       <track artist='Dire Straits' album='Sultans Of Swing-Best of
    Dire Straits' time='5:50'>Sultans of Swing</track>
       <track artist='Dire Straits' album='Sultans Of Swing-Best of
    Dire Straits' time='4:12'>Walk of Life</track>
       <track artist='The Eagles' album='The Very Best of The Eagles'
    time='3:33'>Take It Easy</track>
       <track artist='Gary Allan' album='Best I Ever Had'
    time='4:18'>Best I Ever Had</track>
       <track artist='Goo Goo Dolls' album='Dizzy Up the Girl'
    time='4:50'>Iris</track>
       <track artist='Kansas' album='The Ultimate Kansas'
    time='3:24'>Dust in the Wind</track>
       <track artist='Toby Keith' album='Greatest Hits 2'
    time='3:27'>How Do You Like Me Now?!</track>
       <track artist='Toby Keith' album='Greatest Hits 2'
    time='3:15'>Courtest of the Red, White and Blue</track>
       <track artist='ZZ Top' album='ZZ Top-Greatest Hits'
    time='4:03'>Got Me Under Pressure</track>
    </library>

    If we had given every property of the song its own tag, then the file would have been much longer than it is now. However, we are able to reduce the length of the file by putting properties in attributes.

    Now, however, we are left with the task of parsing the data. The process is similar to what we did above, but there are some differences since we are dealing with attributes this time around. Here's how it all works:

    import xml.sax

    # Create a class to handle the contents of the XML file
    class HandleLibrary ( xml.sax.ContentHandler ):

       def __init__ ( self ):

          self.artist = ''
          self.album = ''
          self.time = ''

       # Handle the start of an element
       def startElement ( self, name, attributes ):

          # Check to see if it is a "track" element
          # If so, store the attributes
          if name == 'track':
              self.artist = attributes.getValue ( 'artist' )
              self.album = attributes.getValue ( 'album' )
              self.time = attributes.getValue ( 'time' )

       # Handle content
       def characters ( self, content ):

          # If the content isn't a newline or blank space, then we
    have the track name
          # Print out all the track info
          if ( content != '\n' ) and ( content.replace ( ' ', '' ) !=
    '' ):
             print
             print 'Track:  ' + content
             print 'Artist: ' + self.artist
             print 'Album:  ' + self.album
             print 'Length: ' + self.time

    # Parse the file
    parser = xml.sax.make_parser()
    parser.setContentHandler ( HandleLibrary() )
    parser.parse ( 'music.xml' )

    It's not a very complex script, and it's not very lengthy. It strongly resembles the previous script, but note that we choose not to store anything in a dictionary. Rather, we just dump it all out to the user as we receive it. The attributes variable in the startElement method is an object representing all the attributes of that tag. We then access the attributes by name with the getValue method, saving the values in variables that we print out in the characters method. That's all there is to parsing attributes.

    What if, however, we do not know the names of the attributes? It isn't too much of a problem, since we can loop through all the attributes and get their values:

    import xml.sax

    # Create a class to handle the XML
    class HandleLibrary ( xml.sax.ContentHandler ):

       def __init__ ( self ):

          self.attributes = None

       # Handle the beginning part of each tag
       def startElement ( self, name, attributes ):

          # Check to see if we're dealing with a track tag
          # If we are, store the attributes
          if name == 'track':
             self.attributes = attributes

       # Handle content
       def characters ( self, content ):

          if ( content != '\n' ) and ( content.replace ( ' ', '' ) != '' ):

             # Loop through each attribute and print the name and value
             print
             print 'Track: ' + content
             for attribute in self.attributes.getNames():
                print attribute [ 0 ].upper() + attribute [ 1: ] + ': ' + self.attributes.getValue ( attribute )
             print

    # Parse it all
    parser = xml.sax.make_parser()
    parser.setContentHandler ( HandleLibrary() )
    parser.parse ( 'music.xml' )

    In the above script, we use the getNames method to retrieve a list of attribute names. We loop through the list and print each attribute's name (with the first letter capitalized) and value.



     
     
    >>> More Python Articles          >>> More By Peyton McCullough
     

       

    PYTHON ARTICLES

    - Tuples and Other Python Object Types
    - The Dictionary Python Object Type
    - String and List Python Object Types
    - Introducing Python Object Types
    - Mobile Programming using PyS60: Advanced UI ...
    - Nested Functions in Python
    - Python Parameters, Functions and Arguments
    - Python Statements and Functions
    - Statements and Iterators in Python
    - Sequences and Sets in Python
    - Python Expressions and Operators
    - Dictionaries, Variables and Statements in Py...
    - Data Types in Python
    - The Python Language
    - SSH with Twisted





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 2 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek