Python’s email library contains many useful classes for performing menial tasks with email. The most important of the classes is the “Message” class. This is the base class upon which all other “message-part” classes are derived in the MIME object hierarchy. Any multipart message (a message with attachments) is represented by Python as a nested, linked structure of “Message” objects, while simple messages are represented by a single “Message” object. Some useful methods on the “Message” object are:
__len__(): Gives the number of header fields for the message.
__getitem__(name): Returns the value of the name header field.
__setitem__(name, val): Sets the name header field to val.
__delitem__(name): Deletes the name header field.
attach(payload): Adds the payload Message object to the payload for the message.
as_string(): Returns a string representation of the message, often used when sending the message though the SMTP object.
The Email Parser
Another useful object is Parser. This object takes either a string representation of a message (for instance, a string downloaded from an email server) or a file containing flat text of an email and turns it into the appropriate Python “Message” object representing the email. There are two ways to parse an email message. First, you can simply create the whole message object structure to represent the entire message, including all the different message-parts. Second, it is possible to tell the parser to just parse the headers of a message.
When you are only interested in the headers of a message, the second method is significantly faster. It has limited usefulness in client-side operations, as you will generally be downloading message data from a remote server, in which case you will only download the data you wish to use. However, if you were writing a script to run over messages stored as files on a server, this can be useful if you want to get message statistics without having to parse the entire message body structure -- in case someone has a 30 megabyte attachment in their mailbox, for instance.
Additionally, there are a couple of shortcut functions available that allow you to skip the actual creation of a Parser object. These functions are called “message_from_string” and “message_from_file.” They take a string or file pointer respectively and return the message object, without the coding overhead of creating a Parser object and then calling the actual method on the object. However, these functions don’t allow you to parse only headers.
The following are a few examples to show how to use these objects. In these cases, we assume that there is a string representation of a message in the variable “msg.” This could be built up directly from strings provided by the users or by downloading from a mail server, something that was described in the previous two articles in this series.
from email.Parser import Parser
p = Parser()
emailMessage = p.parsestr(msg)
msgJustHeaders = p.parsestr(msg, True)
The second to last line parses the entire message, and the last line parses just the headers, as previously discussed. Alternatively, the same thing can be accomplished by using the shortcut methods like this:
The shortcut methods are great if you only need plain vanilla functionality for a relatively small set of messages, and you want the entire message parsed each time. In addition, parsing a message directly from a file would work the same as above, but you would substitute the “parse” method for “parsestr” in the first example, and change the shortcut method to “message_from_file.” In both examples, you would pass a file pointer object rather than a message string.
blog comments powered by Disqus