Python Email Libraries: SMTP and Email Parsing

The previous two articles in this series discussed connecting to the two major types of email servers and downloading messages. The task yet to be discussed is the actual sending of a message and how to represent an email message logically. In the first part of this article, we will discuss how to create the local data structure to represent an email message. The second part deals with actually connecting to an SMTP server and sending the message.

MIME Messaging

Python’s main library for dealing with email messages is called the email library, which contains many different objects. A good portion of these objects are concerned with creating and manipulating MIME objects. MIME stands for Multipurpose Internet Mail Extensions and is the standard for formatting email messages, especially email messages that carry attachments. A full discussion of MIME is quite extensive and beyond the scope of this article; however, some familiarity with the standard is useful for understanding how MIME messages are composed.

A MIME message is composed of many “parts,” each of which holds a separate piece of data. This data can be any number of different types, for example, it could be text, an image, a media file, or any number of other things. Most messages nowadays are classified overall as “multipart” which means they contain several different parts, each containing a different type of data. MIME defines several header fields to help with the task of classifying and separating the “parts” of an email message. 

Particularly, the Python libraries for email wrap much of the details surrounding the creation of MIME messages. This includes things like how to encode specific types of data so other email clients know how to decode the data, and creating message part boundary descriptions, which are often annoying low level tasks that the typical Python programmer doesn’t want to deal with directly.

{mospagebreak title=The Python Email Library} 

Python’s email library contains many useful classes for performing menial tasks with email. The most important of the classes is the “Message” class. This is the base class upon which all other “message-part” classes are derived in the MIME object hierarchy. Any multipart message (a message with attachments) is represented by Python as a nested, linked structure of “Message” objects, while simple messages are represented by a single “Message” object. Some useful methods on the “Message” object are:

__len__(): Gives the number of header fields for the message.

__getitem__(name): Returns the value of the name header field.

__setitem__(name, val):  Sets the name header field to val.

__delitem__(name):  Deletes the name header field.

attach(payload):  Adds the payload Message object to the payload for the message.

 as_string():  Returns a string representation of the message, often used when  sending the message though the SMTP object.

The Email Parser

Another useful object is Parser. This object takes either a string representation of a message (for instance, a string downloaded from an email server) or a file containing flat text of an email and turns it into the appropriate Python “Message” object representing the email. There are two ways to parse an email message.  First, you can simply create the whole message object structure to represent the entire message, including all the different message-parts. Second, it is possible to tell the parser to just parse the headers of a message. 

When you are only interested in the headers of a message, the second method is significantly faster. It has limited usefulness in client-side operations, as you will generally be downloading message data from a remote server, in which case you will only download the data you wish to use. However, if you were writing a script to run over messages stored as files on a server, this can be useful if you want to get message statistics without having to parse the entire message body structure — in case someone has a 30 megabyte attachment in their mailbox, for instance. 

Additionally, there are a couple of shortcut functions available that allow you to skip the actual creation of a Parser object.  These functions are called “message_from_string” and “message_from_file.” They take a string or file pointer respectively and return the message object, without the coding overhead of creating a Parser object and then calling the actual method on the object.  However, these functions don’t allow you to parse only headers. 

The following are a few examples to show how to use these objects. In these cases, we assume that there is a string representation of a message in the variable “msg.”  This could be built up directly from strings provided by the users or by downloading from a mail server, something that was described in the previous two articles in this series. 

from email.Parser import Parser

p = Parser()
emailMessage = p.parsestr(msg)
msgJustHeaders = p.parsestr(msg, True)

The second to last line parses the entire message, and the last line parses just the headers, as previously discussed. Alternatively, the same thing can be accomplished by using the shortcut methods like this:

import email

emailMessage = email.message_from_string(msg)

The shortcut methods are great if you only need plain vanilla functionality for a relatively small set of messages, and you want the entire message parsed each time. In addition, parsing a message directly from a file would work the same as above, but you would substitute the “parse” method for “parsestr” in the first example, and change the shortcut method to “message_from_file.” In both examples, you would pass a file pointer object rather than a message string.

{mospagebreak title=Manipulating Message Objects}           

One of the major tasks you will want to accomplish when working with email messages is header manipulation. Basically, you can add, delete, modify, and list headers that are associated with an email message. Let’s assume you have parsed a message as above, and the resulting “Message” objects are in the “emailMessage” object. We can now do the following actions to modify headers:

fields = emailMessage.keys()
if emailMessage.has_key(‘To’):
    emailMessage.__delitem__(‘To’)
emailMessage.__setitem__(‘To’, ‘test@yourdomain.com’)
subj = emailMessage.__getitem__(‘Subject’)
print “The message contains the following keys:n”
for field in fields:
    print field + “n”

The preceding code grabs the header fields from the message. It then checks to see whether the message contains a “To” field. If it does, it deletes that field.  Next, it sets the “To” field to the value test@yourdomain.com. After that, it fetches the “Subject” field and then prints out all of the field names on separate lines. 

The above code is useful for fairly simple actions with email, and for many applications, this will be sufficient to accomplish the overall goal. However, the Python email library allows for the manipulation of much more complex header fields very easily. This takes place through the various “param” methods. Some headers are expected to have parameters added to allow them to carry more data in a compact fashion and to associate that data with a particular task. The standard application of this is in the “Content-type” header. 

Usually, when you are working with MIME messages, the constructors for the various MIME Message objects will take care of setting the “Content-Type”  header correctly, as will parsing a message using the parsing methods described above, and you won’t necessarily have to manually set this header.  However, if you are defining a content type or subtype that isn’t covered by the built in Python classes, this will be important. To set a parameter, you use the “set_param” method, and to retrieve a parameter “get_param.”  In addition, Python provides a ready made “set_type” which takes a string with the names of the main-type and sub-type and sets the Content-Type header appropriately.

{mospagebreak title=Sending Email Messages}

The first step in sending an email message is to instantiate an “SMTP” object with the IP address and port of the remote SMTP server where the client should connect. If these arguments aren’t supplied, you will have to call the “connect” method after instantiation to connect to the server. Once you’ve created the SMTP object, normally all you will need to use are the “sendmail” method and “quit” to close the connection. 

In addition to these methods, the SMTP library also allows a client to authenticate with a username and password for servers that require authentication. Also, you can directly issue an SMTP command with the server using the “docmd” method.  This is great for writing your own SMTP extensions, or for beginning to implement your own SMTP class. You can also use the SMTP “VRFY” command with the “verify” method to find out if a server has a specific address. However, remember that this functionality is often disabled on servers today for security reasons and to prevent spammers from taking advantage of it to send junk email. 

The SMTP class’s “sendmail” function is the most interesting for this article. When you call this method, you must send strings with the “To:” and “From:” addresses as well as the full message represented as a string. In order to make parsing email useful, you will probably want to send the messages you’ve parsed at some point, or use a built “Message” object structure to send an email. You will need to get a string representation of your “Message” objects. This is as easy as calling the “as_string” method on the top “Message” object in the hierarchy. This returns a string encoded representation of the message ready for sending with the SMTP object. Following is an example of sending a message. Again we will assume that “emailMessage” is a Message object with data about some email.

server = SMTP(“123.231.12.1”)
server.login(“username”, “password”)
server.sendmail(emailMessage[‘from’], emailMessage[‘to’], emailMessage.as_string())
server.quit()

 

Conclusion

In general, sending email with Python is really very easy. You can take as much or as little control over the process as you wish. It’s certainly possible to jump in with just the built-in classes that are provided and build most applications for email.  However, if you need to go deeper, many tools are provided for you to do just that, all within the great framework of the Python scripting engine. 

[gp-comments width="770" linklove="off" ]

chat